ks.test {stats} R Documentation

## Kolmogorov-Smirnov Tests

### Description

Performs one or two sample Kolmogorov-Smirnov tests.

### Usage

```ks.test(x, y, ..., alternative = c("two.sided", "less", "greater"),
exact = NULL)
```

### Arguments

 `x` a numeric vector of data values. `y` either a numeric vector of data values, or a character string naming a distribution function. `...` parameters of the distribution specified (as a character string) by `y`. `alternative` indicates the alternative hypothesis and must be one of `"two.sided"` (default), `"less"`, or `"greater"`. You can specify just the initial letter of the value, but the argument name must be give in full. See Details for the meanings of the possible values. `exact` `NULL` or a logical indicating whether an exact p-value should be computed. See Details for the meaning of `NULL`. Not used for the one-sided two-sample case.

### Details

If `y` is numeric, a two-sample test of the null hypothesis that `x` and `y` were drawn from the same continuous distribution is performed.

Alternatively, `y` can be a character string naming a continuous distribution function. In this case, a one-sample test is carried out of the null that the distribution function which generated `x` is distribution `y` with parameters specified by `...`.

The presence of ties generates a warning, since continuous distributions do not generate them.

The possible values `"two.sided"`, `"less"` and `"greater"` of `alternative` specify the null hypothesis that the true distribution function of `x` is equal to, not less than or not greater than the hypothesized distribution function (one-sample case) or the distribution function of `y` (two-sample case), respectively. This is a comparison of cumulative distribution functions, and the test statistic is the maximum difference in value, with the statistic in the `"greater"` alternative being D^+ = max_u [ F_x(u) - F_y(u) ]. Thus in the two-sample case `alternative="greater"` includes distributions for which `x` is stochastically smaller than `y` (the CDF of `x` lies above and hence to the left of that for `y`), in contrast to `t.test` or `wilcox.test`.

Exact p-values are not available for the one-sided two-sample case, or in the case of ties. If `exact = NULL` (the default), an exact p-value is computed if the sample size if less than 100 in the one-sample case, and if the product of the sample sizes is less than 10000 in the two-sample case. Otherwise, asymptotic distributions are used whose approximations may be inaccurate in small samples. In the one-sample two-sided case, exact p-values are obtained as described in Marsaglia, Tsang & Wang (2003). The formula of Birnbaum & Tingey (1951) is used for the one-sample one-sided case.

If a single-sample test is used, the parameters specified in `...` must be pre-specified and not estimated from the data. There is some more refined distribution theory for the KS test with estimated parameters (see Durbin, 1973), but that is not implemented in `ks.test`.

### Value

A list with class `"htest"` containing the following components:

 `statistic` the value of the test statistic. `p.value` the p-value of the test. `alternative` a character string describing the alternative hypothesis. `method` a character string indicating what type of test was performed. `data.name` a character string giving the name(s) of the data.

### References

Z. W. Birnbaum & Fred H. Tingey (1951), One-sided confidence contours for probability distribution functions. The Annals of Mathematical Statistics, 22/4, 592–596.

William J. Conover (1971), Practical Nonparametric Statistics. New York: John Wiley & Sons. Pages 295–301 (one-sample “Kolmogorov” test), 309–314 (two-sample “Smirnov” test).

Durbin, J. (1973) Distribution theory for tests based on the sample distribution function. SIAM.

George Marsaglia, Wai Wan Tsang & Jingbo Wang (2003), Evaluating Kolmogorov's distribution. Journal of Statistical Software, 8/18. http://www.jstatsoft.org/v08/i18/.

`shapiro.test` which performs the Shapiro-Wilk test for normality.

### Examples

```x <- rnorm(50)
y <- runif(30)
# Do x and y come from the same distribution?
ks.test(x, y)
# Does x come from a shifted gamma distribution with shape 3 and rate 2?
ks.test(x+2, "pgamma", 3, 2) # two-sided, exact
ks.test(x+2, "pgamma", 3, 2, exact = FALSE)
ks.test(x+2, "pgamma", 3, 2, alternative = "gr")

# test if x is stochastically larger than x2
x2 <- rnorm(50, -1)
plot(ecdf(x), xlim=range(c(x, x2)))