This function calculates a variety of statistics to characterize the discrepancy between 1D data and a model for the distribution from which the data were drawn. Each statistic is based on the empirical distribution function (i.e. the cdf of the data). Such statistics can be used to evaluate whether a model distribution is consistent with the data.
In what follows, Fn(x) is the empirical distribution function (edf) and F(x) is the model cdf. Currently, four statistics are implemented:
1) The Kolmogorov-Smirnov Statistic: max(|F(x) - Fn(x)|) 2) An Anderson-Darling style statistic: Mean( (Fn(x) - F(x))^2 / (F(x) * (1 - F(x))) ) 3) The Kuyper Statistic: max(F(x) - Fn(x)) + max(Fn(x) - F(x)) 4) The mean absolute deviation: Mean( |Fn(x) - F(x)| )
Note that statistic 2 is designed to be more sensitive to discrepancies at low and high values of x than is the KS stat
Note also that the Kuyper statistic is meant to be used for values of x wrapped onto a circle. See Numerical Recipes.
If only one of ks, ad, ky, or mad are set, then the return value is that particular statistic. Otherwise, the KS statistic is returned.
SEE ALSO: edf
MODIFICATION HISTORY June 2009: Written by Chris Beaumont July 2009: Added mad statistic
- data in required
A vector of data values
- model in required
The string name of a function which calculates the cdf of the model distribution. The function must have the calling sequence result = model(x, _extra = extra), and must be written to handle x as a scalar or vector. Extra keywords to edf_stats will be passed to this function.
- ks in optional
If non zero or set to a named variable, will calculate and return the ks statistic to that variable.
- ad in optional
Same as above, for the Anderson-Darling statistic.
- ky in optional
Same as above, for the Kuyper statistic
- mad in optional
Same as above, for the mediat absolute deviation
|Modifcation date:||Mon Mar 22 16:17:13 2010|