To get a summary from most basic R statistics you may enter
The typical output the summary() function gives include:
Min, 1st Qu, Median, Mean, 3rd Qu. Max.
Minimum, Maximum and Range in R
There are 4 basic quantiles in every data collection.
>quantile(dataset$variable, 1/4) #Gives the first quantile
>quantile(dataset$variable, 2/4) Gives the second quantile
>quantile(dataset$variable, 3/4) Gives the third quantile
>quantile(dataset$variable, 4/4) Gives the fourth quantile
Mean Absolute Deviation in R
Median Absolute Deviation (MAD) or Absolute Deviation Around the Median is a robust measure of central tendency (the most common measures of central tendency are the arithmetic mean, the median and the mode).
Robust statistics are statistics with good performance for data drawn from a wide range of non-normally distributed probability distributions. Unlike the standard mean/standard deviation combo, MAD is not sensitive to the presence of outliers. The interquartile range is also resistant to the influence of outliers, although the mean and median absolute deviation are better in that they can be converted into values that approximate the standard deviation.
Essentially the breakdown point for a parameter (median, mean, variance, etc.) is the proportion or number of arbitrarily small or large extreme values that must be introduced into a sample to cause the estimate to yield an arbitrarily bad result. The median’s breakdown point is .5 or half (the mean’s is 0). This means that the median only becomes “bad” when more than 50% of the observations are infinite.
set <- c(2, 6, 6, 12, 17, 25, 32)
The median is 12 and the mean is 14.28.
Constant “b” in the formula above is depending on the distribution. b=1.4826 when dealing with normally distributed data, but we’ll need to calculate a new “b” if a different underlying distribution is assumed:
b = 1/Q(0.75) (0.75 quantile of that underlying distribution)
To calculate the MAD, we find the median of absolute deviations from the median. In other words, the MAD is the median of the absolute values of the residuals (deviations) from the data’s median.
Using the same set from earlier:
- [(2 – 12), (6 – 12), (6 – 12), (12 – 12), (17 – 12), (25 – 12) ,(32 – 12)] Subtract median from each i
- |[-10, -6, -6, 0, 5, 13, 20]| Take the absolute value of the list
- [10, 6, 6, 0, 5, 13, 20] Find the median
- [10, 6, 6, 0, 5, 13, 20] -> [0, 5, 6, 6, 10, 13, 20] -> 6
- 6 * b -> 6 * 1.4826 = 8.8956
We now have our MAD (8.8956) to use in our predetermined threshold. Going back to our example set’s median of 12 we can use +/- 2 or 2.5 or 3 MAD. For example:
12 + 2*8.8956 = 29.7912 as out upper threshold
12 – 2*8.8956 = -5.7912 as out lower threshold
Using this criteria we can identify 32 as an outlier in our example set of [2, 6, 6, 12, 17, 25 ,32].
R code for MAD
mad(x, center = median(x), constant = 1.4826, na.rm = FALSE, low = FALSE, high = FALSE)
Standard Deviation and Variation in R