The median is another measure of central tendency that represents the middle value of a dataset when the data is arranged in either ascending or descending order. The median divides the dataset into two equal halves, where half the values are less than the median and the other half are greater than it.
The median is often used when dealing with skewed data or when outliers may distort the mean. Unlike the mean, the median is not sensitive to extreme values.
How to Calculate the Median
For an Odd Number of Values:
- Sort the data in ascending (or descending) order.
- The median is the middle value in the sorted dataset.
For an Even Number of Values:
- Sort the data in ascending (or descending) order.
- The median is the average of the two middle values.
Formula for the Median:
- For an odd number of observations, the median is the value at position n + 1/2 .
- For an even number of observations, the median is the average of the values at positions n/2 and n/2 + 1
Where:
- ( n ) is the number of observations.
Example 1 (Odd Number of Data Points):
Consider the following dataset representing the ages of 5 people:
[ 24, 19, 35, 42, 29 ]
- Sort the dataset:
[ 19, 24, 29, 35, 42 ] - Since there are 5 values (odd number), the median is the middle value:
Median = 29
Example 2 (Even Number of Data Points):
Consider the following dataset of 6 test scores:
[ 85, 78, 92, 88, 73, 91 ]
- Sort the dataset:
[ 73, 78, 85, 88, 91, 92 ] - Since there are 6 values (even number), the median is the average of the two middle values (85 and 88):
Median = 85 + 88/2 = 173/2 = 86.5
Advantages of the Median
- Not Affected by Outliers: The median is resistant to extreme values or outliers that might distort the mean.
- Useful for Skewed Data: In datasets where the values are not symmetrically distributed, the median gives a better indication of the central tendency.
- Applicable for Ordinal Data: The median can be used for ordinal data, where values have a natural order but differences between values may not be meaningful.
Disadvantages of the Median
- Ignores Extremes: While it’s beneficial in ignoring outliers, sometimes the extreme values might be important to consider.
- Less Sensitive than the Mean: The median does not take into account the precise values of all the data points, so it may miss some subtle trends that the mean would detect.
When to Use the Median
- When your data is skewed (e.g., income distributions, real estate prices).
- When there are outliers that you don’t want to influence your measure of central tendency.
- When working with ordinal data or ranked data where the exact numerical difference between data points is not meaningful.
Median vs. Mean
- The mean is sensitive to outliers, while the median is not. For example, in a dataset of house prices where most houses are valued around $200,000 but one mansion is priced at $10 million, the mean will be pulled up significantly by the mansion, but the median will still reflect the typical house price.
- In symmetrical distributions, the mean and median are usually close or the same, but in skewed distributions, the median is often a better representation of central tendency.
Summary
- The median is the middle value of an ordered dataset and is a robust measure of central tendency, especially in the presence of outliers or skewed data. It is widely used in fields like economics, real estate, and social sciences, where data often exhibit skewness.