Measure of Center
Sometimes we need measures to answer questions like What is the typical value of a quantity?. There are well know metrics to answer this kind of questions which are called measures of center. These measures are as follow:
Mean
The Mean of a quantity is defined as the sum of the values of that quantity in all observations divided by the total number of observations. This is sometimes called sample mean.
- \(N\): total number of observations (number of records in the dataset)
- \(x_{i}\): the value of quantity \(x\) for \(i\)th observation
- $$\bar{x} = \frac{x_1 + x_2 + \dots + x_N }{N}.$$
Here is an example of how to find the mean in
import numpy as np
height = [165, 170, 175, 180, 185, 190, 195]
np.mean(height)
Median
The Median of a quantity is defined as the middle value among all gathered observations. In order to find the median, we to the follwing:
- \(N\): total number of observations (number of records in the dataset)
- \(x_{i}\): the value of quantity \(x\) for \(i\)th observation
- \(y_{j}\): the sorted collection of \(x\)s for all observations
- $$\rm{median} = y_{\frac{N}{2}} $$
Note that the above formula is not exact but I think you can guess the exact formula depending on whether the \(N\) is odd or even.
Here is an example of how to find the median in
import numpy as np
height = [175, 185, 165, 180, 170, 190, 195]
np.median(height)
Note that numpy's median function automatically sorts the data and you don't need to pass sorted data to it to find the median.
Mode
The mode is the value that is the most frequent in the dataset. For a discrete variable, this is well defined. For a continuous variable, to find the mode, we need to plot a histogram of that variable and the bin with the highest height is the mode of that variable on a given dataset.
Example: Wealth Distribution
Imagine a "Fake Earth" which is in a "Fake Galaxy" and there are people who work and gather assets. Their total asset or wealth is shown in the following figure:
Most of the people in Fake Earth are poor and few of them are extremely rich. Suppose you want to answer this question: What is the typical wealth of people living in Fake Earth?. At first, you may think of mean, but as you see in the plot the median looks a better indicator of the wealth of people. The reason for this is that the mean is much more sensitive to outliers than the median. Elon Musk's wealth impact in the mean is much greater than its impact on the median because it is a huge number but Elon Musk is just one person.