Mean, Median and Mode in Python
19th
February 2022, 22:56
The basic ways to analyze numeric data come in the form of formulae that help describe the numbers. These are the mean, median and mode of a dataset. Before computer programming came along and changed the world forever, these were the calculations that were used in statistics. Now, we have languages such as Python, which automate these tasks for us. However, it is useful to know how to derive these numbers ourselves.
To do this, I am going to explain each method, and use Python to implement these methods. At the same time, I am going to use Python's NumPy library to check if these implementations are correct. For the purpose of this exercise, we will be disregarding edge cases such as negative numbers and zero values.
Let's write a function to do this. We'll call it tt_mean(). In it, we accept a parameter, vals, which is a list.
We use a For loop to iterate through the list, totalling them up. This value is stored in a variable, total.
And then we divide that total by the number of values in that list. And we return the final result.
Compare the results against that of NumPy's mean() function. Here, the test dataset will be a list of 11 numeric values.
An exact match!

Let us now examine the Median of a dataset. We use this to discover the middle of a dataset.
So here's our function, tt_median().
We sort the list using the standard sort() method.
Declare med. Check for the length of the list. If the length is even, make med the average of the values just before and after the halfway mark. We can use the tt_mean() function we've already written, for this.
If the length of the list is odd, then just set med to the value of the element right at the midpoint of the list. Finally, return med.
Let's try this with two different lists. test will be the list we defined while testing tt_mean().
Compare the results against NumPy's median() function.
Close enough!

The Mode is a measure of frequency of appearance in a dataset. The value that appears the most number of times, is the Mode. In the case that more than one value has the same number of appearances, the smallest value is the Mode.
We declare freq, and set it to 0.
Then we use a For loop to iterate through the list, and increment freq each time the current element matches the value.
Finally, return freq.
Next, we declare tt_mode(), which accepts a list as a parameter. The code you are about to read is extremely inefficient, but it does the job!
In this function, we declare val and freq. Then we use a For loop on the list.
We run the current element through the tt_frequency() function, and assign the value to temp_freq. If temp_freq is greater than freq, then we replace that value in freq, and set val to the current element.
If not, then we check if it is equal. In that case, we next check if the current element is smaller than val.
If so, we set val to the smaller value. freq does not need to be changed because it is already the same value. Finally, return val.
Now let's test tt_mode()'s output with two different lists, against the Statistics library's mode() function.
Correct output.

Tags
See also
To do this, I am going to explain each method, and use Python to implement these methods. At the same time, I am going to use Python's NumPy library to check if these implementations are correct. For the purpose of this exercise, we will be disregarding edge cases such as negative numbers and zero values.
The Mean
This basically is the average value of an entire dataset. To achieve this, we add up all the values and divide this total by the total number of values.Let's write a function to do this. We'll call it tt_mean(). In it, we accept a parameter, vals, which is a list.
import numpy as np
import statistics as stat
def tt_mean(vals):
import statistics as stat
def tt_mean(vals):
We use a For loop to iterate through the list, totalling them up. This value is stored in a variable, total.
import numpy as np
import statistics as stat
def tt_mean(vals):
total = 0
for v in vals:
total = total + v
import statistics as stat
def tt_mean(vals):
total = 0
for v in vals:
total = total + v
And then we divide that total by the number of values in that list. And we return the final result.
import numpy as np
import statistics as stat
def tt_mean(vals):
total = 0
for v in vals:
total = total + v
return total / len(vals)
import statistics as stat
def tt_mean(vals):
total = 0
for v in vals:
total = total + v
return total / len(vals)
Compare the results against that of NumPy's mean() function. Here, the test dataset will be a list of 11 numeric values.
import numpy as np
import statistics as stat
def tt_mean(vals):
total = 0
for v in vals:
total = total + v
return total / len(vals)
test = [1, 3, 10, 45, 7, 8, 8, 10, 10, 8]
print(tt_mean(test))
print(np.mean(test))
import statistics as stat
def tt_mean(vals):
total = 0
for v in vals:
total = total + v
return total / len(vals)
test = [1, 3, 10, 45, 7, 8, 8, 10, 10, 8]
print(tt_mean(test))
print(np.mean(test))
An exact match!

Let us now examine the Median of a dataset. We use this to discover the middle of a dataset.
The Median
How the Median is derived, is to sort the values, then take the value that lies right in the middle of the sorted dataset. If there are two values in the middle of the dataset, we take the Mean of these values as the Median.So here's our function, tt_median().
def tt_median (vals):
We sort the list using the standard sort() method.
def tt_median (vals):
vals_sorted = vals.sort()
vals_sorted = vals.sort()
Declare med. Check for the length of the list. If the length is even, make med the average of the values just before and after the halfway mark. We can use the tt_mean() function we've already written, for this.
def tt_median (vals):
vals_sorted = vals.sort()
med = 0
if (len(vals) % 2 == 0):
med = tt_mean([vals[int(len(vals) / 2 - 1)], vals[int(len(vals) / 2)]])
else:
vals_sorted = vals.sort()
med = 0
if (len(vals) % 2 == 0):
med = tt_mean([vals[int(len(vals) / 2 - 1)], vals[int(len(vals) / 2)]])
else:
If the length of the list is odd, then just set med to the value of the element right at the midpoint of the list. Finally, return med.
def tt_median (vals):
vals_sorted = vals.sort()
med = 0
if (len(vals) % 2 == 0):
med = tt_mean([vals[int(len(vals) / 2 - 1)], vals[int(len(vals) / 2)]])
else:
med = vals[int(len(vals) / 2)]
return med
vals_sorted = vals.sort()
med = 0
if (len(vals) % 2 == 0):
med = tt_mean([vals[int(len(vals) / 2 - 1)], vals[int(len(vals) / 2)]])
else:
med = vals[int(len(vals) / 2)]
return med
Let's try this with two different lists. test will be the list we defined while testing tt_mean().
def tt_median (vals):
vals_sorted = vals.sort()
med = 0
if (len(vals) % 2 == 0):
med = tt_mean([vals[int(len(vals) / 2 - 1)], vals[int(len(vals) / 2)]])
else:
med = vals[int(len(vals) / 2)]
return med
test2 = [3, 3, 10, 17, 17]
print(tt_median(test))
print(tt_median(test2))
vals_sorted = vals.sort()
med = 0
if (len(vals) % 2 == 0):
med = tt_mean([vals[int(len(vals) / 2 - 1)], vals[int(len(vals) / 2)]])
else:
med = vals[int(len(vals) / 2)]
return med
test2 = [3, 3, 10, 17, 17]
print(tt_median(test))
print(tt_median(test2))
Compare the results against NumPy's median() function.
print(tt_median(test))
print(tt_median(test2))
print(np.median(test))
print(np.median(test2))
print(tt_median(test2))
print(np.median(test))
print(np.median(test2))
Close enough!

The Mode is a measure of frequency of appearance in a dataset. The value that appears the most number of times, is the Mode. In the case that more than one value has the same number of appearances, the smallest value is the Mode.
The Mode
We begin by declaring a function to determine the frequency. It is tt_frequency(), and it accepts a list and a value as parameters.def tt_frequency (vals, val):
We declare freq, and set it to 0.
def tt_frequency (vals, val):
freq = 0
freq = 0
Then we use a For loop to iterate through the list, and increment freq each time the current element matches the value.
def tt_frequency (vals, val):
freq = 0
for v in vals:
if (v == val) :
freq = freq + 1
freq = 0
for v in vals:
if (v == val) :
freq = freq + 1
Finally, return freq.
def tt_frequency (vals, val):
freq = 0
for v in vals:
if (v == val) :
freq = freq + 1
return freq
freq = 0
for v in vals:
if (v == val) :
freq = freq + 1
return freq
Next, we declare tt_mode(), which accepts a list as a parameter. The code you are about to read is extremely inefficient, but it does the job!
def tt_frequency (vals, val):
freq = 0
for v in vals:
if (v == val) :
freq = freq + 1
return freq
def tt_mode (vals):
freq = 0
for v in vals:
if (v == val) :
freq = freq + 1
return freq
def tt_mode (vals):
In this function, we declare val and freq. Then we use a For loop on the list.
def tt_frequency (vals, val):
freq = 0
for v in vals:
if (v == val) :
freq = freq + 1
return freq
def tt_mode (vals):
val = 0
freq = 0
for v in vals:
freq = 0
for v in vals:
if (v == val) :
freq = freq + 1
return freq
def tt_mode (vals):
val = 0
freq = 0
for v in vals:
We run the current element through the tt_frequency() function, and assign the value to temp_freq. If temp_freq is greater than freq, then we replace that value in freq, and set val to the current element.
def tt_frequency (vals, val):
freq = 0
for v in vals:
if (v == val) :
freq = freq + 1
return freq
def tt_mode (vals):
val = 0
freq = 0
for v in vals:
temp_freq = tt_frequency (vals, v)
if (temp_freq > freq):
freq = temp_freq
val = v
freq = 0
for v in vals:
if (v == val) :
freq = freq + 1
return freq
def tt_mode (vals):
val = 0
freq = 0
for v in vals:
temp_freq = tt_frequency (vals, v)
if (temp_freq > freq):
freq = temp_freq
val = v
If not, then we check if it is equal. In that case, we next check if the current element is smaller than val.
def tt_frequency (vals, val):
freq = 0
for v in vals:
if (v == val) :
freq = freq + 1
return freq
def tt_mode (vals):
val = 0
freq = 0
for v in vals:
temp_freq = tt_frequency (vals, v)
if (temp_freq > freq):
freq = temp_freq
val = v
else:
if (temp_freq == freq):
if (v < val):
freq = 0
for v in vals:
if (v == val) :
freq = freq + 1
return freq
def tt_mode (vals):
val = 0
freq = 0
for v in vals:
temp_freq = tt_frequency (vals, v)
if (temp_freq > freq):
freq = temp_freq
val = v
else:
if (temp_freq == freq):
if (v < val):
If so, we set val to the smaller value. freq does not need to be changed because it is already the same value. Finally, return val.
def tt_frequency (vals, val):
freq = 0
for v in vals:
if (v == val) :
freq = freq + 1
return freq
def tt_mode (vals):
val = 0
freq = 0
for v in vals:
temp_freq = tt_frequency (vals, v)
if (temp_freq > freq):
freq = temp_freq
val = v
else:
if (temp_freq == freq):
if (v < val):
val = v
return val
freq = 0
for v in vals:
if (v == val) :
freq = freq + 1
return freq
def tt_mode (vals):
val = 0
freq = 0
for v in vals:
temp_freq = tt_frequency (vals, v)
if (temp_freq > freq):
freq = temp_freq
val = v
else:
if (temp_freq == freq):
if (v < val):
val = v
return val
Now let's test tt_mode()'s output with two different lists, against the Statistics library's mode() function.
return val
print(tt_mode(test))
print(tt_mode(test2))
print(stat.mode(test))
print(stat.mode(test2))
print(tt_mode(test))
print(tt_mode(test2))
print(stat.mode(test))
print(stat.mode(test2))
Correct output.

In summation
Whatever I have presented above is probably not an exact replica of how the NumPy and Statistics libraries perform ther calculations, but it is a pretty close match to how the average human brain would calculate these numbers. Of course, the methods could always be better optimized.In the meantime, seeya!