An introduction to normal distributions

Many things in nature abide by what we call a normal, or Gaussian distribution. The general trends by which normal distributions follow involve most of a population adhering to some average, or mean, and then fewer of the population fitting into groupings around this. Common examples include height of a population, birth weight, and even some more unexpected metrics such as reading ability. Given a normally distributed population, we can make use of a number of analytical principles that can allow us to interpret the behaviour of this distribution. Every normal distribution has a mean average, around which one can calculate values such as the standard deviation, z-scores and p-values.

The standard deviation in essence tells us how "spread out" a distribution is. It is found by summing the squares of the distances of each value from the mean. It can be interpreted as the square of a norm if each member of the population is a coordinate in an n-dimensional vector space where n is the size of the population. Given this, it tells you an "average"(ish) of how far every point lies from the mean, so a distribution with a higher standard deviation will be more spread out, while a distribution with a low standard deviation will be more packed together. The variance of the distribution can be seen as the norm of this theoretical vector space, which is found by taking the positive square root of the standard deviation.

Using this standard deviation one can identify and isolate outliers using what we call a "z-score". A z-score of an element in the population describes its distance from the mean in multiples of the standard deviation, so for example, if a member of the population has a z-score of -0.5, it falls 0.5 standard deviations below the mean, while a z-score of 2.2 would have it fall 2.2 standard deviations above the mean. Z-scores can provide an insightful and helpful metric that can help contextualise outlying values with respect to how the entire population sits as a whole.

Z-scores can also help compare values across populations that are distributed differently. If one wanted to compare a footballer's performance metrics with the rest of their team, or their league, percentiles give you a qualitative answer but do not contextualise how they sit to the bulk of the pack (you know where they sit but not where in comparision to where the mean sits). By looking across important metrics (goals and assists per 90, pass completion percentage or sprints per 90 for example), one can measure where they sit with respect to their peers, and it allows one to 'normalise' these metrics to compare across different metrics of methods of appraisal.

Author:
Kealan Daly
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab