Trend Lines and Linear Regressions
It’s often the case that when seeing and/or working with scatter plots we’ll see a trendline, maybe even with some values attached to it. But what do they actually mean?
Before we delve deeper into what a trendline and these numbers actually mean, I think it’s important to discuss whether or not they’re actually necessary. The presence of a trendline typically means there is a relationship between an independent variable (x) and a dependent variable(y). So when you have a scatter plot that’s showing with variables that don't influence each other, or even variables that you suspect to be related but aren’t showing that strong a correlation, consider if a trendline is really necessary.
The graph above shows a relationship between penguin’s flipper length and their body mass. As you’d expect, larger(heavier) penguins tend to have longer flippers. This is reflected in not only the shape of the graph, but also the R-Squared Value.
R-Squared measures how much variability in y is explained by the model y=mx+c, but what does that actually mean? In simple words, R-Squared tells you how correlated the two variables are. 1 is representative of a strong correlation, -1 a strong negative correlation and 0 represents no correlation.
What about the p-value?
The p-value represents whether or not the relationship defined by the trend line is statistically significant. It tells us if the relationship is real or if it could be down to random chance. Typically a threshold of 0.05 is used to judge the model described by the trendline. If the p-value is less than 0.05 then the model is likely a good predictor. If the p-value is more than 0.05 the model is likely not a good predictor.
These are what the R-Squared and P-Values mean in more simplified terms, and to understand them to really understand them a good understanding of statistics helps.
So it's important to consider whether or not a trendline is necessary. Dependent on who'll be using your graph, and if they'll receive any tangible benefit from including a trendline and its associated numbers.