About scatterplots

A scatterplot shows correlation, not causation.

Even if there is a strong apparent relationship in the scatterplot, this could be misleading. 

The trend could be due to several factors:

  • There could be a third factor (like income, education, or demographics) that actually affects both the X and Y topics. In this case, X and Y would appear to be related only because they are both related to the other factor. For instance, places with low education also have a lot of people using food stamps, but using food stamps does not lead to poor education, or vice versa - instead, both affect (or are affected by) income.

  • X and Y could actually be two sides of the same coin. For instance, unemployment and average hours worked may appear to have a strong relationship, only because they both reflect economic conditions - people who are unemployed work few hours.

  • The relationship could be purely random chance. This happens when viewing a small number of places in the scatterplot (under 50 or so), but is unlikely when looking at hundreds or thousands of places.

  • There could actually be a true relationship between X and Y, where one contributes to the other.

How do you know whether the relationship is genuine?

Verify it.

See Verify Your Scatterplot in Metopio's tutorials for step-by-step instructions.