Avoiding Data Pitfalls, Part 1: Gaps Between Data and Reality
It’s easy when working with data to treat it as reality rather than data collected about reality. Here are some examples:
- It’s not crime, it’s reported crime.
- It’s not the number of meteor strikes, it’s the number of recorded meteor strikes.
- It’s not the outer diameter of a mechanical part, it’s the measured outer diameter.
- It’s not how the public feels about a controversial topic, it’s how survey respondents are willing to say they feel.
- It’s not how many people suffer from a particular disease, it’s how many times a doctor diagnosed people with a particular disease.
You get the picture. This distinction may seem like a technicality, and sometimes it is (the number of home runs Hank Aaron “reportedly” hit?) but it can also be a big deal. Let’s see an example of how missing it can lead us astray:
Click here to read more from this January 5, 2014 post by Ben Jones.