Averages.
January 29, 2003
Bush likes to talk about how much money the “average” tax payer will get back from the IRS because of his tax plan. Many listeners are going to hear that and think “yeah, average joe, like me,” because many people think “average” means “most.” In colloquial, every-day speech it often does. But not when people are talking taxes.
“Average” also has a very precise mathematical definition: the value obtained by dividing the sum of a set of quantities by the number of quantities in the set. The average does not tell you what “most” of the quantities in a set are. Most of the time, the average is probably quite a bit off.
If you’re nodding your head, you can skip the rest of this and pop over to Spinsanity, where they explain that the distortion of averages is not the only distortion game Bush is playing.
But if you caught yourself thinking Bush’s averages sounded pretty good, read on. I’ll show you how averages can distort — or even completely misrepresent — the facts.
Here are some examples of how this works:
Test Grades
Student
Score
Mike
100%
Maria
50%
John
50%
Sarah
50%
Nathaniel
50%
Amy
50%
Tim
50%
Melissa
0%
Class Average
50%
In this set of scores, most people scored 50% on the test, and the class average is, indeed, 50%. So the average is representative of the group. Now look at this set:
Test Grades
Student
Score
Mike
100%
Maria
100%
John
100%
Sarah
100%
Nathaniel
0%
Amy
0%
Tim
0%
Melissa
0%
Class Average
50%
See? No one made 50%. Four people did considerably better, but four people didn’t score at all. Saying the class has an average of 50% does not say anything substantive about the performance of individual students. You cannot even say “most of the students in this class are failing,” because half of them are making perfect scores and the other half are abject failures.
Test Grades
Student
Score
Mike
100%
Maria
100%
John
100%
Sarah
100%
Nathaniel
100%
Amy
100%
Tim
0%
Melissa
0%
Class Average
75%
Here most of the students are making perfect scores but the class gets a “C” average. (In my high school, this was a ‘D’.)
The larger the possible difference between numbers, the greater the distortion of the average can be:
National Income
Individual
Annual Income
Mike
$68,000,000,000
Maria
$500,000
John
$50,000
Sarah
$50,000
Nathaniel
$40,000
Amy
$20,000
Tim
$5,000
Melissa
$1,000
National Average
$8,500,083,250
Because Mike’s income is significantly higher than everyone else’s, the average income for citizens of this tiny nation is $8.5 trillon. This is despite the fact that three people (37.5% of the population) live below the poverty line.
So the average doesn’t really tell us anything about distribution. Furthermore, unless the values in the set are very close to each other, the average is not even representative of the set as a whole. That’s why when anyone — liberal or conservative — says “average” in a political context, you should switch on your BS detector and ask some very pointed questions. When discussing figures, don’t let anyone imply to you that “average” is the same as “most.” In fact, it rarely ever is.
Posted in
content rss

January 29th, 2003 at 12:47 pm
I couldn’t have said it better myself. By the way, the most common value in a distribution is the mode.
As a statistician and a crusader against polls and botched statistics, I feel like anyone who publishes a mean or average without giving an estimate of the standard deviation ought to be shot. And anyone who believes a statistic without looking at the standard deviation is a dupe.
The standard deviation basically measures how close MOST observations are to the average. In John’s examples, the standard deviations are:
26.73
53.45
46.29
24,041,596,923
The higher the standard deviation, the more scattered your data is…. Which is exactly the point that John was making in those examples.
Now, all of these may seem high to you. That’s because we only have 8 data points. You need at least several hundred before you get nice low standard deviations.
So never trust an average unless you know:
1. The sample size (and it should be large)
2. The standard deviation (sometimes give as standard error). Do not trust a margin of error. It’s only telling you a small part of the variation, not the whole story.
3. How the questions were worded (for polls) or how the study was completed. The way you collect data can have a HUGE impact on your results.
If you have all of this info then maybe you can trust the results. But a crafty politician can spin anything so buyer beware.
January 29th, 2003 at 2:57 pm
Maria, I can’t believe that you would say someone ought to be shot. In these troubled times!