Monday, March 4, 2013

Naked Statistics and surveys

























This year Charles Wheelan, who teaches public policy and economics at Dartmouth College, wrote a book titled Naked Statistics: stripping the dread from the data. I saw a favorable review of it by Stephen Few, and got a copy from my friendly local public library. I immensely enjoyed reading it. Mr. Wheelan does an excellent job of providing memorable examples to explain his points. Here’s my version of his example from the second chapter on descriptive statistics. 

Suppose up in Seattle there is a row of ten stools in a neighborhood bar. Nine of them are occupied by patrons watching a football game. Purely by accident they are sitting precisely in order of their annual incomes, which are:

$31,000 
$32,000 
$33,000 
$34,000 
$35,000 
$36,000 
$37,000 
$38,000 
$39,000

Their incomes could be described either by the mean (average) or the median (middle value), both of which are $35,000. Now Bill Gates walks in (with a talking parrot on his shoulder), and he sits down on the empty stool at the right. Bill’s income is $1 billion a year. What happens to those statistics when we include him? The median (middle) is unchanged - it still is $35,000, but the mean now is around $91 million. Using the mean here is formally correct, but grossly misleading. 

Chapter 10 on polling (surveys) is subtitled How we know that 64 percent of Americans support the death penalty (with a sampling error of +- 3 percent). He asks three important questions regarding surveys. The first is:

“Is this an accurate sample of the population whose opinions we are trying to measure?”

For example, if we are trying to get a random sample of adults, we might only call each randomly selected phone number just once and during the day. Who will we get? Old people, the unemployed, and lonely types who like to answer random phone calls.  Interviewers for serious polls instead will try to reach each phone number multiple times in both day and evening.

Back in Chapter 7 on The Importance of Data Mr. Wheelan mentioned the classic example where a huge but nonrandom sample provided wildly incorrect data. The Literary Digest magazine polled ten million prospective voters for the 1936 presidential election. Their poll was mailed to their subscribers, and to automobile and telephone owners. They predicted that Alf Landon would get 57% of the popular vote and defeat Franklin Roosevelt. But, their subscribers and owners of automobiles or telephones were wealthier than average. Instead Roosevelt won 60% of the popular vote. Similarly, college students (mostly sophomores) taking Introduction to Psychology or Introduction to Public Speaking probably aren’t a random sample of adults.  

If only a small proportion (say less than 25% of those contacted) completed the poll, there may also be sampling bias. For example, the NACE Job Outlook 2013 Survey I blogged about on February 25th only had a 25.2% response rate. Their 2011 survey, had just a 20.7% response rate. 


His second question is:

“Have the questions been posed in a way that elicits accurate information on the topic of interest?”

Opinion about the “inheritance tax” may differ from that about the “death tax.”

His third question is:

“Are respondents telling the truth?”

If you asked a sample of couples how many times a day (or a week) they had sex, you might not get an accurate answer. Similarly, if you asked men to report their penis length, you could expect to see inflated numbers. 


No comments: