Nate Silver asks some questions here which can actually be answered by some research that Tian Zheng, Matt Salganik, and I published a couple years ago, in our article, “How Many People Do You Know in Prison?: Using Overdispersion in Count Data to Estimate Social Structure in Networks.”
Just a few key bits:
- The average number of people known is more like 750 than 290. We actually estimate the 750 using the same survey that the earlier researchers used to get the 290, but we discuss why 290 is too low an estimate. (In short, it is based on recall of common names such as Michael and Robert, which are under-recalled compared to rarer names and attributes.)
- I doubt that anything close to 3.2% of Americans really know someone who’s gotten sick with swine flu. The trouble is that survey estimates of the frequency of rare events are contaminated with misreporting errors. See here for a discussion by David Hemenway of this phenomenon in an unrelated context.
- The pattern of recall in a social network depends a lot on the attribute being asked about. In this example, I may very well know 750 people, but 600 of these are people I don’t see very often, and if they have swine flu, I’d have no idea. On the other side, there are all sorts of people I don’t really know, but if I heard they got swine flu, I might count it. In our paper, we found that people overreported the number of people they knew who had died in auto accidents in the past year, while underreporting people who had AIDS. (See Figure 5b on page 416 of the linked article.)
In summary, I like the idea of using this kind of indirect network data to learn about prevalence, but I’m afraid that the survey response is so unreliable as to make the estimate close to useless–except as a measure of what people’s perceptions are of swine flu prevalence