People Doing Statistics Badly #1

My intent is for this post to kick off a whole series on examples of bad statistical analysis in the media. For my purposes, that includes blogs.

There is a graph reproduced at Hooking Up Smart which is this installment’s example of statistics done badly. It purports to show the level of the average man and the average woman’s “sexual value” over their lifetime. I’m going to start with the things I find wrong with the graph and then go on to address some of the responses.

Things I find wrong with the graph:

  1. This graph is entirely subjective speculation sans any sort of data. This is a fact pointed out by a variety of people.
  2. No axis labels. Petty but true.
  3. For a graph that is supposedly an average over time, there should be a non-zero value (including for women) out to at least the average life expectancy. Which, last time I checked, was around 78.
  4. What are the units of sexual value and how is it computed? It strikes me that averaging preferences is closely related to at least some of the literature on what we in the US would call alternative voting schemes, in particular preference voting. The OkCupid data from further down seems like the closest to sort of preference vote data which would seem applicable. As Mrs. Walsh points out, the data doesn’t support the model.

Now to some of the responses . . .

  1. The area under each curve doesn’t integrate to one/the same value. This one is perhaps the most interesting objection. This is actually very strongly tied to how we want to think about sexual value over time and what the curve is trying to measure. Let’s suppose we were asking what percentage of sexual value is held by a certain segment of the population as defined by their age. Then yes, we would want to make the area under the curve equal to one because all of the sexual value has to go somewhere and each sex has a fixed total sum of it. Now let’s suppose we’re talking about the average person’s sexual value each year of their life. Given our understanding that a unit of measure should not change when we talk about an average, consider the following: should a person’s peak sexual value drop because they live longer? Men and women do, on average, have different life expectancies in the USA and as a result if we assert the area under both curves should be equal, then we are penalizing the longer-lived sex with slightly lower sexual value for the rest of their younger years in order to have some left over for those extra couple years of life. On the other hand, if we scale everything so that the peak of sexual value is 100 (which the graph doesn’t do either) differences in when the average man’s or woman’s peak and how their value drops off (it will start at zero and end at zero eventually) will determine the area under each curve.
  2. The top valued man is never, ever a “10″. Let’s start with the fact that we first have to agree on how to measure sexual value and that sexual value is tied up in tastes and preferences. Ergo, to the extent that the 0-10 scale is nothing more than one way to build a quick-and-dirty preference ballot, lots of people are ranked ten because they are, of the known choices, the highest ranked person under the “rules” of the balloting system. The objection cited isn’t per se wrong, but it is tied up in modeling assumptions which need to be cited and validated.
  3. Fertility is a significant driver of sexual value. To be fair, if you care about the possibility of having kids, this matters. If you don’t, it doesn’t. I’m personally very skeptical of evo-psych because I have a hard time distinguishing it from post-hoc story-telling, but that doesn’t make it wrong.*


*I find the whole idea of saying, “Oh, here’s an interesting modern human behavior! Let’s go study ancient humans and look for environmental conditions which would prompt this behavior to develop!”** invites confirmation bias. In a situation like this it’s important to remember that humanity is much more complicated than the models posit, the data is much worse than we would like it to be, and confirmation bias is insidious.

**The alternative version of this, “I think _____ might have been a pressure faced by the ancients that would cause this sort of behavior, so let’s go look and see if it was present” just shifts the risk from looking for evidence of some pressure (and therefore the existence of some story of how the behavior arose) to looking for evidence of the chosen pressure (aka the chosen story) and does not escape the risk of confirmation bias.***

***Yes, I nested footnotes.

About this entry