March 3, 2017

When Presentation Fights Against Information

I recently came across yet another choropleth (that's "color coded" for those of you who don't speak geek) map, this time from the Leage of Conservation Voters (LCV). It was part of an evaluation of the 2016 Congressional voting records on environmental issues. It presented both the House and the Senate. Since I'm talking about it on my grouchy blog, you already have guessed I have an argument with it. I'm not going to argue about whether or not their basic premise is true. A given congresscritter did vote for or against a given bill, and we can all look that up for ourselves. My quibble is their presentation of the data. It falls short in two ways. First, they made some choices that were a bit misleading. Second, their presentation falls short in accessibility and ease of comprehension. The on the left, below takes their data for the House of Representative in approximately the same colors that the League used. (If you mouseover the map, you'll get more information by state.)

House Votes by State, Colored by Fifths House Votes by State, Colored by Percent

The first map is an approximation of the LCV's own map. The LCV chose to arbitrarily divide the state average scores into even fifths (aka "quintiles"). This was not necessary. The second map shows what happens when, instead of arbitrarily cutting the data into fifths, you instead represent the actual percentage scores on the same color scale. If you notice that the two maps do resemble each other, you are not hallucinating. The point of my comparison is not that a direct representation of scale would give a drastically different result, the point is that arbitrary categorization oversimplified the results and eliminated useful nuances.

Compare the two maps to each other. In the first, everything is rigidly defined and perfectly clear-cut. The "green" states are all nicely and strictly "green" (in their particular shades), while the "red" states are nicely and strictly "red", with no variation in their "redness". Finally, the "orange" states are all staunchly in the middle. The reality, if you look at the second map, is that none of this is true. Let's look at Washington, Maine, and Illinois. According to the first map, Washington and Maine have no meaningful difference from Illinois, and Illinois and Wisconsin are the same as each other. Why? Because the entirely arbitrary cutoff used by LCV put them into different categories. In the second map, Illinois, Washington, and Maine actually are all closer to each other than Illinois is to Wisconsin or than Washington and Maine are to California. That's how the actual percentage ratings come out. The first map misrepresents these differences and similarities.

Let's look at Wisconsin vs. Michigan. In the first map, Wisconsin is a middling "orange" state while Michigan is in a "red" category. In the second map, they are very difficult to tell apart. I will not go so far as to claim that the intent of LCV is to mislead people, but the unfortunate outcome of their poor data presentation choices is misleading. The worst part is that it is immediately obvious that there was no need at all to use arbitrary categories. The data they wish to present is easily comprehended on the second map. The simplification of fifths combined being unnecessary with being misleading. Let's put it another way, is 19% (Louisiana) closer to 24% (Indiana) or to 0% (South Dakota)? The LCV method portrays 19% as not being meaningfully different from 0% but entirely a different category than 24%. Does that even make sense?

In addition to the unnecessary and misleading use of categories, the LCV made two other fundamental and very common errors in data presentation. First, their choice of colors is almost tailor-made to mean that people with the most common forms of color-blindness will have difficulty interpreting their graphic. For people with full color vision, red and green have excellent mutual discernability. Unfortunately, red-green colorblindness is the most common form of colorblindness. Fortunately, there has been a good deal of research on color schemes that can be interpreted by people with multiple types of colorblindness. Much of that research in useful palettes is even available online.

The other problem is that LCV chose the wrong kind of color scheme, altogether. The data in their scorecard is "sequential". Sequential data has "low" and "high" values with no "natural" center. They chose a color scheme with two extremes (green vs. red) and imposed an arbitrary center (orange). Such a color scheme is appropriate for "divergent" data, when you actually want to emphasize how the data deviates from a specific central value. The LCV certainly does not believe that "roughly 50%" environmental friendliness is a natural center or ideal from which everything deviates. But their choice of color gives that impression. The middle of a data set is not always a "natural" center. If what you want to emphasize is that there are "bad" situations and "better" situations, and the "ideal" is the "best", then you have sequential data. For sequential data, you want to use a sequential color scheme, where greater intensity means "more". What would the LCV map look like with a proper sequential color scheme?

House Votes by State, Colored by Fifths with a Sequential Color Scheme House Votes by State, Colored by Percent with a Sequential Color Scheme

The map on the left is the same set of "fifths" that the LCV used, but colored according to its appropriate data type. You can immediately see at a glance which states had higher environmental voting scores for their House of Representatives members. You do not need to look at a "key" or muddle through attempting to interpret color choices with no inherent link to the rankings. This is immediately more informative than the original LCV map. However, it is still as misleading as the original LCV map.

This is corrected by the map on the right, which presents actual percentage scores by color intensity. Again, at a glance, you can see not only which states are stronger in environmental voting record but even just how much they are stronger or weaker than other states. Again, the misleading impressions given for states like Wisconsin vs. Michigan or Washington vs. Illinois are no longer misleading. They relationships are accurately portrayed.

I don't know why LCV presented their data as they did. I don't think they intended to mislead, but I can speculate a bit. First, very few people are given adequate training in proper data presentation. Use of proper color schemes for different data types is rarely taught in formal coursework. Sensitively to colorblindness is also rarely covered in the incidental information given on data presentation. Second, arbitrary and rigid categories are comforting and comfortable. We like to think that the world is nicely split up into "our side" and "the enemy". Admitting that situations can be more gradual, and that there might not be simplistic steps between "saved" and "damned" makes us uncomfortable. The politically dedicated are very often most prone to such thinking.

However, in the end, I can't claim to actually know motivations for bad data presentation choices, but I can offer suggestions on how to more accurately and more effectively present the data.