Things I Write: State Tax Burdens, how to lie without technically lying.

Ranks. I hate ranks. People who love data hate ranks because ranks destroy knowledge. When I do my professional analytics, I only use ranks under two circumstances: 1) When my data is such a bizarre mess that I have to "go nonparametric" (that's the nuclear option for data analysts) or when a client foolishly insists on ranks. There are other reasons people use ranks, they boil down to 1) the people are clueless and want something simplistic, and 2) the people in question are dishonest and want to hide something. Whenever it's a political question, option 2 is most likely

This appears to be the case with some data I found on WalletHub. It presents a table of state tax burdens for 2019. All well and good. They had no data methodology other than to lift the numbers from a group that calls itself "Tax Policy Center". A lot of other web pages did that, too, each presenting it their own way. I'm picking on WalletHub because 1) It comes up very early in Google searches on this topic, and 2) Some of the goofy choices they made in presenting the data.

What were the goofy choices? First, when they made their choropleth map, they decided to use ranks instead of the actual tax burdens. They produced something that looked like this.

State Tax Burden, Colored by Rank

I will give credit where credit is due. WalletHub did use a sequential color scheme for color data instead of the extremely common and stupid practice of a diverging scheme. Not the specific color or range I'd have chosen, but at least conceptually okay. What is wrong with their presentation? If ranks were all they had, then there'd be nothing wrong with that. They had more than ranks. To visualize their data, they chose to eliminate information. That tells me the actual data didn't paint the picture they wanted, so they diddled and fiddled until they got something more to their taste. The proof of this pudding is the next, completely illegitimate, pseudo-analysis they pulled off. They classified states into "red" and "blue", without stating their criteria. Why should this matter? After all, aren't red states totally red, no matter what, the same with blue? Perhaps if one is a complete moron, one might think that. The topic in this case, is state taxes.

There are states in the USA that are currently neither red nor blue at the level of state government. Their state governments are split between the parties. Since the WalletHub goobers decided to not explain how they resolved such a situation into "red" or "blue", there is no way to understand what they did. I do suspect that they blindly and stupidly (yes, stupidly) applied the results of the 2016 presidential election, even though the states went through statewide elections since then.

In any case, they then used this division to "compare" the states' tax burdens by political allegiance. They did so by averaging the ranks, not the actual percent burdens. As expected, using ranks, they came up with an enormous difference: Red states had an "average rank" of 30.13 while blue states had 18.4. This is crap. How do I know it's crap, because I repeated their analysis as best I could reconstruct it. I came up with the same "average ranks", but when I averaged actual percent scores, the "red" states had an average of 8.08% vs. "blue" states at 9.27%. Yes, still a difference, but the magnitude is far different. While the "average ranks" had a difference of 63% (or 11.7 rank points), the proper analysis of actual percent burden showed a difference of 15% (1.2 percent points)!

Without technically falsifying data, WalletHub invented a very large difference that didn't reflect reality.

Of course, there are better ways, otherwise, I'd not have written this. First, the overall presentation of ranks is simply dishonest in this situation. Nothing is gained by presenting this data as ranks, as the following maps show:

State Tax Burden, "Squashed" Range	State Tax Burden, Starting at Zero

The map on the left is the actual tax burden percents, scaled to the same color range used by WalletHub. The minimum value is the lowest tax burden (Alaska). The map on the right is the same data except scaled to a minimum value of 0% tax burden. You will notice a difference. If the data is "stretched", differences can still look big. If a natural bottom end is chosen, you see that most of the differences melt into the background. So, what about the blue/red thing? When I consulted Ballotpedia, I got the distribution of state government political domination, Republican, Democrat, or Divided. Running these through the data gave me the following: Republican: 8.10% tax burden; Democrat: 9.39% tax burden; Divided: 8.43% tax burden. This isn't radically different from outcomes based on the red/blue WalletHub categorization, but it's a more complete picture. I won't bother averaging ranks. I'm not that dishonest.

Things I Write

November 7, 2019

State Tax Burdens, how to lie without technically lying.

No comments:

Post a Comment