December 22, 2015

Liars, Damned Liars, Presidential Candidates

Tree of Candidates, 22 December 2015
image/svg+xml Bush Carson Christie Clinton Cruz Fiorina Huckabee Kasich Obama O'Malley Paul Pelosi Rubio Santorum Trump

A few months ago, I plotted out the Presidential candidates from the two major parties in terms of their truthfulness. I did this through a "tree" (really more of a "bush") that showed how close each was to all the others if one uses the Politifact Truth-o-Meter to measure "truthfulness".

As I mentioned before, Politifact gives us a summary chart for each person and a description of each of their determinations. Unfortunately, comparing the profiles isn't quite straightforward, especially if you want to compare several of them at once. That's where nerdistry comes in.

So, once again, I took the data on each politician's page, ran it through some nerd magic, and came up with the clustering. The names correspond to formally filed candidates who have more than 4 rulings on Politifact and haven't dropped out of the race, plus Barack Obama and Nancy Pelosi, for reference. I color-coded them by party. You can click on any name to lead you to the person's Politifact page. Many of the candidates had more data--more statements that got a Politifact "ruling". That meant that Dr. Ben Carson made it onto the tree. He wasn't in August, only because Politifact did not have at least 4 statements by him. Now they've plenty. Several candidates dropped out. Nevertheless, if you look at the older tree, you'll see that things haven't changed much. As bafore, three "meaningful" clusters appeared in the data and have curves drawn around them. The differences among politicians inside the same "meaningful" cluster are not worth noting. Yes, this means that, when it comes to truthfulness, as measured by Politifact, Santorum, Fiorina, and Huckabee lump in with Pelosi. Clinton (and Obama) are pretty much the same as Bush, Christie, Kasich, Paul, and Rubio.

What do the clusters show? As last time, it comes down to what end of the "True" vs. "Pants-on-Fire" profile a candidate sits on. The top left cluster (Let's call it Clinton-Bush) leans more to "True" and "Mostly True". The cluster on the right (Pelosi and her boys) tends to prefer "half true" and "mostly false". The bottom cluster is heavily dominated by "False", with a dash of "Pants on Fire" and a rare "True" or "Mostly True" O'Malley is in his own world, but he has moved compared to the tree as a whole. Last time, he was a complete outlier and couldn't be related to anyone. Now that the wacky pack at the bottom has amassed a truly monumental level of whoppers, this has squeezed O'Malley in to be closer to the other two clusters than to Trump/Fiorina/Cruz.

And the take-home message? Two messages: First, if you agree with Politifact, it's a rough indication of who is more trustworthy. If you reject Politifact's conclusions, just invert the true/false interpretations. Second, you can see who resembles each other in terms of trustworthiness and that this hasn't changed much since August. Agree with or reject Politifact, this part is consistent. Politicians in the same cluster seem to have the same basic character as each other when it comes to honesty or its lack. Like I said, if you dislike Politifact, just flip the interpretation of truth.

Nerd Section

This is a repeat of Agust's methods. I used a copy of "R" statistical language and the "cluster", "gclus", "ape", "clue", "protoclust", "multinomialCI" and "GMD" packages. Then I gathered up the names of declared candidates for US President. I did not intend to limit this to only Republicans or Democrats. Unfortunately, when I looked people up on Politifact, it was only Republicans or Democrats who had more than 4 rulings. Why more than 4? A rough estimate of the "standard error" of count data is the square root of the total. The square root of 4 is 2, which means that if a candidate had 4 rulings, the accuracy was plus or minus 2. That's too much for my taste. This time, I had 15 candidates.

Comparing them required a distance metric. I could have assigned scores to each ruling level and then calculated an average total per ruling. While this might be tempting, it is also wrong. Why is it wrong? Because that method would make a loose cannon the same as a muddled fence-sitter. Imagine a candidate who only tells the complete truth or complete whoppers. If you assign scores and average, this will come out being the same as a candidate who never commits but only makes halfway statements. Such people should show up as distinct in any valid comparison.

Fortunately, there are other ways to handle this question. I decided to use a metric based on the chi distance. Chi distance is based on the square of the difference between two counts divided by the expected value. It's used for comparing pictures, among other uses. However, a raw chi distance depends very much upon the total, and the totals were very different among candidates. The solution to this was easy, of course. I just took the relative counts (count divided by total) for each candidate.

I needed one more element for my metric. Politifact does not rate every single statement someone makes. They pick and choose. Eventually, if they get enough statements, their profiles probably present an accurate picture, but until they get a very large number of statements, there is always some uncertainty. Fortunately, multinomialCI estimates that uncertainty. I ran the counts through multinomialCI and got a set of "errors" for each candidate. I combined these with the chi distances to obtain "uncertainty-corrected distance" between each candidate. Long story short, this was done by dividing the chi distance by the square root of the sums of the squares of the errors. What that meant is that a candidate with a large error (few rulings) was automatically "closer" to every other candidate due to the uncertainty of that candidate's actual position.

I then created a series of hierarchical clustering trees from this set of distances. There is a good deal of argument over which tree creation method is best. I decided to combine multiple methods. I created trees using "nearest neighbor", "complete linkage", "UPGMA", "WPGMA", "Ward's", "Protoclust", and "Flexible Beta" methods. The "clue" package was designed to combine such trees in a rational fashion. Feel free to look it up if you want to follow all the math. I used clue to create the "consensus tree", which is the structure I posted on my blog. But clue doesn't tell you how to "cut" the clusters. For that, I turned to the "elbow method".

The elbow method is an old statistical rule of thumb. Basically, any set of "clustering" has multiple ways you can slice it to say "these things fall into those groups and smaller groups don't really matter". The "elbow method" compares the "variance" of each possible way of cutting the clusters and charts them on the basis of number of clusters vs. "variance explained" by that number of clusters. The math is not simple. What you do is then plot the "variance explained" vs. the number of clusters. What you look for is a "scree" or an "elbow". The line will always be descending. The idea is that you hope there is some point where there is a sharp bend in your line. At the point of that bend is the "elbow". More clusters won't add enough additional explanation to be worth the cut. In this case, my elbow was at four clusters, the three I outlined plus O'Malley.

July 28, 2015

Lies, Damned Lies, and Presidents

Tree of Candidates, 5 August 2015
image/svg+xml Sanders Obama Chafee Clinton Webb O’Malley Pelosi Paul Christie Rubio Graham Kasich Bush Jindal Fiorina Cruz Trump Santorum Perry Huckabee Walker

In case you are a hermit who somehow has internet access, the USA has already started the run-up to the run-up to our Presidential election. Since our current Commander in Chief is a Democrat, that means that there are boatloads of Republicans trying to get the job. Just to keep things goofy, though, the Democrats have no "anointed one", who is the presumed favorite of the current President for the post.

Anyway, we've got an awful lot of "Vote for me because vote for me." already going on. If you don't know dink about the US political system, here's how this part works. Political parties in the USA are technically "private" organizations. Nevertheless, the two biggies (that would be Republicans and Democrats) have pre-elections, where they fight amongst themselves to get the support of their hardcore hardline members in the various states. These pre-elections then decide who will be the official candidates for each party.

That means that the campaigning starts early in the USA, with a lot of argle, bargle, wharr, and garble. There are all kinds of "ratings" out there that try to summarize each candidate on various axes. Conservative? Liberal? Environmental? Business? Pick a special interest and go with it. One that I like is Politifact. What they do is pick out statements made by public figures and rate them on a six-level scale, on the basis of how factual the statement is. They have a page devoted to the many people they've looked at.

You could, if you wanted, browse through every single page and get a rough idea of a politician's history. They even summarize things with a little chart for each person. Unfortunately, comparing these histories can be a little gnarled, especially if you want to compare several of them at once. That's where nerdistry comes in.

I am certain you have noticed the diagram on the right. It's called a "hierarchical clustering". I took the data on each politician's page, ran it through some nerd magic, and came up with the clustering. The names correspond to formally filed candidates who have more than 4 rulings on Politifact, plus Barack Obama and Nancy Pelosi. I color-coded them by party. Each name can be clicked to lead you to the person's Politifact page. The three "meaningful" clusters have curves drawn around them. The differences among politicians inside the same "meaningful" cluster are not worth noting.

What do the clusters show? To keep matters short, it comes down to what end of the "True" vs. "Pants-on-Fire" profile one comes out on. The cluster at the top of the figure leans more to "True" and "Mostly True". The bottom left cluster is fairly evenly distributed among all the answers, but tends to prefer "half true" and "mostly false". The bottom right cluster is dominated by "False", with a sprinking of everything else from "Mostly True" to "Pants on Fire". Outright "True" is rare for them, though. O'Malley is just too far from anyone else to fit in.

What does this mean? If you agree with Politifact, it's a reflection of who is more trustworthy, but not in any fine-grained sense. If you reject Politifact's conclusions, just invert the true/false interpretations. What is important is that you can see who resembles each other in terms of trustworthiness. Agree with or reject Politifact, that is still consistent. Politicians in the same cluster seem to have the same basic character as each other when it comes to honesty or its lack. Like I said, if you dislike Politifact, just flip the interpretation of "True" vs. not true.

Nerd Section

If you are interested in how I came up with the "truth tree" and its "meaningful clusters", first I needed a copy of "R" statistical language and the "cluster", "gclus", "ape", "clue", "protoclust", "MultinomialCI" and "GMD". Then I gathered up the names of declared candidates for US President. I did not intend to limit this to only Republicans or Democrats. Unfortunately, when I looked people up on Politifact, it was only Republicans or Democrats who had more than 4 rulings. Why more than 4? A rough estimate of the "standard error" of count data is the square root of the total. The square root of 4 is 2, which means that if a candidate had 4 rulings, the accuracy was plus or minus 2. That's too much for my taste. This left me with 21 candidates.

Comparing them required a distance metric. I could have assigned scores to each ruling level and then calculated an average total per ruling. While this might be tempting, it is also wrong. Why is it wrong? Because that method would make a loose cannon the same as a muddled fence-sitter. Imagine a candidate who only tells the complete truth or complete whoppers. If you assign scores and average, this will come out being the same as a candidate who never commits but only makes halfway statements. Such people should show up as distinct in any valid comparison.

Fortunately, there are other ways to handle this question. I decided to use a metric based on the chi distance. Chi distance is based on the square of the difference between two counts divided by the expected value. It's used for comparing pictures, among other uses. However, a raw chi distance depends very much upon the total, and the totals were very different among candidates. The solution to this was easy, of course. I just took the relative counts (count divided by total) for each candidate.

I needed one more element for my metric. Politifact does not rate every single statement someone makes. They pick and choose. Eventually, if they get enough statements, their profiles probably present an accurate picture, but until they get a very large number of statements, there is always some uncertainty. Fortunately, multinomialCI is perfect to estimate that uncertainty. I ran the counts through multinomialCI and got a set of "errors" for each candidate. I could combine these with the chi distances to obtain "uncertainty-corrected distance" between each candidate. Long story short, this was done by dividing the chi distance by the square root of the sums of the squares of the errors. What that meant is that a candidate with a large error (few rulings) was automatically "closer" to every other candidate due to the uncertainty of that candidate's actual position.

.

I then created a series of hierarchical clustering trees from this set of distances. There is a good deal of argument over which tree creation method is best. I decided to combine multiple methods. I created trees using "nearest neighbor", "complete linkage", "UPGMA", "WPGMA", "Ward's", "Protoclust", and "Flexible Beta" methods. The "clue" package was designed to combine such trees in a rational fashion. Feel free to look it up if you want to follow all the math. I used clue to create the "consensus tree", which is the structure I posted on my blog. But clue doesn't tell you how to "cut" the clusters. For that, I turned to the "elbow method".

The elbow method is an old statistical rule of thumb. Basically, any set of "clustering" has multiple ways you can slice it to say "these things fall into those groups and smaller groups don't really matter". The "elbow method" compares the "variance" of each possible way of cutting the clusters and charts them on the basis of number of clusters vs. "variance explained" by that number of clusters. The math is not simple. What you do is then plot the "variance explained" vs. the number of clusters. What you look for is a "scree" or an "elbow". The line will always be descending. The idea is that you hope there is some point where there is a sharp bend in your line. At the point of that bend is the "elbow". More clusters won't add enough additional explanation to be worth the cut. In this case, my elbow was at four clusters, the three I outlined plus extreme outlier O'Malley.

May 28, 2015

What does "Right to work" actually do?

So, what's the argument over?

Arguments made in favor of right-to-work (RTW), at least for the consumption of the general public, all boil down to claiming a better employment atmosphere overall in a state. It increases overall employment, and it increases wages. Before I go on, I'm going to explain some of my personal biases. No law should ever be made without compelling need. A slight marginal improvement is usually not worth the burden put upon the citizenry by a new law, whatever that burden may be. Thus, the entire burden of proof is not that a law will not make things any worse, it is that a law must make things better. This is why I'm not explicitly testing anti-right-to-work claims. Anti-any-law automatically is favored as the "null hypothesis". Of course, some laws are trivial to justify. The damage done to people and society by practices such as child prostitution are so enormous, and the moral issue so clear-cut, that it is trivial to show an overriding social need for a law against such practices. When it comes to labor law, things can start to become less immediately clear-cut. What is RTW? I explain at the end if you don't already know.

Let's Talk Money

All State Differences
Median Wages
Orange: Right-to-work state does better.
Blue: Non-RTW state does better.

One conventional measurement that dominates the argument about right-to-work is wages. If you know me, though, you'll already know that I will not look at them in conventional ways. This is because most attempts to both support and attack RTW on a wage basis have been pretty much crap, and pretty much dishonest. Dishonest crap? Yes. The attempts to attack and defend RTW are based on "average" wages. That's just plain silly. I'll show how at the end of the article. Instead of average, I will use median.

Okay, so now that I've chosen median income as the basis of comparison (conveniently available from the Burea of Labor Statistics, how to compare? One way is to aggregate the two groups of states (RTW vs. non-RTW), subtract one aggregation from the other, et voila! But "simple" isn't always so simple. If there is a difference between the two, is that difference meaningful? There are several little statistical "tests" that could be used, but the tests make a lot of assumptions.

Our data covers all the possible bases (all 50 states for that year). Statistics may actually obscure information. Anyway, what aggregation should I use? Should I compare the average of the medians? That sounds pretty darn kooky, although it's easy to calculate. If average is a bad idea for individual states, what makes it a good idea for states as groups? Okay, how about the medians of the medians? Again, there could be problems with this. If nothing else, it's very coarse and clunky.

What can be done, then? There are only 50 states. Of these 24 are RTW, 26 are not. That's not much. That's only 624 pairwise comparisons of states. We have spreadsheets in the modern day. 624 subtractions are nothing! Okay, so I can do 624 subtractions of one state's median wage from another's, then what? Aggregate the subtractions and present column charts with error bars and all kinds of statistical gobbie-goo?

I could, but it would only hide more than reveal. After all, when I've got that few points of data (yes, 624 is few points in my world), why not just present them all and let the reader see directly? That's what I did. The figure to the right is a "histogram". It displays every single comparison, grouped in "income difference" brackets. Orange columns are where an RTW state had a higher median income than a non-RTW state. Blue columns are the other way around. If you mouse over, you'll see the limits of each bracket and the actual number of comparisons that fell into that bracket. Overall, an RTW state was better in 106 comparisons. A non-RTW state was better in 518 comparisons.

Is that "significant"? I hate "significant", and I do statistics for a living. "Significant" was just a shorthand that Professor Pearson came up with years ago as an arbitrary cut-off. Some moron could come along and note that less than 95% of the comparisons went against RTW. So? Does that mean the 5 out of 6 comparisons that went against RTW don't count? If you were told that, if you drank something, it had an 83% chance to make you sick and do nothing good for you, would you say "It's not a 95% chance, so it's not significant!" Of course you wouldn't. Statistics were invented to deal with situations where we do not have all the data points and are trying to make a conclusion. Here, we don't have to guess. We know that out of all possible combinations, RTW did more poorly in 518 out 624 comparisons. Those comparisons, by the way, were not of "typical" (middling) RTW states vs. "typical" (middling) non-RTW states. They included everything.

All State Differences
Employment per Population
Orange: Right-to-work state does better.
Blue: Non-RTW state does better.

Let's Talk Jobs

Enough on income (for now). What about claims on employment? Here is where it gets more murky. First, we have the crap input problem. Most analyses of RTW use the "unemployment rate". Funny thing about the "unemployment rate". When a Democrat is president and the "unemployment rate" goes down, Republicans say that it's a crappy metric. When a Republican is president and the "unemployment rate" goes down, Democrats say that it's a crappy metric. If the value of a metric depends entirely upon who is in office and what party is commenting, it's a flat-out crappy metric!

There is a number I will use. Numbers of people employed (for 2013, from the BLS) divided by total population (estimated for 2013, US Census) is a good reflection of not only those who are working, but how much the fruits of their labor ends up having to be distributed among "mouths to feed". In 218 comparisons, RTW states came out better. In 406 comparisons, non-RTW states came out better. It's closer, but I'm not comfortable saying this is a wash. After all, roughly 2/3 of the time, RTW lost out again. So, even though it's closer, RTW lags behind non-RTW in simply providing employment to the population at large.

All State Differences
Median Wages Adjusted by
Employment, Population, and Cost of Living
Orange: Right-to-work state does better.
Blue: Non-RTW state does better.

Let's Talk Money and Jobs

The tale is not told, though. After all, what if RTW states (like Texas) happen to be very populous states and non-RTW states (like Alaska) are sparsely populated? Then, even though on a pure state-by-state basis, RTW might not do well, in terms of overall prosperity of human beings, it might shine! But how to measure that? If we are thinking primarily of ordinary people—and we should, since the arguments about RTW always get down to whether it helps ordinary people, we can start with median income, again. If everyone were making the median income, the median income would not change. Then multiply the median income by the number of employed people in a state to get an aggregate income estimate. We also have to take into account that a state may have a lot more people to support on top of those who are working. Divide the aggregate income by the state's total population. This "population-adjusted income" can give us an idea of how well each state does vs. another in terms of the comfort of its mass of people.

Let us not forget cost of living, since higher wages can be passed on to consumers by the businesses paying them. What does that give us? You've probably already been looking over the last graph. As you can see, RTW still does not do as well as non-RTW in terms of income, adjusted by employment, population, and cost of living. In 202 comparisons, an RTW state did better than a non-RTW state, but in 422 comparisons, a non-RTW state did better than an RTW state.

What does this tell us? Look at the charts and tell yourself what it tells you. The argument tested is "Right to work improves the lot of the worker". While this might be sometimes true, it is false in two-thirds of the comparisons. This is enough to severely call into question the argument that RTW is of benefit to the ordinary worker. What it tells me, personally, is that government interference in the free market is a bad idea even if that interference is supported by large businesses. If government is doing its job in terms of police and night-watchman duties, unions are not able to actually force employers to accept any terms at all, except by using the exact same tactics that companies can use to "force" restrictive clauses into contracts--aka "playing hard-ball", and businessmen do it through purely legal means all the time. However, when government decides to meddle, and RTW is meddling by government in what should be a purely business-to-business (yes, Virginia, unions are businesses) interaction, things don't work as well. Socialistic meddling in favor of employers is no less stupid than socialistic meddling in favor of workers.

What is the take-home? Given the data at hand, a compelling argument in favor of enacting or maintaining RTW cannot be made except perhaps in a few extreme circumstances. By and large, RTW is not a policy that produces enough benefit to be worthy of being kept as law. Government meddling is never anything better than a necessary evil, and if it is not actually necessary, then it is merely evil.

What is "right to work"?

Several states in the USA have laws that are called "right to work" by their proponents. In a "right-to-work" (RTW) state, a union and employer are prohibited from entering into an agreement to "govern the extent to which an established union can require employees' membership, payment of union dues, or fees as a condition of employment, either before or after hiring". This has a lot of political fol-de-rol associated with it, nearly all of it hypocrisy. More "pro-business" groups support such law, since it restricts those nasty-wasty scary pants-wet-inducing unions. Those groups come out in favor of banning "non-compete" or "conflict of interest" clauses in employment contracts. On the other hand, "pro-labor" groups (which always oppose "right-to-work") also oppose non-compete and conflict of interest clauses. If exclusivity in any business agreement—labor contracts are just a sales contract, after all—must be prohibited, then it must be prohibited in all business agreements. But I'm trying to apply logic to politics, which only serves to make the stupid angry.

Why Average Can Be Silly

Here's a very simple example: Suppose you have a conference room with five graduate students and one well-established full professor who has discovered some lucrative inventions and gets patent royalties. The graduate students get $24,000 per year for their work as teachers and laboratory assistants. The faculty member gets $120,000 a year for his salary and those patents he has a piece of. The "average" annual pay for the room is $40,000 per year. Some political idiot can them come along and say that $40,000 a year is "typical" for that room. It's obvious that it isn't. There is a big difference between $24,000 and $40,000.

What number would give us a better idea of the "typical" income? That would be the "median", the "number in the middle". Half of everyone in the group makes less than that, half of everyone makes more than that. In our example, the median would be $24,000. Remember, we're not talking about the full range, but if you have to pull out one number that is most likely to represent a given individual in that room.

So, for income, I will use state medians instead of state "averages". If you bother to look up the matter, you will find out that state medians in the USA are always lower than state "averages". This is because of how top-heavy the US income distribution is. This is also a good indication that "average" is a particularly stupid way to express "typical" income in the USA. If this were a non-issue, we would not see an invariant relationship between "average" and median.

Before I forget, here is the data I used to generate the histograms, nicely summarized.

More Gallup Misrepresentation. This time, obesity.

Yet another Gallup survey is making the rounds, lately. This one is about obesity by state. This time, instead of quintiles, the accompanying map is, as you can see, split into arbitrary cut-offs of obesity. If you want, you can go over there and look at their map or look at the first map below. It represents the cut-offs in approximately same colors. (If you mouseover the map, you'll get more information by state.)

Obesity by State, Arbitrary Categories (darker is more)
"Squashed" Range (darker is more)
Full Range (darker is more)

This map is an excellent example of how data presentation choices can be fraudulent without being fraudulent, how to lie without lying. Honest people use quintiles, quartiles, percentiles, and other such non-parametric numbers to represent either data that has a long, uneven, and strung-out range (like achievement test scores), or to group a different set of data to show how it is distributed (like wealth per quintile). It just so happens that you can look a the obesity percentages for yourself. Notice that the data is not the strung-out and scattered. In fact, it is very densely-packed. It also is not linked to some other unevenly-distributed data.

The obesity rates are actually very densely packed. From 0% obese to 100% obese, the lowest state's rate is 19%. The highest is 35.2%. Look at that first map, again. Is a difference of about 16 percentage points worth that much a visual difference?

How else to represent the difference so people can get an idea of reality instead of a visual lie that technically represents the actual numbers? The second, or "squashed scale" map does that. The "worst color" (dark gray) is matched to rate of 35.2%. The "best color" is matched to 19%. The range between is then evenly filled in among the three color points. Look different? It does. Yes, there is some rough correspondence between the misleading map that comes from Gallup and the (somewhat) more truthful map I created

But I'm not finished. You see a third map. This is a map where the "worst color" corresponds to 100% obesity in the population (which is the worst possible condition) and the "best color" corresponds to no obesity. Thus, changes in color correspond to linear differences along the full possible range. Having a hard time telling the states apart? That is because the differences among them in this index really are quite small--total range is 16.2 percentage points out of 100. This map shows you what that looks like.

So, why does Gallup do this, and why do people so stupidly and eagerly swallow such dishonest representation of data? First, explaining Gallup. I don't work there, so this is speculation, but Gallup makes its money off turmoil. Anything they publish that will stir the pot will inspire more surveys that they can sell. Likewise, presenting things in extreme (and dishonest) ways ensures that there will be more arguments, leading to more survey commissions, leading to more dishonest data presentation, leading to more arguments. It's a lucrative circle for Gallup.

But why do people so eagerly devour such steaming dog turds of quasi-information? First, they're simple. People like very stark, very simple things to natter on about with each other. People do not like complex and shaded descriptions. They want things to be very neatly pigeonholed, and this comforts them. In addition, people with agendas want things presented as rigidly and extremely as possible to the public, all the better to sound the panic alarm and drum the masses into obedience. Finally, we are taught that only rigid and extreme answers can be "true". We are indoctrinated to be mental weaklings, to always see the world as "good" and "evil" with nothing in between. We are taught that someone who is able to see gradual differences is a "fence-sitter" or "spineless". We are told that only extremism is good--although it's only actually extremism when it's someone you don't like doing it.