March 6, 2017

What is behind the frequency of hate groups in the USA?

Up front, I am going to specifically invite comments on this entry. I have actually been thinking of this matter for several years, ever since I first discovered the annual SPLC list of hate groups. I am sure that my essay could be improved and would love to do so.

This is a very large blog entry. It's because there is a lot to be said on this particular issue, and there is no ethical or moral way to boil it down to a few talking points. Recent events have brought the prevalence of "organized hate" back into the larger public eye. This, of course, shouldn't be taken to mean that this sort of thing suddenly sprang up out of nowhere. Our most recent presidential election did not create hate groups or mass hatred. It goes back a long way. The Southern Poverty Law Center publishes an annual accounting of how many hate groups each state has. I have compiled the data from 2007 to 2015 and present it below as they and many press outlets who mentioned the report present it.

SPLC Reported Hate Groups by State
2007 2008 2009
2010 2011 2012
2013 2014 2015

The more intense the color, the more hate groups per state. Hovering your mouse or pointer over a state should get more info to pop up for you. California does look like it has a lot of hate groups, doesn't it? But notice that Texas and Florida are also relatively dark. Whenever you see a recent map that shows anything "by state", and California, Texas, and Florida (often New York, too) are the most intensely-colored states, that should be taken as a warning. Why? Things change when you take population into account. A raw count of "hate groups" by state is mostly a crude measure of state population.

How does population change the picture?

The SPLC gives far too little attention to effects of population sizes. The first thing we must remember is that, when you compare states, countries, cities, or any places that people live, population size should never be dismissed with a single sentence. Population size makes a big difference. We see that when we look at at state populations vs. number of hate groups by state for 2007-2015. All other factors being equal, the bigger a state's population, the more of any human activity it should be expected to have. Any report that ignores or downplays this fundamental fact grossly misrepresents reality. When we look at hate groups per million people by state, we can see that the states still differ among each other. Something also worth noticing is that these by-state rates of hate groups to population change over time. In some states, they increase, in others, they decrease.

SPLC Reported Hate Groups per Million People by State
2007 2008 2009
2010 2011 2012
2013 2014 2015

That doesn't mean that we can just say "case closed" and put down any differences in hate groups by state to population. After all, while the apparent differences change, they aren't erased by taking population into account. What could explain the inter-state differences? The SPLC prefers to attribute variation of organized hate in the USA almost entirely to politics. This probably fails to tell the whole story, although it is still probably an important factor. Demographic (race, education, age) and economic (employment, income) issues can weigh more heavily than any political group wants us to believe. But how important are economics, education, demographics, etc., in this? To answer this, I took the SPLC data from 2007-2015 and compared it to economic, educational, age, and race/ethnicity trends for those years from the US Census American Fact Finder. I didn't use numbers for 2016 because the Census office won't release those until nearly the end of 2017.

I don't believe there are no political elements. It's just that a great deal of what we call "politics" actually is a result of larger demographic and economic factors. Politicians spin these factors as if it were the politicians who could magically who control all these factors.

I selected elements of race/ethnicity, age, education, income, and employment. I could have added more, such as religiosity, but annual state-by-state estimates of religiousity were not available to me, and I wished to do a time course analysis, not just a single-year snapshot.

Making a "model".

I am going to use the word "model" a lot, but what do I mean? In this case, "model" means an equation that estimates an outcome (number of hate groups per population) based on some input data. Even restricting the choices, there is a confusing variety of possible ways to describe age, race (and ethno-racial diversity), income, education, etc. Many of these descriptions overlap each other to some extent or another. Too many variables could lead to models that "overfit" the data. The model might look good, but it has so much input that it describes the random noise more than it does the issue in question.

So, how to put the variables together? There are an enormous number of ways to explore this question. I decided to use a method called "model averaging", where a set of "better" models are selected from a large number of potential models and combined to produce an overall estimate. I also presume that the numbers generated by the model indicate how important each input is toward the estimates. The model type I used was a generalized linear mixed model (or multi-level model). This kind of model accounts for "fixed" or "universal" effects, which in our case would apply to all the states, and for "random" or "specific" effects, which would be unique to each state. So, in our model, we could call the "fixed" effects "shared outcome" and the "random" effects "particular outcome". I chose "outcome" because it would mean that, if two states had the same, for example, employment level, a "shared outcome" effect would predict the same outcome in each state from employment.

I started many models that included variables for age, race/ethnicity, education, poverty, income, and employment. The details of selecting the final set and averaging them come later in this article. What I ended up with was a model that was based on frequencies of non-white and Hispanic or Latino Americans, frequency of 4-year or higher degrees, employment statistics, median household incomes, and poverty rates. I will explain how I ended up using these effects and not using others. What I want to show right now is how strong each shared effect of the final model turned out to be. I will start the pictures:

The Shared Effects

Strengths of Shared Effects on Hate Groups per Million People
Percent Hispanic (any), Asian, or Mixed-Race* 4-Year or Higher Degree* Median Income ($1000s)*
Average Hours Worked per Week† Percent of State Population non-Hispanic Black*†
Poverty Rate Employment Rate per Total Population
*Main effect is "significant" by two-tailed 95% confidence interval as determined by nonparametric stratified bootstrap.
†Interaction of effect and year is significant by two-tailed 95% confidence interval as determined by nonparementric stratified bootstrap.

The graphs have multiple lines. This is because the strength of each shared effect might differ by year, so each year is represented by its own line. What do the charts mean? The more people of Hispanic (any), Asian, or Mixed-Race descent made up a state population, the less common hate groups were, and this did not change much over time. It was the most consistent strong effect. Having at least a 4-year degree was significantly associated with reduced frequency of hate groups, but this effect got weaker over time. The weakining was not significant, though. Median income had a significant effect as well, but a surprising one. In 2007, it was positively associated with hate group frequency, but this changed over time. By 2015, the higher a state's median income, the less common hate groups would be. This change over time was not significant. Average hours worked per week did not have a significant effect by itself, but when considered along with year, the longer a state's average work-week, the more common hate groups became over time. This combined effect was significant. There was a significant association between percent of a state's population identifying as black or African-American (not Hispanic or Latino), but it was not very strong and it grew weaker with time. Poverty rate and employment per total population had no significant associations with frequency of hate groups.

You probably notice no effect for age in the final model. There are statistical reasons for this, but it is also backed up by other research. We like to assume that "old" = "racist" and "young" = "tolerant". However, recent research has shown that age is not a factor in racism. The biggest surprise for me, though, was the relationship between median income or hours worked and hate group frequency.

Some of the effects are marked as "signficant". Notice, though, how a "significant" effect can be large or tiny. How can that be? It is because "significant" in statistics does not mean "important". It only means that the distribution of the data falls within certain parameters when defined by the effects of a model. That is, a "significant" effect is "tight". A "non-significant" effect has a lot of wobble. The first statistician to use the term "significant" ultimately expressed regret over how it became a gold standard for data analysis. Always be cautious when a data analysis is called "significant" without any presentation of effect sizes. For example if some event might "significantly" increase crime, but the increase is from 1001 events per year to 1002, it's not really an important event, and resources are probably better used elsewhere. I was a good boy, though, and showed the effect sizes.

There is one more point to be made about the shared effects. You have probably already noticed it. There seems to be a "corrosive" effect of time. As each year passed, effects that contributed to hate group frequency grew stronger, while effects that reduced hate group frequency grew weaker, except for the weak effect of frequency of African-American population. Something has been going on in the USA for several years, independent of the economic and demographic trends. I am certainly not the first to notice this, but this analysis lays it out rather starkly.

Individual state effects

The type of model I constructed also estimated individual effects for each state in addition to shared national effects. How strong were they? According to a statistic known as "R2", which is based on the model, the shared outcome effects "explain" about 30% of the variation in hate group frequency. If we include some state-specific effect, this jumps to about 83%. In the social sciences, an R2 of 0.298 (about 30%) is not bad. An R2 of 0.828 (about 83%) is astonishing. This tells us that some sort of state-specific effect is very important to understanding what could be going on, and the specific effects are probably stronger than the shared outcome effects.

Individual state effects per population for 2007 to 2015. All maps to same color scale
2007 2008 2009
2010 2011 2012
2013 2014 2015

In these maps, the more intensely red (through black) a state is colored, the greater its individual hate group frequency is after removing nationally shared effects. The more intensely blue a state is colored, the lower its individual hate group frequency is after removing nationally shared effects. None of the individual states are marked as "significant", because there is not a large enough sample size to validly estimate confidence intervals for 50 states. I do have the data for Washington, DC, but it just isn't visible on the maps.

Individual state effects can be pretty sizable, altering national effects by up to +6/-3 hate groups per million people beyond an estimated "national average" for a state. Second, there are some interesting similarities and trends to look at. Texas and California, two states usually presumed to be cultural opposites, seem to be converging over time if we look at the state-specific effects. It is how the two states differ on national effects that determines the overall different outcomes. The states of the Ohio River Valley, although usually conservative or "swing", tend to have individual effects of lower than hate group frequencies caused by shared effects. Montana, Idaho, Mississippi, Arkansas, and New Jersey seem to consistently do more poorly than the rest of the country when it comes to individual effects, and South Carolina gets to be very bad as time goes along.

Did anything change?

Individual state effects changes from 2007-2015

Another question I wanted to ask, and you're probably also interested, is whether individual state effects changed over time, and if they did, how did they change? I could have just subtracted the 2007 effects from the 2015 effects, but that would ignore the possibility that either 2007 or 2015 could be hiccups in overall trends. Instead, I did what was "robust regression" to estimate the overall rate of change of effect per year, if that rate had been constant. If you look at the map to the right, you will immediately see that North Carolina sticks out like a sore thumb. This is because the individual effects for North Carolina rose far faster, than for any other state between the 2007-2015 period. This means that frequency of hate groups in North Carolina, independent of national effects, got worse far faster than in any other state.

What states improved in terms of individual effects? The District of Columbia, followed by Idaho, New York, Vermont, New Hampshire, and Pennsylvania led. Of course, "improved" does not necessarily mean "best outcome". Idaho consistently remained a state with a higher than average frequency of hate groups, but it did improve beyond the changes in national factors. Other states appear to have deteriorated over time. South Carolina seems to have done the worst, but noteworthy deterioration also occurred in Nevada, Alabama, Missouri, Mississippi, and Alaska. I could mention a lot of other states as well, but I hope the map is plain enough for you to draw your own conclusions.

And so?

And so what? What does all this mean? I can speculate, but I'm not a sociologist, anthropologist, or social psychologist. What I can say is that the largest consistent effect is that the more common people of Hispanic/Latino, Asian, or Mixed-Race ancestry are, the less likely the model predicts hate groups to be. After this, better education seems to reduce hate groups, but this effect may be getting weaker as the years go by. I simply don't know how to interpret the income and work week results. But I invite input from someone who knows more than I do about how these things relate to social trends in the USA. It is interesting to see that both of these trends seem to reverse over time. Early on, hate groups were more common than states with higher median incomes, but this reversed. Likewise, early on, hate groups were more common in states with longer work-week, but this reversed. This might reflect significant and important national changes in the economy. States with more hate groups per population may have become less prosperous (lower median income, shorter work-weeks) over the ten-year period than states with a lower hate group load. However, this is just speculation. More in-depth analysis would be necessary to pin this down.

Anyway, as you can see, the matter is fairly complex and doesn't lend itself to suggesting easy solutions? Do effects of race and ethnicity mean show a cause or a result? Does the association between greater education and fewer hate groups mean that higher education limits hate groups or does it mean that states with lower frequency of hate groups tend to prefer more education in the first place? I don't know enough to say, either way, but I hope that my summary will be useful to someone.

Statistical Methods

This is the nerd section. Yes, this is the nerd section. I include it so people will know that I really did do the work and not just made stuff up. To follow this, at a minimum, you will need a basic understanding of the statistical concepts of linear regression, correlation, and robust methods. This is going to be very brief, because I am not getting paid to write any of this. If you wish more details of my analysis, you are free to contact me or leave a comment and I will get back to you.

I began with SPLC and census bureau data for the 50 states and Washington, DC.

As I mentioned earlier, there were an enormous number of variables I could have modeled. In particular, effects of age and race/ethnicity would be tricky because there are many potential ways to slice those pies. In an attempt to avoid overfitting and try to keep the model simpler, I decided to use a single variable to represent age

Age Categories
under 1818-23
over 75

For age, looked for a way to summarize ages by state. I tried both a combination of central tendencies (median) and skewness and attempting to find specific age categories by principal component analysis. Parallel analysis of the ages by year indicated that seven components were necessary to summarize aging trends. It should be noted that I did not use conventional age categories. One major problem with the conventions is that they are not based on recent research. Instead, they have been continued from earlier decades on the basis of convenience. For example, what biological, neurological, or cognitive basis is there for considering the age of 65 as an immutable cut-off? None. It merely corresponds to an administrative division. In the end, it was all mooted for age, since all age-related effects had excess multicollinearity in the models.

What is multicollinearity? When you have a model with several effects, there is a chance that one or more of those effects "track" each other very closely. If they do it very closely, it is called "collinearity". When you have collinearity, one effect can pretty much stand in for the other. One way to measure multicollinearity is by variance inflation factor. It just happened that the vifs for all the age variables I tried were too high. Thus, whatever effect age might have was already present in the other effects. So age effects were deleted from the models.

What about race or ethnicity? I compared using a single number based on the exponentiation of "Shannon entropy" of races and ethnicities as recorded by the US Census. This is a common way to measure diversity in ecology. I also did a factor analysis of the racial/ethnic groups to see if they tended to naturally group together. These two methods were compared in later model building stages. In the end, I found that Black/African-American (non-Hispanic) and a factor that was the sum of Hispanic or Latino (any race), Asian, and Mixed Race worked "best" in the model.

That left several other factors to measure economic and education effects. The ones that survived testing were "Poverty Rate", "Median Income", frequency of a 4-year or higher degree, and "Employment" (total civilian and military employment divided by state population. I intentionally included Year as another effect. How did I choose these from competing effects that I didn't use? I built sub-models around each potential effect group, economic, education, employment and compared the models using something called the "second-order Akaike Information Criterion". These are the variables that ended up being "better" than the ones that were excluded. I combined all the variables into a "global" model.

Since I was essentially testing a repeated-measures model, the global model was a hierarchical generalized linear poisson model (aka "mixed model"), with the fixed (common) factors of Black/African-American, Hispanic + Asian + Mixed Race, Poverty Rate, Median Income, 4-Year or Higher Degree, Employment, and Year, and two-way interactions of Year with each other fixed factor, plus an offset by the log of population in millions. For random factors, I used intercept and noncorrelated slopes for each individual fixed factor (no interactions), each by state.

This model was then used to generate nested models, which eliminated one or more of the main effects (both fixed and random) and associated effect/year interaction. Models were then evaluated with the second-order Akaike information criterion. An "elbow" method was used to select a subset to use for model averaging.

Six models were selected by this method, and a weighted average of the models was constructed. I used the weighted average to calculate the sizes of the fixed ("shared") and random ("state specific") effects. Confidence intervals for fixed effects were determined by stratified bootstrap, stratifying on state. Choropleth maps were all created using GoogleViz, then edited by hand for color scheme choices. Chart code was written by hand according to the Google visualization protocol. As I have mentioned, if you would like more explicit detail on my modeling, I would be happy to share it, I just don't have the time to formally write it up at the moment.

No comments:

Post a Comment