October 10, 2017

Comparing Indianapolis Neighborhoods

(There is a map and a chart that may take a little time to load. If you see blank spots, don't panic, just wait. If they don't fill in, try reloading the page.)

Some time ago, I wrote about the data available from the Polis Center at IUPUI, SAVI. The SAVI data is the input for Indy Vitals. You can get a lot of information from Indy Vitals to compare the neighborhoods. You can get so much that it's probably impossible to actually conclude anything. It becomes a blur. Some time ago, I did an analysis of a part of the IndyVitals data. Since then, IndyVitals has been updated, so I decided to take a look at the newer data and a larger chunk of it. Like before, I cleaned, analyzed, and clustered the data into similar neighborhoods.

Clustering Results of Indy Vitals Data
+++ +++ +++ +++ +++ +++ +++

On the left is the neighborhood map, color coded to cluster. You should be able to click the map to get more specific information about a neighborhood to slide over. Warning, the slideover will cover most of the map. I'm not a master of iframes. I do not have the neighborhood borders outlined, because I wanted to emphasize the great "sea of nothing much" in which various islands of more extreme situations float. Several neighborhoods are labeled. If you zoom in, more labels will appear. Most of Marion county is "middling", which is what we would expect. A partial ring of unpleasantness surrounds the Downtown neighborhood, and the most comfortable neighborhoods are in the north. None of this is a surprise to anyone who is familiar with Marion County. Two big islands associated with more generally comfortable conditions are in the southwest and southeast corners, and there are two very unfortunate-looking neighborhoods.

If you click any of the neighborhoods to get the pop-up, there are several numbers associated with it. These are what I used to create the clusters. I will get into how I created those numbers in a moment. Briefly, though, the numbers were compared as groups per neighborhood to generate "distances" among every neighborhood, and those distances were examined for where neighborhoods would "bunch up" as "clusters". My clustering method (more below) also calculated the "prototypes" of each cluster. A prototype would be the neighborhood that most resembled the cluster as a whole. I named each cluster for the prototypes. I used initials to keep the names shorter. Also, one more thing is that the clusters are not the same size. The clusters at the "ends" are smaller than those in the middle. This is what you should expect from an "honest" clustering method of any complex data. Extreme values should be rare. Never, never, never trust any kind of culture- or social-related thing that splits cities, states, neighborhoods, or anything up into nicely-spaced, evenly-sized groups.

image/svg+xml IM MB CH KC SD NC BR

I came up with seven total clusters that could be arranged on a gradient of "pleasantness". These clusters could be related to each other. The tree diagram on the right illustrates this. The worst cluster (IM) is off by itself, away from the rest of the county. The two most favored clusters (BR and NC) are also pretty isolated from the rest of the county. Remember, when I say "isolated", I don't mean physically distant, I mean that the neighborhood traits are pretty far from the rest of the county, not the location. The rest of Marion County clumps together, but there are still enough differences to split into four more clusters. If you're interested in how I got my results and how individual neighborhoods might "stack up" without having to go through every one on the map, read on.

The Data

This time around, I got the data for 45 different traits for 99 different neighborhoods in Marion County. I looked at it in one table, and it still meant nothing at all to me. It was 4455 points of data, all at once, that's usually not very meaningful. There were a lot of things I could have done. The most popular (unfortunately) when one compares cities or neighborhoods is to turn each data variable into a "rank", then add them to produce a final "score" or "rank". I get why people do it. It's simple. It's looks like you are "analyzing" the data. But when you do that, you are not analyzing the data. You're just smashing it together. The "rank and add" method ignores a lot of important things.

To interrupt myself, many data sets also need to be "cleaned" before being analyzed. This sometimes means you drop an entry because it is too incomplete. I dropped the "Airport" and "Park 100" neighborhoods from the analysis because of missing data. Sometimes, you might be able to get a substitute for missing data from another sort. I used a real estate site to get property and violent crime per 1000 residents for the two Lawrence neighborhoods, Speedway, and Beech Grove. I do give the specific site further down, if you want to check for yourself.

Anyway, back to what is wrong with add-and-rank. First, it ignores that data can be uneven. When you compare ranks, you have taken out how far apart each thing you're comparing is. For example, suppose you have "average niceness". Seven towns have "average niceness" values of 1, 4, 17, 92, 93, 94, and 1001. If you rank them, lowest to highest, you get 1, 2, 3, 4, 5, 6, 7. You don't have to be a math genius to see how that's a problem. There is no way that the distance between 93 and 94 is the same as the distance between 1 and 94 or 94 and 1001. But when you use ranks, that's what happens.

Second, it ignores that data can be redundant. The fact that you can count or measure five different things doesn't mean that each of those five different measurements make an equal contribution to an accurate overall picture. Some measurements will closely track others, because they both reflect a deeper underlying connection. In effect, if you just add the contributions of very closely-tracking variables, you're actually "double-counting" the single underlying effect.

The technique of “exploratory factor analysis” (EFA) can handle these issues, if used correctly. What EFA does is look for how parts of a data set are related to each other and groups those parts together. This can be important because as a set of data gets larger, it is more likely that more and more categories will relate together or "co-relate". Oddly enough, when data co-relates, it's called "correlation"--really, that's all it means. This could be because there is some hidden “factor” that these data points describe. EFA allows for these factors to be guessed at in a reasonable fashion.

But, just to keep myself honest, correlation could mean nothing at all! How? The distance between North America and Europe and my waistline track each other very closely. They both increase by a small amount every year. That doesn't mean that one causes the other or that either are caused by some underlying factor.

Clusters vs. Factors

This time, I ended up with five factors instead of three. I clustered the factors, but I used a different method that is less prone to making artificially even-sized clusters. When I looked at the clusters vs. the factors, I noticed that they actually could allow for the neighborhoods to be ranked into seven categories of what most people would consider desirability. The chart summarizes how the clusters relate to the factors. You will understand exactly how I named the factors if you keep reading, but in a nutshell, Comfort is how comfortable a neighborhood appears. Difficulty is how common certain other difficult or unpleasant individual life conditions are in that neighborhood. Deterioration is the physical state of the neighborhood's buildings. Crime is crime reports per population in that neighborhood. Density is the population and building density and some measure of how convenient daily necessities are.

As you can see for yourself, crime is the standout factor for cluster IM. Cluster MB has high Difficulty and very high Deterioration. Cluster CH has less Deterioration but nearly as much Difficulty as cluster MB. Cluster KC is middling. It doesn't have much of any of the factors. Cluster SD is somewhat improved on KC. It's not particularly comfortable, but at least it has lower Difficulty and less Deterioration and Crime. it's also the least dense of the clusters. Cluster NC is very comfortable. However, it is still beat out by cluster BR, primarily because cluster BR also has the lowest Difficulty. It also has the highest Density, which includes nearby availability of foodstuffs. Where did the clusters come from, and why seven? That is explained in the next section.

The Method, the Madness

This is where I explain how I got my numbers. The first thing that must be said is that these numbers only matter within Marion County. They were generated only using the IndyVitals data, so they can't be used to compare the neighborhoods to anything in Hamilton County, for example. I would have to find a comparable Hamilton County dataset and repeat the analysis with both datasets combined to create a two-county model.

I downloaded the data from IndyVitals. There was a lot more this time than before. A few categories had missing values that could be reasonably imputed or otherwise accounted for. By “otherwise accounted for”, I mean deleting the entries for Airport and Park 100. I consider this acceptable for my purpose because those “neighborhoods” are far more industrial districts than neighborhoods. After I did this, only two categories had missing values: “Violent Crime per 1000” and “Property Crime per 1000”, which were missing for Speedway, Lawrence, and Lawrence-Fort Ben-Oaklandon. I took values from the Area Vibes web site. Probably not as reliable as those from IMPD for the rest of the neighborhoods but probably not too far off the mark. That source only had one number for both of the Lawrence-based neighborhoods, so I repeated it for them. This left me with a working data set of 97 neighborhoods and 45 variables (4,365 data points). Some of the variables were problematic. First, there are two variables that were identical. These were Tax Delinquent Properties and Tax Sale Properties. Every single point matched, perfectly. I took this to mean that they were actually the same variable, so I deleted one of the two. Second, two variables had a lot of zero values. These were Parcels with Greenway Access (54 out of 97) and Demolition Orders (72 out of 97). I could have deleted these, but there are ways to handle variables with lots of zeroes.

My starting data set was 44 variables for 97 neighborhoods, with two variables needing special treatment. This special treatment was "jittering", where a very small value is added or subtracted at random from each value in a variable. This usually does not change the behavior of the variable but makes it possible to analyze by methods that can't handle large numbers of zeroes.

As before, I used exploratory factor analysis (EFA) to try to make sense of the data. It is based on correlation. A major assumption that correlation makes is that the data is “normally distributed”. I checked this data with a utility to test this. It was not normally distributed. Ordinary correlation would not give a realistic basis for analysis. So, as before, I ended up using a method called “Orthogonalized Gnanadesikan-Kettenring”. For most people, that will mean nothing, of course, but anyone who wants to check my work would want to know it.

InputComfortDifficultyDeteriorationCrime RiskDensity
Per Capita Income0.901
Median Age0.854
Associate or Higher Degree0.704
Median Assessed Value0.684
Tree Cover0.652
Employment Density0.608
Without Health Insurance-0.550
Median Household Income0.519
Poverty Rate-0.482
Births with First Trimester Prenatal Care0.446
Labor Force Involvement-0.832
Population with Disability0.642
Housing Cost Burden0.516
Mowing Orders0.914
Boarding Orders0.910
Tax Delinquent Properties0.857
Trash Orders0.849
Surplus Properties0.772
Property Crimes per 10000.911
Violent Crimes per 10000.784
Resident Employment in Neighborhood-0.674
Housing Density0.947
Income Density0.891
Pop Density0.787
Land Value Density0.697
Food Access0.653
Permeable Surface Area-0.635
Walk Score0.562

Parallel analysis suggested 7 factors. When I looked at the factors, I noticed that some of the input variables had very low "loadings". A loading is a measure of how much a variable contributes to a factor. By itself a single low loading is not a problem, but if a variable has low loadings on all the factors, that means that its influence is very mixed among the factors and it does not make a good contribution to the analysis. A common cut-off is an absolute value of 0.4. Therefore, if any variable had no loading with an absolute value of 0.4 and a "communality" of less than 0.6, I deleted it from the EFA and repeated the process, starting from re-calculating a correlation matrix. I repeated this until no variables had at least one absolute loading value of less than 0.4 or communality of 0.6. This produced an EFA outcome with five factors (the table).

The table describes how strongly the factors relate to the variables. The numbers in the tables are the “loadings”. I used these loadings to guide how I named the factors. The first factor was a combination of better income and education, more trees, lower poverty, better prenatal care, etc. It made sense to call this factor Comfort, since places with such features are probably more comfortable places to live. The second factor combined a high proportion of handicapped residents and housing cost burden with low labor force participation. It made sense to call this Difficulty, since people with those traits probably have a difficult time getting by. The third factor was all negative property-related variables, plus unemployment. Since it was mostly property traits that people wouldn't want in their neighborhoods, I called it "Deterioration". The "Crime Risk" factor corresponded to higher rates of property and violent crimes, along with high unemployment. "Crime Risk" was a good name. The final factor amounted to overall "Density", since it was four "Density" factors along with two measures that amounted to "lots of stores nearby".

I used the factors to produce factor scores and the factor scores to produce clusters. This time around, I used "minimax hierarchical clustering". However, it is not uncommon common for factors in EFA to be correlated to each other. This is not a flaw in the EFA result. However, to get the clustering, I still had to estimate how "distant" each neighborhood was from the other in terms of factor scores. For this, I calculated pairwise “Mahalanobis distances”. While somewhat tricky to calculate, Mahalanobis distances take these correlations into account to produce a more realistic description of the data. Then I did the cluster analysis on these distances. As I already mentioned, I used minimax hierarchical clustering. Like all clustering methods, it might create clusters but doesn't tell you how many are the optimal number. This time, I computed clustering sums of squares for successive numbers of clusters and used the number that produced an "elbow". This turned out to be seven clusters. How do the clusters relate to factor scores? Since I had five final factors, I couldn't really chart them. However, if I looked at the factor scores vs. cluster assignments, it appeared that three of the five had a larger contribution, overall, to the clustering, than the other two. I built a rotatable chart that plotted these three factors vs. cluster. "x" is Comfort; "y" is Difficulty, and "z" is Deterioration. If you click and drag on it, you can rotate the chart. Points are colored by cluster, using colors similar to those for the map. You will notice that IM is not nicely separated in the 3D chart. This is because it is set apart by Crime levels, which are the fourth factor

But how do the neighborhoods RANK?

I am sure that some people have come all the way to the bottom to find the "ranks" of each individual neighborhood. This is flatly wrong-headed, and I already explained why. That being said, if you like, you can download a table of the neighborhoods that shows cluster and factor scores and create your own ranks.

August 30, 2017

Brief analysis of effect of right-to-work laws on per-person real income availability

This mostly data and accompanying analysis. If you are a casual reader, I apologize for complete lack of background or any real readability. It may be of use to people who are already familiar with the issue. I asked the question "What is the effect of having right-to-work laws on a 'meaningful measure' of income?" First, a look at those states with right-to-work laws as of 2015:

Incidence of Right to Work laws, 2015
Blue: No RTW; Red: RTW

The first question unanswered is what would constitute a "meaningful measure". I began with by-state median income. I chose median instead of mean because median is a far more robust estimator of the central tendency than is mean. Unfortunately, my limited access to data (American FactFinder) meant that I could only get state median incomes for "households" or "families". Since many households are non-family households, I chose median by household. I also downloaded average sizes of household by state. I then obtained the "Implicit Regional Price Deflator" (by state), or IRPD, from the BEA. This combines differences in cost of living by state with inflation per year. It can be used to give a state-adjusted, real-dollar estimate of income. This was only available for the years 2008-2015, which limited my analysis to those years. I finally downloaded total civilian full-time employed and total military employed, both by state. I divided this by the total population of a state for a year. I did not restrict this to "workforce", since children have to be supported, too, even if they are not in the workforce. Each state's status as right-to-work or not was coded as an ordinal variable by year. The basic data set is available for you to check, yourself

Right-to-Work model coefficients
Right to Work†-0.1504+0.1052/-0.0570*
Right to Work × Year0.0002+0.0166/-0.0103
* Factor is significant at p ≤ 0.05 by nonparametric bootstrap.
† Estimate corresponds to state having right-to-work law.

From these numbers, I created my "metric": (((Median Income)/(IRPD/100))/Average Household Size)*(Employment Percent). I call it "Effective Income per Person". I modeled this Metric using generalized linear mixed models. State was the grouping factor for random effects. Sums contrasts were used. The fixed portion was "Metric ~ RTW + Year + RTW*Year". For calculation purposes, year was divided by the standard deviation of all years in the data set. Different error structures were compared by second-order Akaike Information Criterion (AICc). The compared models used Gaussian, gamma, and inverse Gaussian distributions, with identity, inverse, and log link functions. Of these, many did not converge. Of those that converged, the lowest AICc belonged to the model with a gamma distribution and log link. The next-nearest model had a gamma distribution and identity link. Δ AICc was greater than 6.9, indicating very strong evidence to favor the first model over all other models that converged. The model was evaluated by stratified non-parametric bootstrap, "state" as the stratifying feature.

Difference between RTW and non-RTW states

Since this had a log link, the estimate for "Right to Work" means that, on average, a right-to-work state could be expected to have a 15% lower effective income per person. I bootstrapped the estimated average effective income for RTW and non-RTW states for each year and subtracted the RTW average from the non-RTW average. Adjusted for multiple comparisons, the 95% confidence intervals show that the difference was significant for all years examined, as the chart shows. In addition, overall effective income per person dropped by roughly 1% every six months, regardless of right-to-work status. There was no significant interaction between right-to-work and year, meaning the difference due to right-to-work remained constant.

I glossed over using a mixed (or multilevel) model to reach my results. I chose such a model for two reasons. First, this was repeat measures data. The same states were "measured" each year. That means we can presume that the data within each state will be correlated to data for other years from the same state. Second, as has been noted in other analyses of RTW laws, individual state effects may play large roles that could mask overall RTW effects. The mixed model allows one to account for both within-state correlations and individual state effects. What it does not let us do, with the data on hand, is actually identify those individual state effects. That is, we can estimate how large the effects are but not what they are. It's like measuring a hole without knowing what actually made it. You don't need to know how a hole was made to measure how wide and deep it is. I will present those "random effects" in a later post.

An alternate model

After getting snark from someone who believes that a "differences in differences" model magically establishes "causation" better than does a mixed-level glm (Free clue: Neither type of model actually establishes causation.), I ran the magical DID on my data. My results:

DID model coefficients
Right to Work†-0.956.61+412.31/-399.59*
* Factor is significant at p ≤ 0.05 by nonparametric bootstrap.
† Estimate corresponds to state having right-to-work law.

Now, what does this mean? It will make more sense if you understand that "DID" is actually the same thing as interaction between Right-to-Work and year. The only difference is that "Year" has been coded as a 0/1 variable instead of specific years. The cutoff was 2012, which was the only year in which some states swapped from not having RTW to having RTW. While the values of the coefficients are different, the result is the same. DID analysis indicates that, overall, non-RTW states had a higher per-person adjusted income and that imposing RTW did not significantly alter this.

So, what does that mean? It means that, using this metric, there is no net benefit to most people in a state from imposing RTW over not having it. Now, if one believes "the more regulation the better", then one would say "Okay, so impose RTW everywhere, since it doesn't make a difference." However, if one believes that more laws are not good in and of themselves, and that government interference in business practices (interfering in permitted terms of contracts is government interference) should only be done if there is a compelling benefit, then RTW fails to actually grant sufficient benefit.

March 6, 2017

Is there a relationship between population trends and the frequency of hate groups in the USA?

Up front, I am going to specifically invite comments on this entry. I have actually been thinking of this matter for several years, ever since I first discovered the annual SPLC list of hate groups. I am sure that my essay could be improved and would love to do so.

This is a very large blog entry. It's because there is a lot to be said on this particular issue, and there is no ethical or moral way to boil it down to a few talking points. Recent events have brought the prevalence of "organized hate" back into the larger public eye. This, of course, shouldn't be taken to mean that this sort of thing suddenly sprang up out of nowhere. Our most recent presidential election did not create hate groups or mass hatred. It goes back a long way. The Southern Poverty Law Center publishes an annual accounting of how many hate groups each state has. I have compiled the data from 2007 to 2015 and present it below as they and many press outlets who mentioned the report present it.

SPLC Reported Hate Groups by State
2007 2008 2009
2010 2011 2012
2013 2014 2015

The more intense the color, the more hate groups per state. Hovering your mouse or pointer over a state should get more info to pop up for you. California does look like it has a lot of hate groups, doesn't it? But notice that Texas and Florida are also relatively dark. Whenever you see a recent map that shows anything "by state", and California, Texas, and Florida (often New York, too) are the most intensely-colored states, that should be taken as a warning. Why? Things change when you take population into account. A raw count of "hate groups" by state is mostly a crude measure of state population.

How does population change the picture?

The SPLC gives far too little attention to effects of population sizes. The first thing we must remember is that, when you compare states, countries, cities, or any places that people live, population size should never be dismissed with a single sentence. Population size makes a big difference. We see that when we look at state populations vs. number of hate groups by state for 2007-2015. All other factors being equal, the bigger a state's population, the more of any human activity it should be expected to have. Any report that ignores or downplays this fundamental fact grossly misrepresents reality. When we look at hate groups per million people by state, we can see that the states still differ among each other. Something also worth noticing is that these by-state rates of hate groups to population change over time. In some states, they increase, in others, they decrease.

SPLC Reported Hate Groups per Million People by State
2007 2008 2009
2010 2011 2012
2013 2014 2015

That doesn't mean that we can just say "case closed" and put down any differences in hate groups by state to population. After all, while the apparent differences change, they aren't erased by taking population into account. What could explain the inter-state differences? The SPLC prefers to attribute variation of organized hate in the USA almost entirely to politics. This probably fails to tell the whole story, although it is still probably an important factor. Demographic (race, education, age) and economic (employment, income) issues can weigh more heavily than any political group wants us to believe. But how important are economics, education, demographics, etc., in this? To answer this, I took the SPLC data from 2007-2015 and compared it to economic, educational, age, and race/ethnicity trends for those years from the US Census American Fact Finder. I didn't use numbers for 2016 because the Census office won't release those until nearly the end of 2017.

I don't believe there are no political elements. It's just that a great deal of what we call "politics" actually is a result of larger demographic and economic factors. Politicians spin these factors as if it were the politicians who could magically who control all these factors.

I selected elements of race/ethnicity, age, education, income, and employment. I could have added more, such as religiosity, but annual state-by-state estimates of religiosity were not available to me, and I wished to do a time course analysis, not just a single-year snapshot.

Making a "model".

I am going to use the word "model" a lot, but what do I mean? In this case, "model" means an equation that estimates an outcome (number of hate groups per population) based on some input data. Even restricting the choices, there is a confusing variety of possible ways to describe age, race (and ethno-racial diversity), income, education, etc. Many of these descriptions overlap each other to some extent or another. Too many variables could lead to models that "overfit" the data. The model might look good, but it has so much input that it describes the random noise more than it does the issue in question.

So, how to put the variables together? There are an enormous number of ways to explore this question. I decided to use a method called "model averaging", where a set of "better" models are selected from a large number of potential models and combined to produce an overall estimate. I also presume that the numbers generated by the model indicate how important each input is toward the estimates. The model type I used was a generalized linear mixed model (or multi-level model). This kind of model accounts for "fixed" or "universal" effects, which in our case would apply to all the states, and for "random" or "specific" effects, which would be unique to each state. So, in our model, we could call the "fixed" effects "shared outcome" and the "random" effects "particular outcome". I chose "outcome" because it would mean that, if two states had the same, for example, employment level, a "shared outcome" effect would predict the same outcome in each state from employment.

I started many models that included variables for age, race/ethnicity, education, poverty, income, and employment. The details of selecting the final set and averaging them come later in this article. What I ended up with was a model that was based on frequencies of non-white and Hispanic or Latino Americans, frequency of 4-year or higher degrees, employment statistics, median household incomes, and poverty rates. I will explain how I ended up using these effects and not using others. What I want to show right now is how strong each shared effect of the final model turned out to be. The following graphs summarize these effect sizes.

The Shared Effects

Strengths of Shared Effects on Hate Groups per Million People
Average Hours Worked per Week*† Employment Rate per Total Population*† Percent Hispanic (any), Asian, or Mixed-Race*
Median Income ($1000s)† Proportion of Population at Middle Age† 4-Year or Higher Degree
Percent of State Population non-Hispanic Black* Poverty Rate
*Main effect is "significant" by two-tailed 95% confidence interval as determined by nonparametric stratified bootstrap.
†Interaction of effect and year is significant by two-tailed 95% confidence interval as determined by nonparametric stratified bootstrap.

The graphs have multiple lines. This is because the strength of each shared effect might differ by year, so each year is represented by its own line. Individual points on the lines are for minimum, 5th, 25th, 50th, 75th, 95th percentiles, and maximum values of each effect. What do the charts mean? Going through each one in turn, the more hours per week people work in a state, the more common hate groups were. While it did not change much over time, the small increase in effect between 2007-2015 was significant. The employment rate per population started out by reducing hate group frequency as more people got work, but over time, this effect significantly weakened. The strongest time-independent effect was the percent of state population of Hispanic (any non-Native people), Asian, and Mixed-Race. This consistently reduced hate group frequency. While median income didn't have a significant effect on its own, the effect combined with time was significant, and odd. Early on, higher state median income associated with greater frequency of hate groups, but this reversed over time. A similar, but smaller, effect also associated with proportion of a state in the "Middle Age" category (ages 36-51). The effects of Median income and "Middle Age" may be associated, since this is the "prime earning years", and the proportions of people in this category seem to have diminished. There was a weak but significant association between percent of a state's population identifying as black or African-American (not Hispanic or Latino).

Two other factors, while appearing in the model, did not have significant effects. Having at least a 4-year degree was associated with increased(!) frequency of hate groups, and this effect got stronger over time. I have no way to explain those numbers. It should be noted that the estimate was not significant, which means that state-to-state fluctuation was so wild that the trend is probably not to be seen as reliable. Poverty rate had no significant associations with frequency of hate groups.

Some of the effects are marked as "significant". Notice, though, how a "significant" effect can be large or tiny. How can that be? It is because "significant" in statistics does not mean "important". It only means that the distribution of the data falls within certain parameters when defined by the effects of a model. That is, a "significant" effect is "tight". A "non-significant" effect has a lot of wobble. The first statistician to use the term "significant" ultimately expressed regret over how it became a gold standard for data analysis. Always be cautious when a data analysis is called "significant" without any presentation of effect sizes. For example if some event might "significantly" increase crime, but the increase is from 1001 events per year to 1002, it's not really an important event, and resources are probably better used elsewhere. I was a good boy, though, and also showed the effect sizes.

There is one more point to be made about the shared effects. You have probably already noticed it. There seems to be a "corrosive" effect of time. As each year passed, effects that contributed to hate group frequency grew stronger, while effects that reduced hate group frequency grew weaker, except for median income. Something has been going on in the USA for several years, independent of the economic and demographic trends. I am certainly not the first to notice this, but this analysis lays it out rather starkly.

Individual state effects

The type of model I constructed also estimated individual effects for each state in addition to shared national effects. How strong were they? According to a statistic known as "R2", which is based on the model, the shared outcome effects "explain" about 37% of the variation in hate group frequency. If we include some state-specific effect, this jumps to about 82%. In the social sciences, an R2 of 0.372 (about 37%) is not bad. An R2 of 0.824 (about 82%) is very good. This tells us that some sort of state-specific effect is very important to understanding what could be going on, and the specific effects are probably stronger than the shared outcome effects.

Individual state effects per population for 2007 to 2015. All maps to same color scale
2007 2008 2009
2010 2011 2012
2013 2014 2015

In these maps, the more intensely red (through black) a state is colored, the greater its individual hate group frequency is after removing nationally shared effects. The more intensely blue a state is colored, the lower its individual hate group frequency is after removing nationally shared effects. None of the individual states are marked as "significant", because there is not a large enough sample size to validly estimate confidence intervals for 50 states. I mapped the estimates for Washington, DC, but it just isn't visible at this scale.

Individual state effects can be pretty sizable, altering national effects by up to +6/-5 hate groups per million people beyond an estimated "national average" for a state. Second, there are some interesting similarities and trends to look at. Texas and California, two states usually presumed to be cultural opposites, seem to be converging over time if we look at the state-specific effects. It is how the two states differed on national effects that determined the overall different outcomes. Most states of the Ohio River Valley, although usually conservative or "swing", tend to have individual effects of lower than those caused by shared effects. Montana, Idaho, Mississippi, Arkansas, and New Jersey seem to consistently do more poorly than the rest of the country when it comes to individual effects, and South Carolina gets to be very bad as time goes along.

Did anything change?

Individual state effects changes from 2007-2015

Another question I wanted to ask, and you're probably also interested, is whether individual state effects changed over time, and if they did, how did they change? I could have just subtracted the 2007 effects from the 2015 effects, but that would ignore the possibility that either 2007 or 2015 could be hiccups in overall trends. Instead, I did what was "robust regression" to estimate the overall rate of change of effect per year, if that rate had been constant. If you look at the map to the right, you will immediately see that North Carolina sticks out like a sore thumb. This is because the individual effects for North Carolina rose far faster, than for any other state between the 2007-2015 period. This means that frequency of hate groups in North Carolina, independent of national effects, got worse far faster than in any other state.

What states improved in terms of individual effects? The District of Columbia, followed by Idaho, Montana, Vermont, and Alaska led. Of course, "improved" does not necessarily mean "best outcome". Idaho consistently remained a state with a higher than average frequency of hate groups, but it did improve beyond the changes in national factors. Other states appear to have deteriorated over time. South Carolina seems to have done the worst, but noteworthy deterioration also occurred in Nevada, Missouri, and New Jersey. I hope the map is plain enough for you to draw your own conclusions for the states in general.

And so?

And so what? What does all this mean? I can speculate, but I'm not a sociologist, anthropologist, or social psychologist. What I can say is that the largest consistent effect is that the more hours worked, the more common hate groups are. How could that make sense? If one takes an economic view of organized hate, widespread average hours worked do not necessarily go along with prosperity. People may be having to work longer to simply keep up, leading to resentment that can be taken advantage of. Employment per population exerts the next strongest effect. Early on, it follows what would be reasonable presumptions, that lower employment goes along with more frequent hate groups. However, this gradually reverses over time until employment has little effect at all. What this reflects could be quality of jobs available deteriorating or some other social change over time. I can't say which.

Interestingly, when we get to questions of race or ethnicity affecting hate group frequency, it is a reverse relationship to Hispanic/Latino, Asian, or Mixed-Race ancestry. That is, the more common people are in this group, the less frequent hate groups are predicted to be. After this, the oddest factor weighs in: Median Income. In 2007, the higher a state's median income was, the more likely hate groups would be. Over time, this completely reverses. Again, we are left with a "what does this mean?" situation. It may reflect a shifting of prosperity away from states that have ongoing historical causes to prop up hate and toward states that have fewer such causes. Or it could reflect overall concentration of income away from the working class toward upper classes, which would lower the median and likely build resentment. The final significant factor is the proportion of population at Middle Age (36 to 51). By itself, it has no significant effect, it is the switch between a negative to positive relationship over time that is significant. It may actually be following the employment and earning trends, since this is the prime earning years for most workers. In any case, it exerts a small effect.

We then have a few non-significant effects. Better education seems to reduce hate groups, and this effect gets stronger as years go by, but variation is so high in this effect that it cannot be deemed significant at the 95% threshold I used. Surprisingly, the frequency of people identifying themselves as non-Hispanic Black and the poverty rate exert very little influence on the frequency of hate groups, but the effect for Black non-Hispanic population percent is significant. Regarding poverty, in and of itself, it may not be as important as perceptions of inequity and having to work more to get less. The effect for non-Hispanic Black may be due to the pervasiveness of anti-Black racism in US culture. If anti-Black racism is a nearly-universal trait of most hate groups, it could be possible for there to be a significant relationship to the simple presence of Black people, regardless of numbers.

Anyway, as you can see, the matter is fairly complex and doesn't lend itself to suggesting easy solutions? Do effects of race and ethnicity show a cause or a result? What does the association between greater education and more hate groups mean? I don't know, but I hope that my summary will be useful to someone.

Statistical Methods

This is the nerd section. Yes, this is the nerd section. I include it so people will know that I really did do the work and not just made stuff up. To follow this, at a minimum, you will need a basic understanding of the statistical concepts of linear regression, correlation, and robust methods. This is going to be very brief, because I am not getting paid to write any of this. If you wish more details of my analysis, you are free to contact me or leave a comment and I will get back to you.

I began with SPLC and census bureau data for the 50 states and Washington, DC.

As I mentioned earlier, there were an enormous number of variables I could have modeled. In particular, effects of age and race/ethnicity would be tricky because there are many potential ways to slice those pies.

Age Categories
under 1818-23
over 75

Age presented a particular problem, because it has so very many ways to slice up and categoriz. I tried both a combination of central tendencies (median) and skewness and attempting to find specific age categories by principal component analysis. Parallel analysis of the ages by year indicated that seven components were necessary to summarize aging trends (see table). I did not use conventional age categories. One major problem with the conventions is that they are not based on recent research. Instead, they have been continued from earlier decades on the basis of convenience. For example, what biological, neurological, or cognitive basis is there for considering the age of 65 as an immutable cut-off? None. It merely corresponds to an administrative division.

What about race or ethnicity? I compared using a single number based on the exponentiation of "Shannon entropy" of races and ethnicities as recorded by the US Census. This is a common way to measure diversity in ecology. I also did a factor analysis of the racial/ethnic groups to see if they tended to naturally group together. These two methods were compared in later model building stages. In the end, I found three factors: Black/African-American (non-Hispanic), a factor that was the sum of Hispanic or Latino (any race except Native American or Pacific Islander), Asian, and Mixed Race worked "best" in the model, and a factor that was the sum of Native American and Pacific Islander. I also tested the frequency of non-Hispanic White for the model, as well.

That left several other factors to measure economic and education effects. The ones that survived testing were "Poverty Rate", "White Poverty Rate", "White Poverty Risk", "Median Income", "White Median Income", frequency of a 4-year or higher degree, and "Employment" (total civilian and military employment divided by state population. I intentionally included Year as another effect. How did I choose these from competing effects that I didn't use? I built sub-models around each potential effect group, economic, education, employment and compared these using something called the "second-order Akaike Information Criterion" (AICc). If you know what AIC means in this context, you don't need me to explain it. If you don't, you will need a lot more background than I could give to really understand it. Very roughly put, it looks at how much "information" is lost if a variable is not included in the model, but it also penalizes having many variables. In any case, the variables left in the sub models are those that ended up being "better" than the ones that were excluded. I combined all the variables into a "global" model.

Averaged Model with 95% Confidence Intervals
Effect or InteractionEstimate
Proportion of Population, Hispanic (any), Asian, or Mixed Race-0.305 +/- 0.076/0.066*
Average Hourse Worked per Week0.168 +/- 0.064/0.217
Employment as percent of Population-0.142 +/- 0.087/0.046*
Median Household Income-0.048 +/- 0.120/0.108
Middle Aged-0.021 +/- 0.077/0.110
4-Year or Higher Degree0.017 +/- 0.071/0.212
Proportion of Population, Black, non-Hispanic0.016 +/- 0.055/0.014*
Year0.021 +/- 0.054/0.110
Proportion of Population, Hispanic (any), Asian, or Mixed Race × Year0.040 +/- 0.061/0.038*
Average Hourse Worked per Week × Year0.015 +/- 0.099/0.144
Employment as percent of Population × Year0.119 +/- 0.079/0.084*
Median Household Income × Year-0.136 +/- 0.124/0.102*
Middle Aged × Year0.038 +/- 0.052/0.058
4-Year or Higher Degree × Year0.039 +/- 0.105/0.050
Proportion of Population, Black, non-Hispanic × Year0.003 +/- 0.032/0.005
(Intercept)1.151 +/- 0.071/0.055*

This model was then used to generate nested models, which eliminated one or more of the main effects (both fixed and random) and associated effect/year interaction. Models were then evaluated with AICc. An "elbow" method was used to select a subset to use for model averaging. The averaged model is to the right. Estimates are from z scores, so they are standard deviations of the effect or interaction.

Since I was essentially testing a repeated-measures model, the global model was a hierarchical generalized linear Poisson model (aka "mixed model"), with the fixed (common) factors of Black/African-American, Hispanic + Asian + Mixed Race, Poverty Rate, Median Income, 4-Year or Higher Degree, Employment, and Year, and two-way interactions of Year with each other fixed factor, plus an offset by the log of population in millions. For random factors, I used intercept and noncorrelated slopes for each individual fixed factor (no interactions), each by state. Since it was a mixed model, and estimating coefficients for a mixed model can be extremely difficult when variables are different scales, I generated z scores (standardization) for predictors.

Six models were selected by this method, and a weighted average of the models was constructed. I used the weighted average to calculate the sizes of the fixed ("shared") and random ("state specific") effects. Confidence intervals for fixed effects were determined by stratified bootstrap, stratifying on state. Choropleth maps were all created using GoogleViz, then edited by hand for color scheme choices. Chart code was written by hand according to the Google visualization protocol. As I have mentioned, if you would like more explicit detail on my modeling, I would be happy to share it.