August 21, 2014

Mass Killings and the Good Old Days

On some message board or another, I saw somebody posting a link to an interactive map from USA Today. It quickly degenerated into a dogmatic flame war about how horrible guns are or aren't, with various racist jabs based on locations. I, being a nerd, took one look at the map and made a quip that killed the whole thread: "Look! Somebody has re-invented the population density map!" (I say things like that.) To my eye, all it really showed is how dense the US population might be. Then somebody finally threw out a comment that got it all roiling again. To wit, that it's all due to the breakdown of American society since the "good old days".

That got me to thinking. My off-the-cuff remark about population maps might or might not be quite on the button, but I really knew nothing about the frequency of mass murder in the USA over time, certainly not since the "good old days". My dachshund-like instincts triggered, I had to dig to the bottom of this. I gleaned the "rampage killing" events that occurred in the USA from the appropriate Wikipedia articles. Yes, I know, but it's a free source, and I did spot checking for several of the entries.

I pulled out information on mass killings that fell into the categories of "school massacres", "workplace killings" (with a subcategory for military), "religious, political, or racial", "familicides", "home intruders", and "rampage killers". "Rampage Killers" was further subdivided into "Vehicular" and "Other Methods". I found that bizarre, since there were examples classified as "school massacres" that used only a vehicle as the weapon, but they were not also classified as "vehicular". So, I collapsed the "vehicular" and "other methods" in with generic "rampage". I also combined non-military and military workplace killings. To make things crazier, Wikipedia has various pages outlining what it called "terrorism" in the USA, but they disagree over what constitutes a terrorist act and likewise classify killings done for insurance fraud as "terrorism". I gleaned the mass killings that I could find under "terrorism" and either reclassified them to "Religious/Racial" (for race riots or killings based entirely on the victims' religion without specific political context) or as "Terrorism", defined as an intentional attack upon civilians for a specific political purpose. This left me with the categories of "Workplace", "Terrorism", "School Attacks", "Religious/Racial", "Rampage", "Home Intrusion", and "Familicide". But how to compare those with the "good old days". There are a lot of ways to slice up time. Individual years is too fine for this purpose, decades seem to arbitrary. "Generations" seem to be a good place to at least start.

GenerationBirth YearsDominant Years
Baby Boom1943–19601982–2004

Is there a difference in mass killings by generation? There are a lot of ways that cake could be sliced. I picked Strauss and Howe's Generations: The History of America's Future. (1992. Harper Collins). Most of us are roughly familiar with it, even if we don't know it. It does have its flaws. they get a bit kooky, but it has seemed to be roughly useful. The relevant Strauss & Howe generations are thus:

"Birth Years" are the years that members of a given generation were born in. While useful for saying when a generation begins, it's a lousy way to describe when a generation was influential. Taking the name "GI Generation" as a guide, I decided that a generation's major "Dominant Years" are the "Birth Years" of the generation two steps after their own. Thus, the GI Generation was the major cultural force in the USA during the time that the Baby Boomers were born. Finally, I looked up population by state and year from the US Census Bureau.

And what do I do with all that? Raw event counts are often useless. If there are more people, anybody reasonable would expect more of any kind of event to happen. So, the thing to do is to adjust events for people. This makes a very big difference when you're talking about risk. I'll put it another way. In 2000, there were three mass killings in the USA. In 1902, there were three mass killings in the USA. Does that mean the risk of a mass murder occurring in the USA was equal in 1902 and 2000? No, because risk takes into account all the other times an event might have occurred but didn't. If an intersection has 10 collisions per year, it's a much bigger deal if the intersection only has 20 cars using it in that year than if it has 2000 cars using it. The same is true for these mass killings. Taking population into account, 1902 was three and a half times as dangerous for mass killing events to occur than was 2000. Well, then, divide the number of mass killings by population, group by generation, and voila! (Pardon my French.)

Population-Adjusted Mass Killings
by Generation and Type

Just use a minute to take this figure in. It's a lot if you're not used to fancy graphics or scientific papers, but it actually is sensible if you take it in stages. First, the height of each column (or stack) is the population-adjusted average of mass killings per year, by generation. I multiplied these numbers by a billion. Yes, a billion. Why a billion? Because, even if they are plastered all over the news, mass killings are a very rare event in terms of total US population. I multiplied by a billion just so the numbers would be readable.

You'll notice the columns are actually stacks of different colors. This is so you could see how the total mass killings were split among different types. If you roll your mouse over each stack, you'll get its summary. The letters above each stack are from what is called a "multiple range test", which is used to divide up samples that have more than one category. If a stack shares a letter with another stack, it's not considered statistically different (in total height) from that other stack. If you want the details on all of this, I've got a nerd's only section at the end where I go through the nuts and bolts of the modeling I did.

But what does all this mean? First, look at the total stack heights, most of them are pretty close to each other. The error bars are from "standard errors" for the model. Consider that the "reasonable wiggle room" that represents variability in each total stack. The letters are the real take-home message. According to the "model" I made of the data, for most of the 20th century and up into the 21st until the end of 2013, if we go by generation, overall mass killings were no less common "back then" than they are now. Let that sink in. No less common in the "good old days".

Yes, I do know you're making "Ooh!" noises and pointing at that short little stack for the GI Generation. I'm not ignoring it. I'm just waiting for everyone to notice it. Now that it's been noticed, "we're going to do some science", as my old ecology professor used to say. What Brent (at my alma mater, we were to call professors by their first names) meant is that numbers and charts are summaries. Science is what happens when you try to pull a little meaning out of it. So, what's the meaning we can pull? Short version: The GI Generation was not normal. Long version: We look back to the WWII and post-war years as the defining times of "America". Every aspect of American life is still defined in terms of what the GI Generation did, had, or wanted. How did this happen? It's a collision of several forces. Television. While the Baby Boomers grew up on TV, it was the GI Generation that made the shows they watched and thus defined what the Boomers considered "normal life". The Boomers, themselves, contribute to America's unquestioning acceptance of the GI Generation's dominant period as the standard. It was when they grew up. Thus, the conditions of that era are remembered as "the way things work" by what is still the largest generational group in America. It doesn't matter that "the way things worked" might have been very different before their own childhoods. What we grew up with is usually what we decide is "normal" for the rest of our lives.

But what about the categories? Why did I bother classifying them? And what about maps? This post is already long enough, I'll get to it next time.

Nerd Postlude

Warning: If you have not yet earned your Gold Pocket Protector with 20-Sided Dice Clusters, the following may cause your brain to ooze out your ears like badly-made guacamole, your ears to then slide down to your chin, and your vocabulary to be reduced to repeating "Uhhhhhhhhhh" for an indefinite period of time. If you notice these symptoms, immediately apply an appropriate antidote, including but not limited to funny kitten videos, babes/hunks in bikinis/speedos, cookie recipes, or other uplifting but not painfully technical uses of the World Wide Web. This part is fairly hardcore nerdery, with stuff that would take an enormous amount of space to explain. If it makes no sense to you, it's okay, it doesn't mean you're stupid.

With InterceptWithout Intercept
FactorEstimateStd. Errorz valuepEstimateStd. Error
Miss. Gen.0.2540.1461.7480.081-10.9580.158
Lost Gen.0.0150.1610.0930.926-11.1980.180
GI Gen.-0.7980.209-3.815<0.001-12.0110.243
Sil. Gen.0.1200.1210.9940.320-11.0930.123
Gen. X0NANANA-10.9380.143

As promised, I lift the hood on my analysis. Some of you have, no doubt, already noticed that this entire blog entry was just to present a fairly simple linear model. But preliminaries like identifying my data sources (with all their flaws), introducing my factor definition, etc. can't be disposed of. The model I presented is a simple one-factor linear model, specifically "EventsPopulation ~ Generation". That is, Events (mass killings), adjusted for Population, grouped by Generation. The question I asked was "Does grouping by generation actually mean anything?" A naive approach to this would have been to do an ANOVA. However, my data is actually counts offset by population. Count data is very often better modeled with a poisson error distribution. I began with a simple generalized linear model (glm). However, testing dispersion revealed that it was underdispersed. Among the many alternatives to deal with this, I chose to use a mixed model (glmm), which sort of "shoves" the dispersion issue onto a random variable. I used the year for this.

I set up orthogonal contrasts to compare each generation against Generation X, since ran the model "Event ~ Generation + offset(log(Population)) + (1|Year)" in the R environment, using the lme4 package. I ran the model with and without an intercept. The no-intercept model was used to generate coefficients and standard errors for the figure. Analysis was done on the with-intercept model. The glmm showed that GI Generation was distinct from Generation X, but I was interested in simultaneous pairwise comparison. For this I used the multcomp package, simultaneous Tukey contrasts. They are summarized in the figure.

No comments:

Post a Comment