I performed my first regression at 19. It felt almost as important as losing my virginity. Speculations became testable. The deeper structure of society became legible. At the time, I liked baseball, loved the Baltimore Orioles, and hated Cal Ripken, whose consecutive games streak actually hurt the team by inflating his salary without creating many marginal runs. I ran to the computer lab and crunched regressions using singles, doubles, and home runs to predict wins. It worked. Not only did all the coefficients make sense, they proved Cal Ripken was grossly overpaid for a mediocre third baseman. Vindication tasted sweet. I knew shit the announcers didn’t and could prove it with statistical tools they couldn't even understand. The assholes who bleat “correlation does not imply causation” have no idea how slippery and inane causation is. A high R-squared paired with an intuitive story is all I ever want or need.
Since college, I've averaged a regression or two per year. Gathering data is tedious, though. I’ll only do it if I’m motivated. I might gather data more often if it paid well, but being a middling trial lawyer pays far better than being a middling data nerd.
ChatGPT Pro has made regressions fun again. It gathers data for me. I can run a dozen regressions in an hour, testing a new hypothesis every 3 to 5 minutes. Of course, you can’t trust AI completely—it has a nasty habit of forgetting data it already looked up or slipping in dummy numbers that ruin your regression. Still, a good R-squared is like the scent of prey. Getting a strong R-squared purely by chance is damn near impossible. I worry very little about data quality until I find a promising R-squared that matches a plausible story. Only then do I double-check that the AI hasn’t hallucinated. I’ve spent the past couple of days running regressions to understand murder rates better. I’ve checked the numbers, they make sense, and I’m confident someone who tried to replicate my work would reach similar results.
Crude Racism and Crude Egalitarianism are Both Bullshit
Plenty of people online want nothing more than to prove white people are superior to black people. Plenty of others want to assume all races are inherently equal, and any group differences must stem from sinister social dynamics. Both positions are equally unscientific. Racial difference is a complicated, messy empirical question. Huge genetic and cultural differences exist within any racial category, especially among black Africans. Indeed, because black Africans have a longer evolutionary history, they vary more than any other group.
Why do I care about race and human biodiversity? Part of my interest is just the boyish thrill of tasting forbidden fruit—like liberals reading banned books or telling Trump to fuck off. Race is also a useful analytical category. It strongly correlates with voting patterns and educational achievement, retaining explanatory power even when you control for class. Race correlates with wealth and income. Racial demographics predict murder rates better than poverty rates, gun ownership, or the proportion of murders that get solved. They predict murder rates better than household income or weather. Refusing to acknowledge a variable this powerful is putting ideology above truth.
Environment and Genetics Have a Roughly Multiplicative Effect
Bad enough genes and a bad enough environment are equally fatal. Suppose a young Japanese boy, born in 1944 with outstanding genes for intelligence, dies in an American bombing raid. Whatever genetic advantages this poor kid had were obliterated by being raised in a war zone. Suppose a billionaire conceives a child with a common trisomy. Even with the best obstetric care, this child probably won’t survive the womb. If the child has a comparatively minor trisomy, such as Down syndrome, he might survive into adulthood, yet all the education in the world is unlikely to produce an above-average IQ.
Subtler environmental effects can have big impacts too. Usain Bolt would never have become the fastest man in human history had he grown up stunted by malnutrition or slaving away 60 hours a week in a sweatshop from age 10. Einstein’s name would never have become synonymous with genius had he been exposed to too much lead as a child. Plenty of clever peasants throughout history likely had genetic gifts equal to Einstein’s but never received enough education to pursue science. People who claim human behavior is entirely genetic or entirely environmental are idiots. Both matter—the real question is how much weight each gets. Environment is also notoriously tricky to quantify. Severe poverty devastates human development. But it’s far less clear if parents in the 80th percentile of parenting skill meaningfully outperform those in the 60th percentile. Add in the fact that higher-skilled parents probably pass along better genes, and it becomes brutally hard to disentangle genes and environment. One of my smartest friends in college had an alcoholic mother, drank a fifth of vodka a day, and died at 43, having spent his last half dozen years unemployed. Genes and environment intertwine in confounding ways. How much parents should sacrifice for their offspring is an incredibly important question, but doesn’t have an easy answer.
Environment Strongly Predicts Murder Rates
Black skin, by itself, doesn’t correlate well with murder. Several cities in sub-Saharan Africa have lower murder rates than Thunder Bay, Ontario—a city in a safe country with almost no black population:
The variation of murder rates within Africa is massive. Some cities in sub-Saharan Africa are incredibly violent:
However, African Cities aren't uniquely violent—Latin American cities can be even more sanguinary. Consider the homicide rates below:
By contrast, many Latin American cities have murder rates lower than Dallas, Texas or Thunder Bay:
Africa, Latin America, and the United States all demonstrate massive local variation in murder rates.
Clusters of Poor Blacks and Latinos Explain Most American Murder
The single best predictor of American murder rates is the concentration of poor black residents in a community. Using only three variables, I explained 89.3% of the variation in murder rates among America’s 20 largest cities, with a statistical significance of 7.44×10^-7. That's a hell of a lot of signal and very little noise.
The homicide rate per 100,000 residents can be expressed clearly through the following regression formula:
Here's the exact data fed into the regression model:
The biggest outlier is Charlotte, with a murder rate 30% less than expected. Charlotte has a reputation for good race relations, perhaps this has suppressed homicide. That’s only a surmise, and I want to look into Charlotte more. The figures for poor blacks and Latinos represent the fraction of the total city population that belongs to the specified group and earns less than 200% of the poverty threshold.
The “financial capital” dummy variable needs a brief explanation: major cities hosting national financial markets typically have lower murder rates than poorer, provincial cities. Toronto is safer than Winnipeg or Edmonton, Mexico City is safer than Tijuana or Juárez, Paris is safer than Marseille, and London is safer than Manchester or Birmingham. New York is America's financial capital, with fiscal capacity that other cities can't match. I gave Chicago half credit because it hosts America's commodities and futures exchanges and has a metropolitan GDP comparable to London or Paris. Chicago’s murder rate is lower than expected based purely on racial demographics. San Francisco and Los Angeles perhaps deserved half credit too. This would have slightly improved the R-squared, but I didn't want to rely on too many subjective judgments. Chicago hosts the Board of Trade and the Mercantile Exchange, which is an objective reason to give it half credit for being a financial capital.
The dummy works well by constraining New York, which would otherwise be a huge outlier, and then treating Chicago intuitively. If you object to suppressing the biggest outlier with a dummy variable, I grant you have a point. Fortunately, I only have to give up a wee bit of R-squared to use a numerical measure: millionaires per capita. This is a decent proxy for fiscal capacity, and it if significant with p=0.011.
Coefficient Table:
| Predictor | Coef. | P>|t| | |
| Intercept | –5.94 | ... | ... | 0.08
| | (Black Share × Black Poverty Share)² | 30.9 | ... | ... | <0.001
| | (Poor Latino Share)² | 17.2 | ... | ... | 0.003
| | Millionaires per Capita | –80.9 | ... | ... | 0.011
Model Fit:
F-statistic: 34.24
p-value: 3.41 × 10⁻⁷
Adjusted R²: 0.840
In any event, concentrations of poor blacks and Latinos correlate catastrophically with higher murder rates. Racial caste systems strongly predict violence. The highest murder rates in Africa occur in South Africa, still deeply scarred by apartheid. South Africa retains the largest white population in sub-Saharan Africa, which wielded more political and economic power for longer than anywhere else in Africa. Cape Town has a much higher murder rate than Chicago and a much uglier racial history. This makes intuitive sense. If I were poor and constantly saw people who looked different from me holding most of the wealth, I'd be pretty pissed off.