My Model of the Presidential Election

Hey Mom, I Coded a Cholesky Decomposition

Oct 02, 2024

I’ve followed Nate Silver’s models of presidential elections since 2008. I greatly respect his method but don’t always agree with his assumptions. For instance, penalizing Harris’ polling numbers for a “convention bounce” seemed dubious when Harris had only recently entered the race. I’m also not a big fan of discounting Harris’s polling numbers based upon “economic fundamentals” when unemployment is around 4%, inflation is under 3%, gas is about $3 a gallon and the stock market is near record highs. These adjustments cost Harris a couple tenths of a point in Nate’s polling averages.

The key to coding a “professional grade” model is capturing the correlations between the different states. Calculating the correlations is easy, it only takes a spreadsheet or a calculator and some patience. Producing random variables that follow these correlations — every swing state outcome is correlated with and therefore constrained by every other swing state — is trickier. Thankfully, Chat GPT4 is a totally awesome tool if you sort of know what you are doing. Even betterm it is amazingly good at linear algebra. It suggested code that can factor the correlation matrix into a Cholesky decomposition lickity split. Multiply this matrix by a vector of normally distributed variables and, voila, you can simulate state results in a way that builds in 45 separate correlations.

I calibrated my model so that Harris’s share of the two-party “national” vote has a standard deviation of 1.62% and her share of the two-party vote in Pennsylvania and Georgia have mean standard deviations of 3%. This translates to basically a 3.44% standard deviation in Harris’ popular vote margin and 6% standard deviation in her margin in Pennsylvania/Georgia. My experience suggests I should use somewhat smaller variances on election day, perhaps vote share standard deviations of 1.5% and 2.5%, respectively. Intriguing, because Harris is somewhat ahead in the polls and has far more plausible paths to victory than Trump, the choice of standard deviation doesn’t affect the final result much. This will change if either candidate opens a significant lead, in which case greater variance would constrain the front runner’s win probability,

Right now, my model gives Harris a 58.4% chance of winning the electoral college.

Anyway, the best thing about having my own “professional grade” model is I can test different scenarios. Nine times out of ten, I convince myself that Nate Silver is basically right. However, I have a lot more confidence in his predictions having done the work myself.

I’m making my code public so anyone with a C++ compiler can try different scenarios for themselvs. My model assumes Harris will win every state bluer than Minnesota and that Trump will win every state redder than Texas. Texas is the tipping point state only 0.1% of the time, so the odds that Virginia or Missouri ends up tipping the election are miniscule. I also assume that the Nebraska 2nd congressional district shows the same variance patterns as Minnesota.

My code follows, you can cut and past it into your C++ compiler:

#include <iostream>

#include <random>

#include <vector>

#include <cmath>

#include<iomanip>

using namespace std;

// Function for Cholesky decomposition of a matrix

std::vector<std::vector<double>> choleskyDecomposition(const std::vector<std::vector<double>>& matrix) {

int n = matrix.size();

std::vector<std::vector<double>> lower(n, std::vector<double>(n, 0));

for (int i = 0; i < n; i++) {

for (int j = 0; j <= i; j++) {

double sum = 0;

for (int k = 0; k < j; k++) {

sum += lower[i][k] * lower[j][k];

}

if (i == j) {

lower[i][j] = std::sqrt(matrix[i][i] - sum);

}

else {

lower[i][j] = (matrix[i][j] - sum) / lower[j][j];

}

cout <<endl<<"FACTORIZING CORRELATION MATRIX" << endl;

for (short a = 0;a < lower.size();a++)

{

for (short b = 0;b <= a;b++)

{

cout<<setprecision(2) << lower[a][b]<<"\t";

}

std::cout << endl;

}

return lower;

}

int main() {

// Mean and standard deviation

double mean = 0;

double sigma = 1.45;

int dualwin(0);

// Corrected correlation matrix including Texas and NE-2

std::vector<std::vector<double>> correlationMatrix = {

{1.0, 0.79911, -0.04574, 0.79701, 0.46629, 0.89996, 0.34071, 0.79601, 0.64514, 0.41656, 0.41629}, // NE-2 with MN correlation

{0.79911, 1.0, 0.41431, 0.81127, 0.56702, 0.86392, 0.71487, 0.80164, 0.68268, 0.56121, 0.56702}, // MN and NE-2 correlation

{-0.04574, 0.41431, 1.0, 0.01899, 0.45030, 0.03037, 0.85474, 0.07055, 0.04081, 0.73922, 0.45030},

{0.79701, 0.81127, 0.01899, 1.0, 0.57500, 0.83774, 0.43582, 0.90569, 0.86946, 0.29993, 0.57500},

{0.46629, 0.56702, 0.45030, 0.57500, 1.0, 0.44486, 0.57853, 0.73339, 0.74048, 0.74582, 0.90000}, // MN and NE-2 correlation

{0.89996, 0.86392, 0.03037, 0.83774, 0.44486, 1.0, 0.48683, 0.85462, 0.74400, 0.39811, 0.44486},

{0.34071, 0.71487, 0.85474, 0.43582, 0.57853, 0.48683, 1.0, 0.43292, 0.43005, 0.82933, 0.57853},

{0.79601, 0.80164, 0.07055, 0.90569, 0.73339, 0.85462, 0.43292, 1.0, 0.87445, 0.41170, 0.73339},

{0.64514, 0.68268, 0.04081, 0.86946, 0.74048, 0.74400, 0.43005, 0.87445, 1.0, 0.44291, 0.74048},

{0.41656, 0.56121, 0.73922, 0.29993, 0.74582, 0.39811, 0.82933, 0.41170, 0.44291, 1.0, 0.74582},

{0.46629, 0.56702, 0.45030, 0.57500, 0.90000, 0.44486, 0.57853, 0.73339, 0.74048, 0.74582, 1.0} // NE-2 correlation

};

// Cholesky decomposition of the correlation matrix

std::vector<std::vector<double>> lowerTriangular = choleskyDecomposition(correlationMatrix);

// Random number generation

std::random_device rd;

int exactly270(0);

int harrisev(0);

int dwin(0);

int pacount(0);

int mwsplit(0);

int fatalsplit(0);

int gacount(0);

int southernsplit(0);

int southernev(0);

int narrowpath(0);

int micount(0);

int withoutpa(0);

int flct(0);

int txct(0);

int ncct(0);

int nvct(0);

int mnct(0);

int azct(0);

int wict(0);

int azwin(0);

int gawin(0);

int flwin(0);

int miwin(0);

int mnwin(0);

int nvwin(0);

int ncwin(0);

int pawin(0);

int wiwin(0);

int txwin(0);

int ne2win(0);

int azloss(0);

int galoss(0);

int flloss(0);

int miloss(0);

int mnloss(0);

int nvloss(0);

int ncloss(0);

int paloss(0);

int wiloss(0);

int txloss(0);

int ne2ct(0);

int ne2loss(0);

int bluewallholds(0);

int harrisgood(0);

int trumpgood(0);

float correlsum(0);

int correlharris(0);

int correltrump(0);

int closeharris(0);

int georgiasave(0);

int trumpyga(0);

int bluega(0);

int paha(0);int patrump(0);

bool pa, mi, wi, mn, ga, nc, az, fl, nv, tx, ne2;

short j1;

float azavg, gaavg, flavg, miavg, mnavg, nvavg, ncavg, paavg, txavg, wiavg, ne2avg;

azavg = 49.35;flavg = 48.25;gaavg = 49.65;miavg = 51;mnavg = 52.95;

nvavg = 50.9;ncavg = 49.8;paavg = 50.6;ne2avg = 53.5;txavg = 47.1;

wiavg = 50.95;

cout << "press 1 to enter polling averages manually, any other kep to use deaults";

cin >> j1;

if (j1 == 1)

{

cout << "enter AZ polling avg";

cin >> azavg;

cout << "enter FL polling avg";

cin >> flavg;

cout << "enter GA polling avg";

cin >> gaavg;

cout << "enter MI polling avg";

cin >> miavg;

cout << "enter MN polling avg";

cin >> mnavg;

cout << "enter NV polling avg";

cin >> nvavg;

cout << "enter NC polling avg";

cin >> ncavg;

cout << "enter PA polling avg";

cin >> paavg;

cout << "enter TX polling avg";

cin >> txavg;

cout << "enter WI polling avg";

cin >> wiavg;

cout << "enter NE-2 polling avg";

cin >> ne2avg;

}

for (int bl = 0; bl < 1000000; bl++) {

correlsum = 0;

pa = false; mi = false; wi = false; mn = false; ga = false; nc = false;

az = false; fl = false; tx = false; ne2 = false;nv = false;

ne2 = false;

harrisev = 215;

std::mt19937 generator(rd()); // Uses a new seed for each run

normal_distribution<double>ndistribution(0, 1.55);

double mean0 = ndistribution(generator);

normal_distribution<double> distribution(mean0, 1.45);

// Generate new uncorrelated random variables for each iteration

std::vector<double> uncorrelatedRandomVariables(11);

for (int i = 0; i < 11; ++i) {

uncorrelatedRandomVariables[i] = distribution(generator);

if (i < 10)

correlsum += uncorrelatedRandomVariables[i];

}

if (correlsum > 29.2)correlharris++;

else if (correlsum < -29.2)correltrump++;

float vs(0);

// Apply Cholesky transformation to create correlated variables

vector<double> correlatedRandomVariables(11, 0.0);

for (int i = 0; i < 11; ++i) {

for (int j = 0; j <= i; ++j) {

correlatedRandomVariables[i] += lowerTriangular[i][j] * uncorrelatedRandomVariables[j];

}

if (i == 0) {

if (azavg + correlatedRandomVariables[0] > 50.0) {

az = true;

harrisev += 11;

azct++;

vs += correlatedRandomVariables[0];

}

else if (i == 1) {

if (flavg + correlatedRandomVariables[1] > 50.0) {

harrisev += 30;

fl = true;

vs += correlatedRandomVariables[1];

}

else if (i == 2) {

if (correlatedRandomVariables[2] < -2.5)

trumpyga++;

if (gaavg + correlatedRandomVariables[2] > 50.0) {

gacount++;

harrisev += 16;

ga = true;

vs += correlatedRandomVariables[2];

if (correlatedRandomVariables[2] > 2.5)

bluega++;

}

else if (i == 3) {

if (51.05 + correlatedRandomVariables[3] > 50.0) {

harrisev += 15; mi = true;

micount++;

vs += correlatedRandomVariables[3];

}

else if (i == 4) {

if (mnavg + correlatedRandomVariables[4] > 50.0) {

harrisev += 10;

mn = true;

mnct++;

vs += correlatedRandomVariables[4];

}

else if (i == 5) {

if (nvavg + correlatedRandomVariables[5] > 50.0) {

harrisev += 6;

nvct++;

nv = true;

vs += correlatedRandomVariables[5];

}

else if (i == 6) {

if (ncavg + correlatedRandomVariables[6] > 50.0) {

harrisev += 16;

nc = true;

ncct++;

vs += correlatedRandomVariables[6];

}

else if (i == 7) {

if (correlatedRandomVariables[7] < -3)

patrump++;

if (paavg + correlatedRandomVariables[7] > 50.0)

{

if (correlatedRandomVariables[7] > 3)

paha++;

harrisev += 19;

pacount++;

pa = true;

vs += correlatedRandomVariables[7];

}

else if (i == 8) {

if (wiavg + correlatedRandomVariables[8] > 50.0) {

harrisev += 10;

wi = true;

wict++;

vs += correlatedRandomVariables[8];

}

else if (i == 9) { // Texas

if (txavg + correlatedRandomVariables[9] > 50.0) {

//harrisev += 38;

tx = true;

txct++;

vs += correlatedRandomVariables[9];

}

else if (i == 10) { // NE-2

if (ne2avg + correlatedRandomVariables[9] > 50.0) {

harrisev += 1;

ne2 = true;

ne2ct++;

}

}//calibration

if (correlsum > 16.2)harrisgood++;

if (correlsum < -16.2)trumpgood++;

if (mi && pa && wi && mn && ne2)

bluewallholds++;

// Scenario analysis and output logic

if (harrisev > 269) {

dwin++;

if (az)azwin++;if (ga)gawin++;if (fl)flwin++;

if (mi)miwin++;if (mn)mnwin++;if (nv)nvwin++;

if (nc)ncwin++;if (pa)pawin++; if (wi)wiwin++;

if (tx)txwin++;if (ne2)ne2win++;

if (pa && wi && mi && mn && ne2)

mwsplit++;

}

else {

if (az)azloss++;if (ga)galoss++;if (fl)flloss++;

if (mi)miloss++;if (mn)mnloss++;if (nv)nvloss++;

if (nc)ncloss++;if (pa)paloss++;if (wi)wiloss++;

if (tx)txloss++;if (ne2)ne2loss++;

if (pa && wi && mi && mn && ne2)

mwsplit++;

if (!mwsplit && harrisev < 270)

fatalsplit++;

}

if (ga || nc || az || fl || tx)

southernev++; // NE-2 excluded from Southern EVs

if (ga != nc) southernsplit++;

if (harrisev == 270)exactly270++;

if (harrisev > 269 && harrisev < 289)

closeharris++;

if (ga && !mi && dwin)

georgiasave++;

}

std::cout << std::endl << "HARRIS WON " << dwin << " out of 1 Million times"<<endl;

cout << endl << "she got exactly 270 EVs " << exactly270 << " times" << endl<<endl;

cout << "\t the blue wall holds " << bluewallholds << " times";

cout<<endl << " Blue Wall Cracks " << 1000000 - mwsplit << " times \t of which Harris salvages " << dwin - bluewallholds << " wins";

cout << endl << " Harris wins a southern state (GA/AZ/NC/FL/TX): " << southernev <<" times"<< endl;

cout << endl << "State\t" << "wins with state\t\t " << "losses with state\t\t"

<< "wins without state\t\t" << "losses without state" << endl;

cout << "AZ\t\t" << azwin << "\t\t\t" << azloss << "\t\t\t\t"

<< dwin - azwin << "\t\t\t\t" << 1000000 - dwin - azloss << endl;

cout << "GA\t\t" << gawin << "\t\t\t" << galoss << "\t\t\t\t"

<< dwin - gawin << "\t\t\t\t" << 1000000 - dwin - galoss << endl;

cout << "FL\t\t" << flwin << "\t\t\t" << flloss << "\t\t\t\t"

<< dwin - flwin << "\t\t\t\t" << 1000000 - dwin - flloss << endl;

cout << "MI\t\t" << miwin << "\t\t\t" << miloss << "\t\t\t\t"

<< dwin - miwin << "\t\t\t\t" << 1000000 - dwin - miloss << endl;

cout << "MN\t\t" << mnwin << "\t\t\t" << mnloss << "\t\t\t\t"

<< dwin - mnwin << "\t\t\t\t" << 1000000 - dwin - mnloss << endl;

cout << "NV\t\t" << nvwin << "\t\t\t" << nvloss << "\t\t\t\t"

<< dwin - nvwin << "\t\t\t\t" << 1000000 - dwin - nvloss << endl;

cout << "NC\t\t" << ncwin << "\t\t\t" << ncloss << "\t\t\t\t"

<< dwin - ncwin << "\t\t\t\t" << 1000000 - dwin - ncloss << endl;

cout << "PA\t\t" << pawin << "\t\t\t" << paloss << "\t\t\t\t"

<< dwin - pawin << "\t\t\t\t" << 1000000 - dwin - paloss << endl;

cout << "TX\t\t" << txwin << "\t\t\t" << txloss << "\t\t\t\t"

<< dwin - txwin << "\t\t\t\t" << 1000000 - dwin - txloss << endl;

cout << "WI\t\t" << wiwin << "\t\t\t" << wiloss << "\t\t\t\t"

<< dwin - wiwin << "\t\t\t\t" << 1000000 - dwin - wiloss << endl;

cout << "NE-2\t\t" << ne2win << "\t\t\t" << ne2loss << "\t\t\t\t"

<< dwin - ne2win << "\t\t\t\t" << 1000000 - dwin - ne2loss << endl;

cout << endl << endl << "harris overperforms: " << correlharris << " trmp overperforms: " << correltrump;

cout << endl << "harris wins close election with <289EV: " << closeharris;

cout << endl << "georgia saves michigan " << georgiasave;

cout << endl << "PA >+1sd for harris " << paha << " and >1 pro trump: " << patrump;

cout << endl << "GA calibration: " << bluega << " / " << trumpyga;

cout << endl << "swing state corr. errors >1SD harris: " << harrisgood << " \t and >1SD trump: " << trumpgood;

return 0;

}

David’s Determinist Substack

Discussion about this post