Model Reflection

In this blog post, I’ll reflect on the performance of my model using actual results from the 2022 midterms. As a reminder, my final model found that Republicans were strongly favored to take the House. My model predicted that Democrats would win 200 seats and Republicans would win 235 seats. Moreover, I gave Democrats an 8.14% chance of winning the House.

In 80% of simulations of my model, Democrats won between 185 and 216 seats, and Republicans won between 219 and 250 seats.

The drop-down menu below displays histograms with my predictions for every district in the nation.

Accuracy of My Model

Clearly, my model underestimated Democrats chances, predicting they would win 200 seats when in reality they are on track to win 213. But this 213-222 final seat share was still a reasonably likely outcome according to my topline histogram, with this result sitting within the 80% prediction interval. This may suggest that although my final point prediction was off, my model did a better job considering the levels of uncertainty in the election.

Using the probabilities generated for each of my district forecasts, I calculated a Brier score for my model of 0.03396. And using the point predictions for each of my district forecasts in contested races, I calculated a root mean square error of 2.6852. According to Kiara’s calculations, FiveThirtyEight’s district models had a brier score of 0.032 and an RMSE of 3.99, so although my overall seat prediction was further from the mark, the accuracy of my model was in line with some of the major forecasters.

Listed below are the districts that I called incorrectly, sorted by the Democratic win percentage determined by my model.

DistrictWinnerDem_Win_ProbActual_Dem_VotePredicted_Dem_VoteLowerUpper
WA-03Democrat5.9150.4841.7032.8450.57
NC-13Democrat13.6951.3244.2235.3653.09
OH-13Democrat18.1352.5945.2636.4054.13
AK-01Democrat18.3064.8645.3636.4954.23
CO-08Democrat22.2250.3845.9937.1354.86
PA-17Democrat23.3153.1646.2237.3555.09
NM-02Democrat26.2650.3546.6237.7555.49
MI-03Democrat28.4156.6747.0338.1655.89
ME-02Democrat31.4951.8047.4338.5656.30
NC-01Democrat33.5252.2847.6738.8056.54
OH-01Democrat34.2052.4647.8839.0156.74
IL-17Democrat34.7851.7347.9139.0456.78
IL-13Democrat40.8855.8948.7739.9057.64
CA-13Democrat43.8450.2749.3040.4458.17
PA-08Democrat43.8451.2549.1140.2457.98
RI-02Democrat47.0251.8749.7540.8858.62
OR-06Democrat47.2351.1449.7240.8558.58
OR-04Democrat48.3753.9749.7940.9258.66
CO-07Democrat48.7057.6549.7440.8758.61
OH-09Democrat49.6556.5549.9841.1158.85
IA-03Republican50.1149.6550.0541.1858.92
CA-22Republican51.3647.5750.1241.2558.99
NY-04Republican53.2548.1250.4341.5659.29
VA-02Republican56.0148.2950.7341.8659.59
NJ-07Republican63.3147.7051.7642.9060.63
NY-17Republican84.5549.5355.2946.4264.16

I called 26 races incorrectly (FiveThirtyEight, for reference, called 23 incorrectly). Notably, 20 of these races were those that I initially called for Republicans but which on election day were won by Democrats, while only 6 of the districts that I predicted would go Democratic in fact went Republican. This aligns with my broader underestimation of Democrats.

Many of the districts that I called incorrectly were those that I had pegged as toss-ups. I gave Democrats a 50.11% chance of winning Iowa’s 3rd, so a Republican victory was the mere toss of a coin. In 12 of the districts I called incorrectly, I gave the actual winner at least a 40% chance of victory. Some of my bigger misses were upsets that came as a shock to everyone. I gave Democrat Marie Gluesenkamp Perez only a 5.91% chance of winning Washington’s 3rd District — FiveThirtyEight gave her only a 2% chance. Similarly, my model thought Democrat Sean Patrick Maloney would win New York’s 17th in 84.55% of simulations. FiveThirtyEight was less certain in Maloney’s victory but still confident enough to give him 70-30 odds. Other miscalled races came down to clear flaws in my methodology. I gave Democrat Mary Peltola only an 18.30% chance of winning Alaska’s at-large district, despite the conventional (and ultimately correct) wisdom that Peltola was competitive and even favored in the race. My forecast was off since it did not consider Alaska’s RCV system, the poor candidate quality of the Republicans in the race, or the quirks of Alaska politics.

The map below plots the districts that I called incorrectly.

Republicans won upset victories in two races in New York, and Democrats outperformed my model in districts across the Midwest and in South Carolina.

Turning now to my vote share point predictions, the histogram below plots the model’s error in contested seats (that is, the actual Democratic vote share minus the predicted Democratic vote share).

My point predictions are slightly skewed toward underestimating Democrats (the median error, for reference, is 0.70 percentage points). But the distribution of point prediction errors is still roughly centered around 0. In the majority of contested districts, my model was within 5 percentage points of the true Democratic vote share and within 10 percentage points for nearly all contested districts. A notable outlier is Alaska, though this is effectively just a data error caused by Alaska’s RCV system. (The data set only considers the votes of the top vote-getting Republican, rather than adding the votes of the two Republicans in the race or adjusting for RCV results in some way.)

The map below plots the errors across districts. Grey districts were either uncontested or had multiple Republicans or Democrats on the ballot.

Once again, Republicans overperformed my model in New York state, as well as Florida and parts of California. My model underestimated Democrats across the Midwest and in South Carolina.

Finally, the graph below plots the actual Democratic two-party vote share versus the predicted Democratic two-party vote share.

Overall, my predictions fairly closely map the actual results, with the districts generally following the 45-degree line.

Where I Went Wrong

I think my predictions were fairly reasonable given the outputs I was feeding my model. My forecast was based on fundamental conditions, such as the partisanship and demographics of districts and the fact this was a midterm election cycle with an incumbent Democratic president. These fundamentals seemed to be objectively bad for Democrats, based on the historical data my model is based off of. My model also incorporated national polling, and since it was trained on polling data which had had a pro-Republican bias in recent years, the model expected another modest Republican polling bias. These factors led to overestimating Republicans’ chances.

In particular, I was unable to consider significant regional variations in party performance. This midterm cycle saw a wide range in swings depending on the state: Florida and New York experienced Red Waves, while Michigan experienced a Blue Wave. But my model simply applied a national swing (based on the generic ballot polling average) to all districts and was thus unable to consider regional correlations.

I worry that this lack of consideration for state-based dynamics left my model unable to consider the particular quirks of this election cycle. As evidence for this, my model generally underestimated Democrats in places where Democrats overperformed Biden and the baseline partisanship of districts (the Midwest) and underestimated Republicans in places where Democrats underperformed Biden and the baseline partisanship of districts (Florida, New York, California). In other words, my model was overly reliant on district-level fundamentals like PVI, not anticipating the particular regional contexts of certain races in 2022.

To test this hypothesis further, I may run a regression between my model’s error and 2020-2022 district-level vote swing. If my model is indeed overly reliant on regional fundamentals, I’d expect a strong positive correlation between Democratic overperformance relative to my model and Democratic overperformance relative to 2020.

Improvements for Next Time

One way to address the pitfalls discussed above would be to consider the correlations between districts based on geographic proximity and demographic similarity. FiveThirtyEight’s CANTOR algorithm, for example, is able to make predictions for districts with sparse data based on data in similar districts. This would allow me to consider region- and demographic-based uncertainties rather than relying on a uniform swing model.

Another approach could be to consider the effects of top-of-ballot races on congressional elections. States in which Senate and gubernatorial candidates of a certain party performed well also tended to see that party fare well in House races (e.g. for Republicans, Ron DeSantis and Lee Zeldin; for Democrats, Gretchen Whitmer). In future models, I may therefore include polling for Senate and gubernatorial candidates, which provides a means of considering regional variations. It also provides a way to integrate potentially valuable state-level polling data that I did not consider this time around.