Blog 3: Polling

This blog is an ongoing assignment for Gov 1347: Election Analytics, a course at Harvard College taught by Professor Ryan Enos. It will be updated weekly and culminate in a predictive model of the 2022 midterm elections.

In this week’s blog, I will be incorporating polling into my predictive modeling. I will create predictions using polling in two ways. First, I’ll update a version of my economic fundamentals model from last week with polling data. Next, I’ll incorporate district-level polls and partisanship indicators to forecast seat share in the 2022 midterms, completing blog extension 3.

District-Level Polling Model

Of course, thus far I have only considered national generic-ballot polls, not district-level polling. In this next model, I’ll use district-level polling to predict seat share in the 2022 midterms. Unfortunately, polling is not available for every district-level race. Polling is expensive, and there’s little need to poll non-competitive districts. Thus, we need to be able to also predict results for congressional districts that have not been polled this cycle. To deal with this, forecasters like FiveThirtyEight have used algorithms like CANTOR, which are able to infer results in districts that have not been polled based on demographically similar districts that have been polled. For my model, I will use a simpler method to forecast the winner of districts that have little polling data available. The Cook Political Report’s Partisan Voter Index (PVI) measures how Republican or Democratic a district is relative to the nation as a whole, based on presidential election data from previous cycles. For example, a district with a PVI of D+3 is around 3 points more Democratic than the nation as a whole in terms of two-party vote share.

We can’t just use PVI on its own to predict the results of congressional districts since this would presume we’re in a national environment where Democratic and Republican support is equal. But we know from my previous popular vote model that the national environment currently skews Republican: based on my combined fundamentals/polling model, Democrats are only on track to win 49.09 percent of the two-party vote. In other words, relative to electoral equilibrium, Democrats trail by -0.91 percentage points in terms of the two-party vote. If we trust my national popular vote model, we can take this value to represent the national partisan environment from which congressional districts deviate. We can thus predict the two-party outcome in electoral districts by adding our value for national partisanship to the PVI of each district. This method assumes that all districts undergo uniform swing from election to election. This assumption likely fails since some demographically-similar districts may behave differently compared to other groups of demographically-similar districts. But for now, we can use uniform swing as a heuristic.

Outlined below are the outputs for two district-level models. The PVI model simply relies on the PVI of each district, adjusted based on the current national environment. The PVI_polls model only uses this adjusted PVI when polling is unavailable and otherwise defers to the district-level polling.

modelPVIPVI_polls
Dem Wins208.00209.00
Rep Wins227.00224.00
Tied Seats0.002.00
Dem Seat Share0.480.48
Rep Seat Share0.520.51


Both models predict very similar seat counts for the parties, suggesting that PVI with adjustments for the national environment may be a decent stand-in for district-level polling in unpolled districts. Shown below are histograms plotting the distribution of the predicted district partisanship for the two models. (Note that a district partisanship value of 20 means we’d expect Democrats to win roughly 50 + 20 = 70% of the two-party vote.) Plotted on the bottom is the distribution of baseline PVI for all congressional districts, without adjusting for the national environment.
Overall, PVI provides a helpful heuristic for modeling district-level results when district-level polls are unavailable. I’ll continue to explore the value of these types of district ratings as we consider expert predictions next week.