Weighting and Its Consequences

Variance Reduction in Discrete Outcomes and Its Implications for Survey Aggregation

Mark Rieke

May 15, 2025

Disclaimer

The views and opinions expressed in this presentation are wholly my own and do not necessarily represent that of my employer, Game Data Pros, Inc (GDP).

Outline

  • Setting the Stage
  • Recreating Results
  • Extending the Example
  • Adjustments for Aggregators

Setting the Stage

Setting the Stage: About Me

Setting the Stage: About Me

Setting the Stage: Pennsylvania Polling

date sample_size margin
September 24 760 -
September 24 582 +3.1%
September 23 601 +2.2%
September 23 384 -
September 22 644 -
September 20 768 -
September 20 760 +1.1%
September 19 1,020 -
September 19 432 -
September 19 752 +2.1%

Setting the Stage: Pennsylvania Polling

Recreating Results

Recreating Results

Recreating Results

group group_mean population p_respond
A 400 60% 10%
B 800 20% 5%
C 700 10% 3%
D 600 10% 1%
  • True population mean: 530
  • Responses sampled from \(\mathcal{N}(\mu_g,50)\)
  • Simple population weighting strategy: \(w_g = \frac{P_g}{\left(\frac{N_g}{\sum_g N_g} \right)}\)

Recreating Results

Extending the Example

Extending the Example

  • Little and Vartivarian demonstrate the effect of weighting with a continuous outcome.
  • A broad class of survey results are discrete outcomes.
  • How do these effects hold up in the discrete case?

Extending the Example

group group_mean population p_respond
A 3% 50% 5%
B 97% 50% 7%
  • True population mean: 50%
  • Responses sampled from \(\text{Bernoulli}(\theta_g)\)
  • Simple population weighting strategy: \(w_g = \frac{P_g}{\left(\frac{N_g}{\sum_g N_g} \right)}\)

Extending the Example

Extending the Example

  • We don’t observe the same effects for cases 1 and 2.
  • I drastically increased the correlation with the outcome to see an effect in cases 3 and 4.
  • New simulation — How much correlation with the outcome is needed to see a benefit in variance reduction?
    • Vary correlation with nonresponse/outcome
    • Record simulated standard error

Extending the Example

Extending the Example

  • Any amount of correlation with the outcome decreases the variance when uncorrelated with nonresponse.
  • As correlation with nonresponse increases, so too must correlation with the outcome in order to see a reduction in variance.

Adjustments for Aggregators

Adjustments for Aggregators

  • Effect for poll aggregation models
    • Reasonable models with discrete likelihoods can overstate the variance in model parameters.
    • Modeling the variance per-poll directly improves the precision of parameter estimates.
  • Let’s demonstrate this by simulating a campaign!

Adjustments for Aggregators

strata_1 strata_2 group group_mean population p_respond
A 1 A1 97% 25% 5%
A 2 A2 90% 25% 5%
B 1 B1 10% 25% 5%
B 2 B2 3% 25% 5%
  • True population mean: 50%
  • Responses sampled from \(\text{Bernoulli}(\theta_g)\)
  • Simple population weighting strategy: \(w_g = \frac{P_g}{\left(\frac{N_g}{\sum_g N_g} \right)}\)

Adjustments for Aggregators

pollster strategy bias
Pollster 1 cross -0.028
Pollster 2 cross -0.012
Pollster 3 cross 0.078
Pollster 18 single -0.098
Pollster 19 single 0.035
Pollster 20 single -0.024
  • Simulated pollsters have (logit-scale) statistical bias

Adjustments for Aggregators

  • “Cross” strategy: weight on all variables
  • “Single” strategy: weight on strata_2 only
strata_2 strata_mean
1 53.5%
2 46.5%

Adjustments for Aggregators

day pollster sample_size mean err
1 Pollster 2 941 49.5% +/-1.6%
1 Pollster 8 987 48.1% +/-1.5%
1 Pollster 11 863 53.6% +/-3.3%
2 Pollster 11 847 51.7% +/-3.4%
2 Pollster 12 949 54.1% +/-3.1%
2 Pollster 20 948 49.0% +/-3.2%

Adjustments for Aggregators

\[ \begin{align*} \text{Y}_{d,p} &\sim \text{Binomial}(\text{K}_{d,p}, \theta_{d,p}) \\ \text{logit}(\theta_{d,p}) &= \alpha + \beta_d + \beta_p \\ \beta_p &= \eta_p \sigma_\beta \end{align*} \]

  • Estimated true support: \(\text{logit}(\theta_d) = \alpha + \beta_d\)
  • Bias parameters: \(\beta_p\)

Adjustments for Aggregators

Adjustments for Aggregators

Adjustments for Aggregators

  • How can we improve?

\[ \text{Y}_{d,p} \sim \text{Normal}(\theta_{d,p}, \sigma_{d,p}) \]

  • Latent model for \(\text{logit}(\theta_{d,p})\) remains the same
  • Interpretation of \(\beta_p\) remains the same

Adjustments for Aggregators

Adjustments for Aggregators

Adjustments for Aggregators

In Summary

  • Weighting can reduce both bias and variance when the weighting variables are highly correlated with nonresponse and the outcome.
  • The threshold for “highly correlated” with the outcome increases as the correlation with nonresponse increases.
  • This is particularly prescient in discrete outcomes.
  • Aggregators can improve the precision of parameter estimates by modeling the variance of each poll directly.

Reading Material

Stay in Touch