Race as a Control Variable

One way to minimize proxy discrimination in multivariate pricing models is to introduce a “control variable” representing a protected class to the analysis. Birny Birnbaum explained the approach during the CAS Ratemaking, Product and Modeling seminar in March.

First, he described some of the basics of multivariate models used in insurance pricing, starting with an equation predicting the likelihood of an auto claim using three predictive variables:

b₀ + b₁X₁ + b₂X₂ + b₃X₃+ e = y, where y is the outcome being predicted, X values are predictive variables, and b values measure how much each variable contributes to predicting y.

The variables—age, gender and credit score, perhaps—can be tested for statistical significance (reliability in predicting a claim outcome). And by analyzing the predictive variables simultaneously, the model removes the correlation among the predictive variables.

The ability to eliminate correlation and to isolate each individual predictive variable’s unique contribution to an outcome allows insurers to test for biases and distortions, Birnbaum noted. For example, he offered the possibility that state-by-state variations in tort systems, liability limits, or occupation or age mixes could distort a national pricing model.

To remedy the problem, an insurer could introduce “state” as a “control variable,” C1, in a multivariate analysis: b₀ + b₁X₁ + b₂X₂ + b₃X₃ + b₄C₁+ e = y.

While the control variable itself wouldn’t be used when the models ultimately deploy, by analyzing the interplay of state with the other predictive variables, the insurer winds up refining its understanding of the contributions of the other predictors, independent of their correlation to the variable “state.”

Birnbaum recommends the same approach for race—introducing race as a “control variable,” R1 in his illustration—to weed out proxy variables: b₀ + b₁X₁ + b₂X₂ + b₃X₃ + b₄R₁+ e = y.

If one of the predictive variables, say X_1,is a perfect proxy for race, then when race is added as a control variable, X₁loses all of its predictive value. “All it was doing was predicting race, not the outcome,” Birnbaum said. What if, instead, X₁is both predictive and correlates to race? Then “by introducing the race control factor you sharpened the contribution of X₁—made it actually a more accurate predictor of the outcome.”