One way to minimize proxy discrimination in multivariate pricing models is to introduce a “control variable” representing a protected class to the analysis. Birny Birnbaum explained the approach during the CAS Ratemaking, Product and Modeling seminar in March.

First, he described some of the basics of multivariate models used in insurance pricing, starting with an equation predicting the likelihood of an auto claim using three predictive variables:

b0 + b1X1 + b2X2 + b3X3+ e = y, where y is the outcome being predicted, X values are predictive variables, and b values measure how much each variable contributes to predicting y.

The variables—age, gender and credit score, perhaps—can be tested for statistical significance (reliability in predicting a claim outcome). And by analyzing the predictive variables simultaneously, the model removes the correlation among the predictive variables.

The ability to eliminate correlation and to isolate each individual predictive variable’s unique contribution to an outcome allows insurers to test for biases and distortions, Birnbaum noted. For example, he offered the possibility that state-by-state variations in tort systems, liability limits, or occupation or age mixes could distort a national pricing model.

To remedy the problem, an insurer could introduce “state” as a “control variable,” C1, in a multivariate analysis: b0 + b1X1 + b2X2 + b3X3 + b4C1+ e = y.

While the control variable itself wouldn’t be used when the models ultimately deploy, by analyzing the interplay of state with the other predictive variables, the insurer winds up refining its understanding of the contributions of the other predictors, independent of their correlation to the variable “state.”

Birnbaum recommends the same approach for race—introducing race as a “control variable,” R1 in his illustration—to weed out proxy variables: b0 + b1X1 + b2X2 + b3X3 + b4R1+ e = y.

If one of the predictive variables, say X1, is a perfect proxy for race, then when race is added as a control variable, X1 loses all of its predictive value. “All it was doing was predicting race, not the outcome,” Birnbaum said. What if, instead, X1 is both predictive and correlates to race? Then “by introducing the race control factor you sharpened the contribution of X1—made it actually a more accurate predictor of the outcome.”