The Importance of Soft Skills in Data Science

August 30, 2017 by Bret Shroyer

When most insurers think of data science in their industry, naturally it is mostly about numbers and turning abstract data into correlations that become actionable business intelligence. However, there is considerably more to data science than sifting through spreadsheets and building predictive models. It requires not just mathematical skills but human judgment and intelligent decisions at pivotal moments.

Executive Summary

Statistically, it’s been proven that a firm that just had a really big workers comp claim won’t have another for a while. But insurers don’t allow that knowledge to factor into their pricing. Why not? Here, Bret Shroyer of Valen Analytics explains the reasoning as he describes the importance of applying the soft skills of understanding, fairly using and communicating the outputs of big data analysis to underwriting and pricing models.

When looking for an ideal candidate to be in charge of the data that informs large-scale impacts to businesses, it’s important for insurers to identify data scientists that have an even mix of hard (tech) and soft (communication) skillsets.

This was a solid agreement of all panelists (including myself) who spoke on a big data panel at the recent “Super Regional P/C Insurer Conference 2017.” Soft skills are crucial to passing data initiatives through your organization, getting approval from regulators and knowing how to avoid a lack of context around data correlation (which can ruin a model’s performance in production). It is important to always keep data-driven decisions in check in order to develop a sustainable data strategy within a business environment. Below are areas where human judgment is critical in data analysis.

Ethical Implications

A data scientist analyzes data with a combination of different predictive modeling techniques, including general linear modeling (GLM), machine learning, classification trees, and other multivariate and univariate techniques. Regardless of approach, there always will be certain statistically supported variables that should not be used. A good life insurance example is the relationship between a candidate and whether they have relatives who are felons. While it has proven to be a statistically viable metric, it cannot be used within a predictive model because it is ethically unsound.

Correlation alone is not enough of a reason to include a specific variable into a predictive model or any other data project. It is the job of the data scientist to determine the causation of the relationships.
In workers compensation, it has been statistically proven that after a company has had a huge claim against it, that company is far less likely to have another for a long period of time. This is due to a number of factors including that it takes extra precautions and invests in preventative safety measures moving forward. But insurers cannot in good faith reward those who recently suffered a severe claim, and therefore it’s not used to influence rates.

One of the most widespread consumer-facing issues in insurance with regard to data and analytics has been price optimization. Since there is confusion about how these models are constructed, it leads some to believe there are certain unethical practices at play, like unfair weighting of socioeconomic factors in auto insurance rates. Deciding which variables to include or exclude is an important and strategic decision between the data science and business units involved.

Correlation vs. Causation

Data provides companies with a seemingly limitless arsenal of information that can be used to provide key business advantages, but it takes people to empirically decide which of that information is actionable and to be mindful of what will pass in a stringent regulatory environment.
No matter what dataset is being used to build a predictive model, there will always be certain variables that will correlate with each other that simply don’t make sense from a logical or contextual perspective. Correlation alone is not enough of a reason to include a specific variable into a predictive model or any other data project.

It is the job of the data scientist to determine the causation of the relationships. For example, a positive relationship between two variables that has survived the testing process is global temperature and the increase in piracy. This may correlate, but clearly one does not cause the other. This is an obvious example to demonstrate a point, but many aren’t that clear and require meticulous discretion and a lot of “connecting the dots” to determine whether there is causation.

Data Analysis Won’t Get Far Without Understanding

One of the most critical components of soft skillsets in data analysis is simply the ability to explain the variables and recommendations that the model output is showing. Transparency makes it simpler to obtain buy-in from all parties in the organization and explain to regulators how models come to their decisions.

Take machine learning, for example. Machine learning provides deep and accurate predictors and can often maximize model lift, but because data is being continuously fed into the model without human interruption, you forgo transparency and the ability to show how the model arrived at the decisions. There are instances where this tradeoff is completely appropriate and other times when it is problematic. Responsibility falls to the data scientist to know which modeling technique is appropriate to the use case and to provide clear explanations to all stakeholders involved.

Related articles:

Insurance has entered into a new era that requires a new communication strategy with regulators. Regulators are trying to regulate a rapid succession of new products being introduced as the industry makes strides to be more innovative and responsive to modern consumer demands. The experience has gotten convoluted enough that the National Association of Insurance Commissioners has created a task force to try to smooth over the communication between regulators and InsurTech representatives to improve speed to market. Data scientists have an important role to play in making this a simpler process by preemptively explaining to regulators what was and wasn’t included in a data initiative supporting a new product launch or production model.

While data scientists will always be valued for their statistical abilities, it is important to understand that soft skills have immense value to the sustainability of data initiatives. This is particularly important to consider when hiring for the slew of data positions that are coming into the insurance industry. Data provides companies with a seemingly limitless arsenal of information that can be used to provide key business advantages, but it takes people to empirically decide which of that information is actionable and to be mindful of what will pass in a stringent regulatory environment.