Improvements in Predictive Analytics Help With Early Identification of Workers Comp Reinsurance Claims

The early identification of claims likely to pierce reinsurance retention levels has long been a challenge for primary insurers and reinsurers. The good news is that over the past decade or so, the field of claim analytics has moved from performing forensic work on closed claims to analytics that can identify at 60 days from the date of injury (or sooner) claims with a high likelihood of exceeding a retention level.

Executive Summary

Developments in predictive analytics are helping early identification of claims that are likely to pierce workers compensation reinsurance layers, write Philip S. Borba and Lori Julga of Milliman.

The analytics are not limited to identifying claims piercing the excess layer but look to identify claims likely to get within 50 percent or even 30 percent of the retention limit.

While an excess loss is obvious for many catastrophic claims (e.g., serious burns, certain amputations for young workers), for many excess loss claims, the buildup to the attachment point is less obvious due to the subtleties of compounding factors. These factors may include subtle combinations in the demographics and medical experience that are not easily noticed.

In addition, claims are often managed by several specialists within and outside claims administration operations (e.g., adjusters, case managers, medical management specialists), which can understate the size of potential losses. Complicating the early identification is that information concerning costly medical visits typically is not available from the medical bill review process until several weeks after treatment.

A significant challenge with early identification analytics for claims that have not reached an excess loss attachment point is that the administration of the claim is often handled by several specialists without any single participant noticing the aggregation of costly factors. For a simple lower back injury, the claim adjuster may notice the nature and cause of the injury and that there was a call from an attorney. The case manager may recommend additional laboratory tests and surgery, while the medical examiner may report an obesity problem with hypertension. But the aggregation of these factors is not compiled, which may be due to a failure to recognize compounding effects or the lack of easily accessible data, particularly unstructured data.

New Analytical Tools

Developments over the past few years in predictive analytics are providing opportunities to improve the early identification of claims with a high likelihood of piercing workers compensation reinsurance layers.

One development is the use of “machine learning” software that extends the principles of conventional multivariate analyses. New statistical tools enable testing a much larger number of model specifications in a significantly shorter turnaround time than the conventional multivariate analyses, such as multiple analysis of variance, multiple regression or general linear model (GLM) analyses.

In contrast to the conventional analyses, the recently developed analytic methods are not limited to linear relationships. These tools can include claims where data is not available for all characteristics, as well as characteristics where data is not available for all claims—the pernicious “incomplete” and “missing” data problems.

The analytical tools use machine learning to analyze hundreds of characteristics and correlations. Larger computing capacity, faster computing speed and new software are allowing for the testing of hundreds (if not thousands) of combinations of variables in a single execution. More combinations of demographics (e.g., age, worker classification), injury characteristics (e.g., body part, nature of injury, cause of accident) and medical experience (e.g., laboratory tests, surgery, pharmaceutical products), among other characteristics, can be tested today than would have been the case a few years ago.

A second development—one that is just as important in early identification analytics as the machine learning software—is the extraction of text information from claim adjusters’ notes, nurse case manager reports and medical reports.

The following points show areas where text information is benefiting claim predictive analytics:

1. Updated or more detailed information can be found in text data that is not in structured data. This includes information that has been overlooked or changed since the structured data was gathered, new concepts for the cause of or conditions associated with the injury, or injured parties averse to reporting some information through direct reporting methods. There are situations where the structured data does not have the capacity to capture certain characteristics of excess loss claims, such as injuries to multiple body parts, injuries involving multiple causes and circumstances connected to assignment of liability.

2. There is useful information that has not reached structured data, such as payments for medical treatments or claimant attorneys. The time needed for medical treatments to become structured data (due to the time needed for bill processing and payment of medical bills) limits the usefulness of medical payments in the early identification analytics. References to surgery, laboratory tests, use of pharmaceutical products and other medical treatments several weeks before these treatments are observed in the medical bill payments is beneficial in predicting the claim’s value.

Also, subtleties such as use of pain medications, alternative medical treatments and long-term medical plans (particularly for pharmaceutical products) can be teased out of the data. Information on attorney involvement is often noted in an adjuster’s notes long before a settlement or pro-rata payment that includes a payment to an attorney.

3. In call logs, adjusters’ notes, nurse case manager notes and medical reports, there often is information on comorbidities and an individual’s condition that typically are not captured in workers comp claims structured data. This includes information on hypertension, smoking, obesity, diabetes, cancer and other comorbidity conditions, as well as information on physical restrictions, chronic pain and home healthcare services.

In sum, the shift in interest over excess loss claims from forensic (closed claim) studies to early identification (e.g., at 60 days from the date of injury) has changed dramatically due to analytics. Today, the advances with machine learning software and text mining algorithms are necessary tools for the early identification of claims most likely to become excess loss claims.