Challenge Accepted: Sorting, Scaling and Applying Web Data for Insurance

Big data is back. Once the darling buzzword of innovation, big data’s buzzword status had been eclipsed by a trendier focus on forces of “disruption” and a passion for everything “digital.”

The reason for big data’s resurgence as a foundation of innovation is the arrival of tools enabling the industry to more intelligently harness the vast pools of data that at first appeared so promising, only to drown the early users in volume before they could sift through and find insights.

Executive Summary

“Half the money I spend on advertising is wasted; the trouble is I don’t know which half.” The old saw about marketing costs could easily apply to insurance claims departments and the money they spend on investigation—specifically the money and time spent on the task of combing through web data to find relevant information about claimants. In the accompanying article, Patrick Sullivan of Carpe Data describes tools that searched 200 million URLs potentially related to two million claimants in 2018.

One example of new tools harnessing the previously frustrating power of big data is found in modern management of web data. We have spent the last several years honing a tool focused on utilizing technological solutions to identify claimants as they traverse the online world, and from that dataset, identify whether there is online behavior that is relevant to a claim.

Many insurers are already working with web data, though often manually and focused only on a small subset of claims incurred by a carrier. Scaling up what is undoubtedly a time-consuming and resource-intensive method is next to impossible if it is to be done manually. And if a company wants to move from a subset of claimants to the entire list, it would take an army of online investigators.

Consider three of the big challenges to managing web data:

First is correctly identifying that an online data point is indeed connected to the individual in question. There may be a person with the same name and age as a claimant, but is this person who posts about their prowess as a weightlifter really the claimant? Confirming this in a reliable and automated fashion was among the biggest hurdles in using web data in the past and is now being overcome.
A second challenge is to determine if that data point is relevant to a claim. It is possible to find a claimant online and identify a data point about their behavior. But does it matter to the claim? In theory, an investigator can look over each data point and make a determination of value. But in practice, the volume of information is just too great. The filtering must be automated in some fashion to turn data into usable information.
A third challenge is repeating this process on a regular basis. Web data is highly dynamic, with the likelihood of continuous activity. Just because there is nothing relevant on a Tuesday doesn’t mean that there will not be something on a Thursday.

Automating the process to overcome these challenges isn’t easy, which is why it was not done in the early days of big data. But today automation tools are being deployed with measurable results. Between January and November of 2018, Carpe Data’s Claims Activity tool has analyzed over 200 million URLs related to two million claimants that have been submitted by insurers.

“In theory, an investigator can look over each data point and make a determination of value. But in practice, the volume of information is just too great.”

A key component of the tool is Identity Resolution technology, which algorithmically determines how likely a given URL is to be connected to the individual in question. We make this determination through a multidisciplinary approach, utilizing rulesets, natural language processing and recursive searching to correctly identify an individual’s web presence. Recursive searching allows us to utilize data points that were not provided but were discovered during a search to expand the search parameters and provide higher levels of match accuracy. From those original URLs, Identity Resolution has narrowed the volume to approximately three million unique URLs that fall into the 90 percent and higher range of likelihood to be connected to the claimants submitted.

Once the data is refined to a usable volume connected with the claim, the next task is to determine if any of those URLs have something relevant to a claim or if they are simply noise. As one claims leader said, making a small tweak to an old advertising mantra, “I’m wasting half of my investigation dollars, but I just don’t know which half.”

From January to November of this year, approximately 5 percent of all URLs identified as having high or very high likelihood of being connected to the claimant in question actually had something the claims activity tool found to be relevant to the claim. Sometimes it confirmed the claim—just as valuable as finding fraud, because it enables the insurer to pay quickly and with confidence—and sometimes the findings were contradictory.

By limiting users’ exposure to the noisier parts of the data, new tools bring the power of big data to bear by pushing some of the decision-making into the hands of technology, which can sift through vast amounts of information and highlight the most useful parts.

By utilizing a host of technologies, but most critically rulesets that look for specific activity, the tool is able to highlight behaviors that tend to impact a claim, specifically injury claims. Details about the claimant’s travels, events in which they take part, pain they report, even specific information about a claim are some of the ways data is flagged in the system.

According to Carpe Data:

Searching 200 million URLs for online data would require over 660,000 human hours.
The cost of paying adjusters for over 660,000 hours of manual searching would be over $20 million.

Users are able to spend less time gathering raw data and assessing its usefulness and more time making decisions—a place where experienced adjusters, investigators, claim handlers and SIU departments provide their highest value. The goal of using big data in this case is to provide the user with as much specific and useful data (rather than a vat of unfiltered information) as quickly and as early as possible to allow the professionals to make a decision based on the totality of that information.

New tools also solve the mismatch between the nature of web data and how it was initially analyzed. Initial searches of web data were static, grabbing a picture of one point in time. But this is disconnected from the very nature of web data, which is the opposite of static and is dynamically changing with every passing moment. The power of technology-based solutions allows for a continual refreshing of the data—an impossibility with manual solutions.

Only by utilizing these new tools is the power of web data, perhaps the biggest set of big data, truly unlocked. As time goes on and technology improves, insurers will continue to gain access to noisy datasets such as those found on the web. (For example, insurers are already struggling with the volume of data flowing from vehicles.) The key to successfully harnessing these new datasets is identifying when technology has reached a point that assessment of data is best handled by machines. For web data, at least, that day has come.

Topics InsurTech Claims Tech Data Driven