Who Do You Trust? When Behavioral Science Fails

Late last week, I was reviewing my Twitter feed and came across a tweet that made no sense to me.

While it’s a common occurrence for me to be baffled by Twitter exchanges, this particular tweet purported to be telling a mathematics joke—and it mentioned Duke University Prof. Dan Ariely, a behavioral scientist known to CM readers for his work related to honesty in insurance claims reporting.

Recalling that Ariely did some early work with InsurTech Lemonade, and that he and Harvard Business School Prof. Max Bazerman have published research that is frequently referenced by executives of Lemonade and Slice Labs, I was curious to find out more about this seemingly cryptic tweet, which suggested that the number of economics graduate students scrutinizing data in Dan Ariely’s research papers “is RANDBETWEEN(50000,1000000).”

The author of the tweet, Adam Davidson, a contributing writer for The New Yorker magazine and author of the book “The Passion Economy,” ended his tweet by flagging his observation as a “Stats joke!” And since I have a degree in mathematics, I went down the Twitter rabbit hole to investigate what in the world this particular “stats joke” was all about.

The Ariely-Bazerman research I was remembering has to do with asking people to sign a document, such as a claims submission or an income tax form, before answering questions and providing details instead of at the bottom of the form. The action of pre-signing an honesty pledge is supposed to reduce instances of fraud.

Individually, the two researchers have been cited by InsurTechs on their websites and in publications like Carrier Management and Insurance Journal to support this conclusion, and for related work that scientists have done on behaviors like submitting claims by video—again found to promote honesty.

So, what does the tweet mean?

After some investigation, I found out that RANDBETWEEN(50000,1000000) refers to an Excel function that returns a random number between 50,000 and 1,000,000. But I still wasn’t getting the joke.

As it turns out, the tweet that I happened to stumble upon was directly related to the body of work by behavioral scientists that supports pre-signing of honesty declarations. And Davidson’s tweet is a tiny pebble in an avalanche of tweets and articles that refer to an investigative report alleging that Ariely and other honesty researchers themselves needed to pre-sign an honesty pledge before publishing their research.

In short, there are questions swirling most directly around Ariely suggesting that a man who wrote a book on dishonesty has lied.

A Fraud Discovery

On Aug. 17, a trio of behavioral scientists who are committed to analyzing, replicating and discussing the work of others in the field, wrote in a post on their blog, Data Colada, alleging that the data in a 2012 research paper published jointly by Ariely, Bazerman and three other researchers (I’ll refer to them collectively as “honesty researchers”) contained fake insurance company data—generated with the help of this random number function.

The details in the blog post “[98] Evidence of Fraud in an Influential Field Experiment About Dishonesty,” written by Uri Simonsohn of the Esade Business School in Spain, Leif Nelson of UC Berkeley and Joe Simmons of the University of Pennsylvania (who were helped by some anonymous researchers) are a fascinating read for math types. I won’t review them all here. One form of data manipulation involves duplicating a set of insurance data records—odometer records reported by customers by using the random number generator to make the last three digits of readings look less like duplicates. There’s also an interesting bit about the duplicated numbers being presented with a different font style on an Excel spreadsheet, and another discussion about the lack of randomness in the reported readings (which would be expected to graphically conform to a bell-shaped distribution rather than a uniform one).

Putting the math aside and cutting to the chase here, the honesty researchers all agree that the Data Colada findings are accurate. Data was manipulated. They also all agree that Ariely was the only one of them who had direct access to the spreadsheet with miles-driven data. But that doesn’t mean he manipulated the data, Ariely points out, blaming an unnamed Southeastern insurance company for giving him bad data. (Editor’s Note: The company is an incumbent, not Lemonade, since this was in 2011-2012, before Lemonade was formed and the InsurTech movement really picked up steam.)

The Backstory

There’s more to the backstory here that I was able to piece together from the research papers, also relying heavily on the honesty researchers’ accounts of events in responses to the Data Colada allegations linked to the bottom of the blog item and on a thread of 16 tweets from one of the researchers, Lisa Shu, executive director of the Newton Venture Program.

According to Shu, she and Bazerman and Harvard Business School Prof. Francesca Gino conducted some lab experiments that had nothing to do with insurance back in 2010 when Shu was a Northwestern University grad student. They independently came to the same conclusion as Ariely and Prof. Nina Mazar (then at the University of Toronto and now a marketing professor at Boston University) about honestly declarations. At a conference, they heard Ariely describe his results from the insurance field experiment, and the two sets of researchers decided to link up to publish one joint research paper with all their findings in 2012. (“Signing at the beginning makes ethics salient and decreases dishonest self-reports in comparison to signing at the end,” Proceedings of the National Academy of Sciences, Sept. 18, 2012)

The study was edited by Princeton University behavioral economist Daniel Kahnemann, adding the name of a Nobel Prize winner into the mix.

“Did one team race to publish in order to scoop the other? NO: both teams thought they found the yin to their yang. A merger was born: Shu-Gino-Bazerman’s Studies 1 and 2 from the lab + Ariely-Mazar’s Study 3 from the field—which appeared be perfect complements,” wrote Shu in a portion of her recent Twitter thread recounting the events.

Over a year ago, the honesty researchers themselves, who along with two others (Kristal and Whillans) tried unsuccessfully to replicate the earlier results, wrote a new report and an article in Scientific American stating that their original conclusions were incorrect. (“Signing at the beginning versus at the end does not decrease dishonesty,” PNAS, March 31,2020; “When We’re Wrong, It’s Our Responsibility as Scientists to Say So,” Scientific American, March 21, 2020)

“Seven years and hundreds of citations and media mentions later, we want to update the record. Based on research we recently conducted—with a larger number of people—we found abundant evidence that signing a veracity statement at the beginning of a form does not increase honesty compared to signing at the end,” they wrote.

That’s a pretty bold admission. But it stopped short of saying that anyone made up any data.

In a statement linked to the Data Colada analysis dated Aug. 16, Ariely confirmed that he was the only author in contact with the insurance company, also stating:

“The data were collected, entered, merged and anonymized by the [insurance] company and then sent to me [and] I was not involved in the data collection, data entry, or merging data with information from the insurance database for privacy reasons.”

Further, Ariely’s statement said: “I did not suspect any problems with the data” and “I also did not test the data for irregularities, which after this painful lesson, I will start doing regularly.”

In their 2020 writing reversing their prior conclusion, the researchers explained their failure in trying to the recreate Shu, Gino and Bazerman’s lab experiment with 20-times more participants. They also said that the findings of the insurance company field study were now suspect because, upon further analysis, top-of-form signers and bottom-of-form signers in the original study no longer appear to have been randomly assigned—or that randomization “may have even failed to occur as instructed.” (Bottom-signers had 15,000 more miles, on average, to begin with.)

“What we originally thought was a reporting difference (between customers who signed at the top versus bottom of the form) now seems more likely to be a difference in actual driving behavior—not the honest or dishonest reporting of it,” they wrote in Scientific American.

Why This Matters to P/C Insurers

Why am I sharing all this with Carrier Management readers?

It’s not to disparage Ariely or to throw shade on the field of behavioral science. I have always been fascinated by articles about behavioral science, and while I understand that the investigation of this particular report was borne out of a larger probe of the field of study itself, I haven’t read enough to draw negative conclusions about that. Clearly, however, that part matters to insurers and InsurTechs who are reshaping business practices to incorporate some of its key ideas.

While there is lots of commentary on social media throwing Ariely under the bus, with many asking the same question that first came to my head—Why would an insurance company make up data to support his conclusions?—beyond reading the original paper, I haven’t delved into this enough to understand exactly how the study was conducted. (And I have no particular opinion about Data Colada’s Footnote 14 about the swapping of labels on spreadsheet columns for those readers who are really digging into the details.) Did he request a certain number of records to guarantee a statistically significant result? Who at the insurance company managed the data project? What motivation did the insurance company have to take part in this particular research?

(Any readers who worked at the unnamed Southeastern insurance company are invited to fill in the blanks in the comment section of this article or by contacting me directly.)

I have, however, worked at—and with—enough insurance companies during my prior career as an actuary and consultant to know that bad data happens—on a regular basis.

So, what motivated me to share some of the details of this academic saga was not the possible existence of a fraudulent researcher who signed the report or a sloppy data entry professional at the bottom of the chain. What most upsets me personally are worries about the people in the middle—in this case, the other researchers who didn’t quite do enough to prevent fraudulent research from seeing the light of day—because we all know that this isn’t a problem isolated to academia.

When we read the responses of the other researchers, we get a pretty clear picture of how fake news, fake research, inaccurate rate filings, optimistic loss reserve analyses and incorrect company valuations proliferate. Below are some excerpts.

Prof. Shu in a 16-part Twitter thread:

Did we request to scrutinize each other’s datasets at the time? NO …

We began our collaboration from a place of assumed trust—rather than earned trust. Lesson learned. ….

I am delighted that this story has come to light and the scientific record is being corrected. Science is an evolving conversation. We must continually test and retest to confirm or disprove what we think we know and find the limits of what we don’t know.

Prof. Bazerman in his written response to Data Colada:

[In 2011, on] initial reading, I thought I saw a problem with implausible data in Study 3 [auto study]. I raised the issue with a coauthor and was assured the data was accurate. I continued to ask questions because I was not convinced by the initial responses. When I eventually met another coauthor responsible for this portion of the work at a conference, I was provided more plausible explanations and felt more confidence in the underlying data. I would note that this coauthor quickly showed me the data file on a laptop; I did not nor did I have others examine the data more carefully….

Shu and I were the only two of the original five authors explicitly in favor of retraction, and lacking a majority we did not retract the 2012 paper. I now believe I should have independently taken action to push for retraction even without a majority of co-authors in favor of such action.

In sum, I wish I had worked harder to identify the data were fraudulent, to ensure rigorous research in a collaborative context, and to promptly retract the 2012 paper. While I had doubts and raised questions, I believed the responses I received. We reported our failure to replicate the 2012 finding, but I should have argued more forcefully to retract the paper sooner….

Prof. Mazar in her written response to Data Colada:

I recognize now that the severity of the issues discussed in the 2012 and 2020 papers were only the tip of the iceberg. In retrospect, we all should have done a better job of scrutinizing the data before the first submission to PNAS in June 2012. This whole situation has reinforced the importance of having an explicit team contract, that clearly establishes roles, responsibilities, and processes, and of properly sharing and archiving the digital trail of the work done. …

Francesca Gino in her written response for Data Colada:

Being notified about this issue was exceedingly difficult. I start all my research collaborations from a place of trust and assume that all of my co-authors provide data collected with proper care and due diligence, and that they are presented with accuracy. … I did not have any suspicion about the quality of the data at the time we published the paper. …

In July 2020, PNAS reached out to us after a reader asked whether we intended to retract the 2012 PNAS paper. I regret not taking a stronger stance in support of that decision. My logic was the following: unless we believed the data were problematic, the proper course of action was, in my mind, to leave the paper on record and demonstrate that the results failed to replicate. Science moves forward only when we correct the record. …

I will approach all future projects with more diligence and attention, no matter how much respect and trust I have for the people I work with.

In your daily job, you probably have experienced something like this too. There are always folks in the middle who notice something is wrong.

How often do you face situations where you come across data or information that doesn’t add up? Are you asking enough questions? Are you blindly trusting superiors? Are you satisfied with the answers? It is tempting to accept a weak reason to satisfy a customer, your manager or the CEO.

In my two worlds of insurance and journalism, I know firsthand about the pressures to meet a deadline, satisfy the client, secure the company bonuses, sign the clean statement of opinion.

For a prior publication, in the wake of the Enron debacle, I wrote a piece about rating agencies who said they relied on inaccurate information from auditors and the everyday pressures that analysts, auditors, actuaries face to look the other way when they aren’t coming to an answer that they or their clients want to hear. “In this business, it’s easy to trade the risks that unseen policyholders, employees and other stakeholders might get hurt at some time in the future for today’s rewards,” I wrote at the time. (“The Pressure to Deceive Confessions Of A Former Auditor,” National Underwriter/PC360.com, April 29, 2002, registration required)

You don’t have to be an actuary to come up with your own examples.

In the situation I described today, the rewards were academic celebrity and the validation of a hypothesis the honesty researchers were sure was correct.

When you find yourself in a similar position in the days and months ahead, remember your reactions to reading what the honesty researchers said in their defense. Did they do enough?

Write down your explanations for waving away your own questions about whatever it is you’re involved in. Do they sound more convincing?

In the bigger picture, many of us, for our own health and safety—and for the good of the environment—have been questioning the role of science in helping us to make risk-based decisions.

“We have worked on enough fraud cases in the last decade to know that scientific fraud is more common than is convenient to believe, and that it does not happen only on the periphery of science,” the Data Colada bloggers wrote.

Who can we trust if not some of the world’s greatest minds?