People want to believe they have good instincts, but when it comes to hiring, they can’t best a computer. Hiring managers select worse job candidates than the ones recommended by an algorithm, new research from the National Bureau of Economic Research finds.

Looking across 15 companies and more than 300,000 hires in low-skill service-sector jobs, such as data entry and call center work, NBER researchers compared the tenure of employees who had been hired based on the algorithmic recommendations of a job test with that of people who’d been picked by a human. The test asked a variety of questions about technical skills, personality, cognitive skills, and fit for the job. The applicant’s answers were run through an algorithm, which then spat out a recommendation: Green for high-potential candidates, yellow for moderate potential, and red for the lowest-rated.

First, the researchers proved that the algorithm works, confirming what previous studies have found. On average, greens stayed at the job 12 days longer than yellows, who stayed 17 days longer than reds. The median duration of employees in these jobs isn’t very long to begin with, about three months. “That’s still a big deal, on average, when you’re hiring tens of thousands of people,” said researcher Mitchell Hoffman, an assistant professor of strategic management, calling the extra few weeks the algorithm bought a “modest or significant improvement.”

Often hiring managers, possibly because of overconfidence or bias, don’t listen to the algorithm. Those cases, it turns out, lead to worse hires. When, for example, recruiters hired a yellow from an applicant pool instead of available greens, who were then hired at a later date to fill other open positions, those greens stayed at the jobs about 8 percent longer, the researchers found. The more managers deviated from the testing recommendations, the less likely candidates were to stick around.

Recruiters might argue that they make these exceptions to hire more productive people, even though they don’t stay as long at the job. The numbers suggest otherwise. For six of the 15 companies, the researchers measured productivity, such as the number of calls completed per hour, amount of data entered per hour, or number of standardized tests graded per hour. The exceptions to the algorithm did no better than their peers. “There is no statistical evidence that the exceptions are doing better in this other dimension,” said researcher Danielle Li, an assistant professor of entrepreneurship at Harvard Business School. In some cases, she said, the exceptions did worse.

While hiring algorithms have started to gain popularity as a way to reduce hiring and turnover costs, finding employees who fit better within companies, there’s still an a tendency to trust one’s gut over a machine. One study dubbed the phenomenon “algorithm aversion.” People can be blinded by bias, however, especially when it comes to hiring. Some hiring managers gravitate to people like themselves; others are just overconfident in their abilities to predict success. “It’s human nature to think that some of that information you’re learning in an interview is valuable,” added Lee. “Is it more valuable than the information in the test? In a lot of cases, the answer is no.”