What Researchers Are Thinking When You Tell Us About Some Success You’ve Had

Jeffrey Butts
June 6, 2024

You’ve probably seen researchers roll their eyes when you tell them about a success you had in the public safety field. We don’t mean to be disrespectful — well, ok, some do, but mainly we’re reacting to how often we hear unproven claims of effectiveness.

When people really believe in what they do for their community, they can be fooled into embracing data analyses that a trained researcher would see as sketchy. It can be tricky to tell bogus research from credible evaluations. That’s why false findings have such long shelf lives.

I made up some examples to show you what I mean.

When We Hear:
Crime (or violence) is down in our city (or our neighborhood), so we’re clearly doing the right things.
We’re Thinking:
Sure, but you do realize it might be down in other places too, right? Have you looked to see if your experience is unique or different from your neighbors? It might be a general trend and have nothing to do with your program or policy. You should check.

When We Hear:
We saw a statistically significant improvement after starting our program/policy.
We’re Thinking:
Um, ok. But, what was the actual difference? Based on how many observations? Even a very small difference can be statistically significant if your sample is large enough.

When We Hear:
Crime dropped by 20 percent after we started our program.
We’re Thinking:
Really? And it never declined by that much before? Crime rates go up and down. You need to see a change that’s unusual or different to claim some connection to your program.

When We Hear:
Recidivism was lower among our participants.
We’re Thinking:
Gee, that’s great, but recidivism is not a great outcome on which to base your entire argument. Behaviors only turn into recidivism with sustained interactions between people, communities, and bureaucracies like the police. It’s relevant, but it’s not a perfect outcome or indicator of program effectiveness.

When We Hear:
Our program graduates had 20 percent fewer arrests than the state average.
We’re Thinking:
Oh wow, but who does the state average include? Can you create matching groups? And, how do people get involved in your program? Is it voluntary? Do some fail or quit before completing? We would call that sample attrition. Maybe those who stick with the program were inherently more likely to avoid arrest due to other factors, and that’s why your program graduates have fewer arrests, not because of something the program does.

When We Hear:
We launched the program in 10 neighborhoods of the city. Those areas saw significant improvements compared with matched comparison areas.
We’re Thinking:
Great! That’s a very promising finding. But, there are other things to consider. Did you see similar changes in every treatment site, or did outcomes vary? Did you measure each site’s fidelity to the program plan? If the program was responsible for improved outcomes, researchers need to know how and why it worked. Other communities can’t replicate your success just by giving a program the same name and embracing a few general principles. They’ll need details!

And, one of the worst…

When We Hear:
We did a Randomized Controlled Trial! That’s the Gold Standard! By the time those randomly assigned to treatment completed our program, their outcomes were 30 percent better than the control group.
We’re Thinking:
Those who… What…? Are you reporting outcomes only for treatment group members who finished the program? How many were randomly selected for treatment but never engaged in it sufficiently? How many started but never completed? Did you publish the results for the entire treatment group before examining participant selection and attrition? No? OK, so you really can’t call these results an RCT. Sure, most people will believe you, but trained researchers will see the problems. Someone might decide to expose them in the future. Just for fun.

Researchers: Do you have other favorites? Please add them in the comments.