On "Liberating Evaluation from the Academy"
I like Richard Hahn's argument that we should put the tools of causal research in more hands, especially in the hands of public servants
In January, a few different people forwarded me the same social science paper with a note to the effect of "you should read this." I'm not known for my penchant to read academic papers of any kind, but it seemed these people knew something I didn't. They were right. I devoured it.
The paper was "Cause, Effect, and the Structure of the Social World" by University of Virginia professor Megan Stevenson. It's controversial because she looks across social science research in the criminal justice space, and concludes that, if measured by the bar of random control trials, basically none of the interventions measured really works. But it's not a bleak paper. It's a call to reconsider how we think about and measure social change. And it deeply resonates with the work of integrated policy-delivery teams in UK government who are calling out that we can't keep pretending we live in a complicated world, where mechanistic thinking can solve problems as long as the solutions are sufficiently complicated. We have to accept that we live in a complex world, where we'll only know if something works by trying it out in real life situations and seeing if it works. That has HUGE implications for policymaking... and of course delivery.
I expanded on those thoughts in an essay, as part of a collection of essays, all responding to Stevenson, published by the Niskanen Center (where I am a senior fellow) and Vital City. My contribution is here, but the one I want to comment on is Richard Hahn’s. Hahn1 is a senior policy analyst at Niskanen, and his take is that the correct response to Stevenson’s claims is to free the tools of evaluation of policy from academia and demystify and democratize their use, especially by practitioners in government. “The solution is not to abandon the economic tools that make causal research possible,” Hahn writes, “but instead to put them in more hands and apply them to smaller problems — a democratization of knowledge generation that could itself be a big social change.” His argument feels akin to three points that tie into my work (and that I touched on in Recoding America): the ways highly specialized domains lose the forest for the trees, the outsized returns from building internal capacity within government and upskilling public servants, and the value of tight build-measure-learn cycles.
Hahn starts with a well-stated caveat about evaluation in general, which I’m sharing here mostly because the metaphors are so good that I intend to borrow them and you might want to too.
Making a causal claim — the central goal of economic evaluation — means isolating the measurable effect of a certain cause by minimizing the chance of alternative explanations. Therein lies the majesty of economics. Correctly applied, economic techniques of measurement — econometrics — paint highly accurate portraits of the relationships between specific causes and effects. But the paintings are necessarily miniatures. Expanding their canvases to comprehend the breadth of human nature leaves the pictures patchy and pixelated.
Econometrics helps us get to the truths of very small matters. Any attempt to apply these truths to very large matters requires a strong draught of assumptions. But we need not drink this brew — at least not all of us.
With that note of caution out of the way, he dives into a claim about academia’s hold on evaluation that mirror my complaints about the risk-averse legalism in government culture – that by abdicating the right to interpret law and policy to specialists, we’ve ended up with “small, cramped, professionalized” thinking (to quote Ezra Klein). In Recoding America, I tell the story of a service designer in a federal agency who, when faced with needlessly complex requirements born of fussy, overly literal interpretations of law, says “I get that it’s complicated. But it needs to make sense to a person.” (And she has to repeat that daily in order to tame the requirements and be able to build a service that’s usable by real people.) In insisting on this principle, this service designer is claiming a right that law scholar Larry Kramer defends under what he calls “popular constitutionalism,” explaining that courts were never considered the final word on interpretation of the Constitution until the late twentieth century, and that the public was supposed to have a meaningful say about what laws meant, independent of legal interpretation. Kahn suggests something like a more popular right to interpret not law, but data.
…most good scholars are transparent about their assumptions and uncertainty. It causes harm only when we rely almost exclusively on academics to identify and test policy solutions. In my experience as a program evaluator and policy analyst for governments at all levels, I have seen that the econometric tools now nearly monopolized by academic social scientists might be properly liberated and shared with the very people who make and implement social policies.
Perhaps freeing legal interpretation from the exclusive reach of lawyers and other policy specialists is easier: all you have to know how to do is read, and to follow the thread when one law or policy points back to another.2 Kahn is pointing at learning new technical skills, but he sees how well-positioned non-academic public servants might be to do this well:
In some ways, the people on the ground are actually at an advantage. Academics who evaluate policies from the outside have to make strong assumptions about how policies are implemented and how data are collected, but practitioners don’t have to make the same leaps, because they are doing the work. Practitioner-led research is rarely perfect, or even publishable, but the very act of questioning and systematically measuring the outcomes of policies is itself a huge step forward.
I would add to this argument that while practitioner-led research is rarely perfect or publishable, it will be faster – much faster. And there is enormous value in this speed. Not going through the long peer review and publishing cycle means the benefits of learnings are available to act on immediately. In the tech world, we often improve communications and forms through A/B tests. We send out multiple versions of an email trying to get students to enroll in advantageous loan repayment programs for example, to a fraction of your universe, and measure the response rates. The version or versions that get the best responses are then mailed to the remainder of the list – usually later that week. This goes terribly awry when someone in a compliance role mistakes this practice for something requiring academic rigor, and prevents the team from acting on the results until they’ve gone through some imagined form of review, including ethics review. If you know which email does better, it’s unethical not to send the higher-performing version soon thereafter. The email content will be stale and the results invalid by the time the review process concludes, and in the meantime, more people will suffer from lack of access to the program. The kinds of research Hahn is talking about are more sophisticated than an A/B test, I presume, but the value in keeping your measure-learn-act cycle fast and tight applies.
Hahn doesn’t directly address how to bring the skills needed for policy evaluation to a broader set of public servants, though he does refer to them at one point as “basic math,” suggesting that it shouldn’t be all that hard. I love the way he demystifies something that should be broadly accessible, but he may be underestimating the effort of uptake of practices: government job descriptions and lists of duties can be quite specific, and adopting new approaches often relies on personal motivation rather than structural incentives. But what he’s suggesting fits neatly with calls to stop infantilizing government by outsourcing expertise, knowledge, and tasks that should be considered “inherently governmental.” Increasing policy teams’ ability to measure outcomes isn’t a matter of saving money on outsourced consultants, it’s a way to speed learning and improve outcomes where the social benefits will dwarf any budget-driven cost/benefit analysis.
The irony of all this discussion of measurement is how hard it is to measure the returns on investing in public servants (and the policy advocates who work with them from outside government.) But I’m here for any voice that calls for that investment. When I testified in front of the Homeland Security and Government Affairs Committee of the Senate a few months ago, one press outlet summarized the message from me and my co-panelists as “Capacity, capacity, capacity.” I’ll keep saying that, and elevating others who say the same.
According to my own personal style guide, I refer to people by their first names if I know them well enough that it feels weird not to. Though Mr Hahn works at Niskanen, I don’t know him, though I may have been in a large meeting with him the other day. So we’re not on a first name basis.
As I proofread this sentence, it sounds a bit insulting to lawyers. That’s not how I mean it. I rely on lawyers for all sorts of things, and I respect and value their expertise. “Some of my best friends are lawyers” — well that’s not going to get me off the hook! What I mean is that law is indeed open to interpretation, all the time, as evidenced by how different lawyers in different agencies interpret the same statutes. As a public servant, it often feels like you’re not so much subject to the law as you are subject to the whims of competing teams of counsel in various parts of your agency and across other agencies and offices with a stake in your work. They interpret law and policy differently because they infer different goals — if the goal is zero risk of criticism or lawsuit, or the technically most accurate interpretation, you will get a different answer than if the goal is to enable services that makes sense to people and get the desired outcome. In that sense, I am encouraging non-lawyers to also read the law and ask whether a different interpretation might better serve a goal if that non-lawyer thinks that goal is important. And I am forever grateful to and inspired by lawyers who see the bigger picture of what’s at stake and work towards those higher goals.
Great points! As the cost of passing information has dropped to zero, the desire and ability to provide input into decision making has increased exponentially, driving the costs of doing things through the roof as the level of engagement has increased.