Before we get started, I want to mention that this follow up to Emmerich’s post was composed prior to having seen what he wrote. I suggest reading his post before before mine, so to check our Part 1, click here or just scroll down for a bit).
Because I manage a long term randomized evaluation (RE), the debate about the merits of this type of study is close to my heart, and I enjoy reading eminent economists and development experts go back and forth attacking and defending their value. It is fun to get caught up in intellectual disputes where the adversaries have ground to stand on. (some might say that I am making too much of the disagreement, but it seems to me that Dani Rodrik and Abhijit Banerjee and Esther Duflo have strongly contrasting views).
But when I take a step back from the discussion, it all seems quite premature. It has only been in the last ten years that a large number of randomized evaluations on development programs have been conducted. And among those, only a handful that have statistical power and major policy implications. The impact of randomized evaluations will take years to understand, both in how greatly the results from these studies will affect policy and whether these results can lead to a new paradigm in how we think about development.
I like to use the analogy of international economic development as a large corporation like the old RJR Nabisco (a cigarette and snack company). Randomized evaluation is a new product that the company is putting a fair amount of investment in, but understands that it is a calculated risk. It might turn out to be a grand failure like RJR’s smokeless cigarette (a grand failure) or it may well turn out to be the Animal Cracker (a massive success). I tend to believe that randomized evaluations will turn out to be quite fruitful in developing and promoting effective programs, but either way, the intellectual foundation for conducting these types of evaluations is sound, so it seems a sensible risk.
As I have discussed on this blog, my personal hope for randomized evaluations is that they demonstrate the importance of public funding for public goods, rather than the wasteful attempts to create “sustainable” markets for goods that households perceive as having little value. But I have no idea whether the variety of randomized evaluations that have convinced me to take this position will also convince policy makers and philanthropists (many of these papers are highlighted in this Kremer and Holla paper).
Will Jessica Cohen’s fascinating and oft quoted study on bednet usage affect the way bednet distribution is conducted in the future? Will the J-PAL study on the impact of de-worming drugs on schooling actually lead to huge de-worming projects funded by government or major philanthropic institutions?
In his post, Emmerich highlights some reasons for optimism on this front.
I believe/hope it will, but I am not positive about how open microfinance policy makers are to conclusions that contradict some of there most closely held beliefs on the importance of weekly meetings and joint liability for microfinance. Policy makers may simply not believe in the external validity of the study documenting a certain effect (and they may be right).
Of course the impact of REs is not simply up to chance. How the results are promoted and explained to stakeholders will make a large difference in their influence. In the recently published Banerjee and Duflo article that I link to in the first paragraph, the authors speak of the power of the relationship created during REs between the experimenters and the institution having their program evaluated, and how this relationship can develop into intimate cooperation. From my experience, developing this type of bond between researcher and institution is the exception and not the rule, but perhaps this can change if researchers begin to put more emphasis on this relationship.
They conclude the paper by highlighting the importance of developing research in concert with policy makers so that they are invested in the results. This is a promising tactic, and I think would go a long way towards dealing with some of the criticism targeted at REs.
This blog was originally suppose to be a response to Evelyn Stark’s list of issues that “elicited the most concern with randomized control trials.” And my job was to respond to her last three comments (I apologize to Emmerich for digressing). So here are my responses:
Stark: “Time – results are measured over a relatively long periods (at least 1 year) and therefore the inability to make changes to the product/service in that time”
Randomized evaluations do generally take a great deal of time to produce statistics. It is not just the interventions that take time but also collecting data and cleaning that data. It seems the suggestion from this complaint is that if one uses simple observation and operational data one could come more quickly to conclusions. But from my experience, I actually do not perceive randomized evaluations prohibiting researchers from using observation and operational data. To give one example, conducting an evaluation of SKS’s health insurance program did not stop the researchers from publishing a toolkit or stop SKS from further rolling out the program once they believed it was successful.
Stark: “Relative merit (time, cost & results) to good market research and product piloting seemed unclear, or negative.”
The majority of this post was spent responding to this issue. Put simply, I don’t think proponents or detractors can say with any certainty whether randomized evaluations do have more merit than “market research and product piloting.” It will take years of analyzing the impact of REs to make any conclusions.
Stark: “Limited by the specificity of the research question; unable to probe nuance and context or make results replicable across programs in different contexts.”
I would be interested in an expansion on what Stark’s colleagues meant by this, because I don't see why this is true. Why can’t researchers conducting REs delve into nuance? I believe that we certainly do this in the study that I am currently involved in, and I believe all my colleagues at the Centre for Micro Finance do the same. And how can other types of evaluators “make results replicable across program in different contexts” any better than randomized evaluators can? I hope I am not beginning to sound overly defensive, but these issues sound more like problems with the concept of structured evaluation itself, rather than problems with randomized evaluations specifically.
We are eager to hear others reactions to our opinions and the issues Stark raises, even if it is too early to discuss. J