Wednesday 24 September 2008

Random Discontents: III

I appreciate Emmerich and Dan for their spirited defence of RCTs and having spent some time with CMF and J-PAL and the brilliant academics we worked with, my impressionable mind has been reasonably impressed. The C-GAP blog, Rodrik's paper and the responses to the same, in my opinion, have been largely academic, evaluating the cost-benefit of running these experiments. I will take a slightly more lay-man stance here.

On innovation, yes – RCTs are amazing! Imagine using a scientific trial to evaluate social programs - breaking the myth that in social programs, all we could rely on are our gut feel and years of experience and may be, qualitative studies. The studies take time – that’s no issue. Any good study will and should take time. The studies are costly – no problem, as long as there are donors with research funding. RCTs exclude some people from benefits for some time – no problem, no organization can serve everyone at the same time and tricky field issues can be resolved with some clever designing.

Where I have a problem is the way in which RCTs researchers promote it as the ultimate tool. Social experiments, which we call development interventions, when tried in a particular context, lets say Orissa, and succeed are not these days acknowledged as successful enough, if they cannot be proven to be replicable. The romance with scale, replication and cost-effectiveness have brought damning judgements on many development projects and many promising interventions have lost out on funding because of their seemingly non-replicable nature. Academic investigations into such studies seem to escape this scrutiny. And rightly, this is not just with RCTs, this also holds for other kinds of studies which attempt to judge social programs. Why would an MFI, for instance, in Bihar modify its operations because a study in Hyderabad puts out a certain result?

Secondly, researchers need to internalise that most practitioners are a different breed of people, unlike themselves and some academics-turned-policy makers. These people have their eyes and ears on the ground and are also exposed to a large amount of information that comes their way regarding different organisations and their programs in different parts of the country. From what I have seen, RCTs almost seem to assume that their study examines a problem from scratch and in isolation. Thus, when RCTs are promoted using results that the initial rounds of the study throw up, one needs to exercise more restraint. For example, in the evaluation of an intervention where a south-Indian MFI bundled a micro-insurance product with their micro-credit product, the researchers have said there was no evidence that introducing the insurance product in any way adversely affected the composition of the MFI’s clients. In another study with repayment schedules in a state in east India, researchers found that monthly repayment schedules did not seem to increase default rates. Both these studies have been widely publicized and I can imagine, must have amazed (and horrified) many practitioners who attended conferences and seminars where these studies were presented.

However, the forcefulness of these conclusions tend to gloss over the fact that most of these are only preliminary findings; worse still, they look at only one small component of a program over a pre-defined period of time. It is obvious that human relationships change with time – my relationship with my banker will change as soon as I discover there are hidden charges that were not explained to me previously and the same will dramatically improve as soon if I am told that as a reward for my excellent credit history, my subsequent loans will cost me less; my relationship with my insurance company will change if it takes them 3 weeks to process my claim, after having hassled me for over a week about proper documentation in presenting my claim – meaning, my behaviour with an organization/program with which we have a transactional relationship is dynamic. My social networks impact the nature and the extent to which I influence others regarding the particular organization/program.

Also, the artificial separation of the organization from the program simply refuses to convince me. Especially in contexts where programs are made or marred by those implementing it, this separation is quite inexplicable. Unless we standardize implementers all over the world, we cannot study them the way RCTs propose. These studies spend a considerable amount of time and expertise determining sample sizes, emphasizing on the law of large numbers; how then would the same law work when it came to the number of experiments? How many experiments would I need before I can say that I have covered all types of organizations in India and I have an answer for the standard prototype? I cannot even hope to achieve it for a state, let alone a country or a continent.

I firmly believe one must temper down such ‘conclusions’ that these studies yield. It is true governments and practitioners indulge in rhetoric and without some good rhetoric, they wouldn’t survive. However, academics, in my opinion, ought to desist from such strong posturing – even if all this is being done in an attempt to legitimize a particular methodology and out of the conviction that this is “the way”. These results, promoted as methodologically sound to a bunch of practitioners and policy makers, which later, could be revised, corrected or retracted (on studying the same program location for a longer duration or on studying multiple locations and realizing that much of the previous results are actually attributable to non-replicable factors), erode the credibility of the researcher; and in this case, since the methodology is promoted as the infallible hero, it is RCT that is likely to take a beating. A far more modest study would have looked at a bunch of MFIs at the same time and remarked that in a majority of cases, insurance schemes did seem to be doing badly and that it is possible that one can find reasons for the same if one looked into how programs are being managed and implemented by each organization. By looking at anthropological and sociological accounts of the same population over time, some more inferences can be drawn as to why these events happen as they do. Not the most scientifically accurate study, but one that is possibly a better representative of reality than what one RCT will reveal.


Having proven that RCTs can study a snapshot better than any other technique can, we need to hear from its proponents, how they can be made less costly and easier to handle – so that we can have multiple rounds of the same experiment and in multiple locations at the same time. With this, we will also need to know how to integrate the existing knowledge of an area, its people and interventions. Also, if RCTs are like incisions into a program/organisation’s body, we have to see how it can be made as painless and non-disruptive as possible. This will give it some further credence and then, the non-standardization issue can be approached (and solved, I am sure, using some complicated equation). As I see it, converting socio-economic impacts into solve-able mathematical equations is not the final frontier; being able to answer the ‘why’ and ‘how’ and attaching the same to the estimated impact probably is…

Finally, as Dan points out, the debate on how good RCTs are, are probably premature. I got caught up in this since I was an ‘insider’. If I were not, there is little chance I would worry too much about these questions, especially, which study was better than the other? I would just worry in general about any evaluation and its impact on my work in the field…

2 comments:

Selvan Kumar said...

Suvojit,

Thanks for the post, which certainly highlights some of the criticism of RCTs that have been largely ignored by the academic literature.

However, I have to take issue with one of your points.

You state that "Social experiments, which we call development interventions, when tried in a particular context, lets say Orissa, and succeed are not these days acknowledged as successful enough, if they cannot be proven to be replicable." You suggest that the questionable external validity of RCTs means that they are not "replicable", and that they should therefore be subject to the same criticisms levelled against other non-replicable projects.

But this is hardly a criticism of RCTs. Quite the contrary, the type of micro-development work that Poverty Action Lab, IPA, and other organizations who rely on randomization has clearly shown that "one-size-fits-all" approaches to development are thoroughly misguided. It is the innovative and careful use of RCTs that has helped development researchers prove just how context-specific the process of development is, and how development interventions must be designed with this in mind to have any chance of success.

Extrapolating the results of RCTs from one context to another is more art than science, and as you correctly point out many of the conclusions they yield cannot be generalized. But by replicating RCTs across a variety of contexts (something you suggest, which IPA is now doing), it will become possible to make broader generalizations that are more empirically sound.

But regardless of external validity, the fact remains that RCTs are the most methodologically sound way to evaluate programs in a specific context. While I wholeheartedly agree with your point that there is too much of a gap between researchers and practicioners (something I have seen firsthand time and again), I'm still not convinced that other evaluation strategies are any better at addressing these issues. If anything, less rigorous methods such as qualitative studies should be viewed as a supplement to RCTs. The insights gleaned from qualitative research can definitely help us interpret results from RCTs and translate them into meaningful policy mandates, but they should not be viewed as a replacement for the rigorous and unbiased information RCTs provide.

Suvojit said...

Selvan, there is a problem with what you say here. It is definitely not true that "It is the innovative and careful use of RCTs that has helped development researchers prove just how context-specific the process of development is, and how development interventions must be designed with this in mind" - the knowledge that different communities and locations needs different solutions is common knowledge and I am sure no proponent of RCTs would lay claim to having invented this logic!

As you rightly point out, extrapolating these results are possibly more art than science and yet, it is important that scientific answers be replicated - and thats the point I raise towards the end. How do we make it easier? How can we make it more user-friendly? The beauty of a technology lies in making it as widely usable and user-friendly as possible and that is what I would be waiting to coming next from the randomistas of this world.

The pride that RCTs take in their methodological perfection is justifiable. However, considering the time and resources required for each project, you should be realistic about how many projects you can evaluate in a reasonable time-frame so as to frame a policy that addresses each social program.