How R C Ts Lie

Cartwright (2007) has argued that, contrary to popular opinion, the RCT is not a gold standard. She is right, but she overlooks many of the most interesting reasons for her conclusion.

Big (and nicely philosophical) question: WHAT are RCTs meant to be the gold standard for? Obviously for getting the truth in some area; but which area? If it’s some sufficiently narrow area, then OK; but it’s going to have to be a very narrow area. I can write a general section on this at the beginning, and at the end of the paper I can argue that one such area might (with some caveats) be hospital-based studies of typical (in some vague sense) pharmaceuticals. (JG)

see arguments below from Grossman and Mackenzie (2005); need to reword these, expand some of them and delete others

make much more of a big deal about Intention to Treat (NT)

maybe something about Bayesianism: philosophers generally help themselves to Bayesianism (e.g. Bayesian decision theory; Bayesian normative probability arguments) when it suits them, but RCTs are analysed in a way which is strictly incompatible with Bayesianism (JG expand?)

There you go!

—-

The following is not much changed from Mackenzie and Grossman 2005, so whichever bits we want to keep need to be re-worked.

The claim that randomization is enough to bring the scientific method into clinical trials gives far too much credit to randomization as a technique of bias adjustment. The assumption that randomization always reduces bias creates what Horwitz (1987) terms “the illusion of homogeneity,” which results in a failure to consider the numerous other factors that can go wrong (Howson and Urbach 1993;Vandenbroucke 1998). (See also Grossman in preparation on what `bias’ means.)

Sample size

Randomizing is completely futile when the sample size is small compared to the number of variables. If there is one take-home message from the technicalities we mention in this paper,this is it. It is irrational to expect that small trials will have balanced variables (D’Agostino and Kwan 1995;Evidence-Based Medicine Working Group 2002; Kunz and Oxman 1998). On the contrary,we should expect small trials to be much less well balanced than if the balancing had been done by hand. This can easily be seen by example. Suppose we have four trial subjects,two male and two female, and we wish to test the contraceptive properties of a new drug believed to be relevant to hormone metabolism. We would be well advised to put one of our males and one of our females in the intervention group and one each in the control group. We can do this . . . provided that we don’t randomize. If we randomize, there is a 50% chance that both females will end up in one group and both males in the other. That’s the chance of going wrong in a very small study with only one variable that needs balancing. If we are trying to balance two dichotomous variables, the chances that randomization will get them both right is only 25%,and so on.

Treatment allocation

Support for randomization is often based on inadequate information about alternative forms of treatment allocation that could be used to balance variables. For example, the Evidence-Based Medicine Working Group (2002) tries to show the superiority of randomization as a bias adjustment method by suggesting that the only alternative is the allocation of patients into groups by a clinician who is personally interested in seeing those with the great- est need given the test treatment. Aside from the fact that, as Worrall (2002) points out, this is specifically a problem with blinding, the Working Group’s sug- gestion is ridiculous straw - man - Ism There are many other ways of allocating subjects to groups, some of which are almost certain to give more accurate results than randomizing in some cases. There has been very little work on non- random allocation methods, precisely because of the dominance of RCTs, but that can and should change.

Intention to Treat

[Neil to expand this] An intention-to-treat analysis is one in which the two groups that are compared are not the group that has receivedtreatment versus a control, but the group that was originally intended to receive treatment versus a control. Intention-to-treat analysis in a sense takes into account the proportion of subjects who will not receive the treatment that they are advised to take in the real world. According to the Evidence-Based Medicine Working Group (2002), failure to follow the intention-to-treat principle “defeat[s] the purpose of randomization.”

An intention-to-treat analysis on its own is relatively useless. This is most easily shown by example. An experimental weight-loss program recruits 100 subjects, 50 of whom are randomized into a control group. Of the other 50, only 10 complete the program. An intention-to-treat analysis would (by definition) work out average results for all 50, including the 40 who did not complete the intervention: consequently, its estimate of the program’s effectiveness is likely to be very low indeed. Up to a point, this low estimate of the program’s effectiveness is intentional: there may have been something about the program that made study participants drop out, and if so that should count as a black mark against it. Alternatively, it may be that only a few of the subjects were motivated to take the program seriously. The intention-to-treat analysis does not enable us to distinguish between these two possibilities. Now suppose that the program is adopted by a health organization and marketed. What effectiveness should be quoted? Its average effectiveness among 50 people, only 10 of whom received it? Or its effectiveness among the people who actually received it (sometimes called its “efficacy”)? The latter figure (or perhaps both) is the one which prospective clients of the marketed program need to hear; but if only an intention-to-treat analysis has been done, only the former figure is available.

Time-Frame Problems

If the amount of time between consecutive events on which the outcome measures for a particular study depend is long or unpredictable (for example, occurrences of rare exacerbations in asthma treatment studies, as discussed in Cleland, Thomas, and Price 2003; other examples are discussed in Black [1996]), a standard RCT is inadequate to obtain certainty of measurement. The reduced rigidity and increased simplicity of observational studies make them much more suited to such a scenario, as they can cover more bases and do not require the maintenance of an ongoing structural framework. At the other end of the spectrum, when a study is to be conducted in an area where advances are occurring rapidly, such as research into antiretroviral treatments for HIV, the use of observational databases is a preferable option to avoid the results being out of date by the time the study is complete (Phillips et al.1999).

Additional Problems with RCTs in Social Contexts

It has been widely acknowledged that it is not always possible to conduct RCTs in social contexts (Black 1996; D’Agostino and Kwan 1995; Evidence-Based Medicine Working Group 2002; Levin et al.1997; Mackenbach and Howden-Chapman 2003; Moses 1995; Pocock and Elbourne 2000; Rychetnik et al. 2002). It is obvious, for example, that to assess the negative effects of smoking, one can neither randomize nor blind participants, and Thomson et al. (2004) discuss an investigation into the effects of income supplements on health, where performing an RCT was found difficult to justify ethically. When this point is conceded by RCT advocates, however, one will inevitably find an accompanying note stating that the alternative observational or historically controlled trials that must be done instead are “suboptimal in comparison to an RCT” (D’Agostino and Kwan 1995, p.AS96) and therefore must be carried out only if there is absolutely no way to do an RCT. This is not an adequate response to our arguments. First, it seems only logical to say that whenever an RCT is impossible, it is, in that case, an inferior study design; if an impossible study design can count as the best one, then we might as well say that magical divination of the true properties of a drug is the best study design. Second, and more importantly, even when an RCT is feasible, it may not be the best study design. [JG to fill in why not?]

There are several reasons why the RCT is particularly unlikely to be the best study design in social contexts.

Bleeding of effects from the intervention group into the control group

When the subjects of an experiment are free-living humans,the effects of the study treatment can bleed from the intervention group into the control group, destroying the comparison between the groups. For example, subjects in the active arm of a drug trial sometimes share their drugs with friends in the control group. This can happen even in a blinded study, either because the sharing goes both ways, or because the subjects are able to tell which pill is the active one by its medical effects. In this case, an intention-to-treat analysis, as mandated by orthodox RCT methodology, is inappropriate. Often, the best solution in such cases is to analyze the study’s results according to the drugs actually taken by each subject, rather than by the drugs delivered to each subject. This may well be easiest to do if the study is deliberately not blinded, because in a blind study the subjects have an extra incentive to lie about their actions, for fear of being seen to have broken the blinding. Another way to avoid this effect would be to use a historical, non-randomized control group. The details of the best methodology will depend on the details of the social interactions of the subjects with the researchers. The necessity of taking these interactions into account argues against the automatic privileging of a particular simple research methodology such as the RCT.

Disincentives to participation

Randomization can provide disincentives to participation, which can cause problems on two levels. First, the requirement for studies to take the form of an RCT may make some topics impossible to study. For example, new treatments are frequently considered better than old treatments, even though this superiority may not have been scientifically established. In this case, patients have an extreme disincentive to enter a randomized study comparing the two treatments, and even to ask them to enter may be unethical (Little 2003; Miller and Weijer 2003). Such treatments can be compared more fruitfully using other methods, such as historical (non-RCT) control groups. Second, the benefits of randomization are diminished or extinguished when the subjects who drop out of the intervention group are not identical to the subjects who drop out of the control group. For example, try asking a community of sex workers who make money from unprotected sex to accept randomization into a safe-sex program. The resulting study won’t be representative because there will be differential drop-out: some of the subjects who are randomized to the safe- sex arm will drop out rather than lose income. Not only are the subjects who drop out generally not identical to those who don’t, there are reasons to believe that in many studies they will exhibit major systematic differences’for example, in a drug trial subjects who drop out of a treatment group (but not a control group) will often be those least able to cope with drug toxicity or other side effects. Statistical methods can minimize the damage caused in these cases, but only if the differential dropout is recognized as such. Dropout from RCTs is almost always treated as random, as required by intention-to-treat analysis, and this should be expected to often lead to substantially incorrect results. This problem is avoided in (some) non-RCT studies in which variables related to dropout are explicitly modelled throughout the analysis.