How to randomise (original) (raw)
- Douglas G Altman, professor of statistics in medicinea,
- J Martin Bland, professor of medical statisticsb
- a ICRF Medical Statistics Group, Centre for Statistics in Medicine, Institute of Health Sciences, Oxford OX3 7LF
- b Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE
- Correspondence to: Professor Altman
We have explained why random allocation of treatments is a required feature of controlled trials.1 Here we consider how to generate a random allocation sequence.
Almost always patients enter a trial in sequence over a prolonged period. In the simplest procedure, simple randomisation, we determine each patient's treatment at random independently with no constraints. With equal allocation to two treatment groups this is equivalent to tossing a coin, although in practice coins are rarely used. Instead we use computer generated random numbers. Suitable tables can be found in most statistics textbooks. The table shows an example2: the numbers can be considered as either random digits from 0 to 9 or random integers from 0 to 99.
For equal allocation to two treatments we could take odd and even numbers to indicate treatments A and B respectively. We must then choose an arbitrary place to start and also the direction in which to read the table. The first 10 two digit numbers from a starting place in column 2 are 85 80 62 36 96 56 17 17 23 87, which translate into the sequence A B B B B B A A A A for the first 10 patients. We could instead have taken each digit on its own, or numbers 00 to 49 for A and 50 to 99 for B. There are countless possible strategies; it makes no difference which is used.
We can easily generalise the approach. With three groups we could use 01 to 33 for A, 34 to 66 for B, and 67 to 99 for C (00 is ignored). We could allocate treatments A and B in proportions 2 to 1 by using 01 to 66 for A and 67 to 99 for B.
At any point in the sequence the numbers of patients allocated to each treatment will probably differ, as in the above example. But sometimes we want to keep the numbers in each group very close at all times. Block randomisation (also called restricted randomisation) is used for this purpose. For example, if we consider subjects in blocks of four at a time there are only six ways in which two get A and two get B:
1: A A B B 2: A B A B 3: A B B A 4: B B A A 5: B A B A 6: B A A B
We choose blocks at random to create the allocation sequence. Using the single digits of the previous random sequence and omitting numbers outside the range 1 to 6 we get 5 6 2 3 6 6 5 6 1 1. From these we can construct the block allocation sequence B A B A / B A A B / A B A B / A B B A / B A A B, and so on. The numbers in the two groups at any time can never differ by more than half the block length. Block size is normally a multiple of the number of treatments. Large blocks are best avoided as they control balance less well. It is possible to vary the block length, again at random, perhaps using a mixture of blocks of size 2, 4 or 6.
Excerpt from a table of random digits.2 The numbers used in the example are shown in bold
While simple randomisation removes bias from the allocation procedure, it does not guarantee, for example, that the individuals in each group have a similar age distribution. In small studies especially some chance imbalance will probably occur, which might complicate the interpretation of results. We can use stratified randomisation to achieve approximate balance of important characteristics without sacrificing the advantages of randomisation. The method is to produce a separate block randomisation list for each subgroup (stratum). For example, in a study to compare two alternative treatments for breast cancer it might be important to stratify by menopausal status. Separate lists of random numbers should then be constructed for premenopausal and postmenopausal women. It is essential that stratified treatment allocation is based on block randomisation within each stratum rather than simple randomisation; otherwise there will be no control of balance of treatments within strata, so the object of stratification will be defeated.
Stratified randomisation can be extended to two or more stratifying variables. For example, we might want to extend the stratification in the breast cancer trial to tumour size and number of positive nodes. A separate randomisation list is needed for each combination of categories. If we had two tumour size groups (say ≤4 and >4cm) and three groups for node involvement (0, 1-4, >4) as well as menopausal status, then we have 2×3×2=12 strata, which may exceed the limit of what is practical. Also with multiple strata some of the combinations of categories may be rare, so the intended treatment balance is not achieved.
In a multicentre study the patients within each centre will need to be randomised separately unless there is a central coordinated randomising service. Thus “centre” is a stratifying variable, and there may be other stratifying variables as well.
In small studies it is not practical to stratify on more than one or perhaps two variables, as the number of strata can quickly approach the number of subjects. When it is really important to achieve close similarity between treatment groups for several variables minimisation can be used—we discuss this method in a separate Statistics note.3
We have described the generation of a random sequence in some detail so that the principles are clear. In practice, for many trials the process will be done by computer. Suitable software is available at http://www.sghms.ac.uk/phs/staff/jmb/jmb.htm.
We shall also consider in a subsequent note the practicalities of using a random sequence to allocate treatments to patients.