Note: this is the first of two letters constituting a review of the Harris et al. study of Vancouver and Toronto bicycle facilities.
You may read an explanation of how these letters came to be posted here. It includes a link to the other letter and to the edited version which is posted with the original article.
Vulnerabilities of the case-crossover method as applied, and unsuitability of the epidemiological approach, to transportation injuries and traffic engineering problems – Part I
M Kary, Mathematician
Montreal, Canada
Re: Comparing the effects of infrastructure on bicycling injury at intersections and non-intersections using a case–crossover design. Harris, et al. 19:5 303-310 doi:10.1136/injuryprev-2012-040561
The case-crossover method in its familiar application is to look for factors that recur when cases occur, for individuals crossing exposure to them as examined over a time interval. This study [1-3] applies the method in a different way, the exposures being examined over a spatial route, with neither the identified factors nor the various routes being independent of the highly constrained urban geographies of the settings. Thus in addition to all the familiar vulnerabilities of the case-crossover method [4-10]– some of which require translation to the new context– this application brings new problems of its own. Overlayed on top of these is the general unsuitability of the epidemiological approach to transportation injuries and traffic engineering problems.
These issues occur in abundance and deserve illumination, while replies are required to be brief. Consequently this is not a balanced assessment of strengths and weaknesses, but a spotlight on a selection of the latter. Even so this will have to be done over a series of eventual responses. I thank the authors for kindly providing extra information about their study as necessary for the following analysis.
The case-crossover method is particularly vulnerable to recruitment and severity bias, information and recall bias, bias in the selection of control sites, temporal confounding of various sorts, and other problems [4-10]. It is also as subject as any other method to confounding by unmeasured, uncontrolled factors, and model dependence of adjustments for measured confounders. I focus on a few of these that take unusual forms in this study, even though there is reason to suspect that the others are at least as important. Examined in this first response are vulnerabilities to control site selection bias.
For each site where an injury event occurred, the authors find control sites by randomly selecting another location along the route the rider took, from start to termination at the injury event. This is supposed to adjust for exposure to infrastructure types, the probabilities of their selection, and thus hopefully the resulting overall relative frequencies, being proportional to their relative lengths along the routes.
To compare by facility types, intersections must be paired with other intersections, and likewise non-intersections with non-intersections. For injuries occurring at intersections, usually the location randomly chosen for use as a control will not land on another intersection, so the authors randomly adjust that location forward or back until it does. The authors have informed me that this was necessary about 70% of the time.
In these instances, the selection of control intersections of various sizes (i.e., traversed widths) is dependent on their spatial distribution along the route, but indifferent to their widths. This allows their selection to be disproportionately biased in favour of smaller intersections associated to longer non-intersection segments. For example, over a route whose length is 30% intersections, 70% non-intersections, beginning at 0 and having terminated at 1, with intersections between 0 and 0.05, 0.5 and 0.6, and 0.85 to 1, the probabilities of choosing the three intersections as control sites should occur in ratios of 1:2:3. But by the authors’ adjustment method, in those instances where adjustment is needed, they are in fact respectively 0.5 x 0.45/0.7, [(0.5 x 0.45)+ (0.5 x 0.25)]/0.7, 0.5 x 0.25/0.7, thus occurring in ratios of approximately 1:1.56:0.56. Maclure [4] has discussed the potentially large biases in relative risk estimates that can result from not taking intersection widths into account.
Likewise, sometimes the random location selected to control for a non-intersection injury event site lands on an intersection, so the authors randomly adjust that location forward or back until it doesn’t. In these instances (about 30%), this allows the selection of non-intersection control sites to be disproportionately biased in favour of whatever are adjacent larger intersections. This might very well have included cycle tracks, considering the ones in existence at the time of the study.
There is though already a potential selection bias before this stage. Injuries almost always occur some distance before the very end of a planned trip. The authors restrict their selection of control sites to the route traversed before the injury event, thus systematically excluding the tails of the planned trips from selection. Bicycle facilities have particular distributions within cities and along routes (and over all routes in the sample considered as a whole), and consequently, excluding the tails of the planned trips may disproportionately bias the selection of control sites. For example, consider a route with no intersections, having a bicycle facility in the first and last thirds only. Suppose injury events occur completely at random along this route, so that they have no association with infrastructure type (or any other factor). The injury events therefore occur in bicycle facilities and non-facilities in proportions of 2:1, and likewise so should the selection of control sites. But an elementary calculation shows that, under the authors’ method of selection, the probability of the control being in a facility is [2+ln(4/3)]/3, so that instead the control sites occur in facilities and non-facilities in proportions of about 3.21:1.
Some of the above vulnerabilities to bias are analogous to those familiar from meteorological case-crossover studies, where selection bias may occur if there are long-scale temporal waves of exposure, or serial autocorrelation. In the present context these correspond to spatial waves or autocorrelation of exposure, which occur both within routes and across subjects, the latter if only because the constraints of urban geography mean their routes may overlap. There can also be temporal waves or autocorrelations in the present study, e.g. for injuries occurring to different people during the same rush hour, in response to e.g. large scale spatio-temporal patterns of traffic congestion.
Other studies have addressed such issues with more or less success by using bidirectional sampling. This and the resulting matter of selecting control sites post-terminating event has been discussed extensively in the meteorological literature on case-crossover studies [7-9], and by Maclure in the epidemiological literature [4].
There is still another potential randomisation failure at this level of selection to consider. The standard deviation of the uniform distribution on [0, 1] is almost one-third (1/[2*SQRT(3)]). For individual runs of only 801 in length (for non-intersections), or 272 (for intersections), this can easily result in quintiles being out of balance by plus or minus 10 to 25%, which can again skew the estimates. (Thus the reader wishing to closely check by simulation the probability calculations given above should use a much larger n, such as on the order of 10^5.)
This ends a first instalment, devoted only to control site selection bias. The next eventual instalment shall cover various other vulnerabilities to bias in this study, including the fundamental one introduced by considering only distance at risk, instead of or without the addition of time at risk [11].
References
1. Harris MA, Reynolds CCO, Winters M, Chipman M, Cripton PA, Cusimano MD, Teschke K. The Bicyclists’ Injuries and the Cycling Environment study: a protocol to tackle methodological issues facing studies of bicycling safety. Inj Prev 2011;17:e6. doi:10.1136/injuryprev-2011-040071.
2. Teschke K, Harris MA, Reynolds CCO, Winters M, Babul S, Chipman M, et al. Route Infrastructure and the risk of injuries to bicyclists: a case-crossover study. Am J Pub Health 2012;Oct 18:e1-e8. doi:10.2105/AJPH.2012.300762.
3. Harris MA, Reynolds CCO, Winters M, Cripton PA, Shen H, Chipman ML, et al. Comparing the effects of infrastructure on bicycling injury at intersections and non-intersections using a case-crossover design. Inj Prev 2013;0:18. doi:10.1136/injuryprev-2012-040561.
4. Maclure M, Mittleman MA. Should we use a case-crossover design? Ann Rev Public Health 2000;21:193221.
5. Redelmeier DA, Tibshirani RJ. Interpretation and bias in case-crossover studies. J Clin Epidemiol 1997;50;1281-1287.
6. Sorock GS, Lombardi DA, Gabel CL, Smith GS, Mittleman MA. Case-crossover studies of occupational trauma: methodological caveats. Inj Prev 2001;7(Suppl I):i3842.
7. Lee J-T, Kim H, Schwartz J. Bidirectional case-crossover studies of air pollution: bias from skewed and incomplete waves. Env Health Perspectives 2000;108:1107-1111.
8. Bateson TF, Schwartz J. Selection bias and confounding in case-crossover analyses of environmental time-series data. Epidemiology 2001;12:654-661.
9. Lumley T, Levy D. Bias in the case-crossover design: implications for studies of air pollution. NRCSE Technical Report Series NRCSE-TRS No. 031, 1999.
10. Maclure M. The case-crossover design: a method for studying transient effects on the risk of acute events. Am J Epid 1991;133:144-153.
11. Chipman ML, MacGregor CG, Smiley AM, Lee-Gosselin M. Time vs. distance as measures of exposure in driving surveys. Acciden Analysis & Prevention 1992;24:679-684.
Conflict of Interest:
None declared