Logistic Regression Analysis: Applications to Ophthalmic Research
Article Outline
The purpose of this Editorial is to present a brief overview of an extremely useful analytical tool for the analysis of data frequently arising in health sciences research. To facilitate understanding of the method, we consider a hypothetical study involving 286 patients who enter a study for the treatment of a particular ophthalmic condition. Two treatments are available (let us call these treatments A and B). Both treatments offer immediate results, and after just one week of receiving treatment, all symptoms and evidence of disease are gone for all 286 patients. However, disease will reoccur in a certain proportion of patients, so the study is designed to last for a total of 20 months to determine which treatment is more effective for the relief of symptoms and disease in this period. In our hypothetical data, 76 (26.57%) of the 286 patients experienced a recurrence of symptoms before the end of the study. The breakdown of recurrence by treatment is shown in Table 1.
TABLE 1. Cross Classification of Recurrence of Disease and Treatment
| Recurrence | Treatment | Total | |
|---|---|---|---|
| A | B | ||
| No | 114 | 96 | 210 |
| Yes | 16 | 60 | 76 |
| Total | 130 | 156 | 286 |
It is clear from this cross tabulation that 38.4% (60/156) of patients receiving treatment B experience a recurrence of symptoms, whereas the rate is only 12.3% (16/130) of patients receiving treatment A. To put this in epidemiologic terms, the odds of recurrence among patients receiving treatment A is (16/130)/(114/130) = 16/114 = 0.140. For patients who receive treatment B, the odds of recurrence is (60/156)/(96/156) = 60/96 = 0.625. The odds ratio (OR) is the ratio of these odds and is 0.14/0.625 = 0.224 (ie, the odds of recurrence is only 22% as great in patients who received treatment A (x = 1) as compared with those who received treatment B (x = 0)).
The logistic regression model is appropriate for modeling a binary outcome (such as the recurrence of disease at the end of the study). The actual model is as follows:

The fitted logistic regression model for these data is shown in Table 2. Note that exponentiating the estimated coefficient for treatment in Table 2, we have e−1.494 = 0.224, the identical OR we computed earlier from the contingency table in Table 1.
TABLE 2. Fitted Logistic Regression Model Containing Treatment
| Coefficient | Standard Error | z | P value | 95% Confidence Interval | |
|---|---|---|---|---|---|
| Treatment | −1.494 | 0.3136 | −4.76 | <.001 | −2.108 |
| Constant | −0.470 | 0.1646 | −2.86 | .004 | −0.793 |
Researchers will immediately recognize that one potential reason why treatment A performed so much better than treatment B is that the patients who received treatment A may have differed with respect to some other characteristic, such as age. If age is related to recurrence as well as differing in the treatment groups, then age may be a confounder of the association between treatment and recurrence. To explore this possibility using a logistic regression model, we only have to include age in the previous model (Table 3).
TABLE 3. Fitted Logistic Regression Model Containing Treatment and Age
| Coefficient | Standard Error | z | P value | 95% Confidence Interval | |
|---|---|---|---|---|---|
| Treatment | −1.460 | 0.3162 | −4.62 | <.001 | −2.080 |
| Age | 0.025 | 0.0130 | 1.88 | .060 | −0.001 |
| Constant | −1.171 | 0.4217 | −2.78 | .005 | −1.997 |
If we exponentiate the coefficient associated with treatment, we obtain e−1.46 = 0.232. This is the adjusted OR, where we have controlled for age. Note that the OR did not change much from the crude OR (ie, controlling for nothing), and hence, age is judged not to be a confounder of treatment in these data.
Note that to use the logistic regression model, all we need to known about recurrence is whether it is present (y = 1) or absent (y = 0) for each subject at the end of the study. The fact that subjects might have been under observation for varying lengths of time over the course of the study was not considered or used in any way. The resulting estimate of effect for treatment is the OR, adjusted for age, and is applicable as a measure of effect only at the end of the study. Hosmer and Lemeshow provide a detailed treatment of modeling binary outcome data using the logistic regression model.1
The authors indicate no financial support or financial conflict of interest. Both authors were involved in design and conduct of study; data collection; analysis and interpretation of data; and preparation and review of the manuscript.
Reference
PII: S0002-9394(08)00610-7
doi:10.1016/j.ajo.2008.07.042
© 2009 Elsevier Inc. All rights reserved.
