Advertisement
Logo
Search for

Volume 149, Issue 3, Pages 364-366 (March 2010)


View previous. 4 of 43 View next.

Current Research in Biostatistics

Abdelmonem A. AfifiCorresponding Author Informationemail address, Fei Yu

Accepted 16 November 2009.

Article Outline

Acknowledgment

References

Copyright

In the 1950s and '60s, a doctoral student in biostatistics could be reasonably expected to acquire a fairly sophisticated knowledge of the whole field before completing the doctoral dissertation. This knowledge would include the fundamentals of probability and mathematical theory of statistical inference, as well as biostatistics proper—that is, the theory and application of statistics to the life and health sciences. Today, biostatistics has grown to the point that no doctoral student in it can become an expert in all of it. Thus, our ambitious title notwithstanding, we can, in this limited space, only aspire to cover part of the current research in biostatistics.

As in many scientific fields, widespread application of new biostatistical methods lags behind their publication by a period ranging from a few years to 2 or 3 decades. For example, the Cox proportional hazards model was proposed in 1972 but became a common practice for survival analysis only in the mid-1980s.1 As another example, the current popularity of mixed-effects regression models and longitudinal data analysis not only lagged behind their theoretical development by some decades, it also had to await the availability of practical computer software, such as SAS PROC MIXED, first made generally available in the 1990s. To identify, in general terms, where the field may be going, we surveyed recent issues of several leading statistical journals and attempted to distill their contents into main categories. These summaries, we hope, will point the way to some topics that may become common practice in the near future, although we recognize that some others may fall by the wayside.

We sought to summarize influential recent work in journals with high impact factors and that have an orientation overlapping substantially with the disciplines of statistics and biostatistics. In selecting journals for our search, we relied mainly on the journals' impact factors.2 We selected the top 10 journals in the field of statistics and probability (Table 1) with 2 exceptions: Econometrica was excluded, since it is a specialized journal in an area far from biostatistics; and although Biometrika is ranked 26th among statistics and probability journals, we included it because of its historical importance and because it is the next-highest ranked statistical journal in the field of biostatistics. We systematically scanned all the articles in the most recent issues of these 10 journals. In a few instances, we excluded special sections that were not related to biostatistics. Although the number of articles per journal varied greatly, we feel comfortable that the 583 articles we reviewed represent a good cross section of the most recent statistical publications in the field of biostatistics.

TABLE 1.

List of Statistical Journals and Issues Reviewed

Journal Title
Impact Factor
Journal Issues Reviewed
Number of Articles
Biostatistics3.394October 2008 – April 200945
Journal of the Royal Statistical Society, Series B: Statistical Methodology2.835November 2008 – September 200947
Annals of Applied Statistics2.448September 2008 – June 200955
Journal of the American Statistical Association2.394September 2008 – March 200992
Annals of Statistics2.307February 2009 – June 200952
Statistical Methods in Medical Research2.177October 2008 – August 200933
Statistical Science2.135November 2007 – September 200817
Statistics in Medicine2.111January 2009 – May 200989
Biometrics1.970September 2008 – March 2009100
Biometrika1.405September 2008 – March 200953
Total 583

Classifying articles into categories presented some challenges. For example, an article may use Bayesian ideas to develop a new methodology for generalized linear models (GLM), a broad area that encompasses as special cases such varied topics as analysis of variance and multiple linear, logistic, and Poisson regression models. Should we classify such an article under regression analysis or under Bayesian analysis? In most such cases, we opted for the former since that is where the emphasis usually was. This example illustrates the many decisions we made regarding the overlap of subjects. Such decisions have inevitably affected the relative frequencies of the categories. Table 2 presents the 10 categories we settled on, along with their relative frequencies. They are listed in descending order of frequency, except for the category “other.” Each category includes articles where the main emphasis is on that topic, and several also encompass some subcategories. Several topics were covered in previous editorials in this series. In those cases we simply list the categories below. For categories not covered previously, we present a very brief description.

TABLE 2.

Categories of Statistical Research and Their Frequencies in Reviewed Journals

Category of Statistical Research
Number (%) of Articles
Nonparametric/semiparametric analysis83(14.2%)
Regression analysis81(13.9%)a
High-dimensional data73(12.5%)
Bayesian analysis71(12.2%)
Post hoc analysis58(9.9%)
Study design46(7.9%)
General inference45(7.7%)
Causal inference33(5.7%)
Genetic analysis25(4.3%)
Other68(11.7%)
Total583(100%)
a

Including 29 generalized linear regression models (5.0%) and 52 survival regression analyses (8.9%).

The category with the highest frequency covers nonparametric and semiparametric approaches to inference techniques, GLM, regression models, and variable selection. It also includes functional data analysis, that is, statistical methods that deal with the analysis of samples of curves, surfaces, images, and other functional observations, which are usually represented as a function of time, spatial location, or wavelength, and whose basic unit of analysis is the entire observed function rather than individual numbers. There is a short discussion of the use of nonparametric versus parametric tests in this series.3 The next category is regression analysis, including survival analysis and parametric approaches to GLM. Regression analysis is currently the most commonly used statistical method in practice, and other editorials in this series discuss different type of regression analysis.4, 5 Next is the high-dimensional data category, which includes handling time series data, spatial-temporal data (ie, data dispersed in time or space, or both), data mining and classification models (ie, assigning an observation to 1 of several groups based on multiple measurements, eg, diagnosis).

The next category includes general Bayesian analysis methodology as well as Bayesian approaches to genetics/ecology, stochastic processes, model selection, nonparametric analysis, and experimental design. More details about the Bayesian methods in statistics are available in an editorial in this series.6 Post hoc analysis is a category that encompasses what statisticians often do after performing an initial data analysis. It includes missing data analysis and parametric model and variable selection, as well as multiple comparisons, that is, adjustments necessary when making several inferences simultaneously. The impact of missing data analysis is discussed in another editorial under the Series on Statistics series.7 Study design encompasses experimental design research, design of clinical trials, and survey sampling. The general inference category includes classical statistical inference methods, such as hypothesis testing and confidence intervals, as well as multivariate distributions. Genetic analysis contains statistical methodology and applications to genetic data, such as gene sequence, population genomic data, and gene expression microarry data. Causal inference encompasses methods that aim to uncover whether observed phenomena reflect statistical association or a true causal relationship, such as the propensity score methods discussed in this series.8 Finally, we put in the “other” category all publications that did not fit in any 1 of the above 9 categories. It contains, for example, research on ecology, quality control, meta-analysis, and graphical theory. We also included in it papers on stochastic processes (16 publications). We would have certainly found many more publications on any of these subjects had we expanded our search to journals specializing in those areas.

This quick review of our findings illustrates the tremendous breadth of subjects covered in biostatistical research and displays certain trends that may point the way to future research. We highlight 2 such trends in particular. The first is the interplay between statistical theory and computational methods. The relationship is highly synergistic. Statistics was one of the first areas in which specialized, relatively user-friendly software was developed, leading the way to the revolution in mass-distribution software in many areas. Conversely, gains in computing capability and speed have helped statisticians invent or improve existing theory and methodology. Furthermore, widespread application of new statistical methods frequently must await their incorporation into standard computer software, thus lengthening the lag period of their popularity.

Another area to watch is the place of the Bayesian paradigm in statistics.6 For example, a cross-cutting theme in modern statistical research is to develop methods where intervals with associated probability statements (such as confidence intervals) are well calibrated to their targets (eg, that 95% confidence intervals cover underlying true values 95% of the time in repeated applications). This goal is cited by researchers approaching statistics from diverse philosophical perspectives.9, 10, 11 To achieve this goal in practice, it is crucial for statistical models to accommodate the most prominent sources of variability in data sets, motivating efforts to incorporate additional model structure or to relax modeling assumptions as a way of achieving robustness of statistical inferences. The Bayesian approach presents a very useful way to achieve this goal, its applicability having been greatly helped by recent breakthroughs in statistical computing.

In summary, biostatistics has established itself as one of the pillars on which biomedical research rests. It will certainly be exciting to see where the statistical thinking will lead us next.

 

return to Article Outline

The authors indicate no financial support or financial conflict of interest. Both authors (A.A.A., F.Y.) were involved in design and conduct of study; data management and analysis; data interpretation; and preparation, review, and approval of manuscript. No human subjects were involved in this editorial.

The authors acknowledge Anne L. Coleman, MD, PhD, from the Jules Stein Eye Institute, David Geffen School of Medicine at UCLA, who provided assistance in the formulation of this editorial.

References 

return to Article Outline

1. 1Cox DR. Regression models and life tables (with discussion). J R Stat Soc Series B Stat Methodol. 1972;34:187–220.

2. 2Reuters Thomson. 2008 Journal Citation Reports Science Edition. http://admin-apps.isiknowledge.com/JCR/JCR2009;Accessed: August 18, 2009.

3. 3Kitchen CMR. Nonparametric vs parametric tests of location in biomedical research. Am J Ophthalmol. 2009;147:571–572. Full Text | Full-Text PDF (85 KB) | CrossRef

4. 4Lemeshow S, Hosmer DW. Logistic regression analysis: Applications to ophthalmic research. Am J Ophthalmol. 2009;147:766–767. Full Text | Full-Text PDF (92 KB) | CrossRef

5. 5Hosmer DW, Lemeshow S. Survival analysis: Applications to ophthalmic research. Am J Ophthalmol. 2009;147:957–958. Full Text | Full-Text PDF (131 KB) | CrossRef

6. 6Weiss R. Bayesian methods and data analysis. Am J Ophthalmol. 2010;149:187–188. Full Text | Full-Text PDF (540 KB) | CrossRef

7. 7Belin TR. Missing data: What a little can do, and what researchers can do in response. Am J Ophthalmol. 2009;148:820–822. Full Text | Full-Text PDF (117 KB) | CrossRef

8. 8Rubin DB. Propensity score methods. Am J Ophthalmol. 2010;149:7–9. Full Text | Full-Text PDF (94 KB) | CrossRef

9. 9Efron B. Bayesians, frequentists and scientists. J Am Stat Assoc. 2005;100:1–5.

10. 10Rubin DB. Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann Stat. 1984;12:1151–1172.

11. 11Little R. Calibrated Bayes: a Bayes/frequentist roadmap. Am Stat. 2006;60:213–223.

Department of Biostatistics, UCLA School of Public Health, University of California, Los Angeles, Los Angeles, California

Corresponding Author InformationInquiries to Abdelmonem A. Afifi, PhD, Department of Biostatistics, School of Public Health, UCLA, Los Angeles, CA 90095

PII: S0002-9394(09)00886-1

doi:10.1016/j.ajo.2009.11.023


View previous. 4 of 43 View next.