Data Analysis Software for Complex Sample Designs
Running head: Data Analysis Software for Complex Sample Designs
Data Analysis Software for Complex Sample Designs
Mark Tabladillo, Ph.D.
Contractor, Centers for Disease Control and Prevention, Office on Smoking and Health
(Technology Scientist, MarkTab Consulting, and Associate Faculty, University of Phoenix)
Curtis Blanton, M.A.
Statistician, Division of Emergency and Environmental Health Sciences, Centers for Disease
Control
Presented at
National Conference on Tobacco or Health
Minneapolis, MN
Data Analysis Software for Complex Sample Designs
Abstract
Survey analysts are challenged by how to choose among commonly available software packages
when analyzing survey data based on a complex sample design. This paper compares and
contrasts features of five complex sample analysis software applications: 1) Epi Info, 2) Stata, 3)
SPSS, 4) SUDAAN, and 5) SAS. Important factors include cost and licensing, data
manipulation and analysis, comprehensive documentation, technical support available, and
ongoing product development. Published research may provide peer-reviewed insight into one
or more of these factors. Sample code includes a free, publicly-available data set from the
Global Youth Tobacco Survey (GYTS). Using the sample code and free data, anyone could try
out and evaluate these software choices.
Data Analysis Software for Complex Sample Designs
Data Analysis Software for Complex Sample Designs
Complex survey data designs provide a way to obtain population estimates based on
predefined strata. See Cochran (1977), Lohr (1999), Kalton (1983), and Kish (1965) for a
comprehensive statistical description and defense of complex survey data analysis. The
United Nations published a helpful guide online for general survey analysis (Chromy &
Abeyasekera, 2005; Nathan, 2005) and issues in statistical software variance estimation
(Brogan, 2005). In practice, it may be challenging to choose among commonly available
software packages when analyzing survey data based on a complex sample design. How can a
data analyst assess software suitability beyond just the technical commands?
Various software packages can be used to analyze complex sample designs. Wang
(2001) describes using three statistical packages (SPSS, SAS, and STATA) for computing this
variances. Similar to Wang’s (2001) article, this paper compares and contrasts features of five
complex sample analysis software applications: 1) Epi Info, 2) Stata, 3) SPSS, 4) SUDAAN, and
5) SAS. Important factors include cost and licensing, data manipulation and analysis,
comprehensive documentation, technical support available, and ongoing product development.
Published research may provide peer-reviewed insight into one or more of these factors.
Examples for this presentation include prevalence data from the Youth Tobacco Survey (YTS)
and the Global Youth Tobacco Survey (GYTS), the latter of which has publicly available
datasets. Complex sample analysis examples are in the appendix.
Epi Info
Epi Info (Centers for Disease Control, 2007a) is free epidemiologic and statistical
software developed by the Centers for Disease Control, an agency of the United States
Federal Government. The Epi Info website (Centers for Disease Control, 2007a) provides free