Secondary data analysis is the use of data that was collected by someone other than the researcher using it. The researcher poses questions that can be addressed through the analysis of a data set that they were not involved in collecting or creating. The data was not collected to answer the researcher’s specific research questions and was instead collected for another purpose. The same data set can therefore be a primary data set to one researcher and a secondary data set to another researcher.
Secondary data is often confused with primary data. Primary data is data that was collected by the researcher, or team of researchers, for the specific purpose of their research questions or interests. Here, a researcher or research team develops a research project, collects data designed to address specific questions, and performs their own analyses of the data they collected.
Frequently Used Data Sources
Humboldt State University’s Social Science Data Research Guide
The National Longitudinal Study of Adolescent to Adult Health (Add Health) is a longitudinal study of a nationally representative sample of adolescents in grades 7-12 in the United States during the 1994-95 school year. The Add Health cohort has been followed into young adulthood with four in-home interviews, the most recent in 2008, when the sample was aged 24-32. Add Health combines longitudinal survey data on respondents’ social, economic, psychological and physical well-being with contextual data on the family, neighborhood, community, school, friendships, peer groups, and romantic relationships, providing unique opportunities to study how social environments and behaviors in adolescence are linked to health and achievement outcomes in young adulthood (Add Health 2014).
*If you have any questions about Add Health, or are interested in doing a project with Add Health data, please contact Dr. Meredith Williams, Humboldt State University.
The United States Bureau of Labor Statistics is a branch of the United States Department of Labor and is the government agency that is responsible for collecting data about employment, unemployment, pay and benefits, consumer spending, work productivity, workplace injuries, employment projections, international labor comparisons, and the National Longitudinal Survey of Youth. Data can be accessed online in a variety of formats (United States Bureau of Labor Statistics 2014).
The United States Census Bureau is the government agency that is responsible for the United States Census and serves as a leading source of data about America’s people and economy. It also gathers other national and economic data, many of which is available online. The U.S. Census Bureau website includes data from the Economic Census, the American Community Survey, the 1990 Census, the 2000 Census, and current population estimates. Also available are interactive internet tools that include mapping tools and data at the national, state, county, and city level.
The National Center for Health Statistics (NCHS) is a part of the Centers For Disease Control And Prevention (CDC) and is responsible for collecting data from birth and death records, medical records, interview surveys, and though direct physical exams and laboratory testing in order to provide important surveillance information that helps identify and address critical health problems in the United States (National Center for Health Statistics 2014).
The National Survey of Families and Households (NSFH) was designed to provide a broad range of information on family life to serve as a resource for research across disciplinary perspectives. A considerable amount of life-history information was collected, including: the respondent’s family living arrangements in childhood, departures and returns to the parental home, and histories of marriage, cohabitation, education, fertility, and employment (National Survey of Families and Households 2014).
SPSS is a widely used program for statistical analysis in social science. It is also used by market researchers, health researchers, survey companies, government, education researchers, marketing organizations, data miners, and others. The current versions (2014) are officially named IBM SPSS Statistics. Companion products in the same family are used for survey authoring and deployment, data mining, text analytics, and collaboration and deployment (SPSS 2014) .
Stata is a powerful statistical software package with smart data-management facilities, a wide array of up-to-date statistical techniques, and an excellent system for producing publication-quality graphs (Princeton 2014) .Stata is a general-purpose statistical software package created in 1985 by StataCorp. Most of its users work in research, especially in the fields of sociology, polictical science and other social science related fields.
How to Learn Secondary Data Analysis
- Click here for a presentation on Secondary Data Analysis.
- ICPSR Summer Program in Quantitative Methods of Social Research
Associations and Journals
- Sanders, David. 1981. “Teaching Social Science Data Analysis: A Political Scientist’s View.” Sociology 15(4):578–85.
Hammersley, Martyn. 2010. “Can We Re-Use Qualitative Data Via Secondary Analysis? Notes on Some Terminological and Substantive Issues.” Sociological Research Online 15(1).
- Cronk, Christine E. and Paul D. Sarvela. 1997. “Alcohol, Tobacco, and Other Drug Use among Rural/small Town and Urban Youth: A Secondary Analysis of the Monitoring the Future Data Set.” American Journal of Public Health 87(5):760–64.
Clark, Rich and Marc Maynard. 1998. “Research Methodology: Using Online Technology for Secondary Analysis of Survey Research Data — ‘Act Globally, Think Locally.’” Social Science Computer Review 16(1):58–71.
John MacInnes – Data Documentation in Secondary Data Analysis
Data Never Sleeps: How Much Data is Generated Every Minute? (Team Domo 2014)
Written by: Janae Teal
Last updated: 18 November 2014