Advanced Certificate in Public Health Data Science

Public Health Data Science draws upon methods from statistics, epidemiology and computer science. The advanced certificate in Public Health Data Science will provide students and practitioners with training in biostatistics, epidemiology, regression and data science, as applied to public health research and practice.  This will prepare them to work at the intersection of these fields to advance public health research and practice. 

*The advanced certificate in PHDS is currently unavailable for NYU/GPH students.


The advanced certificate is currently open to non-student professionals.

Students will work in many possible settings.  The certificate training provides a very broad base of knowledge that will gain them entry into several types of positions:

  • Academic Medical Center, e.g., Biostatistics Departments, Predictive Analytics Cores
  • Pharmaceutical Company
  • Health Insurance Company
  • Financial Consulting
  • UNICEF, WHO, etc



It is absolutely necessary for students to have strong competencies in the analytical tools of both public health and modern data science in order to be competitive for several types of jobs in public health and in other industries that require modern data analysis and manipulation. The certificate program provides an organized framework for students to obtain the skillset needed to perform well in these areas. Each course in this certificate teaches students both technical fundamentals and tools, and highlights their ties to public health data sets and research questions. This is done through teaching examples and analyses of real public health data in homework and projects. 

Upon graduating from the Advanced Certificate, you will have acquired the following skills: 

  1. Apply descriptive and inferential methodologies according to the type of study design for answering a particular research question. [2995, 2106]
  2. Harness basic concepts of probability, random variation and commonly used statistical probability distributions. [2995,2353,2106, 2338]
  3. Distinguish among the different measurement scales and the implications for selection of statistical methods to be used based on these distinctions. [2995,2353,2106, 2338]
  4. Implement the appropriate analytic methods for calculating key measures of association. [2995,2353,2106, 2338]
  5. Understand and apply ethical principles to data acquisition, management, storage, sharing, and analysis [2183,2184, 2338]
  6. Interpret results of statistical analyses found in public health research studies. [2995,2353, 2106, 2338]
  7. Utilize relevant statistical software for data analysis. [2183,2184,2353,2338]



The advanced certificate may be taken as a hybrid of online and classroom-based courses. The courses focus on methods for study design and analysis and on statistical computing and data science tools. There is flexibility in course selection to accommodate the background of the student. Listed below are the required six courses that provide the training for the Public Health Data Science advance certificate. 

GPH-GU 2995 Biostatistics for Public Health (3 credits)1

This course covers basic probability, descriptive and inferential statistics, and the role of biostatistics in the practice of public health. Specific attention will be given to common probability distributions in public health and medicine, t-tests, Analysis of Variance, multiple linear and logistic regression, categorical data analysis, and survival analysis. Statistical topics are presented conceptually with little derivation, and applications are demonstrated using common statistical software, Stata.

GPH-GU 2183 Introduction to Statistical Programming in R (2 credits)

R is one of the most popular programming languages in statistics and data science. This course will introduce various R programming topics, including data visualization, exploration, and transformation, via illustrations with public health datasets. Students will learn how to program in R effectively and efficiently for data analysis with popular R packages including dplyr, tibble, readr, and ggplot2. By the end of the course, students will be able to write R codes from scratch for data visualization, exploratory analysis, transformation, and import & export. This course does not require prior experience in programming or statistics and serves as a foundation for other courses in data science. Students are recommended to take the follow-up course: Intermediate Statistical programming in R.

GPH-GU 2184 Intermediate Statistical Programming in R (2 credits)

R is one of the most popular languages in data science. This course is the follow-up of GPH-GU 2183 Introduction to Statistical programming in R, and covers intermediate R programming topics, including data wrangling, R Markdown, and simple statistical simulations. The course will focus on public health datasets as illustrations to best meet the practical needs of CGPH students but is also open to those of other backgrounds. By the end of the course, students will be able to comfortably program in R for effective data preprocessing, analysis and presentation. In addition, students will be able to write statistical reports with reproducible codes using R Markdown. This course serves as a good preparation for courses in statistics and machine learning.

GPH-GU 2353 Regression I: Linear Regression and Modeling (3 credits)2

Regression models are one of the most important statistical techniques used in public health. This course focuses on data analysis that use linear regression models for continuous outcomes. The first part of this course introduces simple and multiple linear regressions, principles of ordinary least square regression models, model assumptions, and inferences about model parameters. The second part of the course focus on important practical matters, such as prediction, variable selection, moderated effects, and mediation. These two parts together provide the foundations for more advanced statistics modeling. Examples are drawn from broad areas of public health research. All the analyses will be taught and performed using Stata statistical software.

GPH-GU 2106 Epidemiology (3 credits)3 

Epidemiology is the study of the distribution and determinants of health and disease in different human populations and the application of methods to improve disease outcomes. As such, epidemiology is the basic science of public health. This course is designed to introduce students in all fields of public health to the background , basic principles and methods of public health epidemiology. Topics covered in this course Include: basic principles of epidemiology; measures of disease frequency; epidemiologic study designs: experimental and observational; bias; confounding; outbreak investigations; screening; causality; and ethical issues in epidemiologic research. In addition, students will develop skills to read, interpret and evaluate health Information from published epidemiologic studies. 

GPH-GU 2338 Machine Learning in Public Health (3 credits)

This course provides students with a strong foundation in machine learning relevant to public health and biomedical applications. Topics include the data generating process, model selection and evaluation, generalized linear models, common supervised and unsupervised machine learning algorithms such as support vector machines, decision trees, random forests, neural networks, and k-means, and ethics and communication. Students will learn methods for optimal and proper implementation of machine learning, such as assessment of assumptions about the data generating process, feature generation, treatment of missing data, and reduction of bias. Students will gain familiarity with the potential power of machine learning in public health, as well as its particular challenges inherent to public health applications.

Note: Students who have taken the equivalent of any of these courses prior to their enrollment at GPH will substitute advanced courses on the same topics:

1 substitute GPH-GU 3225 Statistical Inference

2 substitute GPH-GU 2354 Regression II: Categorical Data Analysis

3 substitute GPH-GU 2450 Intermediate Epidemiology, GPH-GU 2930 Epidemiological Methods and Design, APSTA-GE 2012 Causal Inference.

How To Apply? 


Applicants must have already obtained an undergraduate degree. They should have work experience in data science OR they should be currently enrolled in or have completed a Masters of Public Health, a Masters of Science in Biostatistics, a Masters of Science in Epidemiology, a PhD in Public Health, a Masters of Public Policy, Masters of Public Administration, Law or Medical, Dental or Nursing degree program. Other graduate degree programs will be considered on a case-by-case basis. They should be able to articulate a clear interest in and understanding of Public Health Data Science.

  • Advanced Certificate in Public Health Data Science

    • Applicants must submit applications online through SOPHAS Express, the common application for schools and programs of public health. In order to be eligible for the certificate, you must hold the following:

      • Bachelor's degree or US equivalent from an accredited institution

      • Minimum 2.75 cumulative undergraduate GPA

    • To apply, you must submit your application as well as the following materials:

      • Scanned copies of transcripts for all post-secondary education completed, regardless of whether a degree was awarded

      • Resume or CV

      • Personal statement of no longer than 1200 words expressing a rationale for pursuing the certificate

      • 1 letter of support from either a professional or academic reference

Financial Aid

You may be eligible for federal financial aid and/or private educational loans to pursue the certificate program. Learn more about your options for aid from GPH’s Office of Financial Aid.

For More Information

For additional information please email 

Online Learning at GPH

The School of Global Public Health is dedicated to providing a connected, professional, and scholastic environment for our online courses.