Rumi Chunara

Rumi Chunara

Rumi Chunara

Scroll

Associate Professor of Biostatistics

Associate Professor of Computer Science and Engineering, Tandon

Director of Center for Health Data Science

Professional overview

The overarching goal of Dr. Rumi Chunara's research is to develop computational and statistical approaches for acquiring, integrating and using data to improve population-level public health. She focuses on the design and development of data mining and machine learning methods to address challenges related to data and goals of public health, as well as fairness and ethics in the design and use of data and algorithms embedded in social systems.

At NYU, Dr. Chunara also leads the Chunara Lab, which develops computational and statistical methods across data mining, natural language processing, spatio-temporal analyses and machine learning, to study population health. Previously, she was a Postdoctoral Fellow and Instructor at HealthMap and the Children's Hospital Informatics Program at Harvard Medical School. She completed her PhD at the Harvard-MIT Division of Health Sciences and Technology and BSc at Caltech.

BS, Electrical Engineering (Honors), Caltech
MS, Electrical Engineering and Computer Science, MIT
PhD, Medical and Electrical Engineering, MIT (Harvard-MIT Division of Health Sciences and Technology)

Max Planck Sabbatical Award (2021)
speaker at NSF Computer and Information Science and Engineering Directorate Career Proposal Writing Workshop (2020)
Invited tutorial on Public Health and Machine Learning at ACM Conference on Health, Inference and Learning (2020)
Keynote at Human Computation and Crowdsourcing (2019)
Invited Speaker at Expert Group Meeting at United Nations Population Fund, Advances in Mobile Technologies for Data Collection Panel (2019)
Keynote at ''Mapping the Equity Dimensions of Artificial Intelligence in Public Health'', University of Toronto (2019)
Facebook Research Award (2019)
Gates Foundation Grand Challenges Exploration Award (2019)
NSF CAREER Award (2019)
MIT Technology Review Top 35 Innovators Under 35 (2014)
MIT Presidential Fellow (2004)

Health Disparities
Machine learning
Social Computing
Social Determinants of Health

Publications

Comparison of Social Media, Syndromic Surveillance, and Microbiologic Acute Respiratory Infection Data: Observational Study

Daughton, A. R., Chunara, R., & Paul, M. J. (n.d.).

Publication year

2020

Journal title

JMIR Public Health and Surveillance

Volume

6

Issue

2
Abstract
Abstract
Background: Internet data can be used to improve infectious disease models. However, the representativeness and individual-level validity of internet-derived measures are largely unexplored as this requires ground truth data for study. Objective: This study sought to identify relationships between Web-based behaviors and/or conversation topics and health status using a ground truth, survey-based dataset. Methods: This study leveraged a unique dataset of self-reported surveys, microbiological laboratory tests, and social media data from the same individuals toward understanding the validity of individual-level constructs pertaining to influenza-like illness in social media data. Logistic regression models were used to identify illness in Twitter posts using user posting behaviors and topic model features extracted from users’ tweets. Results: Of 396 original study participants, only 81 met the inclusion criteria for this study. Of these participants’ tweets, we identified only two instances that were related to health and occurred within 2 weeks (before or after) of a survey indicating symptoms. It was not possible to predict when participants reported symptoms using features derived from topic models (area under the curve [AUC]=0.51; P=.38), though it was possible using behavior features, albeit with a very small effect size (AUC=0.53; P≤.001). Individual symptoms were also generally not predictable either. The study sample and a random sample from Twitter are predictably different on held-out data (AUC=0.67; P≤.001), meaning that the content posted by people who participated in this study was predictably different from that posted by random Twitter users. Individuals in the random sample and the GoViral sample used Twitter with similar frequencies (similar @ mentions, number of tweets, and number of retweets; AUC=0.50; P=.19). Conclusions: To our knowledge, this is the first instance of an attempt to use a ground truth dataset to validate infectious disease observations in social media data. The lack of signal, the lack of predictability among behaviors or topics, and the demonstrated volunteer bias in the study population are important findings for the large and growing body of disease surveillance using internet-sourced data.

COVID-19 transforms health care through telemedicine: Evidence from the field

Quantifying depression-related language on social media during the COVID-19 pandemic

Quantifying the localized relationship between vector containment activities and dengue incidence in a real-world setting: A spatial and time series modelling analysis based on geo-located data from Pakistan

Rehman, N. A., Salje, H., Kraemer, M. U., Subramanian, L., Saif, U., & Chunara, R. (n.d.).

Publication year

2020

Journal title

PLoS neglected tropical diseases

Volume

14

Issue

5

Page(s)

1-22
Abstract
Abstract
Increasing urbanization is having a profound effect on infectious disease risk, posing significant challenges for governments to allocate limited resources for their optimal control at a sub-city scale. With recent advances in data collection practices, empirical evidence about the efficacy of highly localized containment and intervention activities, which can lead to optimal deployment of resources, is possible. However, there are several challenges in analyzing data from such real-world observational settings. Using data on 3.9 million instances of seven dengue vector containment activities collected between 2012 and 2017, here we develop and assess two frameworks for understanding how the generation of new dengue cases changes in space and time with respect to application of different types of containment activities. Accounting for the non-random deployment of each containment activity in relation to dengue cases and other types of containment activities, as well as deployment of activities in different epidemiological contexts, results from both frameworks reinforce existing knowledge about the efficacy of containment activities aimed at the adult phase of the mosquito lifecycle. Results show a 10% (95% CI: 1–19%) and 20% reduction (95% CI: 4–34%) reduction in probability of a case occurring in 50 meters and 30 days of cases which had Indoor Residual Spraying (IRS) and fogging performed in the immediate vicinity, respectively, compared to cases of similar epidemiological context and which had no containment in their vicinity. Simultaneously, limitations due to the real-world nature of activity deployment are used to guide recommendations for future deployment of resources during outbreaks as well as data collection practices. Conclusions from this study will enable more robust and comprehensive analyses of localized containment activities in resource-scarce urban settings and lead to improved allocation of resources of government in an outbreak setting.

Role of the built and online social environments on expression of dining on instagram

Using Digital Data to Protect and Promote the Most Vulnerable in the Fight Against COVID-19

Chunara, R., & Cook, S. H. (n.d.).

Publication year

2020

Journal title

Frontiers in Public Health

Volume

8

Reports of the workshops held at the 2019 international AAAI conference on web and social media

Alburez-Gutierrez, D., Chandrasekharan, E., Chunara, R., Gil-Clavel, S., Hannak, A., Interdonato, R., Joseph, K., Kalimeri, K., Kairam, S., Malik, M. M., Mayer, K., Mejova, Y., Paolotti, D., & Zagheni, E. (n.d.).

Publication year

2019

Journal title

AI Magazine

Volume

40

Issue

4

Page(s)

78-82

Quantitative methods for measuring neighborhood characteristics in neighborhood health research

Duncan, D. T., Goedel, W. C., & Chunara, R. (n.d.). In Neighborhoods and Health (1–).

Publication year

2018

Page(s)

57-90

Reports of the workshops held at the 2018 international AAAI conference on web and social media

Socio-spatial self-organizing maps: Using social media to assess relevant geographies for exposure to social processes

Tracking health seeking behavior during an Ebola outbreak via mobile phones and SMS

Feng, S., Grépin, K. A., & Chunara, R. (n.d.).

Publication year

2018

Journal title

npj Digital Medicine

Volume

1

Issue

1
Abstract
Abstract
The recent Ebola outbreak in West Africa was an exemplar for the need to rapidly measure population-level health-seeking behaviors, in order to understand healthcare utilization during emergency situations. Taking advantage of the high prevalence of mobile phones, we deployed a national SMS-poll and collected data about individual-level health and health-seeking behavior throughout the outbreak from 6694 individuals from March to June 2015 in Liberia. Using propensity score matching to generate balanced subsamples, we compared outcomes in our survey to those from a recent household survey (the 2013 Liberian Demographic Health Survey). We found that the matched subgroups had similar patterns of delivery location in aggregate, and utilizing data on the date of birth, we were able to show that facility-based deliveries were significantly decreased during, compared to after the outbreak (p < 0.05) consistent with findings from retrospective studies using healthcare-based data. Directly assessing behaviors from individuals via SMS also enabled the measurement of public and private sector facility utilization separately, which has been a challenge in other studies in countries including Liberia which rely mainly on government sources of data. In doing so, our data suggest that public facility-based deliveries returned to baseline values after the outbreak. Thus, we demonstrate that with the appropriate methodological approach to account for different population denominators, data sourced via mobile tools such as SMS polling could serve as an important low-cost complement to existing data collection strategies especially in situations where higher-frequency data than can be feasibly obtained through surveys is useful.

Denominator Issues for Personally Generated Data in Population Health Monitoring

Determinants of participants' follow-up and characterization of representativeness in flu near you, a participatory disease surveillance system

Etiology of respiratory tract infections in the community and clinic in Ilorin, Nigeria

High-resolution temporal representations of alcohol and tobacco behaviors from social media data

Network inference from multimodal data: A review of approaches from infectious disease transmission

Characterizing sleep issues using Twitter

Estimating influenza attack rates in the United States using a participatory cohort

Flu near you: Crowdsourced symptom reporting spanning 2 influenza seasons

Surveillance of acute respiratory infections using community-submitted symptoms and specimens for molecular diagnostic testing

A case study of the New York City 2012-2013 influenza season with daily geocoded Twitter data from temporal and spatiotemporal perspectives

Public health for the people: Participatory infectious disease surveillance in the digital age

Assessing the Online Social Environment for Surveillance of Obesity Prevalence

Monitoring Influenza Epidemics in China with Search Query from Baidu

Twitter as a Sentinel in Emergency Situations: Lessons from the Boston Marathon Explosions

Contact

rumi.chunara@nyu.edu 708 Broadway New York, NY, 10003