Skip to main content

Rumi Chunara

Rumi Chunara

Rumi Chunara

Scroll

Associate Professor of Biostatistics

Associate Professor of Computer Science and Engineering, Tandon

Director of Center for Health Data Science

Professional overview

The overarching goal of Dr. Rumi Chunara's research is to develop computational and statistical approaches for acquiring, integrating and using data to improve population-level public health. She focuses on the design and development of data mining and machine learning methods to address challenges related to data and goals of public health, as well as fairness and ethics in the design and use of data and algorithms embedded in social systems.

At NYU, Dr. Chunara also leads the Chunara Lab, which develops computational and statistical methods across data mining, natural language processing, spatio-temporal analyses and machine learning, to study population health. Previously, she was a Postdoctoral Fellow and Instructor at HealthMap and the Children's Hospital Informatics Program at Harvard Medical School. She completed her PhD at the Harvard-MIT Division of Health Sciences and Technology and BSc at Caltech.

Education

BS, Electrical Engineering (Honors), Caltech
MS, Electrical Engineering and Computer Science, MIT
PhD, Medical and Electrical Engineering, MIT (Harvard-MIT Division of Health Sciences and Technology)

Honors and awards

Max Planck Sabbatical Award (2021)
speaker at NSF Computer and Information Science and Engineering Directorate Career Proposal Writing Workshop (2020)
Invited tutorial on Public Health and Machine Learning at ACM Conference on Health, Inference and Learning (2020)
Keynote at Human Computation and Crowdsourcing (2019)
Invited Speaker at Expert Group Meeting at United Nations Population Fund, Advances in Mobile Technologies for Data Collection Panel (2019)
Keynote at ''Mapping the Equity Dimensions of Artificial Intelligence in Public Health'', University of Toronto (2019)
Facebook Research Award (2019)
Gates Foundation Grand Challenges Exploration Award (2019)
NSF CAREER Award (2019)
MIT Technology Review Top 35 Innovators Under 35 (2014)
MIT Presidential Fellow (2004)

Areas of research and study

Health Disparities
Machine learning
Social Computing
Social Determinants of Health

Publications

Publications

Social Determinants in Machine Learning Cardiovascular Disease Prediction Models : A Systematic Review

Zhao, Y., Wood, E. P., Mirin, N., Cook, S. H., Chunara, R., & Chunara, R. (n.d.).

Publication year

2021

Journal title

American journal of preventive medicine

Volume

61

Issue

4

Page(s)

596-605
Abstract
Abstract
Introduction: Cardiovascular disease is the leading cause of death worldwide, and cardiovascular disease burden is increasing in low-resource settings and for lower socioeconomic groups. Machine learning algorithms are being developed rapidly and incorporated into clinical practice for cardiovascular disease prediction and treatment decisions. Significant opportunities for reducing death and disability from cardiovascular disease worldwide lie with accounting for the social determinants of cardiovascular outcomes. This study reviews how social determinants of health are being included in machine learning algorithms to inform best practices for the development of algorithms that account for social determinants. Methods: A systematic review using 5 databases was conducted in 2020. English language articles from any location published from inception to April 10, 2020, which reported on the use of machine learning for cardiovascular disease prediction that incorporated social determinants of health, were included. Results: Most studies that compared machine learning algorithms and regression showed increased performance of machine learning, and most studies that compared performance with or without social determinants of health showed increased performance with them. The most frequently included social determinants of health variables were gender, race/ethnicity, marital status, occupation, and income. Studies were largely from North America, Europe, and China, limiting the diversity of the included populations and variance in social determinants of health. Discussion: Given their flexibility, machine learning approaches may provide an opportunity to incorporate the complex nature of social determinants of health. The limited variety of sources and data in the reviewed studies emphasize that there is an opportunity to include more social determinants of health variables, especially environmental ones, that are known to impact cardiovascular disease risk and that recording such data in electronic databases will enable their use.

Telemedicine and healthcare disparities : a cohort study in a large healthcare system in New York City during COVID-19

Chunara, R., Chunara, R., Zhao, Y., Chen, J., Lawrence, K., Testa, P. A., Nov, O., Nov, O., Nov, O., & Mann, D. M. (n.d.).

Publication year

2021

Journal title

Journal of the American Medical Informatics Association : JAMIA

Volume

28

Issue

1

Page(s)

33-41
Abstract
Abstract
OBJECTIVE: Through the coronavirus disease 2019 (COVID-19) pandemic, telemedicine became a necessary entry point into the process of diagnosis, triage, and treatment. Racial and ethnic disparities in healthcare have been well documented in COVID-19 with respect to risk of infection and in-hospital outcomes once admitted, and here we assess disparities in those who access healthcare via telemedicine for COVID-19. MATERIALS AND METHODS: Electronic health record data of patients at New York University Langone Health between March 19th and April 30, 2020 were used to conduct descriptive and multilevel regression analyses with respect to visit type (telemedicine or in-person), suspected COVID diagnosis, and COVID test results. RESULTS: Controlling for individual and community-level attributes, Black patients had 0.6 times the adjusted odds (95% CI: 0.58-0.63) of accessing care through telemedicine compared to white patients, though they are increasingly accessing telemedicine for urgent care, driven by a younger and female population. COVID diagnoses were significantly more likely for Black versus white telemedicine patients. DISCUSSION: There are disparities for Black patients accessing telemedicine, however increased uptake by young, female Black patients. Mean income and decreased mean household size of a zip code were also significantly related to telemedicine use. CONCLUSION: Telemedicine access disparities reflect those in in-person healthcare access. Roots of disparate use are complex and reflect individual, community, and structural factors, including their intersection-many of which are due to systemic racism. Evidence regarding disparities that manifest through telemedicine can be used to inform tool design and systemic efforts to promote digital health equity.

Uncertainty as a Form of Transparency : Measuring, Communicating, and Using Uncertainty

Bhatt, U., Antorán, J., Zhang, Y., Liao, Q. V., Sattigeri, P., Fogliato, R., Melançon, G., Krishnan, R., Stanley, J., Tickoo, O., Nachman, L., Chunara, R., Chunara, R., Srikumar, M., Weller, A., & Xiang, A. (n.d.).

Publication year

2021

Page(s)

401-413
Abstract
Abstract
Algorithmic transparency entails exposing system properties to various stakeholders for purposes that include understanding, improving, and contesting predictions. Until now, most research into algorithmic transparency has predominantly focused on explainability. Explainability attempts to provide reasons for a machine learning model's behavior to stakeholders. However, understanding a model's specific behavior alone might not be enough for stakeholders to gauge whether the model is wrong or lacks sufficient knowledge to solve the task at hand. In this paper, we argue for considering a complementary form of transparency by estimating and communicating the uncertainty associated with model predictions. First, we discuss methods for assessing uncertainty. Then, we characterize how uncertainty can be used to mitigate model unfairness, augment decision-making, and build trustworthy systems. Finally, we outline methods for displaying uncertainty to stakeholders and recommend how to collect information required for incorporating uncertainty into existing ML pipelines. This work constitutes an interdisciplinary review drawn from literature spanning machine learning, visualization/HCI, design, decision-making, and fairness. We aim to encourage researchers and practitioners to measure, communicate, and use uncertainty as a form of transparency.

Comparison of Social Media, Syndromic Surveillance, and Microbiologic Acute Respiratory Infection Data : Observational Study

Daughton, A. R., Chunara, R., Chunara, R., & Paul, M. J. (n.d.).

Publication year

2020

Journal title

JMIR Public Health and Surveillance

Volume

6

Issue

2
Abstract
Abstract
Background: Internet data can be used to improve infectious disease models. However, the representativeness and individual-level validity of internet-derived measures are largely unexplored as this requires ground truth data for study. Objective: This study sought to identify relationships between Web-based behaviors and/or conversation topics and health status using a ground truth, survey-based dataset. Methods: This study leveraged a unique dataset of self-reported surveys, microbiological laboratory tests, and social media data from the same individuals toward understanding the validity of individual-level constructs pertaining to influenza-like illness in social media data. Logistic regression models were used to identify illness in Twitter posts using user posting behaviors and topic model features extracted from users’ tweets. Results: Of 396 original study participants, only 81 met the inclusion criteria for this study. Of these participants’ tweets, we identified only two instances that were related to health and occurred within 2 weeks (before or after) of a survey indicating symptoms. It was not possible to predict when participants reported symptoms using features derived from topic models (area under the curve [AUC]=0.51; P=.38), though it was possible using behavior features, albeit with a very small effect size (AUC=0.53; P≤.001). Individual symptoms were also generally not predictable either. The study sample and a random sample from Twitter are predictably different on held-out data (AUC=0.67; P≤.001), meaning that the content posted by people who participated in this study was predictably different from that posted by random Twitter users. Individuals in the random sample and the GoViral sample used Twitter with similar frequencies (similar @ mentions, number of tweets, and number of retweets; AUC=0.50; P=.19). Conclusions: To our knowledge, this is the first instance of an attempt to use a ground truth dataset to validate infectious disease observations in social media data. The lack of signal, the lack of predictability among behaviors or topics, and the demonstrated volunteer bias in the study population are important findings for the large and growing body of disease surveillance using internet-sourced data.

COVID-19 transforms health care through telemedicine : Evidence from the field

Mann, D. M., Chen, J., Chunara, R., Chunara, R., Testa, P. A., Nov, O., Nov, O., & Nov, O. (n.d.).

Publication year

2020

Journal title

Journal of the American Medical Informatics Association

Volume

27

Issue

7

Page(s)

1132-1135
Abstract
Abstract
This study provides data on the feasibility and impact of video-enabled telemedicine use among patients and providers and its impact on urgent and nonurgent healthcare delivery from one large health system (NYU Langone Health) at the epicenter of the coronavirus disease 2019 (COVID-19) outbreak in the United States. Between March 2nd and April 14th 2020, telemedicine visits increased from 102.4 daily to 801.6 daily. (683% increase) in urgent care after the system-wide expansion of virtual urgent care staff in response to COVID-19. Of all virtual visits post expansion, 56.2% and 17.6% urgent and nonurgent visits, respectively, were COVID-19-related. Telemedicine usage was highest by patients 20 to 44 years of age, particularly for urgent care. The COVID-19 pandemic has driven rapid expansion of telemedicine use for urgent care and nonurgent care visits beyond baseline periods. This reflects an important change in telemedicine that other institutions facing the COVID-19 pandemic should anticipate.

Population-aware hierarchical Bayesian domain adaptation via multi-component invariant learning

Mhasawade, V., Rehman, N. A., Chunara, R., & Chunara, R. (n.d.).

Publication year

2020

Page(s)

182-192
Abstract
Abstract
While machine learning is rapidly being developed and deployed in health settings such as influenza prediction, there are critical challenges in using data from one environment to predict in another due to variability in features. Even within disease labels there can be differences (e.g. "fever" may mean something different reported in a doctor's office versus in an online app). Moreover, models are often built on passive, observational data which contain different distributions of population subgroups (e.g. men or women). Thus, there are two forms of instability between environments in this observational transport problem. We first harness substantive knowledge from health research to conceptualize the underlying causal structure of this problem in a health outcome prediction task. Based on sources of stability in the model and the task, we posit that we can combine environment and population information in a novel population-aware hierarchical Bayesian domain adaptation framework that harnesses multiple invariant components through population attributes when needed. We study the conditions under which invariant learning fails, leading to reliance on the environment-specific attributes. Experimental results for an influenza prediction task on four datasets gathered from different contexts show the model can improve prediction in the case of largely unlabelled target data from a new environment and different constituent population, by harnessing both environment and population invariant information. This work represents a novel, principled way to address a critical challenge by blending domain (health) knowledge and algorithmic innovation. The proposed approach will have significant impact in many social settings wherein who the data comes from and how it was generated, matters.

Quantifying depression-related language on social media during the COVID-19 pandemic

Davis, B. D., McKnight, D. E., Teodorescu, D., Quan-Haase, A., Chunara, R., Chunara, R., Fyshe, A., & Lizotte, D. J. (n.d.).

Publication year

2020

Journal title

International Journal of Population Data Science

Volume

5

Issue

4
Abstract
Abstract
Introduction The COVID-19 pandemic had clear impacts on mental health. Social media presents an opportunity for assessing mental health at the population level. Objectives 1) Identify and describe language used on social media that is associated with discourse about depression. 2) Describe the associations between identified language and COVID-19 incidence over time across several geographies. Methods We create a word embedding based on the posts in Reddit's/r/Depression and use this word embedding to train representations of active authors. We contrast these authors against a control group and extract keywords that capture differences between the two groups. We filter these keywords for face validity and to match character limits of an information retrieval system, Elasticsearch. We retrieve all geo-tagged posts on Twitter from April 2019 to June 2021 from Seattle, Sydney, Mumbai, and Toronto. The tweets are scored with BM25 using the keywords. We call this score rDD. We compare changes in average score over time with case counts from the pandemic's beginning through June 2021. Results We observe a pattern in rDD across all cities analyzed: There is an increase in rDD near the start of the pandemic which levels off over time. However, in Mumbai we also see an increase aligned with a second wave of cases. Conclusions Our results are concordant with other studies which indicate that the impact of the pandemic on mental health was highest initially and was followed by recovery, largely unchanged by subsequent waves. However, in the Mumbai data we observed a substantial rise in rDD with a large second wave. Our results indicate possible un-captured heterogeneity across geographies, and point to a need for a better understanding of this differential impact on mental health.

Quantifying the localized relationship between vector containment activities and dengue incidence in a real-world setting : A spatial and time series modelling analysis based on geo-located data from Pakistan

Rehman, N. A., Salje, H., Kraemer, M. U., Subramanian, L., Saif, U., Chunara, R., & Chunara, R. (n.d.).

Publication year

2020

Journal title

PLoS neglected tropical diseases

Volume

14

Issue

5

Page(s)

1-22
Abstract
Abstract
Increasing urbanization is having a profound effect on infectious disease risk, posing significant challenges for governments to allocate limited resources for their optimal control at a sub-city scale. With recent advances in data collection practices, empirical evidence about the efficacy of highly localized containment and intervention activities, which can lead to optimal deployment of resources, is possible. However, there are several challenges in analyzing data from such real-world observational settings. Using data on 3.9 million instances of seven dengue vector containment activities collected between 2012 and 2017, here we develop and assess two frameworks for understanding how the generation of new dengue cases changes in space and time with respect to application of different types of containment activities. Accounting for the non-random deployment of each containment activity in relation to dengue cases and other types of containment activities, as well as deployment of activities in different epidemiological contexts, results from both frameworks reinforce existing knowledge about the efficacy of containment activities aimed at the adult phase of the mosquito lifecycle. Results show a 10% (95% CI: 1–19%) and 20% reduction (95% CI: 4–34%) reduction in probability of a case occurring in 50 meters and 30 days of cases which had Indoor Residual Spraying (IRS) and fogging performed in the immediate vicinity, respectively, compared to cases of similar epidemiological context and which had no containment in their vicinity. Simultaneously, limitations due to the real-world nature of activity deployment are used to guide recommendations for future deployment of resources during outbreaks as well as data collection practices. Conclusions from this study will enable more robust and comprehensive analyses of localized containment activities in resource-scarce urban settings and lead to improved allocation of resources of government in an outbreak setting.

Quasi-experimental designs for assessing response on social media to policy changes

Tian, Y., Chunara, R., & Chunara, R. (n.d.).

Publication year

2020

Page(s)

671-682
Abstract
Abstract
Regulation of tobacco products is rapidly evolving. Understanding public sentiment in response to changes is very important as authorities assess how to effectively protect population health. Social media systems are widely recognized to be useful for collecting data about human preferences and perceptions. However, how social media data may be used, in rapid policy change settings, given challenges of narrow time periods and specific locations and non-representative the population using social media is an open question. In this paper we apply quasi-experimental designs, which have been used previously in observational data such as social media, to control for time and location confounders on social media, and then use content analysis of Twitter and Reddit posts to illustrate the content of reactions to tobacco flavor bans and the effect of taxation on e-cigarettes. Conclusions distill the potential role of social media in settings of rapidly changing regulation, in complement to what is learned by traditional denominator-based representative surveys.

Role of the built and online social environments on expression of dining on instagram

Mhasawade, V., Elghafari, A., Duncan, D. T., Chunara, R., & Chunara, R. (n.d.).

Publication year

2020

Journal title

International journal of environmental research and public health

Volume

17

Issue

3
Abstract
Abstract
Online social communities are becoming windows for learning more about the health of populations, through information about our health-related behaviors and outcomes from daily life. At the same time, just as public health data and theory has shown that aspects of the built environment can affect our health-related behaviors and outcomes, it is also possible that online social environments (e.g., posts and other attributes of our online social networks) can also shape facets of our life. Given the important role of the online environment in public health research and implications, factors which contribute to the generation of such data must be well understood. Here we study the role of the built and online social environments in the expression of dining on Instagram in Abu Dhabi; a ubiquitous social media platform, city with a vibrant dining culture, and a topic (food posts) which has been studied in relation to public health outcomes. Our study uses available data on user Instagram profiles and their Instagram networks, as well as the local food environment measured through the dining types (e.g., casual dining restaurants, food court restaurants, lounges etc.) by neighborhood. We find evidence that factors of the online social environment (profiles that post about dining versus profiles that do not post about dining) have different influences on the relationship between a user’s built environment and the social dining expression, with effects also varying by dining types in the environment and time of day. We examine the mechanism of the relationships via moderation and mediation analyses. Overall, this study provides evidence that the interplay of online and built environments depend on attributes of said environments and can also vary by time of day. We discuss implications of this synergy for precisely-targeting public health interventions, as well as on using online data for public health research.

Using Digital Data to Protect and Promote the Most Vulnerable in the Fight Against COVID-19

Chunara, R., Chunara, R., & Cook, S. H. (n.d.).

Publication year

2020

Journal title

Frontiers in Public Health

Volume

8
Abstract
Abstract
~

Race, ethnicity and national origin-based discrimination in social media and hate crimes across 100 U.S. cities

Relia, K., Li, Z., Cook, S. H., Chunara, R., & Chunara, R. (n.d.).

Publication year

2019

Page(s)

417-427
Abstract
Abstract
We study malicious online content via a specific type of hate speech: race, ethnicity and national-origin based discrimination in social media, alongside hate crimes motivated by those characteristics, in 100 cities across the United States. We develop a spatially-diverse training dataset and classification pipeline to delineate targeted and self-narration of discrimination on social media, accounting for language across geographies. Controlling for census parameters, we find that the proportion of discrimination that is targeted is associated with the number of hate crimes. Finally, we explore the linguistic features of discrimination Tweets in relation to hate crimes by city, features used by users who Tweet different amounts of discrimination, and features of discrimination compared to non-discrimination Tweets. Findings from this spatial study can inform future studies of how discrimination in physical and virtual worlds vary by place, or how physical and virtual world discrimination may synergize.

Reports of the workshops held at the 2019 international AAAI conference on web and social media

Alburez-Gutierrez, D., Chandrasekharan, E., Chunara, R., Chunara, R., Gil-Clavel, S., Hannak, A., Interdonato, R., Joseph, K., Kalimeri, K., Kairam, S., Malik, M. M., Mayer, K., Mejova, Y., Paolotti, D., & Zagheni, E. (n.d.).

Publication year

2019

Journal title

AI Magazine

Volume

40

Issue

4

Page(s)

78-82
Abstract
Abstract
~

Using Contextual Information to Improve Blood Glucose Prediction

Akbari, M., Chunara, R., & Chunara, R. (n.d.).

Publication year

2019

Journal title

Proceedings of Machine Learning Research

Volume

106

Page(s)

91-108
Abstract
Abstract
Blood glucose value prediction is an important task in diabetes management. While it is reported that glucose concentration is sensitive to social context such as mood, physical activity, stress, diet, alongside the influence of diabetes pathologies, we need more research on data and methodologies to incorporate and evaluate signals about such temporal context into prediction models. Person-generated data sources, such as actively contributed surveys as well as passively mined data from social media offer opportunity to capture such context, however the self-reported nature and sparsity of such data mean that such data are noisier and less specific than physiological measures such as blood glucose values themselves. Therefore, here we propose a Gaussian Process model to both address these data challenges and combine blood glucose and latent feature representations of contextual data for a novel multi-signal blood glucose prediction task. We find this approach outperforms common methods for multi-variate data, as well as using the blood glucose values in isolation. Given a robust evaluation across two blood glucose datasets with different forms of contextual information, we conclude that multi-signal Gaussian Processes can improve blood glucose prediction by using contextual information and may provide a significant shift in blood glucose prediction research and practice.

Creating full individual-level location timelines from sparse social media data

Rehman, N. A., Relia, K., Chunara, R., & Chunara, R. (n.d.). (L. Xiong, R. Tamassia, K. F. Banaei, R. H. Guting, & E. Hoel, Eds.).

Publication year

2018

Page(s)

379-388
Abstract
Abstract
In many domain applications, a continuous timeline of human locations is critical; for example for understanding possible locations where a disease may spread, or the flow of traffic. While data sources such as GPS trackers or Call Data Records are temporally-rich, they are expensive, often not publicly available or garnered only in select locations, restricting their wide use. Conversely, geo-located social media data are publicly and freely available, but present challenges especially for full timeline inference due to their sparse nature. We propose a stochastic framework, Intermediate Location Computing (ILC) which uses prior knowledge about human mobility patterns to predict every missing location from an individual’s social media timeline. We compare ILC with a state-of-the-art RNN baseline as well as methods that are optimized for next-location prediction only. For three major cities, ILC predicts the top 1 location for all missing locations in a timeline, at 1 and 2-hour resolution, with up to 77.2% accuracy (up to 6% better accuracy than all compared methods). Specifically, ILC also outperforms the RNN in settings of low data; both cases of very small number of users (under 50), as well as settings with more users, but with sparser timelines. In general, the RNN model needs a higher number of users to achieve the same performance as ILC. Overall, this work illustrates the tradeoff between prior knowledge of heuristics and more data, for an important societal problem of filling in entire timelines using freely available, but sparse social media data.

From the user to the medium : Neural profiling across web communities

Akbari, M., Relia, K., Elghafari, A., Chunara, R., & Chunara, R. (n.d.).

Publication year

2018

Page(s)

552-555
Abstract
Abstract
Online communities provide a unique way for individuals to access information from those in similar circumstances, which can be critical for health conditions that require daily and personalized management. As these groups and topics often arise organically, identifying the types of topics discussed is necessary to understand their needs. As well, these communities and people in them can be quite diverse, and existing community detection methods have not been extended towards evaluating these heterogeneities. This has been limited as community detection methodologies have not focused on community detection based on semantic relations between textual features of the user-generated content. Thus here we develop an approach, NeuroCom, that optimally finds dense groups of users as communities in a latent space inferred by neural representation of published contents of users. By embedding of words and messages, we show that NeuroCom demonstrates improved clustering and identifies more nuanced discussion topics in contrast to other common unsupervised learning approaches.

Quantitative methods for measuring neighborhood characteristics in neighborhood health research

Duncan, D. T., Goedel, W. C., Chunara, R., & Chunara, R. (n.d.).

Publication year

2018

Page(s)

57-90
Abstract
Abstract
~

Reports of the workshops held at the 2018 international AAAI conference on web and social media

An, J., Chunara, R., Chunara, R., Crandall, D. J., Frajberg, D., French, M., Jansen, B. J., Kulshrestha, J., Mejova, Y., Romero, D. M., Salminen, J., Sharma, A., Sheth, A., Tan, C., Taylor, S. H., & Wijeratne, S. (n.d.).

Publication year

2018

Journal title

AI Magazine

Volume

39

Issue

4

Page(s)

36-44
Abstract
Abstract
~

Socio-spatial self-organizing maps : Using social media to assess relevant geographies for exposure to social processes

Relia, K., Akbari, M., Duncan, D., Chunara, R., & Chunara, R. (n.d.).

Publication year

2018

Journal title

Proceedings of the ACM on Human-Computer Interaction

Volume

2

Issue

CSCW
Abstract
Abstract
Social media offers a unique window into attitudes like racism and homophobia, exposure to which are important, hard to measure and understudied social determinants of health. However, individual geo-located observations from social media are noisy and geographically inconsistent. Existing areas by which exposures are measured, like Zip codes, average over irrelevant administratively-defined boundaries. Hence, in order to enable studies of online social environmental measures like attitudes on social media and their possible relationship to health outcomes, first there is a need for a method to define the collective, underlying degree of social media attitudes by region. To address this, we create the Socio-spatial-Self organizing map, “SS-SOM” pipeline to best identify regions by their latent social attitude from Twitter posts. SS-SOMs use neural embedding for text-classification, and augment traditional SOMs to generate a controlled number of non-overlapping, topologically-constrained and topically-similar clusters. We find that not only are SS-SOMs robust to missing data, the exposure of a cohort of men who are susceptible to multiple racism and homophobia-linked health outcomes, changes by up to 42% using SS-SOM measures as compared to using Zip code-based measures.

Tracking health seeking behavior during an Ebola outbreak via mobile phones and SMS

Feng, S., Grépin, K. A., Chunara, R., & Chunara, R. (n.d.).

Publication year

2018

Journal title

npj Digital Medicine

Volume

1

Issue

1
Abstract
Abstract
The recent Ebola outbreak in West Africa was an exemplar for the need to rapidly measure population-level health-seeking behaviors, in order to understand healthcare utilization during emergency situations. Taking advantage of the high prevalence of mobile phones, we deployed a national SMS-poll and collected data about individual-level health and health-seeking behavior throughout the outbreak from 6694 individuals from March to June 2015 in Liberia. Using propensity score matching to generate balanced subsamples, we compared outcomes in our survey to those from a recent household survey (the 2013 Liberian Demographic Health Survey). We found that the matched subgroups had similar patterns of delivery location in aggregate, and utilizing data on the date of birth, we were able to show that facility-based deliveries were significantly decreased during, compared to after the outbreak (p < 0.05) consistent with findings from retrospective studies using healthcare-based data. Directly assessing behaviors from individuals via SMS also enabled the measurement of public and private sector facility utilization separately, which has been a challenge in other studies in countries including Liberia which rely mainly on government sources of data. In doing so, our data suggest that public facility-based deliveries returned to baseline values after the outbreak. Thus, we demonstrate that with the appropriate methodological approach to account for different population denominators, data sourced via mobile tools such as SMS polling could serve as an important low-cost complement to existing data collection strategies especially in situations where higher-frequency data than can be feasibly obtained through surveys is useful.

Assessing behavioral stages from social media data

Liu, J., Weitzman, E. R., Chunara, R., & Chunara, R. (n.d.).

Publication year

2017

Page(s)

1320-1333
Abstract
Abstract
Important work rooted in psychological theory posits that health behavior change occurs through a series of discrete stages. Our work builds on the field of social computing by identifying how social media data can be used to resolve behavior stages at high resolution (e.g. hourly/daily) for key population subgroups and times. In essence this approach opens new opportunities to advance psychological theories and better understand how our health is shaped based on the real, dynamic, and rapid actions we make every day. To do so, we bring together domain knowledge and machine learning methods to form a hierarchical classification of Twitter data that resolves different stages of behavior. We identify and examine temporal patterns of the identified stages, with alcohol as a use case (planning or looking to drink, currently drinking, and reflecting on drinking). Known seasonal trends are compared with findings from our methods. We discuss the potential health policy implications of detecting high frequency behavior stages.

Denominator Issues for Personally Generated Data in Population Health Monitoring

Chunara, R., Chunara, R., Wisk, L. E., & Weitzman, E. R. (n.d.).

Publication year

2017

Journal title

American journal of preventive medicine

Volume

52

Issue

4

Page(s)

549-553
Abstract
Abstract
~

Determinants of participants' follow-up and characterization of representativeness in flu near you, a participatory disease surveillance system

Baltrusaitis, K., Santillana, M., Crawley, A. W., Chunara, R., Chunara, R., Smolinski, M., & Brownstein, J. S. (n.d.).

Publication year

2017

Journal title

JMIR Public Health and Surveillance

Volume

3

Issue

2
Abstract
Abstract
Background: Flu Near You (FNY) is an Internet-based participatory surveillance system in the United States and Canada that allows volunteers to report influenza-like symptoms using a brief weekly symptom report. Objective: Our objective was to evaluate the representativeness of the FNY population compared with the general population of the United States, explore the demographic and behavioral characteristics associated with FNY's high-participation users, and summarize results from a user survey of a cohort of FNY participants. Methods: We compared (1) the representativeness of sex and age groups of FNY participants during the 2014-2015 flu season versus the general US population and (2) the distribution of Human Development Index (HDI) scores of FNY participants versus that of the general US population. We analyzed associations between demographic and behavioral factors and the level of participant follow-up (ie, high vs low). Finally, descriptive statistics of responses from FNY's 2015 and 2016 end-of-season user surveys were calculated. Results: During the 2014-2015 influenza season, 47,234 unique participants had at least one FNY symptom report that was either self-reported (users) or submitted on their behalf (household members). The proportion of female FNY participants was significantly higher than that of the general US population (n=28,906, 61.2% vs 51.1%, P

Etiology of respiratory tract infections in the community and clinic in Ilorin, Nigeria

Kolawole, O., Oguntoye, M., Dam, T., Chunara, R., & Chunara, R. (n.d.).

Publication year

2017

Journal title

BMC research notes

Volume

10

Issue

1

Page(s)

712
Abstract
Abstract
OBJECTIVE: Recognizing increasing interest in community disease surveillance globally, the goal of this study was to investigate whether respiratory viruses circulating in the community may be represented through clinical (hospital) surveillance in Nigeria.RESULTS: Children were selected via convenience sampling from communities and a tertiary care center (n = 91) during spring 2017 in Ilorin, Nigeria. Nasal swabs were collected and tested using polymerase chain reaction. The majority (79.1%) of subjects were under 6 years old, of whom 46 were infected (63.9%). A total of 33 of the 91 subjects had one or more respiratory tract virus; there were 10 cases of triple infection and 5 of quadruple. Parainfluenza virus 4, respiratory syncytial virus B and enterovirus were the most common viruses in the clinical sample; present in 93.8% (15/16) of clinical subjects, and 6.7% (5/75) of community subjects (significant difference, p < 0.001). Coronavirus OC43 was the most common virus detected in community members (13.3%, 10/75). A different strain, Coronavirus OC 229 E/NL63 was detected among subjects from the clinic (2/16) and not detected in the community. This pilot study provides evidence that data from the community can potentially represent different information than that sourced clinically, suggesting the need for community surveillance to enhance public health efforts and scientific understanding of respiratory infections.

High-resolution temporal representations of alcohol and tobacco behaviors from social media data

Huang, T., Elghafari, A., Relia, K., Chunara, R., & Chunara, R. (n.d.).

Publication year

2017

Journal title

Proceedings of the ACM on Human-Computer Interaction

Volume

1

Issue

CSCW
Abstract
Abstract
Understanding tobacco- and alcohol-related behavioral patterns is critical for uncovering risk factors and potentially designing targeted social computing intervention systems. Given that we make choices multiple times per day, hourly and daily patterns are critical for better understanding behaviors. Here, we combine natural language processing, machine learning and time series analyses to assess Twitter activity specifically related to alcohol and tobacco consumption and their sub-daily, daily and weekly cycles. Twitter self-reports of alcohol and tobacco use are compared to other data streams available at similar temporal resolution. We assess if discussion of drinking by inferred underage versus legal age people or discussion of use of different types of tobacco products can be differentiated using these temporal patterns. We find that time and frequency domain representations of behaviors on social media can provide meaningful and unique insights, and we discuss the types of behaviors for which the approach may be most useful.

Contact

rumi.chunara@nyu.edu 708 Broadway New York, NY, 10003