Rumi Chunara

Rumi Chunara

Rumi Chunara

Scroll

Associate Professor of Biostatistics

Associate Professor of Computer Science and Engineering, Tandon

Director of Center for Health Data Science

Professional overview

The overarching goal of Dr. Rumi Chunara's research is to develop computational and statistical approaches for acquiring, integrating and using data to improve population-level public health. She focuses on the design and development of data mining and machine learning methods to address challenges related to data and goals of public health, as well as fairness and ethics in the design and use of data and algorithms embedded in social systems.

At NYU, Dr. Chunara also leads the Chunara Lab, which develops computational and statistical methods across data mining, natural language processing, spatio-temporal analyses and machine learning, to study population health. Previously, she was a Postdoctoral Fellow and Instructor at HealthMap and the Children's Hospital Informatics Program at Harvard Medical School. She completed her PhD at the Harvard-MIT Division of Health Sciences and Technology and BSc at Caltech.

Education

BS, Electrical Engineering (Honors), Caltech
MS, Electrical Engineering and Computer Science, MIT
PhD, Medical and Electrical Engineering, MIT (Harvard-MIT Division of Health Sciences and Technology)

Honors and awards

Max Planck Sabbatical Award (2021)
speaker at NSF Computer and Information Science and Engineering Directorate Career Proposal Writing Workshop (2020)
Invited tutorial on Public Health and Machine Learning at ACM Conference on Health, Inference and Learning (2020)
Keynote at Human Computation and Crowdsourcing (2019)
Invited Speaker at Expert Group Meeting at United Nations Population Fund, Advances in Mobile Technologies for Data Collection Panel (2019)
Keynote at ''Mapping the Equity Dimensions of Artificial Intelligence in Public Health'', University of Toronto (2019)
Facebook Research Award (2019)
Gates Foundation Grand Challenges Exploration Award (2019)
NSF CAREER Award (2019)
MIT Technology Review Top 35 Innovators Under 35 (2014)
MIT Presidential Fellow (2004)

Areas of research and study

Health Disparities
Machine learning
Social Computing
Social Determinants of Health

Publications

Publications

A Brief Tutorial on Sample Size Calculations for Fairness Audits

Chunara, R., Chunara, R., Singh, H., Xia, F., Kim, M.-O., Pirracchio, R., Chunara, R., & Feng, J. (n.d.).

Publication year

2023
Abstract
Abstract
~

A case study of the New York City 2012-2013 influenza season with daily geocoded Twitter data from temporal and spatiotemporal perspectives

Nagar, R., Yuan, Q., Freifeld, C. C., Santillana, M., Nojima, A., Chunara, R., Chunara, R., & Brownstein, J. S. (n.d.).

Publication year

2014

Journal title

Journal of medical Internet research

Volume

16

Issue

10

Page(s)

e236
Abstract
Abstract
Background: Twitter has shown some usefulness in predicting influenza cases on a weekly basis in multiple countries and on different geographic scales. Recently, Broniatowski and colleagues suggested Twitter's relevance at the city-level for New York City. Here, we look to dive deeper into the case of New York City by analyzing daily Twitter data from temporal and spatiotemporal perspectives. Also, through manual coding of all tweets, we look to gain qualitative insights that can help direct future automated searches. Objective: The intent of the study was first to validate the temporal predictive strength of daily Twitter data for influenza-like illness emergency department (ILI-ED) visits during the New York City 2012-2013 influenza season against other available and established datasets (Google search query, or GSQ), and second, to examine the spatial distribution and the spread of geocoded tweets as proxies for potential cases. Methods: From the Twitter Streaming API, 2972 tweets were collected in the New York City region matching the keywords "flu", "influenza", "gripe", and "high fever". The tweets were categorized according to the scheme developed by Lamb et al. A new fourth category was added as an evaluator guess for the probability of the subject(s) being sick to account for strength of confidence in the validity of the statement. Temporal correlations were made for tweets against daily ILI-ED visits and daily GSQ volume. The best models were used for linear regression for forecasting ILI visits. A weighted, retrospective Poisson model with SaTScan software (n=1484), and vector map were used for spatiotemporal analysis. Results: Infection-related tweets (R=.763) correlated better than GSQ time series (R=.683) for the same keywords and had a lower mean average percent error (8.4 vs 11.8) for ILI-ED visit prediction in January, the most volatile month of flu. SaTScan identified primary outbreak cluster of high-probability infection tweets with a 2.74 relative risk ratio compared to medium-probability infection tweets at P=.001 in Northern Brooklyn, in a radius that includes Barclay's Center and the Atlantic Avenue Terminal. Conclusions: While others have looked at weekly regional tweets, this study is the first to stress test Twitter for daily city-level data for New York City. Extraction of personal testimonies of infection-related tweets suggests Twitter's strength both qualitatively and quantitatively for ILI-ED prediction compared to alternative daily datasets mixed with awareness-based data such as GSQ. Additionally, granular Twitter data provide important spatiotemporal insights. A tweet vector-map may be useful for visualization of city-level spread when local gold standard data are otherwise unavailable.

Active Linear Regression in the Online Setting via LeverageScore Sampling

Singh, H., Musco, C., Chunara, R., & Chunara, R. (n.d.).

Publication year

2022
Abstract
Abstract
~

Area-based determinants of outreach vaccination for reaching vulnerable populations : A cross-sectional study in Pakistan

Chen, X., Porter, A., Rehman, N. A., Morris, S. K., Saif, U., Chunara, R., & Chunara, R. (n.d.).

Publication year

2023

Journal title

PLOS global public health

Volume

3

Issue

9 September
Abstract
Abstract
The objective of this study is to gain a comparative understanding of spatial determinants for outreach and clinic vaccination, which is critical for operationalizing efforts and breaking down structural biases; particularly relevant in countries where resources are low, and sub-region variance is high. Leveraging a massive effort to digitize public system reporting by Lady and Community Health Workers (CHWs) with geo-located data on over 4 million public-sector vaccinations from September 2017 through 2019, understanding health service operations in relation to vulnerable spatial determinants were made feasible. Location and type of vaccinations (clinic or outreach) were compared to regional spatial attributes where they were performed. Important spatial attributes were assessed using three modeling approaches (ridge regression, gradient boosting, and a generalized additive model). Consistent predictors for outreach, clinic, and proportion of third dose pentavalent vaccinations by region were identified. Of all Penta-3 vaccination records, 86.3% were performed by outreach efforts. At the tehsil level (fourth-order administrative unit), controlling for child population, population density, proportion of population in urban areas, distance to cities, average maternal education, and other relevant factors, increased poverty was significantly associated with more in-clinic vaccinations (β = 0.077), and lower proportion of outreach vaccinations by region (β = -0.083). Analyses at the union council level (fifth-administrative unit) showed consistent results for the differential importance of poverty for outreach versus clinic vaccination. Relevant predictors for each type of vaccination (outreach vs. in-clinic) show how design of outreach vaccination can effectively augment vaccination efforts beyond healthcare services through clinics. As Pakistan is third among countries with the most unvaccinated and under-vaccinated children, understanding barriers and factors associated with vaccination can be demonstrative for other national and sub-national regions facing challenges and also inform guidelines on supporting CHWs in health systems.

Area-based determinants of outreach vaccination for reaching vulnerable populations: A cross-sectional study in Pakistan

Chunara, R., Chunara, R., Chen, X., Porter, A., Abdur, R. N., Morris, S. K., Saif, U., & Chunara, R. (n.d.).

Publication year

2023

Journal title

PLOS Global Public Health

Volume

3

Issue

9

Page(s)

e0001703
Abstract
Abstract
~

Assessing behavioral stages from social media data

Liu, J., Weitzman, E. R., Chunara, R., & Chunara, R. (n.d.).

Publication year

2017

Page(s)

1320-1333
Abstract
Abstract
Important work rooted in psychological theory posits that health behavior change occurs through a series of discrete stages. Our work builds on the field of social computing by identifying how social media data can be used to resolve behavior stages at high resolution (e.g. hourly/daily) for key population subgroups and times. In essence this approach opens new opportunities to advance psychological theories and better understand how our health is shaped based on the real, dynamic, and rapid actions we make every day. To do so, we bring together domain knowledge and machine learning methods to form a hierarchical classification of Twitter data that resolves different stages of behavior. We identify and examine temporal patterns of the identified stages, with alcohol as a use case (planning or looking to drink, currently drinking, and reflecting on drinking). Known seasonal trends are compared with findings from our methods. We discuss the potential health policy implications of detecting high frequency behavior stages.

Assessing the Online Social Environment for Surveillance of Obesity Prevalence

Chunara, R., Chunara, R., Bouton, L., Ayers, J. W., & Brownstein, J. S. (n.d.).

Publication year

2013

Journal title

PloS one

Volume

8

Issue

4
Abstract
Abstract
Background:Understanding the social environmental around obesity has been limited by available data. One promising approach used to bridge similar gaps elsewhere is to use passively generated digital data.Purpose:This article explores the relationship between online social environment via web-based social networks and population obesity prevalence.Methods:We performed a cross-sectional study using linear regression and cross validation to measure the relationship and predictive performance of user interests on the online social network Facebook to obesity prevalence in metros across the United States of America (USA) and neighborhoods within New York City (NYC). The outcomes, proportion of obese and/or overweight population in USA metros and NYC neighborhoods, were obtained via the Centers for Disease Control and Prevention Behavioral Risk Factor Surveillance and NYC EpiQuery systems. Predictors were geographically specific proportion of users with activity-related and sedentary-related interests on Facebook.Results:Higher proportion of the population with activity-related interests on Facebook was associated with a significant 12.0% (95% Confidence Interval (CI) 11.9 to 12.1) lower predicted prevalence of obese and/or overweight people across USA metros and 7.2% (95% CI: 6.8 to 7.7) across NYC neighborhoods. Conversely, greater proportion of the population with interest in television was associated with higher prevalence of obese and/or overweight people of 3.9% (95% CI: 3.7 to 4.0) (USA) and 27.5% (95% CI: 27.1 to 27.9, significant) (NYC). For activity-interests and national obesity outcomes, the average root mean square prediction error from 10-fold cross validation was comparable to the average root mean square error of a model developed using the entire data set.Conclusions:Activity-related interests across the USA and sedentary-related interests across NYC were significantly associated with obesity prevalence. Further research is needed to understand how the online social environment relates to health outcomes and how it can be used to identify or target interventions.

Association Between Copayment Amount and Filling of Medications for Angiotensin Receptor Neprilysin Inhibitors in Patients With Heart Failure

Chunara, R., Chunara, R., Mukhopadhyay, A., Adhikari, S., Li, X., Dodson, J. A., Kronish, I. M., Shah, B., Ramatowski, M., Chunara, R., Kozloff, S., & Blecker, S. (n.d.).

Publication year

2022

Journal title

Journal of the American Heart Association

Volume

11

Issue

24

Page(s)

e027662
Abstract
Abstract
~

Association Between Copayment Amount and Filling of Medications for Angiotensin Receptor Neprilysin Inhibitors in Patients With Heart Failure

Mukhopadhyay, A., Adhikari, S., Li, X., Dodson, J. A., Kronish, I. M., Shah, B., Ramatowski, M., Chunara, R., Chunara, R., Kozloff, S., & Blecker, S. (n.d.).

Publication year

2022

Journal title

Journal of the American Heart Association

Volume

11

Issue

24
Abstract
Abstract
BACKGROUND: Angiotensin receptor neprilysin inhibitors (ARNI) reduce mortality and hospitalization for patients with heart failure. However, relatively high copayments for ARNI may contribute to suboptimal adherence, thus potentially limiting their benefits. METHODS AND RESULTS: We conducted a retrospective cohort study within a large, multi-site health system. We included patients with: ARNI prescription between November 20, 2020 and June 30, 2021; diagnosis of heart failure or left ventricular ejection fraction ≤40%; and available pharmacy or pharmacy benefit manager copayment data. The primary exposure was copayment, categorized as $0, $0.01 to $10, $10.01 to $100, and >$100. The primary outcome was prescription fill nonadherence, defined as the proportion of days covered $100. Patients with higher copayments had higher rates of nonadherence, ranging from 17.2% for $0 copayment to 34.2% for copayment >$100 (P$100 (OR, 2.58 [95% CI, 1.63– 4.18], P

Association between visit frequency, continuity of care, and pharmacy fill adherence in heart failure patients

Hamo, C. E., Mukhopadhyay, A., Li, X., Zheng, Y., Kronish, I. M., Chunara, R., Chunara, R., Dodson, J., Adhikari, S., & Blecker, S. (n.d.).

Publication year

2024

Journal title

American Heart Journal

Volume

273

Page(s)

53-60
Abstract
Abstract
Background: Despite advances in medical therapy for heart failure with reduced ejection fraction (HFrEF), major gaps in medication adherence to guideline-directed medical therapies (GDMT) remain. Greater continuity of care may impact medication adherence and reduced hospitalizations. Methods: We conducted a cross-sectional study of adults with a diagnosis of HF and EF ≤40% with ≥2 outpatient encounters between January 1, 2017 and January 10, 2021, prescribed ≥1 of the following GDMT: 1) Beta Blocker, 2) Angiotensin Converting Enzyme Inhibitor/Angiotensin Receptor Blocker/Angiotensin Receptor Neprilysin Inhibitor, 3) Mineralocorticoid Receptor Antagonist, 4) Sodium Glucose Cotransporter-2 Inhibitor. Continuity of care was calculated using the Bice-Boxerman Continuity of Care Index (COC) and the Usual Provider of Care (UPC) index, categorized by quantile. The primary outcome was adherence to GDMT, defined as average proportion of days covered ≥80% over 1 year. Secondary outcomes included all-cause and HF hospitalization at 1-year. We performed multivariable logistic regression analyses adjusted for demographics, insurance status, comorbidity index, number of visits and neighborhood SES index. Results: Overall, 3,971 individuals were included (mean age 72 years (SD 14), 71% male, 66% White race). In adjusted analyses, compared to individuals in the highest COC quartile, individuals in the third COC quartile had higher odds of GDMT adherence (OR 1.26, 95% CI 1.03-1.53, P = .024). UPC tertile was not associated with adherence (all P > .05). Compared to the highest quantiles, the lowest UPC and COC quantiles had higher odds of all-cause (UPC: OR 1.53, 95%CI 1.23-1.91; COC: OR 2.54, 95%CI 1.94-3.34) and HF (UPC: OR 1.81, 95%CI 1.23-2.67; COC: OR 1.77, 95%CI 1.09-2.95) hospitalizations. Conclusions: Continuity of care was not associated with GDMT adherence among patients with HFrEF but lower continuity of care was associated with increased all-cause and HF-hospitalizations.

Association of U.S. birth, duration of residence in the U.S., and atherosclerotic cardiovascular disease risk factors among Asian adults

Al Rifai, M., Kianoush, S., Jain, V., Joshi, P. H., Cainzos-Achirica, M., Nasir, K., Merchant, A. T., Dodani, S., Wong, S. S., Samad, Z., Mehta, A., Chunara, R., Chunara, R., Kalra, A., & Virani, S. S. (n.d.).

Publication year

2022

Journal title

Preventive Medicine Reports

Volume

29
Abstract
Abstract
Introduction: Prior studies have shown a direct association between U.S. birth and duration of residence with atherosclerotic cardiovascular disease (ASCVD) though, few have specifically focused on Asian Americans. Methods: We utilized cross-sectional data from the 2006 to 2015 National Health Interview Survey. We compared prevalent cardiovascular risk factors and ASCVD among Asian American individuals by U.S. birth and duration of time spent in the U.S. Results: The study sample consisted of 18,150 Asian individuals of whom 20.5 % were Asian Indian, 20.5 % were Chinese, 23.4 % were Filipino, and 35.6 % were of other Asian ethnic groups. The mean (standard error) age was 43.8 (0.21) years and 53 % were women. In multivariable-adjusted logistic regression models, U.S. birth was associated with a higher prevalence odds ratio (95 % confidence interval) of current smoking 1.31 (1.07,1.60), physical inactivity 0.62 (0.54,0.72), obesity 2.26 (1.91,2.69), hypertension 1.33 (1.12,1.58), and CAD 1.96 (1.24,3.11), but lower prevalence of stroke 0.28 (0.11,0.71). Spending greater than 15 years in the U.S. was associated with a higher prevalence of current smoking 1.65 (1.24,2.21), obesity 2.33 (1.57,3.47), diabetes 2.68 (1.17,6.15), and hyperlipidemia 1.72 (1.09,2.71). Conclusion: Heterogeneity exists in cardiovascular risk factor burden among Asian Americans according to Asian ethnicity, U.S. birth, and duration of time living in the U.S.

Associations between Anti-Gay Prejudice, Traditional Masculine Self-Concept, and Colorectal Cancer Screening–Related Outcomes among Black and White Men in the United States

Chen, T., Wicke, R., King, A. J., Margolin, D., Chunara, R., Chunara, R., & Niederdeppe, J. (n.d.).

Publication year

2025

Journal title

Cancer Epidemiology Biomarkers and Prevention

Volume

34

Issue

5

Page(s)

714-721
Abstract
Abstract
Background: Colorectal cancer screening can reduce colorectal cancer risk, yet many men are not up to date with screening guidelines. Although previous qualitative studies have suggested links among anti-gay prejudice, traditional masculine self-concept, racial identity, and colorectal cancer screening among men, scholars have yet to fully explore these associations using quantitative data. This study used a nationally representative sample of Black and White men in the United States to test these associations and examine the sociodemographic correlates. Methods: Using the National Opinion Research Center (NORC)/ AmeriSpeak probability-based panel, we recruited a sample of Black and White men in the United States ages 45 to 74 years who had never been diagnosed with colorectal cancer (N ¼ 909). Participants completed an online questionnaire measuring anti-gay prejudice, traditional masculine self-concept, sociodemographic variables, and screening-related outcomes (awareness of screening test options, screening intention, and adherence to screening recommendations). Results: Black participants reported higher levels of anti-gay prejudice and traditional masculine self-concept than White participants. Anti-gay prejudice was associated with lower awareness and lower screening intention. Black participants reported higher intention to follow screening recommendations but not higher odds of actual adherence than White participants. Conclusions: Men with anti-gay prejudice are less likely to be aware of colorectal cancer screening test options and less likely to intend to engage in colorectal cancer screening. The results have implications for the design and development of future interventions aimed at increasing colorectal cancer screening rates.

Associations between news coverage, social media discussions, and search trends about celebrity deaths, screening, and other colorectal cancer-related events

Liu, J., Niederdeppe, J., Tong, C., Margolin, D., Chunara, R., Chunara, R., Smith, T., & King, A. J. (n.d.).

Publication year

2024

Journal title

Preventive Medicine

Volume

185
Abstract
Abstract
Objective: Colorectal cancer (CRC) is the third leading cause of cancer death among both men and women in the United States. CRC-related events may increase media coverage and public attention, boosting awareness and prevention. This study examined associations between several types of CRC events (including unplanned celebrity cancer deaths and planned events like national CRC awareness months, celebrity screening behavior, and screening guideline changes) and news coverage, Twitter discussions, and Google search trends about CRC and CRC screening. Methods: We analyzed data from U.S. national news media outlets, posts scraped from Twitter, and Google Trends on CRC and CRC screening during a three-year period from 2020 to 2022. We used burst detection methods to identify temporal spikes in the volume of news, tweets, and search after each CRC-related event. Results: There is a high level of heterogeneity in the impact of celebrity CRC events. Celebrity CRC deaths were more likely to precede spikes in news and tweets about CRC overall than CRC screening. Celebrity screening preceded spikes in news and tweets about screening but not searches. Awareness months and screening guideline changes did precede spikes in news, tweets, and searches about screening, but these spikes were inconsistent, not simultaneous, and not as large as those events concerning most prominent public figures. Conclusions: CRC events provide opportunities to increase attention to CRC. Media and public health professionals should actively intervene during CRC events to increase emphasis on CRC screening and evidence-based recommendations.

Averting the perfect storm : Addressing youth substance use risk from social media use

Salimian, P. K., Chunara, R., Chunara, R., & Weitzman, E. R. (n.d.).

Publication year

2014

Journal title

Pediatric annals

Volume

43

Issue

10

Page(s)

e242-e247
Abstract
Abstract
Adolescents are developmentally sensitive to pathways that influence alcohol and other drug (AOD) use. In the absence of guidance, their routine engagement with social media may add a further layer of risk. There are several potential mechanisms for social media use to influence AOD risk, including exposure to peer portrayals of AOD use, socially amplified advertising, misinformation, and predatory marketing against a backdrop of lax regulatory systems and privacy controls. Here the authors summarize the influences of the social media world and suggest how pediatricians in everyday practice can alert youth and their parents to these risks to foster conversation, awareness, and harm reduction.

Building Public Health Surveillance 3.0 : Emerging Timely Measures of Physical, Economic, and Social Environmental Conditions Affecting Health

Thorpe, L. E., Chunara, R., Chunara, R., Roberts, T., Pantaleo, N., Irvine, C., Conderino, S., Li, Y., Hsieh, P. Y., Gourevitch, M. N., Levine, S., Ofrane, R., & Spoer, B. (n.d.).

Publication year

2022

Journal title

American journal of public health

Volume

112

Issue

10

Page(s)

1436-1445
Abstract
Abstract
In response to rapidly changing societal conditions stemming from the COVID-19 pandemic, we summarize data sources with potential to produce timely and spatially granular measures of physical, economic, and social conditions relevant to public health surveillance, and we briefly describe emerging analytic methods to improve small-area estimation. To inform this article, we reviewed published systematic review articles set in the United States from 2015 to 2020 and conducted unstructured interviews with senior content experts in public heath practice, academia, and industry. We identified a modest number of data sources with high potential for generating timely and spatially granular measures of physical, economic, and social determinants of health. We also summarized modeling and machine-learning techniques useful to support development of time-sensitive surveillance measures that may be critical for responding to future major events such as the COVID-19 pandemic. (Am J Public Health. 2022;112(10):1436-1445. https://doi.org/10.2105/AJPH.2022.306917).

Causal Multi-level Fairness

Mhasawade, V., Chunara, R., & Chunara, R. (n.d.).

Publication year

2021

Page(s)

784-794
Abstract
Abstract
Algorithmic systems are known to impact marginalized groups severely, and more so, if all sources of bias are not considered. While work in algorithmic fairness to-date has primarily focused on addressing discrimination due to individually linked attributes, social science research elucidates how some properties we link to individuals can be conceptualized as having causes at macro (e.g. structural) levels, and it may be important to be fair to attributes at multiple levels. For example, instead of simply considering race as a causal, protected attribute of an individual, the cause may be distilled as perceived racial discrimination an individual experiences, which in turn can be affected by neighborhood-level factors. This multi-level conceptualization is relevant to questions of fairness, as it may not only be important to take into account if the individual belonged to another demographic group, but also if the individual received advantaged treatment at the macro-level. In this paper, we formalize the problem of multi-level fairness using tools from causal inference in a manner that allows one to assess and account for effects of sensitive attributes at multiple levels. We show importance of the problem by illustrating residual unfairness if macro-level sensitive attributes are not accounted for, or included without accounting for their multi-level nature. Further, in the context of a real-world task of predicting income based on macro and individual-level attributes, we demonstrate an approach for mitigating unfairness, a result of multi-level sensitive attributes.

Characterizing sleep issues using Twitter

McIver, D. J., Hawkins, J. B., Chunara, R., Chunara, R., Chatterjee, A. K., Bhandari, A., Fitzgerald, T. P., Jain, S. H., & Brownstein, J. S. (n.d.).

Publication year

2015

Journal title

Journal of medical Internet research

Volume

17

Issue

6

Page(s)

e140
Abstract
Abstract
Background: Sleep issues such as insomnia affect over 50 million Americans and can lead to serious health problems, including depression and obesity, and can increase risk of injury. Social media platforms such as Twitter offer exciting potential for their use in studying and identifying both diseases and social phenomenon. Objective: Our aim was to determine whether social media can be used as a method to conduct research focusing on sleep issues. Methods: Twitter posts were collected and curated to determine whether a user exhibited signs of sleep issues based on the presence of several keywords in tweets such as insomnia, "can't sleep", Ambien, and others. Users whose tweets contain any of the keywords were designated as having self-identified sleep issues (sleep group). Users who did not have self-identified sleep issues (non-sleep group) were selected from tweets that did not contain pre-defined words or phrases used as a proxy for sleep issues. Results: User data such as number of tweets, friends, followers, and location were collected, as well as the time and date of tweets. Additionally, the sentiment of each tweet and average sentiment of each user were determined to investigate differences between non-sleep and sleep groups. It was found that sleep group users were significantly less active on Twitter (P=.04), had fewer friends (P

Cohort profile : a large EHR-based cohort with linked pharmacy refill and neighbourhood social determinants of health data to assess heart failure medication adherence

Adhikari, S., Mukhyopadhyay, A., Kolzoff, S., Li, X., Nadel, T., Fitchett, C., Chunara, R., Chunara, R., Dodson, J., Kronish, I., & Blecker, S. B. (n.d.).

Publication year

2023

Journal title

BMJ open

Volume

13

Issue

12
Abstract
Abstract
Purpose Clinic-based or community-based interventions can improve adherence to guideline-directed medication therapies (GDMTs) among patients with heart failure (HF). However, opportunities for such interventions are frequently missed, as providers may be unable to recognise risk patterns for medication non-adherence. Machine learning algorithms can help in identifying patients with high likelihood of non-adherence. While a number of multilevel factors influence adherence, prior models predicting non-adherence have been limited by data availability. We have established an electronic health record (EHR)-based cohort with comprehensive data elements from multiple sources to improve on existing models. We linked EHR data with pharmacy refill data for real-time incorporation of prescription fills and with social determinants data to incorporate neighbourhood factors. Participants Patients seen at a large health system in New York City (NYC), who were >18 years old with diagnosis of HF or reduced ejection fraction (

Colorectal Cancer Racial Equity Post Volume, Content, and Exposure : Observational Study Using Twitter Data

Tong, C., Margolin, D., Niederdeppe, J., Chunara, R., Chunara, R., Liu, J., Jih-Vieira, L., & King, A. J. (n.d.).

Publication year

2025

Journal title

Journal of medical Internet research

Volume

27
Abstract
Abstract
Background: Racial inequity in health outcomes, particularly in colorectal cancer (CRC), remains one of the most pressing issues in cancer communication and public health. Social media platforms like Twitter (now X) provide opportunities to disseminate health equity information widely, yet little is known about the availability, content, and reach of racial health equity information related to CRC on these platforms. Addressing this gap is essential to leveraging social media for equitable health communication. Objective: This study aims to analyze the volume, content, and exposure of CRC racial health equity tweets from identified CRC equity disseminator accounts on Twitter. These accounts were defined as those actively sharing information related to racial equity in CRC outcomes. By examining the behavior and impact of these disseminators, this study provides insights into how health equity content is shared and received on social media. Methods: We identified accounts that posted CRC-related content on Twitter between 2019 and 2021. Accounts were classified as CRC equity disseminators (n=798) if they followed at least 2 CRC racial equity organization accounts. We analyzed the volume and content of racial equity–related CRC tweets (n=1134) from these accounts and categorized them by account type (experts vs nonexperts). Additionally, we evaluated exposure by analyzing follower reach (n=6,266,269) and the role of broker accounts—accounts serving as unique sources of CRC racial equity information to their followers. Results: Among 19,559 tweets posted by 798 CRC equity disseminators, only 5.8% (n=1134) mentioned racially and ethnically minoritized groups. Most of these tweets (641/1134, 57%) addressed disparities in outcomes, while fewer emphasized actionable content, such as symptoms (11/1134, 1%) or screening procedures (159/1134, 14%). Expert accounts (n=479; 716 tweets) were more likely to post CRC equity tweets compared with nonexpert accounts (n=319; 418 tweets). Broker accounts (n=500), or those with a substantial portion of followers relying on them for equity-related information, demonstrated the highest capacity for exposing followers to CRC equity content, thereby extending the reach of these critical messages to underserved communities. Conclusions: This study emphasizes the critical roles played by expert and broker accounts in disseminating CRC racial equity information on social media. Despite the limited volume of equity-focused content, broker accounts were crucial in reaching otherwise unexposed audiences. Public health practitioners should focus on encouraging equity disseminators to share more actionable information, such as symptoms and screening benefits, and implement measures to amplify the reach of such content on social media. Strengthening these efforts could help bridge disparities in cancer outcomes among racially minoritized groups.

Comparison of Social Media, Syndromic Surveillance, and Microbiologic Acute Respiratory Infection Data : Observational Study

Daughton, A. R., Chunara, R., Chunara, R., & Paul, M. J. (n.d.).

Publication year

2020

Journal title

JMIR Public Health and Surveillance

Volume

6

Issue

2
Abstract
Abstract
Background: Internet data can be used to improve infectious disease models. However, the representativeness and individual-level validity of internet-derived measures are largely unexplored as this requires ground truth data for study. Objective: This study sought to identify relationships between Web-based behaviors and/or conversation topics and health status using a ground truth, survey-based dataset. Methods: This study leveraged a unique dataset of self-reported surveys, microbiological laboratory tests, and social media data from the same individuals toward understanding the validity of individual-level constructs pertaining to influenza-like illness in social media data. Logistic regression models were used to identify illness in Twitter posts using user posting behaviors and topic model features extracted from users’ tweets. Results: Of 396 original study participants, only 81 met the inclusion criteria for this study. Of these participants’ tweets, we identified only two instances that were related to health and occurred within 2 weeks (before or after) of a survey indicating symptoms. It was not possible to predict when participants reported symptoms using features derived from topic models (area under the curve [AUC]=0.51; P=.38), though it was possible using behavior features, albeit with a very small effect size (AUC=0.53; P≤.001). Individual symptoms were also generally not predictable either. The study sample and a random sample from Twitter are predictably different on held-out data (AUC=0.67; P≤.001), meaning that the content posted by people who participated in this study was predictably different from that posted by random Twitter users. Individuals in the random sample and the GoViral sample used Twitter with similar frequencies (similar @ mentions, number of tweets, and number of retweets; AUC=0.50; P=.19). Conclusions: To our knowledge, this is the first instance of an attempt to use a ground truth dataset to validate infectious disease observations in social media data. The lack of signal, the lack of predictability among behaviors or topics, and the demonstrated volunteer bias in the study population are important findings for the large and growing body of disease surveillance using internet-sourced data.

Constructing social vulnerability indexes with increased data and machine learning highlight the importance of wealth across global contexts

Chunara, R., Chunara, R., Zhao, Y., Paul, R., Reid, S., Coimbra, V. C., Wolfe, C., Zhang, Y., & Chunara, R. (n.d.).

Publication year

2024

Journal title

Social Indicators Research

Volume

175

Issue

2

Page(s)

639--657
Abstract
Abstract
~

Correction to : Constructing Social Vulnerability Indexes with Increased Data and Machine Learning Highlight the Importance of Wealth Across Global Contexts (Social Indicators Research, (2024), 10.1007/s11205-024-03386-9)

Zhao, Y., Paul, R., Reid, S., Vieira, C. C., Wolfe, C., Zhang, Y., Chunara, R., & Chunara, R. (n.d.).

Publication year

2024

Journal title

Social Indicators Research

Volume

174

Issue

3

Page(s)

1141-1142
Abstract
Abstract
The wrong Supplementary file was originally published with this article; it has now been replaced with the correct file. The original article has been corrected.

COVID-19 transforms health care through telemedicine : Evidence from the field

Mann, D. M., Chen, J., Chunara, R., Chunara, R., Testa, P. A., Nov, O., Nov, O., & Nov, O. (n.d.).

Publication year

2020

Journal title

Journal of the American Medical Informatics Association

Volume

27

Issue

7

Page(s)

1132-1135
Abstract
Abstract
This study provides data on the feasibility and impact of video-enabled telemedicine use among patients and providers and its impact on urgent and nonurgent healthcare delivery from one large health system (NYU Langone Health) at the epicenter of the coronavirus disease 2019 (COVID-19) outbreak in the United States. Between March 2nd and April 14th 2020, telemedicine visits increased from 102.4 daily to 801.6 daily. (683% increase) in urgent care after the system-wide expansion of virtual urgent care staff in response to COVID-19. Of all virtual visits post expansion, 56.2% and 17.6% urgent and nonurgent visits, respectively, were COVID-19-related. Telemedicine usage was highest by patients 20 to 44 years of age, particularly for urgent care. The COVID-19 pandemic has driven rapid expansion of telemedicine use for urgent care and nonurgent care visits beyond baseline periods. This reflects an important change in telemedicine that other institutions facing the COVID-19 pandemic should anticipate.

Creating full individual-level location timelines from sparse social media data

Rehman, N. A., Relia, K., Chunara, R., & Chunara, R. (n.d.). (L. Xiong, R. Tamassia, K. F. Banaei, R. H. Guting, & E. Hoel, Eds.).

Publication year

2018

Page(s)

379-388
Abstract
Abstract
In many domain applications, a continuous timeline of human locations is critical; for example for understanding possible locations where a disease may spread, or the flow of traffic. While data sources such as GPS trackers or Call Data Records are temporally-rich, they are expensive, often not publicly available or garnered only in select locations, restricting their wide use. Conversely, geo-located social media data are publicly and freely available, but present challenges especially for full timeline inference due to their sparse nature. We propose a stochastic framework, Intermediate Location Computing (ILC) which uses prior knowledge about human mobility patterns to predict every missing location from an individual’s social media timeline. We compare ILC with a state-of-the-art RNN baseline as well as methods that are optimized for next-location prediction only. For three major cities, ILC predicts the top 1 location for all missing locations in a timeline, at 1 and 2-hour resolution, with up to 77.2% accuracy (up to 6% better accuracy than all compared methods). Specifically, ILC also outperforms the RNN in settings of low data; both cases of very small number of users (under 50), as well as settings with more users, but with sparser timelines. In general, the RNN model needs a higher number of users to achieve the same performance as ILC. Overall, this work illustrates the tradeoff between prior knowledge of heuristics and more data, for an important societal problem of filling in entire timelines using freely available, but sparse social media data.

Denominator Issues for Personally Generated Data in Population Health Monitoring

Chunara, R., Chunara, R., Wisk, L. E., & Weitzman, E. R. (n.d.).

Publication year

2017

Journal title

American journal of preventive medicine

Volume

52

Issue

4

Page(s)

549-553
Abstract
Abstract
~

Contact

rumi.chunara@nyu.edu 708 Broadway New York, NY, 10003