Skip to main content

Rumi Chunara

Rumi Chunara

Rumi Chunara

Scroll

Associate Professor of Biostatistics

Associate Professor of Computer Science and Engineering, Tandon

Director of Center for Health Data Science

Professional overview

The overarching goal of Dr. Rumi Chunara's research is to develop computational and statistical approaches for acquiring, integrating and using data to improve population-level public health. She focuses on the design and development of data mining and machine learning methods to address challenges related to data and goals of public health, as well as fairness and ethics in the design and use of data and algorithms embedded in social systems.

At NYU, Dr. Chunara also leads the Chunara Lab, which develops computational and statistical methods across data mining, natural language processing, spatio-temporal analyses and machine learning, to study population health. Previously, she was a Postdoctoral Fellow and Instructor at HealthMap and the Children's Hospital Informatics Program at Harvard Medical School. She completed her PhD at the Harvard-MIT Division of Health Sciences and Technology and BSc at Caltech.

Education

BS, Electrical Engineering (Honors), Caltech
MS, Electrical Engineering and Computer Science, MIT
PhD, Medical and Electrical Engineering, MIT (Harvard-MIT Division of Health Sciences and Technology)

Honors and awards

Max Planck Sabbatical Award (2021)
speaker at NSF Computer and Information Science and Engineering Directorate Career Proposal Writing Workshop (2020)
Invited tutorial on Public Health and Machine Learning at ACM Conference on Health, Inference and Learning (2020)
Keynote at Human Computation and Crowdsourcing (2019)
Invited Speaker at Expert Group Meeting at United Nations Population Fund, Advances in Mobile Technologies for Data Collection Panel (2019)
Keynote at ''Mapping the Equity Dimensions of Artificial Intelligence in Public Health'', University of Toronto (2019)
Facebook Research Award (2019)
Gates Foundation Grand Challenges Exploration Award (2019)
NSF CAREER Award (2019)
MIT Technology Review Top 35 Innovators Under 35 (2014)
MIT Presidential Fellow (2004)

Areas of research and study

Health Disparities
Machine learning
Social Computing
Social Determinants of Health

Publications

Publications

Associations between Anti-Gay Prejudice, Traditional Masculine Self-Concept, and Colorectal Cancer Screening–Related Outcomes among Black and White Men in the United States

Chen, T., Wicke, R., King, A. J., Margolin, D., Chunara, R., Chunara, R., & Niederdeppe, J. (n.d.).

Publication year

2025

Journal title

Cancer Epidemiology Biomarkers and Prevention

Volume

34

Issue

5

Page(s)

714-721
Abstract
Abstract
Background: Colorectal cancer screening can reduce colorectal cancer risk, yet many men are not up to date with screening guidelines. Although previous qualitative studies have suggested links among anti-gay prejudice, traditional masculine self-concept, racial identity, and colorectal cancer screening among men, scholars have yet to fully explore these associations using quantitative data. This study used a nationally representative sample of Black and White men in the United States to test these associations and examine the sociodemographic correlates. Methods: Using the National Opinion Research Center (NORC)/ AmeriSpeak probability-based panel, we recruited a sample of Black and White men in the United States ages 45 to 74 years who had never been diagnosed with colorectal cancer (N ¼ 909). Participants completed an online questionnaire measuring anti-gay prejudice, traditional masculine self-concept, sociodemographic variables, and screening-related outcomes (awareness of screening test options, screening intention, and adherence to screening recommendations). Results: Black participants reported higher levels of anti-gay prejudice and traditional masculine self-concept than White participants. Anti-gay prejudice was associated with lower awareness and lower screening intention. Black participants reported higher intention to follow screening recommendations but not higher odds of actual adherence than White participants. Conclusions: Men with anti-gay prejudice are less likely to be aware of colorectal cancer screening test options and less likely to intend to engage in colorectal cancer screening. The results have implications for the design and development of future interventions aimed at increasing colorectal cancer screening rates.

Colorectal Cancer Racial Equity Post Volume, Content, and Exposure : Observational Study Using Twitter Data

Tong, C., Margolin, D., Niederdeppe, J., Chunara, R., Chunara, R., Liu, J., Jih-Vieira, L., & King, A. J. (n.d.).

Publication year

2025

Journal title

Journal of medical Internet research

Volume

27
Abstract
Abstract
Background: Racial inequity in health outcomes, particularly in colorectal cancer (CRC), remains one of the most pressing issues in cancer communication and public health. Social media platforms like Twitter (now X) provide opportunities to disseminate health equity information widely, yet little is known about the availability, content, and reach of racial health equity information related to CRC on these platforms. Addressing this gap is essential to leveraging social media for equitable health communication. Objective: This study aims to analyze the volume, content, and exposure of CRC racial health equity tweets from identified CRC equity disseminator accounts on Twitter. These accounts were defined as those actively sharing information related to racial equity in CRC outcomes. By examining the behavior and impact of these disseminators, this study provides insights into how health equity content is shared and received on social media. Methods: We identified accounts that posted CRC-related content on Twitter between 2019 and 2021. Accounts were classified as CRC equity disseminators (n=798) if they followed at least 2 CRC racial equity organization accounts. We analyzed the volume and content of racial equity–related CRC tweets (n=1134) from these accounts and categorized them by account type (experts vs nonexperts). Additionally, we evaluated exposure by analyzing follower reach (n=6,266,269) and the role of broker accounts—accounts serving as unique sources of CRC racial equity information to their followers. Results: Among 19,559 tweets posted by 798 CRC equity disseminators, only 5.8% (n=1134) mentioned racially and ethnically minoritized groups. Most of these tweets (641/1134, 57%) addressed disparities in outcomes, while fewer emphasized actionable content, such as symptoms (11/1134, 1%) or screening procedures (159/1134, 14%). Expert accounts (n=479; 716 tweets) were more likely to post CRC equity tweets compared with nonexpert accounts (n=319; 418 tweets). Broker accounts (n=500), or those with a substantial portion of followers relying on them for equity-related information, demonstrated the highest capacity for exposing followers to CRC equity content, thereby extending the reach of these critical messages to underserved communities. Conclusions: This study emphasizes the critical roles played by expert and broker accounts in disseminating CRC racial equity information on social media. Despite the limited volume of equity-focused content, broker accounts were crucial in reaching otherwise unexposed audiences. Public health practitioners should focus on encouraging equity disseminators to share more actionable information, such as symptoms and screening benefits, and implement measures to amplify the reach of such content on social media. Strengthening these efforts could help bridge disparities in cancer outcomes among racially minoritized groups.

Identifying and mitigating algorithmic bias in the safety net

Mackin, S., Major, V. J., Chunara, R., Chunara, R., & Newton-Dame, R. (n.d.).

Publication year

2025

Journal title

npj Digital Medicine

Volume

8

Issue

1
Abstract
Abstract
Algorithmic bias occurs when predictive model performance varies meaningfully across sociodemographic classes, exacerbating systemic healthcare disparities. NYC Health + Hospitals, an urban safety net system, assessed bias in two binary classification models in our electronic medical record: one predicting acute visits for asthma and one predicting unplanned readmissions. We evaluated differences in subgroup performance across race/ethnicity, sex, language, and insurance using equal opportunity difference (EOD), a metric comparing false negative rates. The most biased classes (race/ethnicity for asthma, insurance for readmission) were targeted for mitigation using threshold adjustment, which adjusts subgroup thresholds to minimize EOD, and reject option classification, which re-classifies scores near the threshold by subgroup. Successful mitigation was defined as 1) absolute subgroup EODs

Quantifying greenspace with satellite images in Karachi, Pakistan using a new data augmentation paradigm

Chunara, R., Chunara, R., Zhang, M., Arshad, H., Abbas, M., Jehanzeb, H., Tahir, I., Hassan, J., Samad, Z., & Chunara, R. (n.d.).

Publication year

2025

Journal title

ACM Journal on Computing and Sustainable Societies
Abstract
Abstract
~

Association between visit frequency, continuity of care, and pharmacy fill adherence in heart failure patients

Hamo, C. E., Mukhopadhyay, A., Li, X., Zheng, Y., Kronish, I. M., Chunara, R., Chunara, R., Dodson, J., Adhikari, S., & Blecker, S. (n.d.).

Publication year

2024

Journal title

American Heart Journal

Volume

273

Page(s)

53-60
Abstract
Abstract
Background: Despite advances in medical therapy for heart failure with reduced ejection fraction (HFrEF), major gaps in medication adherence to guideline-directed medical therapies (GDMT) remain. Greater continuity of care may impact medication adherence and reduced hospitalizations. Methods: We conducted a cross-sectional study of adults with a diagnosis of HF and EF ≤40% with ≥2 outpatient encounters between January 1, 2017 and January 10, 2021, prescribed ≥1 of the following GDMT: 1) Beta Blocker, 2) Angiotensin Converting Enzyme Inhibitor/Angiotensin Receptor Blocker/Angiotensin Receptor Neprilysin Inhibitor, 3) Mineralocorticoid Receptor Antagonist, 4) Sodium Glucose Cotransporter-2 Inhibitor. Continuity of care was calculated using the Bice-Boxerman Continuity of Care Index (COC) and the Usual Provider of Care (UPC) index, categorized by quantile. The primary outcome was adherence to GDMT, defined as average proportion of days covered ≥80% over 1 year. Secondary outcomes included all-cause and HF hospitalization at 1-year. We performed multivariable logistic regression analyses adjusted for demographics, insurance status, comorbidity index, number of visits and neighborhood SES index. Results: Overall, 3,971 individuals were included (mean age 72 years (SD 14), 71% male, 66% White race). In adjusted analyses, compared to individuals in the highest COC quartile, individuals in the third COC quartile had higher odds of GDMT adherence (OR 1.26, 95% CI 1.03-1.53, P = .024). UPC tertile was not associated with adherence (all P > .05). Compared to the highest quantiles, the lowest UPC and COC quantiles had higher odds of all-cause (UPC: OR 1.53, 95%CI 1.23-1.91; COC: OR 2.54, 95%CI 1.94-3.34) and HF (UPC: OR 1.81, 95%CI 1.23-2.67; COC: OR 1.77, 95%CI 1.09-2.95) hospitalizations. Conclusions: Continuity of care was not associated with GDMT adherence among patients with HFrEF but lower continuity of care was associated with increased all-cause and HF-hospitalizations.

Associations between news coverage, social media discussions, and search trends about celebrity deaths, screening, and other colorectal cancer-related events

Liu, J., Niederdeppe, J., Tong, C., Margolin, D., Chunara, R., Chunara, R., Smith, T., & King, A. J. (n.d.).

Publication year

2024

Journal title

Preventive Medicine

Volume

185
Abstract
Abstract
Objective: Colorectal cancer (CRC) is the third leading cause of cancer death among both men and women in the United States. CRC-related events may increase media coverage and public attention, boosting awareness and prevention. This study examined associations between several types of CRC events (including unplanned celebrity cancer deaths and planned events like national CRC awareness months, celebrity screening behavior, and screening guideline changes) and news coverage, Twitter discussions, and Google search trends about CRC and CRC screening. Methods: We analyzed data from U.S. national news media outlets, posts scraped from Twitter, and Google Trends on CRC and CRC screening during a three-year period from 2020 to 2022. We used burst detection methods to identify temporal spikes in the volume of news, tweets, and search after each CRC-related event. Results: There is a high level of heterogeneity in the impact of celebrity CRC events. Celebrity CRC deaths were more likely to precede spikes in news and tweets about CRC overall than CRC screening. Celebrity screening preceded spikes in news and tweets about screening but not searches. Awareness months and screening guideline changes did precede spikes in news, tweets, and searches about screening, but these spikes were inconsistent, not simultaneous, and not as large as those events concerning most prominent public figures. Conclusions: CRC events provide opportunities to increase attention to CRC. Media and public health professionals should actively intervene during CRC events to increase emphasis on CRC screening and evidence-based recommendations.

Constructing social vulnerability indexes with increased data and machine learning highlight the importance of wealth across global contexts

Chunara, R., Chunara, R., Zhao, Y., Paul, R., Reid, S., Coimbra, V. C., Wolfe, C., Zhang, Y., & Chunara, R. (n.d.).

Publication year

2024

Journal title

Social Indicators Research

Volume

175

Issue

2

Page(s)

639--657
Abstract
Abstract
~

Correction to : Constructing Social Vulnerability Indexes with Increased Data and Machine Learning Highlight the Importance of Wealth Across Global Contexts (Social Indicators Research, (2024), 10.1007/s11205-024-03386-9)

Zhao, Y., Paul, R., Reid, S., Vieira, C. C., Wolfe, C., Zhang, Y., Chunara, R., & Chunara, R. (n.d.).

Publication year

2024

Journal title

Social Indicators Research

Volume

174

Issue

3

Page(s)

1141-1142
Abstract
Abstract
The wrong Supplementary file was originally published with this article; it has now been replaced with the correct file. The original article has been corrected.

Making Sense of Social Media Data About Colorectal Cancer Screening

King, A. J., Margolin, D., Tong, C., Chunara, R., Chunara, R., & Niederdeppe, J. (n.d.).

Publication year

2024

Journal title

Journal of the American College of Radiology

Volume

21

Issue

4

Page(s)

543-544
Abstract
Abstract
~

Mitigating Urban-Rural Disparities in Contrastive Representation Learning with Satellite Imagery

Chunara, R., Chunara, R., Zhang, M., & Chunara, R. (n.d.).

Publication year

2024

Volume

7

Page(s)

1723--1734
Abstract
Abstract
~

Social determinants of health : the need for data science methods and capacity

Chunara, R., Chunara, R., Gjonaj, J., Immaculate, E., Wanga, I., Alaro, J., Scott-Sheldon, L. A., Mangeni, J., Mwangi, A., Vedanthan, R., & Hogan, J. (n.d.).

Publication year

2024

Journal title

The Lancet Digital Health

Volume

6

Issue

4

Page(s)

e235-e237
Abstract
Abstract
~

Understanding colorectal cancer screening message preferences of black Americans: Results from a crowdsourced wiki survey

Chunara, R., Chunara, R., King, A., Chen, T., Wicke, R., Tong, C., Margolin, D., Chunara, R., Kanrar, R., Nettleton, D., & Niederdeppe, J. (n.d.).

Publication year

2024
Abstract
Abstract
~

Understanding Disparities in Post Hoc Machine Learning Explanation

Mhasawade, V., Rahman, S., Haskell-Craig, Z., Chunara, R., & Chunara, R. (n.d.).

Publication year

2024

Page(s)

2374-2388
Abstract
Abstract
Previous work has highlighted that existing post-hoc explanation methods exhibit disparities in explanation fidelity (across "race"and "gender"as sensitive attributes), and while a large body of work focuses on mitigating these issues at the explanation metric level, the role of the data generating process and black box model in relation to explanation disparities remains largely unexplored. Accordingly, through both simulations as well as experiments on a real-world dataset, we specifically assess challenges to explanation disparities that originate from properties of the data: limited sample size, covariate shift, concept shift, omitted variable bias, and challenges based on model properties: inclusion of the sensitive attribute and appropriate functional form. Through controlled simulation analyses, our study demonstrates that increased covariate shift, concept shift, and omission of covariates increase explanation disparities, with the effect pronounced higher for neural network models that are better able to capture the underlying functional form in comparison to linear models. We also observe consistent findings regarding the effect of concept shift and omitted variable bias on explanation disparities in the Adult income dataset. Overall, results indicate that disparities in model explanations can also depend on data and model properties. Based on this systematic investigation, we provide recommendations for the design of explanation methods that mitigate undesirable disparities.

Utilizing big data without domain knowledge impacts public health decision-making

Zhang, M., Rahman, S., Mhasawade, V., Chunara, R., & Chunara, R. (n.d.).

Publication year

2024

Journal title

Proceedings of the National Academy of Sciences of the United States of America

Volume

121

Issue

39
Abstract
Abstract
New data sources and AI methods for extracting information are increasingly abundant and relevant to decision-making across societal applications. A notable example is street view imagery, available in over 100 countries, and purported to inform built environment interventions (e.g., adding sidewalks) for community health outcomes. However, biases can arise when decision-making does not account for data robustness or relies on spurious correlations. To investigate this risk, we analyzed 2.02 million Google Street View (GSV) images alongside health, demographic, and socioeconomic data from New York City. Findings demonstrate robustness challenges; built environment characteristics inferred from GSV labels at the intracity level often do not align with ground truth. Moreover, as average individual-level behavior of physical inactivity significantly mediates the impact of built environment features by census tract, intervention on features measured by GSV would be misestimated without proper model specification and consideration of this mediation mechanism. Using a causal framework accounting for these mediators, we determined that intervening by improving 10% of samples in the two lowest tertiles of physical inactivity would lead to a 4.17 (95% CI 3.84–4.55) or 17.2 (95% CI 14.4–21.3) times greater decrease in the prevalence of obesity or diabetes, respectively, compared to the same proportional intervention on the number of crosswalks by census tract. This study highlights critical issues of robustness and model specification in using emergent data sources, showing the data may not measure what is intended, and ignoring mediators can result in biased intervention effect estimates.

A Brief Tutorial on Sample Size Calculations for Fairness Audits

Chunara, R., Chunara, R., Singh, H., Xia, F., Kim, M.-O., Pirracchio, R., Chunara, R., & Feng, J. (n.d.).

Publication year

2023
Abstract
Abstract
~

Area-based determinants of outreach vaccination for reaching vulnerable populations : A cross-sectional study in Pakistan

Chen, X., Porter, A., Rehman, N. A., Morris, S. K., Saif, U., Chunara, R., & Chunara, R. (n.d.).

Publication year

2023

Journal title

PLOS global public health

Volume

3

Issue

9 September
Abstract
Abstract
The objective of this study is to gain a comparative understanding of spatial determinants for outreach and clinic vaccination, which is critical for operationalizing efforts and breaking down structural biases; particularly relevant in countries where resources are low, and sub-region variance is high. Leveraging a massive effort to digitize public system reporting by Lady and Community Health Workers (CHWs) with geo-located data on over 4 million public-sector vaccinations from September 2017 through 2019, understanding health service operations in relation to vulnerable spatial determinants were made feasible. Location and type of vaccinations (clinic or outreach) were compared to regional spatial attributes where they were performed. Important spatial attributes were assessed using three modeling approaches (ridge regression, gradient boosting, and a generalized additive model). Consistent predictors for outreach, clinic, and proportion of third dose pentavalent vaccinations by region were identified. Of all Penta-3 vaccination records, 86.3% were performed by outreach efforts. At the tehsil level (fourth-order administrative unit), controlling for child population, population density, proportion of population in urban areas, distance to cities, average maternal education, and other relevant factors, increased poverty was significantly associated with more in-clinic vaccinations (β = 0.077), and lower proportion of outreach vaccinations by region (β = -0.083). Analyses at the union council level (fifth-administrative unit) showed consistent results for the differential importance of poverty for outreach versus clinic vaccination. Relevant predictors for each type of vaccination (outreach vs. in-clinic) show how design of outreach vaccination can effectively augment vaccination efforts beyond healthcare services through clinics. As Pakistan is third among countries with the most unvaccinated and under-vaccinated children, understanding barriers and factors associated with vaccination can be demonstrative for other national and sub-national regions facing challenges and also inform guidelines on supporting CHWs in health systems.

Area-based determinants of outreach vaccination for reaching vulnerable populations: A cross-sectional study in Pakistan

Chunara, R., Chunara, R., Chen, X., Porter, A., Abdur, R. N., Morris, S. K., Saif, U., & Chunara, R. (n.d.).

Publication year

2023

Journal title

PLOS Global Public Health

Volume

3

Issue

9

Page(s)

e0001703
Abstract
Abstract
~

Cohort profile : a large EHR-based cohort with linked pharmacy refill and neighbourhood social determinants of health data to assess heart failure medication adherence

Adhikari, S., Mukhyopadhyay, A., Kolzoff, S., Li, X., Nadel, T., Fitchett, C., Chunara, R., Chunara, R., Dodson, J., Kronish, I., & Blecker, S. B. (n.d.).

Publication year

2023

Journal title

BMJ open

Volume

13

Issue

12
Abstract
Abstract
Purpose Clinic-based or community-based interventions can improve adherence to guideline-directed medication therapies (GDMTs) among patients with heart failure (HF). However, opportunities for such interventions are frequently missed, as providers may be unable to recognise risk patterns for medication non-adherence. Machine learning algorithms can help in identifying patients with high likelihood of non-adherence. While a number of multilevel factors influence adherence, prior models predicting non-adherence have been limited by data availability. We have established an electronic health record (EHR)-based cohort with comprehensive data elements from multiple sources to improve on existing models. We linked EHR data with pharmacy refill data for real-time incorporation of prescription fills and with social determinants data to incorporate neighbourhood factors. Participants Patients seen at a large health system in New York City (NYC), who were >18 years old with diagnosis of HF or reduced ejection fraction (

Disparate Effect Of Missing Mediators On Transportability of Causal Effects

Chunara, R., Chunara, R., Mhasawade, V., & Chunara, R. (n.d.).

Publication year

2023
Abstract
Abstract
Transported mediation effects provide an avenue to understand how upstream interventions (such as improved neighborhood conditions like green spaces) would work differently when applied to different populations as a result of factors that mediate the effects. However, when mediators are missing in the population where the effect is to be transported, these estimates could be biased. We study this issue of missing mediators, motivated by challenges in public health, wherein mediators can be missing, not at random. We propose a sensitivity analysis framework that quantifies the impact of missing mediator data on transported mediation effects. This framework enables us to identify the settings under which the conditional transported mediation effect is rendered insignificant for the subgroup with missing mediator data. Specifically, we provide the bounds on the transported mediation effect as a function of missingness. We then apply the framework to longitudinal data from the Moving to Opportunity Study, a large-scale housing voucher experiment, to quantify the effect of missing mediators on transport effect estimates of voucher receipt, an upstream intervention on living location, in childhood on subsequent risk of mental health or substance use disorder mediated through parental health across sites. Our findings provide a tangible understanding of how much missing data can be withstood for unbiased effect estimates. [Journal_ref: ]

Generalization in Healthcare AI: Evaluation of a Clinical Large Language Model

Chunara, R., Chunara, R., Rahman, S., Jiang, L. Y., Gabriel, S., Aphinyanaphongs, Y., Oermann, E. K., & Chunara, R. (n.d.).

Publication year

2023
Abstract
Abstract
Advances in large language models (LLMs) provide new opportunities in healthcare for improved patient care, clinical decision-making, and enhancement of physician and administrator workflows. However, the potential of these models importantly depends on their ability to generalize effectively across clinical environments and populations, a challenge often underestimated in early development. To better understand reasons for these challenges and inform mitigation approaches, we evaluated ClinicLLM, an LLM trained on [HOSPITAL]'s clinical notes, analyzing its performance on 30-day all-cause readmission prediction focusing on variability across hospitals and patient characteristics. We found poorer generalization particularly in hospitals with fewer samples, among patients with government and unspecified insurance, the elderly, and those with high comorbidities. To understand reasons for lack of generalization, we investigated sample sizes for fine-tuning, note content (number of words per note), patient characteristics (comorbidity level, age, insurance type, borough), and health system aspects (hospital, all-cause 30-day readmission, and mortality rates). We used descriptive statistics and supervised classification to identify features. We found that, along with sample size, patient age, number of comorbidities, and the number of words in notes are all important factors related to generalization. Finally, we compared local fine-tuning (hospital specific), instance-based augmented fine-tuning and cluster-based fine-tuning for improving generalization. Among these, local fine-tuning proved most effective, increasing AUC by 0.25% to 11.74% (most helpful in settings with limited data). Overall, this study provides new insights for enhancing the deployment of large language models in the societally important domain of healthcare, and improving their performance for broader populations. [Journal_ref: ]

Global prevalence and content of information about alcohol use as a cancer risk factor on Twitter

Chunara, R., Chunara, R., King, A. J., Dunbar, N. M., Margolin, D., Chunara, R., Tong, C., Jih-Vieira, L., Matsen, C. B., & Niederdeppe, J. (n.d.).

Publication year

2023

Journal title

Preventive Medicine

Volume

177

Page(s)

107728
Abstract
Abstract
~

Global prevalence and content of information about alcohol use as a cancer risk factor on Twitter

King, A. J., Dunbar, N. M., Margolin, D., Chunara, R., Chunara, R., Tong, C., Jih-Vieira, L., Matsen, C. B., & Niederdeppe, J. (n.d.).

Publication year

2023

Journal title

Preventive Medicine

Volume

177
Abstract
Abstract
Objectives: Alcohol use is a major risk factor for several forms of cancer, though many people have limited knowledge of this link. Public health communicators and cancer advocates desire to increase awareness of this link with the long-term goal of reducing cancer burden. The current study is the first to examine the prevalence and content of information about alcohol use as a cancer risk on social media internationally. Methods: We used a three-phase process (hashtag search, dictionary-based auto-identification of content, and human coding of content) to identify and evaluate information from Twitter posts between January 2019 and December 2021. Results: Our hashtag search retrieved a large set of cancer-related tweets (N = 1,122,397). The automatic search process using an alcohol dictionary identified a small number of messages about cancer that also mentioned alcohol (n = 9061, 0.8%), a number that got small after adjusting for human coded estimates of the dictionary precision (n = 5927, 0.5%). When cancer-related messages also mentioned alcohol, 82% (n = 1003 of 1225 examined through human coding) indicated alcohol use as a risk factor. Coding found rare instances of problematic information (e.g., promotion of alcohol, misinformation) in messages about alcohol use and cancer. Conclusions: Few social media messages about cancer types that can be linked to alcohol mention alcohol as a cancer risk factor. If public health communicators and cancer advocates want to increase knowledge and understanding of alcohol use as a cancer risk factor, efforts will need to be made on social media and through other communication platforms to increase exposure to this information over time.

Impact on Public Health Decision Making by Utilizing Big Data Without Domain Knowledge

Chunara, R., Chunara, R., Zhang, M., Rahman, S., Mhasawade, V., & Chunara, R. (n.d.).

Publication year

2023
Abstract
Abstract
New data sources, and artificial intelligence (AI) methods to extract information from them are becoming plentiful, and relevant to decision making in many societal applications. An important example is street view imagery, available in over 100 countries, and considered for applications such as assessing built environment aspects in relation to community health outcomes. Relevant to such uses, important examples of bias in the use of AI are evident when decision-making based on data fails to account for the robustness of the data, or predictions are based on spurious correlations. To study this risk, we utilize 2.02 million GSV images along with health, demographic, and socioeconomic data from New York City. Initially, we demonstrate that built environment characteristics inferred from GSV labels at the intra-city level may exhibit inadequate alignment with the ground truth. We also find that the average individual-level behavior of physical inactivity significantly mediates the impact of built environment features by census tract, as measured through GSV. Finally, using a causal framework which accounts for these mediators of environmental impacts on health, we find that altering 10% of samples in the two lowest tertiles would result in a 4.17 (95% CI 3.84 to 4.55) or 17.2 (95% CI 14.4 to 21.3) times bigger decrease on the prevalence of obesity or diabetes, than the same proportional intervention on the number of crosswalks by census tract. This work illustrates important issues of robustness and model specification for informing effective allocation of interventions using new data sources. [Journal_ref: ]

Is there a need for graduate-level programmes in health data science? A perspective from Pakistan

Hoodbhoy, Z., Chunara, R., Chunara, R., Waljee, A., AbuBakr, A., & Samad, Z. (n.d.).

Publication year

2023

Journal title

The Lancet Global Health

Volume

11

Issue

1

Page(s)

e23-e25
Abstract
Abstract
~

Making sense of social media data about colorectal cancer screening

Chunara, R., Chunara, R., King, A. J., Margolin, D., Tong, C., Chunara, R., & Niederdeppe, J. (n.d.).

Publication year

2023

Journal title

Journal of the American College of Radiology
Abstract
Abstract
~

Contact

rumi.chunara@nyu.edu 708 Broadway New York, NY, 10003