Rumi Chunara
Associate Professor of Biostatistics
Associate Professor of Computer Science and Engineering, Tandon
Director of Center for Health Data Science
-
Professional overview
-
The overarching goal of Dr. Rumi Chunara's research is to develop computational and statistical approaches for acquiring, integrating and using data to improve population-level public health. She focuses on the design and development of data mining and machine learning methods to address challenges related to data and goals of public health, as well as fairness and ethics in the design and use of data and algorithms embedded in social systems.
At NYU, Dr. Chunara also leads the Chunara Lab, which develops computational and statistical methods across data mining, natural language processing, spatio-temporal analyses and machine learning, to study population health. Previously, she was a Postdoctoral Fellow and Instructor at HealthMap and the Children's Hospital Informatics Program at Harvard Medical School. She completed her PhD at the Harvard-MIT Division of Health Sciences and Technology and BSc at Caltech.
-
Education
-
BS, Electrical Engineering (Honors), CaltechMS, Electrical Engineering and Computer Science, MITPhD, Medical and Electrical Engineering, MIT (Harvard-MIT Division of Health Sciences and Technology)
-
Honors and awards
-
Max Planck Sabbatical Award (2021)speaker at NSF Computer and Information Science and Engineering Directorate Career Proposal Writing Workshop (2020)Invited tutorial on Public Health and Machine Learning at ACM Conference on Health, Inference and Learning (2020)Keynote at Human Computation and Crowdsourcing (2019)Invited Speaker at Expert Group Meeting at United Nations Population Fund, Advances in Mobile Technologies for Data Collection Panel (2019)Keynote at ''Mapping the Equity Dimensions of Artificial Intelligence in Public Health'', University of Toronto (2019)Facebook Research Award (2019)Gates Foundation Grand Challenges Exploration Award (2019)NSF CAREER Award (2019)MIT Technology Review Top 35 Innovators Under 35 (2014)MIT Presidential Fellow (2004)
-
Areas of research and study
-
Health DisparitiesMachine learningSocial ComputingSocial Determinants of Health
-
Publications
Publications
Association between visit frequency, continuity of care, and pharmacy fill adherence in heart failure patients
Hamo, C. E., Mukhopadhyay, A., Li, X., Zheng, Y., Kronish, I. M., Chunara, R., Dodson, J., Adhikari, S., & Blecker, S. (n.d.).Publication year
2024Journal title
American Heart JournalVolume
273Page(s)
53-60AbstractBackground: Despite advances in medical therapy for heart failure with reduced ejection fraction (HFrEF), major gaps in medication adherence to guideline-directed medical therapies (GDMT) remain. Greater continuity of care may impact medication adherence and reduced hospitalizations. Methods: We conducted a cross-sectional study of adults with a diagnosis of HF and EF ≤40% with ≥2 outpatient encounters between January 1, 2017 and January 10, 2021, prescribed ≥1 of the following GDMT: 1) Beta Blocker, 2) Angiotensin Converting Enzyme Inhibitor/Angiotensin Receptor Blocker/Angiotensin Receptor Neprilysin Inhibitor, 3) Mineralocorticoid Receptor Antagonist, 4) Sodium Glucose Cotransporter-2 Inhibitor. Continuity of care was calculated using the Bice-Boxerman Continuity of Care Index (COC) and the Usual Provider of Care (UPC) index, categorized by quantile. The primary outcome was adherence to GDMT, defined as average proportion of days covered ≥80% over 1 year. Secondary outcomes included all-cause and HF hospitalization at 1-year. We performed multivariable logistic regression analyses adjusted for demographics, insurance status, comorbidity index, number of visits and neighborhood SES index. Results: Overall, 3,971 individuals were included (mean age 72 years (SD 14), 71% male, 66% White race). In adjusted analyses, compared to individuals in the highest COC quartile, individuals in the third COC quartile had higher odds of GDMT adherence (OR 1.26, 95% CI 1.03-1.53, P = .024). UPC tertile was not associated with adherence (all P > .05). Compared to the highest quantiles, the lowest UPC and COC quantiles had higher odds of all-cause (UPC: OR 1.53, 95%CI 1.23-1.91; COC: OR 2.54, 95%CI 1.94-3.34) and HF (UPC: OR 1.81, 95%CI 1.23-2.67; COC: OR 1.77, 95%CI 1.09-2.95) hospitalizations. Conclusions: Continuity of care was not associated with GDMT adherence among patients with HFrEF but lower continuity of care was associated with increased all-cause and HF-hospitalizations.Associations between news coverage, social media discussions, and search trends about celebrity deaths, screening, and other colorectal cancer-related events
Liu, J., Niederdeppe, J., Tong, C., Margolin, D., Chunara, R., Smith, T., & King, A. J. (n.d.).Publication year
2024Journal title
Preventive MedicineVolume
185AbstractObjective: Colorectal cancer (CRC) is the third leading cause of cancer death among both men and women in the United States. CRC-related events may increase media coverage and public attention, boosting awareness and prevention. This study examined associations between several types of CRC events (including unplanned celebrity cancer deaths and planned events like national CRC awareness months, celebrity screening behavior, and screening guideline changes) and news coverage, Twitter discussions, and Google search trends about CRC and CRC screening. Methods: We analyzed data from U.S. national news media outlets, posts scraped from Twitter, and Google Trends on CRC and CRC screening during a three-year period from 2020 to 2022. We used burst detection methods to identify temporal spikes in the volume of news, tweets, and search after each CRC-related event. Results: There is a high level of heterogeneity in the impact of celebrity CRC events. Celebrity CRC deaths were more likely to precede spikes in news and tweets about CRC overall than CRC screening. Celebrity screening preceded spikes in news and tweets about screening but not searches. Awareness months and screening guideline changes did precede spikes in news, tweets, and searches about screening, but these spikes were inconsistent, not simultaneous, and not as large as those events concerning most prominent public figures. Conclusions: CRC events provide opportunities to increase attention to CRC. Media and public health professionals should actively intervene during CRC events to increase emphasis on CRC screening and evidence-based recommendations.Constructing Social Vulnerability Indexes with Increased Data and Machine Learning Highlight the Importance of Wealth Across Global Contexts
Zhao, Y., Paul, R., Reid, S., Coimbra Vieira, C., Wolfe, C., Zhang, Y., & Chunara, R. (n.d.).Publication year
2024Journal title
Social Indicators ResearchVolume
175Issue
2Page(s)
639-657AbstractWe consider the availability of new harmonized data sources and novel machine learning methodologies in the construction of a social vulnerability index (SoVI), a multidimensional measure that defines how individuals’ and communities may respond to hazards including natural disasters, economic changes, and global health crises. The factors underpinning social vulnerability—namely, economic status, age, disability, language, ethnicity, and location—are well understood from a theoretical perspective, and existing indices are generally constructed based on specific data chosen to represent these factors. Further, the indices’ construction methods generally assume structured, linear relationships among input variables and may not capture subtle nonlinear patterns more reflective of the multidimensionality of social vulnerability. We compare a procedure which considers an increased number of variables to describe the SoVI factors with existing approaches that choose specific variables based on consensus within the social science community. Reproducing the analysis across eight countries, as well as leveraging deep learning methods which in recent years have been found to be powerful for finding structure in data, demonstrate that wealth-related factors consistently explain the largest variance and are the most common element in social vulnerability.Making Sense of Social Media Data About Colorectal Cancer Screening
King, A. J., Margolin, D., Tong, C., Chunara, R., & Niederdeppe, J. (n.d.). In Journal of the American College of Radiology (1–).Publication year
2024Volume
21Issue
4Page(s)
543-544Utilizing big data without domain knowledge impacts public health decision-making
Zhang, M., Rahman, S., Mhasawade, V., & Chunara, R. (n.d.).Publication year
2024Journal title
Proceedings of the National Academy of Sciences of the United States of AmericaVolume
121Issue
39AbstractNew data sources and AI methods for extracting information are increasingly abundant and relevant to decision-making across societal applications. A notable example is street view imagery, available in over 100 countries, and purported to inform built environment interventions (e.g., adding sidewalks) for community health outcomes. However, biases can arise when decision-making does not account for data robustness or relies on spurious correlations. To investigate this risk, we analyzed 2.02 million Google Street View (GSV) images alongside health, demographic, and socioeconomic data from New York City. Findings demonstrate robustness challenges; built environment characteristics inferred from GSV labels at the intracity level often do not align with ground truth. Moreover, as average individual-level behavior of physical inactivity significantly mediates the impact of built environment features by census tract, intervention on features measured by GSV would be misestimated without proper model specification and consideration of this mediation mechanism. Using a causal framework accounting for these mediators, we determined that intervening by improving 10% of samples in the two lowest tertiles of physical inactivity would lead to a 4.17 (95% CI 3.84–4.55) or 17.2 (95% CI 14.4–21.3) times greater decrease in the prevalence of obesity or diabetes, respectively, compared to the same proportional intervention on the number of crosswalks by census tract. This study highlights critical issues of robustness and model specification in using emergent data sources, showing the data may not measure what is intended, and ignoring mediators can result in biased intervention effect estimates.Area-based determinants of outreach vaccination for reaching vulnerable populations: A cross-sectional study in Pakistan
Chen, X., Porter, A., Rehman, N. A., Morris, S. K., Saif, U., & Chunara, R. (n.d.).Publication year
2023Journal title
PLOS Global Public HealthVolume
3Issue
9AbstractThe objective of this study is to gain a comparative understanding of spatial determinants for outreach and clinic vaccination, which is critical for operationalizing efforts and breaking down structural biases; particularly relevant in countries where resources are low, and sub-region variance is high. Leveraging a massive effort to digitize public system reporting by Lady and Community Health Workers (CHWs) with geo-located data on over 4 million public-sector vaccinations from September 2017 through 2019, understanding health service operations in relation to vulnerable spatial determinants were made feasible. Location and type of vaccinations (clinic or outreach) were compared to regional spatial attributes where they were performed. Important spatial attributes were assessed using three modeling approaches (ridge regression, gradient boosting, and a generalized additive model). Consistent predictors for outreach, clinic, and proportion of third dose pentavalent vaccinations by region were identified. Of all Penta-3 vaccination records, 86.3% were performed by outreach efforts. At the tehsil level (fourth-order administrative unit), controlling for child population, population density, proportion of population in urban areas, distance to cities, average maternal education, and other relevant factors, increased poverty was significantly associated with more in-clinic vaccinations (β = 0.077), and lower proportion of outreach vaccinations by region (β = -0.083). Analyses at the union council level (fifth-administrative unit) showed consistent results for the differential importance of poverty for outreach versus clinic vaccination. Relevant predictors for each type of vaccination (outreach vs. in-clinic) show how design of outreach vaccination can effectively augment vaccination efforts beyond healthcare services through clinics. As Pakistan is third among countries with the most unvaccinated and under-vaccinated children, understanding barriers and factors associated with vaccination can be demonstrative for other national and sub-national regions facing challenges and also inform guidelines on supporting CHWs in health systems.Cohort profile: a large EHR-based cohort with linked pharmacy refill and neighbourhood social determinants of health data to assess heart failure medication adherence
Adhikari, S., Mukhyopadhyay, A., Kolzoff, S., Li, X., Nadel, T., Fitchett, C., Chunara, R., Dodson, J., Kronish, I., & Blecker, S. B. (n.d.).Publication year
2023Journal title
BMJ openVolume
13Issue
12AbstractPurpose Clinic-based or community-based interventions can improve adherence to guideline-directed medication therapies (GDMTs) among patients with heart failure (HF). However, opportunities for such interventions are frequently missed, as providers may be unable to recognise risk patterns for medication non-adherence. Machine learning algorithms can help in identifying patients with high likelihood of non-adherence. While a number of multilevel factors influence adherence, prior models predicting non-adherence have been limited by data availability. We have established an electronic health record (EHR)-based cohort with comprehensive data elements from multiple sources to improve on existing models. We linked EHR data with pharmacy refill data for real-time incorporation of prescription fills and with social determinants data to incorporate neighbourhood factors. Participants Patients seen at a large health system in New York City (NYC), who were >18 years old with diagnosis of HF or reduced ejection fraction (<40%) since 2017, had at least one clinical encounter between 1 April 2021 and 31 October 2022 and active prescriptions for any of the four GDMTs (beta-blocker, ACEi/angiotensin receptor blocker (ARB)/angiotensin receptor neprilysin inhibitor (ARNI), mineralocorticoid receptor antagonist (MRA) and sodium-glucose cotransporter 2 inhibitor (SGLT2i)) during the study period. Patients with non-geocodable address or outside the continental USA were excluded. Findings to date Among 39 963 patients in the cohort, the average age was 73±14 years old, 44% were female and 48% were current/former smokers. The common comorbid conditions were hypertension (77%), cardiac arrhythmias (56%), obesity (33%) and valvular disease (33%). During the study period, 33 606 (84%) patients had an active prescription of beta blocker, 32 626 (82%) had ACEi/ARB/ARNI, 11 611 (29%) MRA and 7472 (19%) SGLT2i. Ninety-nine per cent were from urban metropolitan areas. Future plans We will use the established cohort to develop a machine learning model to predict medication adherence, and to support ancillary studies assessing associates of adherence. For external validation, we will include data from an additional hospital system in NYC.Global prevalence and content of information about alcohol use as a cancer risk factor on Twitter
King, A. J., Dunbar, N. M., Margolin, D., Chunara, R., Tong, C., Jih-Vieira, L., Matsen, C. B., & Niederdeppe, J. (n.d.).Publication year
2023Journal title
Preventive MedicineVolume
177AbstractObjectives: Alcohol use is a major risk factor for several forms of cancer, though many people have limited knowledge of this link. Public health communicators and cancer advocates desire to increase awareness of this link with the long-term goal of reducing cancer burden. The current study is the first to examine the prevalence and content of information about alcohol use as a cancer risk on social media internationally. Methods: We used a three-phase process (hashtag search, dictionary-based auto-identification of content, and human coding of content) to identify and evaluate information from Twitter posts between January 2019 and December 2021. Results: Our hashtag search retrieved a large set of cancer-related tweets (N = 1,122,397). The automatic search process using an alcohol dictionary identified a small number of messages about cancer that also mentioned alcohol (n = 9061, 0.8%), a number that got small after adjusting for human coded estimates of the dictionary precision (n = 5927, 0.5%). When cancer-related messages also mentioned alcohol, 82% (n = 1003 of 1225 examined through human coding) indicated alcohol use as a risk factor. Coding found rare instances of problematic information (e.g., promotion of alcohol, misinformation) in messages about alcohol use and cancer. Conclusions: Few social media messages about cancer types that can be linked to alcohol mention alcohol as a cancer risk factor. If public health communicators and cancer advocates want to increase knowledge and understanding of alcohol use as a cancer risk factor, efforts will need to be made on social media and through other communication platforms to increase exposure to this information over time.National cervical cancer burden estimation through systematic review and analysis of publicly available data in Pakistan
Chughtai, N., Perveen, K., Gillani, S. R., Abbas, A., Chunara, R., Manji, A. A., Karani, S., Noorali, A. A., Zakaria, M., Shamsi, U., Chishti, U., Khan, A. A., Soofi, S., Pervez, S., & Samad, Z. (n.d.).Publication year
2023Journal title
BMC public healthVolume
23Issue
1AbstractBackground: Cervical cancer is a major cause of cancer-related deaths among women worldwide. Paucity of data on cervical cancer burden in countries like Pakistan hamper requisite resource allocation. Objective: To estimate the burden of cervical cancer in Pakistan using available data sources. Methods: We performed a systematic review to identify relevant data on Pakistan between 1995 to 2022. Study data identified through the systematic review that provided enough information to allow age specific incidence rates and age standardized incidence rates (ASIR) calculations for cervical cancer were merged. Population at risk estimates were derived and adjusted for important variables in the care-seeking pathway. The calculated ASIRs were applied to 2020 population estimates to estimate the number of cervical cancer cases in Pakistan. Results: A total of 13 studies reported ASIRs for cervical cancer for Pakistan. Among the studies selected, the Karachi Cancer Registry reported the highest disease burden estimates for all reported time periods: 1995–1997 ASIR = 6.81, 1998–2002 ASIR = 7.47, and 2017–2019 ASIR = 6.02 per 100,000 women. Using data from Karachi, Punjab and Pakistan Atomic Energy Cancer Registries from 2015–2019, we derived an unadjusted ASIR for cervical cancer of 4.16 per 100,000 women (95% UI 3.28, 5.28). Varying model assumptions produced adjusted ASIRs ranging from 5.2 to 8.4 per 100,000 women. We derived an adjusted ASIR of 7.60, (95% UI 5.98, 10.01) and estimated 6166 (95% UI 4833, 8305) new cases of cervical cancer per year. Conclusion: The estimated cervical cancer burden in Pakistan is higher than the WHO target. Estimates are sensitive to health seeking behavior, and appropriate physician diagnostic intervention, factors that are relevant to the case of cervical cancer, a stigmatized disease in a low-lower middle income country setting. These estimates make the case for approaching cervical cancer elimination through a multi-pronged strategy.Neighborhood-Level Socioeconomic Status and Prescription Fill Patterns among Patients with Heart Failure
Mukhopadhyay, A., Blecker, S., Li, X., Kronish, I. M., Chunara, R., Zheng, Y., Lawrence, S., Dodson, J. A., Kozloff, S., & Adhikari, S. (n.d.).Publication year
2023Journal title
JAMA network openVolume
6Issue
12Page(s)
E2347519AbstractImportance: Medication nonadherence is common among patients with heart failure with reduced ejection fraction (HFrEF) and can lead to increased hospitalization and mortality. Patients living in socioeconomically disadvantaged areas may be at greater risk for medication nonadherence due to barriers such as lower access to transportation or pharmacies. Objective: To examine the association between neighborhood-level socioeconomic status (nSES) and medication nonadherence among patients with HFrEF and to assess the mediating roles of access to transportation, walkability, and pharmacy density. Design, Setting, and Participants: This retrospective cohort study was conducted between June 30, 2020, and December 31, 2021, at a large health system based primarily in New York City and surrounding areas. Adult patients with a diagnosis of HF, reduced EF on echocardiogram, and a prescription of at least 1 guideline-directed medical therapy (GDMT) for HFrEF were included. Exposure: Patient addresses were geocoded, and nSES was calculated using the Agency for Healthcare Research and Quality SES index, which combines census-tract level measures of poverty, rent burden, unemployment, crowding, home value, and education, with higher values indicating higher nSES. Main Outcomes and Measures: Medication nonadherence was obtained through linkage of health record prescription data with pharmacy fill data and was defined as proportion of days covered (PDC) of less than 80% over 6 months, averaged across GDMT medications. Results: Among 6247 patients, the mean (SD) age was 73 (14) years, and majority were male (4340 [69.5%]). There were 1011 (16.2%) Black participants, 735 (11.8%) Hispanic/Latinx participants, and 3929 (62.9%) White participants. Patients in lower nSES areas had higher rates of nonadherence, ranging from 51.7% in the lowest quartile (731 of 1086 participants) to 40.0% in the highest quartile (563 of 1086 participants) (P <.001). In adjusted analysis, patients living in the lower 2 nSES quartiles had significantly higher odds of nonadherence when compared with patients living in the highest nSES quartile (quartile 1: odds ratio [OR], 1.57 [95% CI, 1.35-1.83]; quartile 2: OR, 1.35 [95% CI, 1.16-1.56]). No mediation by access to transportation and pharmacy density was found, but a small amount of mediation by neighborhood walkability was observed. Conclusions and Relevance: In this retrospective cohort study of patients with HFrEF, living in a lower nSES area was associated with higher rates of GDMT nonadherence. These findings highlight the importance of considering neighborhood-level disparities when developing approaches to improve medication adherence..Neighborhood-Level Socioeconomic Status and Prescription Fill Patterns Among Patients With Heart Failure
Mukhopadhyay, A., Blecker, S., Li, X., Kronish, I. M., Chunara, R., Zheng, Y., Lawrence, S., Dodson, J. A., Kozloff, S., & Adhikari, S. (n.d.).Publication year
2023Journal title
JAMA network openVolume
6Issue
12Page(s)
e2347519AbstractIMPORTANCE: Medication nonadherence is common among patients with heart failure with reduced ejection fraction (HFrEF) and can lead to increased hospitalization and mortality. Patients living in socioeconomically disadvantaged areas may be at greater risk for medication nonadherence due to barriers such as lower access to transportation or pharmacies.OBJECTIVE: To examine the association between neighborhood-level socioeconomic status (nSES) and medication nonadherence among patients with HFrEF and to assess the mediating roles of access to transportation, walkability, and pharmacy density.DESIGN, SETTING, AND PARTICIPANTS: This retrospective cohort study was conducted between June 30, 2020, and December 31, 2021, at a large health system based primarily in New York City and surrounding areas. Adult patients with a diagnosis of HF, reduced EF on echocardiogram, and a prescription of at least 1 guideline-directed medical therapy (GDMT) for HFrEF were included.EXPOSURE: Patient addresses were geocoded, and nSES was calculated using the Agency for Healthcare Research and Quality SES index, which combines census-tract level measures of poverty, rent burden, unemployment, crowding, home value, and education, with higher values indicating higher nSES.MAIN OUTCOMES AND MEASURES: Medication nonadherence was obtained through linkage of health record prescription data with pharmacy fill data and was defined as proportion of days covered (PDC) of less than 80% over 6 months, averaged across GDMT medications.RESULTS: Among 6247 patients, the mean (SD) age was 73 (14) years, and majority were male (4340 [69.5%]). There were 1011 (16.2%) Black participants, 735 (11.8%) Hispanic/Latinx participants, and 3929 (62.9%) White participants. Patients in lower nSES areas had higher rates of nonadherence, ranging from 51.7% in the lowest quartile (731 of 1086 participants) to 40.0% in the highest quartile (563 of 1086 participants) (P < .001). In adjusted analysis, patients living in the lower 2 nSES quartiles had significantly higher odds of nonadherence when compared with patients living in the highest nSES quartile (quartile 1: odds ratio [OR], 1.57 [95% CI, 1.35-1.83]; quartile 2: OR, 1.35 [95% CI, 1.16-1.56]). No mediation by access to transportation and pharmacy density was found, but a small amount of mediation by neighborhood walkability was observed.CONCLUSIONS AND RELEVANCE: In this retrospective cohort study of patients with HFrEF, living in a lower nSES area was associated with higher rates of GDMT nonadherence. These findings highlight the importance of considering neighborhood-level disparities when developing approaches to improve medication adherence.Prevalence of familial hypercholesterolemia in a country-wide laboratory network in Pakistan: 10-year data from 988, 306 patients
Farhad, A., Noorali, A. A., Tajuddin, S., Khan, S. D., Ali, M., Chunara, R., Khan, A. H., Zafar, A., Merchant, A., Bokhari, S. S., Virani, S. S., & Samad, Z. (n.d.).Publication year
2023Journal title
Progress in Cardiovascular DiseasesVolume
79Page(s)
19-27AbstractIntroduction: Familial hypercholesterolemia (FH) is a modifiable risk factor for premature coronary heart disease but is poorly diagnosed and treated. We leveraged a large laboratory network in Pakistan to study the prevalence, gender and geographic distribution of FH. Methodology: Data were curated from the Aga Khan University Hospital clinical laboratories, which comprises of 289 laboratories and collection points spread over 94 districts. Clinically ordered lipid profiles from 1st January 2009 to 30th June 2018 were included and data on 1,542,281 LDL-C values was extracted. We used the Make Early Diagnosis to Prevent Early Death (MEDPED) criteria to classify patients as FH and reported data on patients with low-density liporotein -cholesterol (LDL-C) ≥ 190 mg/dL. FH cases were also examined by their spatial distribution. Results: After applying exclusions, the final sample included 988,306 unique individuals, of which 24,273 individuals (1:40) had LDL-C values of ≥190 mg/dL. Based on the MEDPED criteria, 2416 individuals (1:409) had FH. FH prevalence was highest in individuals 10–19 years (1:40) and decreased as the patient age increased. Among individuals ≥40 years, the prevalence of FH was higher for females compared with males (1:755 vs 1:1037, p < 0.001). Median LDL-C for the overall population was 112 mg/dL (IQR = 88-136 mg/dL). The highest prevalence after removing outliers was observed in Rajan Pur district (1.23% [0.70–2.10%]) in Punjab province, followed by Mardan (1.18% [0.80–1.70%]) in Khyber Pakhtunkhwa province, and Okara (0.99% [0.50–1.80%]) in Punjab province. Conclusion: There is high prevalence of actionable LDL-C values in lipid samples across a large network of laboratories in Pakistan. Variable FH prevalence across geographic locations in Pakistan may need to be explored at the population level for intervention and management of contributory factors. Efforts at early diagnosis and treatment of FH are urgently needed.Structural racism and homophobia evaluated through social media sentiment combined with activity spaces and associations with mental health among young sexual minority men
Duncan, D. T., Cook, S. H., Wood, E. P., Regan, S. D., Chaix, B., Tian, Y., & Chunara, R. (n.d.).Publication year
2023Journal title
Social Science and MedicineVolume
320AbstractBackground: Research suggests that structural racism and homophobia are associated with mental well-being. However, structural discrimination measures which are relevant to lived experiences and that evade self-report biases are needed. Social media and global-positioning systems (GPS) offer opportunity to measure place-based negative racial sentiment linked to relevant locations via precise geo-coding of activity spaces. This is vital for young sexual minority men (YSMM) of color who may experience both racial and sexual minority discrimination and subsequently poorer mental well-being. Methods: P18 Neighborhood Study (n = 147) data were used. Measures of place-based negative racial and sexual-orientation sentiment were created using geo-located social media as a proxy for racial climate via socially-meaningfully-defined places. Exposure to place-based negative sentiment was computed as an average of discrimination by places frequented using activity space measures per person. Outcomes were number of days of reported poor mental health in last 30 days. Zero-inflated Poisson regression analyses were used to assess influence of and type of relationship between place-based negative racial or sexual-orientation sentiment exposure and mental well-being, including the moderating effect of race/ethnicity. Results: We found evidence for a non-linear relationship between place-based negative racial sentiment and mental well-being among our racially and ethnically diverse sample of YSMM (p <.05), and significant differences in the relationship for different race/ethnicity groups (p <.05). The most pronounced differences were detected between Black and White non-Hispanic vs. Hispanic sexual minority men. At two standard deviations above the overall mean of negative racial sentiment exposure based on activity spaces, Black and White YSMM reported significantly more poor mental health days in comparison to Hispanic YSMM. Conclusions: Effects of discrimination can vary by race/ethnicity and discrimination type. Experiencing place-based negative racial sentiment may have implications for mental well-being among YSMM regardless of race/ethnicity, which should be explored in future research including with larger samples sizes.Association Between Copayment Amount and Filling of Medications for Angiotensin Receptor Neprilysin Inhibitors in Patients With Heart Failure
Mukhopadhyay, A., Adhikari, S., Li, X., Dodson, J. A., Kronish, I. M., Shah, B., Ramatowski, M., Chunara, R., Kozloff, S., & Blecker, S. (n.d.).Publication year
2022Journal title
Journal of the American Heart AssociationVolume
11Issue
24AbstractBACKGROUND: Angiotensin receptor neprilysin inhibitors (ARNI) reduce mortality and hospitalization for patients with heart failure. However, relatively high copayments for ARNI may contribute to suboptimal adherence, thus potentially limiting their benefits. METHODS AND RESULTS: We conducted a retrospective cohort study within a large, multi-site health system. We included patients with: ARNI prescription between November 20, 2020 and June 30, 2021; diagnosis of heart failure or left ventricular ejection fraction ≤40%; and available pharmacy or pharmacy benefit manager copayment data. The primary exposure was copayment, categorized as $0, $0.01 to $10, $10.01 to $100, and >$100. The primary outcome was prescription fill nonadherence, defined as the proportion of days covered <80% over 6 months. We assessed the association between copayment and nonadherence using multivariable logistic regression, and nonbinarized proportion of days covered using multivariable Poisson regression, adjusting for demographic, clinical, and neighborhood-level covariates. A total of 921 patients met inclusion criteria, with 192 (20.8%) having $0 copayment, 228 (24.8%) with $0.01 to $10 copayment, 206 (22.4%) with $10.01 to $100, and 295 (32.0%) with >$100. Patients with higher copayments had higher rates of nonadherence, ranging from 17.2% for $0 copayment to 34.2% for copayment >$100 (P<0.001). After multivariable adjustment, odds of nonadherence were significantly higher for copayment of $10.01 to $100 (odds ratio [OR], 1.93 [95% CI, 1.15– 3.27], P=0.01) or >$100 (OR, 2.58 [95% CI, 1.63– 4.18], P<0.001), as compared with $0 copayment. Similar associations were seen when assessing proportion of days covered as a proportion. CONCLUSIONS: We found higher rates of not filling ARNI prescriptions among patients with higher copayments, which persisted after multivariable adjustment. Our findings support future studies to assess whether reducing copayments can increase adherence to ARNI and improve outcomes for heart failure.Association of U.S. birth, duration of residence in the U.S., and atherosclerotic cardiovascular disease risk factors among Asian adults
Al Rifai, M., Kianoush, S., Jain, V., Joshi, P. H., Cainzos-Achirica, M., Nasir, K., Merchant, A. T., Dodani, S., Wong, S. S., Samad, Z., Mehta, A., Chunara, R., Kalra, A., & Virani, S. S. (n.d.).Publication year
2022Journal title
Preventive Medicine ReportsVolume
29AbstractIntroduction: Prior studies have shown a direct association between U.S. birth and duration of residence with atherosclerotic cardiovascular disease (ASCVD) though, few have specifically focused on Asian Americans. Methods: We utilized cross-sectional data from the 2006 to 2015 National Health Interview Survey. We compared prevalent cardiovascular risk factors and ASCVD among Asian American individuals by U.S. birth and duration of time spent in the U.S. Results: The study sample consisted of 18,150 Asian individuals of whom 20.5 % were Asian Indian, 20.5 % were Chinese, 23.4 % were Filipino, and 35.6 % were of other Asian ethnic groups. The mean (standard error) age was 43.8 (0.21) years and 53 % were women. In multivariable-adjusted logistic regression models, U.S. birth was associated with a higher prevalence odds ratio (95 % confidence interval) of current smoking 1.31 (1.07,1.60), physical inactivity 0.62 (0.54,0.72), obesity 2.26 (1.91,2.69), hypertension 1.33 (1.12,1.58), and CAD 1.96 (1.24,3.11), but lower prevalence of stroke 0.28 (0.11,0.71). Spending greater than 15 years in the U.S. was associated with a higher prevalence of current smoking 1.65 (1.24,2.21), obesity 2.33 (1.57,3.47), diabetes 2.68 (1.17,6.15), and hyperlipidemia 1.72 (1.09,2.71). Conclusion: Heterogeneity exists in cardiovascular risk factor burden among Asian Americans according to Asian ethnicity, U.S. birth, and duration of time living in the U.S.Building Public Health Surveillance 3.0: Emerging Timely Measures of Physical, Economic, and Social Environmental Conditions Affecting Health
Thorpe, L. E., Chunara, R., Roberts, T., Pantaleo, N., Irvine, C., Conderino, S., Li, Y., Hsieh, P. Y., Gourevitch, M. N., Levine, S., Ofrane, R., & Spoer, B. (n.d.).Publication year
2022Journal title
American journal of public healthVolume
112Issue
10Page(s)
1436-1445AbstractIn response to rapidly changing societal conditions stemming from the COVID-19 pandemic, we summarize data sources with potential to produce timely and spatially granular measures of physical, economic, and social conditions relevant to public health surveillance, and we briefly describe emerging analytic methods to improve small-area estimation. To inform this article, we reviewed published systematic review articles set in the United States from 2015 to 2020 and conducted unstructured interviews with senior content experts in public heath practice, academia, and industry. We identified a modest number of data sources with high potential for generating timely and spatially granular measures of physical, economic, and social determinants of health. We also summarized modeling and machine-learning techniques useful to support development of time-sensitive surveillance measures that may be critical for responding to future major events such as the COVID-19 pandemic.Discrimination is associated with C-reactive protein among young sexual minority men
Cook, S. H., Slopen, N., Scarimbolo, L., Mirin, N., Wood, E. P., Rosendale, N., Chunara, R., Burke, C. W., & Halkitis, P. N. (n.d.).Publication year
2022Journal title
Journal of Behavioral MedicineVolume
45Issue
4Page(s)
649-657AbstractThis report examines associations between everyday discrimination, microaggressions, and CRP to gain insight on potential mechanisms that may underlie increased CVD risk among sexual minority male young adults. The sample consisted of 60 participants taken from the P18 cohort between the ages of 24 and 28 years. Multinomial logistic regression models were used to examine the association between perceived everyday discrimination and LGBQ microaggressions with C-reactive protein cardiovascular risk categories of low-, average-, and high-risk, as defined by the American Heart Association and Centers for Disease Control. Adjustments were made for BMI. Individuals who experienced more everyday discrimination had a higher risk of being classified in the high-risk CRP group compared to the low-risk CRP group (RRR = 3.35, p = 0.02). Interpersonal LGBQ microaggressions were not associated with CRP risk category. Everyday discrimination, but not specific microaggressions based on sexual orientation, were associated with elevated levels of CRP among young sexual minority men (YSMM). Thus, to implement culturally and age-appropriate interventions, further researcher is needed to critically examine the specific types of discrimination and the resultant impact on YSMM’s health.Evidence for Telemedicine’s Ongoing Transformation of Health Care Delivery Since the Onset of COVID-19: Retrospective Observational Study
Mandal, S., Wiesenfeld, B. M., Mann, D., Lawrence, K., Chunara, R., Testa, P., & Nov, O. (n.d.).Publication year
2022Journal title
JMIR Formative ResearchVolume
6Issue
10AbstractBackground: The surge of telemedicine use during the early stages of the COVID-19 pandemic has been well documented. However, scarce evidence considers the use of telemedicine in the subsequent period. Objective: This study aims to evaluate use patterns of video-based telemedicine visits for ambulatory care and urgent care provision over the course of recurring pandemic waves in 1 large health system in New York City (NYC) and what this means for health care delivery. Methods: Retrospective electronic health record (EHR) data of patients from January 1, 2020, to February 28, 2022, were used to longitudinally track and analyze telemedicine and in-person visit volumes across ambulatory care specialties and urgent care, as well as compare them to a prepandemic baseline (June-November 2019). Diagnosis codes to differentiate suspected COVID-19 visits from non–COVID-19 visits, as well as evaluating COVID-19–based telemedicine use over time, were compared to the total number of COVID-19–positive cases in the same geographic region (city level). The time series data were segmented based on change-point analysis, and variances in visit trends were compared between the segments. Results: The emergence of COVID-19 prompted an early increase in the number of telemedicine visits across the urgent care and ambulatory care settings. This use continued throughout the pandemic at a much higher level than the prepandemic baseline for both COVID-19 and non–COVID-19 suspected visits, despite the fluctuation in COVID-19 cases throughout the pandemic and the resumption of in-person clinical services. The use of telemedicine-based urgent care services for COVID-19 suspected visits showed more variance in response to each pandemic wave, but telemedicine visits for ambulatory care have remained relatively steady after the initial crisis period. During the Omicron wave, the use of all visit types, including in-person activities, decreased. Patients between 25 and 34 years of age were the largest users of telemedicine-based urgent care. Patient satisfaction with telemedicine-based urgent care remained high despite the rapid scaling of services to meet increased demand. Conclusions: The trend of the increased use of telemedicine as a means of health care delivery relative to the pre–COVID-19 baseline has been maintained throughout the later pandemic periods despite fluctuating COVID-19 cases and the resumption of in-person care delivery. Overall satisfaction with telemedicine-based care is also high. The trends in telemedicine use suggest that telemedicine-based health care delivery has become a mainstream and sustained supplement to in-person-based ambulatory care, particularly for younger patients, for both urgent and nonurgent care needs. These findings have implications for the health care delivery system, including practice leaders, insurers, and policymakers. Further investigation is needed to evaluate telemedicine adoption by key demographics, identify ongoing barriers to adoption, and explore the impacts of sustained use of telemedicine on health care outcomes and experience.Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database
Singh, H., Mhasawade, V., & Chunara, R. (n.d.).Publication year
2022Journal title
PLOS Digital HealthVolume
1Issue
4AbstractModern predictive models require large amounts of data for training and evaluation, absence of which may result in models that are specific to certain locations, populations in them and clinical practices. Yet, best practices for clinical risk prediction models have not yet considered such challenges to generalizability. Here we ask whether population- and group-level performance of mortality prediction models vary significantly when applied to hospitals or geographies different from the ones in which they are developed. Further, what characteristics of the datasets explain the performance variation? In this multi-center cross-sectional study, we analyzed electronic health records from 179 hospitals across the US with 70,126 hospitalizations from 2014 to 2015. Generalization gap, defined as difference between model performance metrics across hospitals, is computed for area under the receiver operating characteristic curve (AUC) and calibration slope. To assess model performance by the race variable, we report differences in false negative rates across groups. Data were also analyzed using a causal discovery algorithm “Fast Causal Inference” that infers paths of causal influence while identifying potential influences associated with unmeasured variables. When transferring models across hospitals, AUC at the test hospital ranged from 0.777 to 0.832 (1st-3rd quartile or IQR; median 0.801); calibration slope from 0.725 to 0.983 (IQR; median 0.853); and disparity in false negative rates from 0.046 to 0.168 (IQR; median 0.092). Distribution of all variable types (demography, vitals, and labs) differed significantly across hospitals and regions. The race variable also mediated differences in the relationship between clinical variables and mortality, by hospital/region. In conclusion, group-level performance should be assessed during generalizability checks to identify potential harms to the groups. Moreover, for developing methods to improve model performance in new environments, a better understanding and documentation of provenance of data and health processes are needed to identify and mitigate sources of variation.Impact of COVID-19 forecast visualizations on pandemic risk perceptions
Padilla, L., Hosseinpour, H., Fygenson, R., Howell, J., Chunara, R., & Bertini, E. (n.d.).Publication year
2022Journal title
Scientific reportsVolume
12Issue
1AbstractPeople worldwide use SARS-CoV-2 (COVID-19) visualizations to make life and death decisions about pandemic risks. Understanding how these visualizations influence risk perceptions to improve pandemic communication is crucial. To examine how COVID-19 visualizations influence risk perception, we conducted two experiments online in October and December of 2020 (N = 2549) where we presented participants with 34 visualization techniques (available at the time of publication on the CDC’s website) of the same COVID-19 mortality data. We found that visualizing data using a cumulative scale consistently led to participants believing that they and others were at more risk than before viewing the visualizations. In contrast, visualizing the same data with a weekly incident scale led to variable changes in risk perceptions. Further, uncertainty forecast visualizations also affected risk perceptions, with visualizations showing six or more models increasing risk estimates more than the others tested. Differences between COVID-19 visualizations of the same data produce different risk perceptions, fundamentally changing viewers’ interpretation of information.Search Term Identification Methods for Computational Health Communication: Word Embedding and Network Approach for Health Content on YouTube
Tong, C., Margolin, D., Chunara, R., Niederdeppe, J., Taylor, T., Dunbar, N., & King, A. J. (n.d.).Publication year
2022Journal title
JMIR Medical InformaticsVolume
10Issue
8AbstractBackground: Common methods for extracting content in health communication research typically involve using a set of well-established queries, often names of medical procedures or diseases, that are often technical or rarely used in the public discussion of health topics. Although these methods produce high recall (ie, retrieve highly relevant content), they tend to overlook health messages that feature colloquial language and layperson vocabularies on social media. Given how such messages could contain misinformation or obscure content that circumvents official medical concepts, correctly identifying (and analyzing) them is crucial to the study of user-generated health content on social media platforms. Objective: Health communication scholars would benefit from a retrieval process that goes beyond the use of standard terminologies as search queries. Motivated by this, this study aims to put forward a search term identification method to improve the retrieval of user-generated health content on social media. We focused on cancer screening tests as a subject and YouTube as a platform case study. Methods: We retrieved YouTube videos using cancer screening procedures (colonoscopy, fecal occult blood test, mammogram, and pap test) as seed queries. We then trained word embedding models using text features from these videos to identify the nearest neighbor terms that are semantically similar to cancer screening tests in colloquial language. Retrieving more YouTube videos from the top neighbor terms, we coded a sample of 150 random videos from each term for relevance. We then used text mining to examine the new content retrieved from these videos and network analysis to inspect the relations between the newly retrieved videos and videos from the seed queries. Results: The top terms with semantic similarities to cancer screening tests were identified via word embedding models. Text mining analysis showed that the 5 nearest neighbor terms retrieved content that was novel and contextually diverse, beyond the content retrieved from cancer screening concepts alone. Results from network analysis showed that the newly retrieved videos had at least one total degree of connection (sum of indegree and outdegree) with seed videos according to YouTube relatedness measures. Conclusions: We demonstrated a retrieval technique to improve recall and minimize precision loss, which can be extended to various health topics on YouTube, a popular video-sharing social media platform. We discussed how health communication scholars can apply the technique to inspect the performance of the retrieval strategy before investing human coding resources and outlined suggestions on how such a technique can be extended to other health contexts.Machine learning and algorithmic fairness in public and population health
Mhasawade, V., Zhao, Y., & Chunara, R. (n.d.).Publication year
2021Journal title
Nature Machine IntelligenceVolume
3Issue
8Page(s)
659-666AbstractUntil now, much of the work on machine learning and health has focused on processes inside the hospital or clinic. However, this represents only a narrow set of tasks and challenges related to health; there is greater potential for impact by leveraging machine learning in health tasks more broadly. In this Perspective we aim to highlight potential opportunities and challenges for machine learning within a holistic view of health and its influences. To do so, we build on research in population and public health that focuses on the mechanisms between different cultural, social and environmental factors and their effect on the health of individuals and communities. We present a brief introduction to research in these fields, data sources and types of tasks, and use these to identify settings where machine learning is relevant and can contribute to new knowledge. Given the key foci of health equity and disparities within public and population health, we juxtapose these topics with the machine learning subfield of algorithmic fairness to highlight specific opportunities where machine learning, public and population health may synergize to achieve health equity.Social Determinants in Machine Learning Cardiovascular Disease Prediction Models: A Systematic Review
Zhao, Y., Wood, E. P., Mirin, N., Cook, S. H., & Chunara, R. (n.d.).Publication year
2021Journal title
American journal of preventive medicineVolume
61Issue
4Page(s)
596-605AbstractIntroduction: Cardiovascular disease is the leading cause of death worldwide, and cardiovascular disease burden is increasing in low-resource settings and for lower socioeconomic groups. Machine learning algorithms are being developed rapidly and incorporated into clinical practice for cardiovascular disease prediction and treatment decisions. Significant opportunities for reducing death and disability from cardiovascular disease worldwide lie with accounting for the social determinants of cardiovascular outcomes. This study reviews how social determinants of health are being included in machine learning algorithms to inform best practices for the development of algorithms that account for social determinants. Methods: A systematic review using 5 databases was conducted in 2020. English language articles from any location published from inception to April 10, 2020, which reported on the use of machine learning for cardiovascular disease prediction that incorporated social determinants of health, were included. Results: Most studies that compared machine learning algorithms and regression showed increased performance of machine learning, and most studies that compared performance with or without social determinants of health showed increased performance with them. The most frequently included social determinants of health variables were gender, race/ethnicity, marital status, occupation, and income. Studies were largely from North America, Europe, and China, limiting the diversity of the included populations and variance in social determinants of health. Discussion: Given their flexibility, machine learning approaches may provide an opportunity to incorporate the complex nature of social determinants of health. The limited variety of sources and data in the reviewed studies emphasize that there is an opportunity to include more social determinants of health variables, especially environmental ones, that are known to impact cardiovascular disease risk and that recording such data in electronic databases will enable their use.Telemedicine and healthcare disparities: a cohort study in a large healthcare system in New York City during COVID-19
Chunara, R., Zhao, Y., Chen, J., Lawrence, K., Testa, P. A., Nov, O., & Mann, D. M. (n.d.).Publication year
2021Journal title
Journal of the American Medical Informatics AssociationVolume
28Issue
1Page(s)
33-41AbstractObjective: Through the coronavirus disease 2019 (COVID-19) pandemic, telemedicine became a necessary entry point into the process of diagnosis, triage, and treatment. Racial and ethnic disparities in healthcare have been well documented in COVID-19 with respect to risk of infection and in-hospital outcomes once admitted, and here we assess disparities in those who access healthcare via telemedicine for COVID-19. Materials and Methods: Electronic health record data of patients at New York University Langone Health between March 19th and April 30, 2020 were used to conduct descriptive and multilevel regression analyses with respect to visit type (telemedicine or in-person), suspected COVID diagnosis, and COVID test results. Results: Controlling for individual and community-level attributes, Black patients had 0.6 times the adjusted odds (95% CI: 0.58-0.63) of accessing care through telemedicine compared to white patients, though they are increasingly accessing telemedicine for urgent care, driven by a younger and female population. COVID diagnoses were significantly more likely for Black versus white telemedicine patients. Discussion: There are disparities for Black patients accessing telemedicine, however increased uptake by young, female Black patients. Mean income and decreased mean household size of a zip code were also significantly related to telemedicine use. Conclusion: Telemedicine access disparities reflect those in in-person healthcare access. Roots of disparate use are complex and reflect individual, community, and structural factors, including their intersection - many of which are due to systemic racism. Evidence regarding disparities that manifest through telemedicine can be used to inform tool design and systemic efforts to promote digital health equity.Comparison of Social Media, Syndromic Surveillance, and Microbiologic Acute Respiratory Infection Data: Observational Study
Daughton, A. R., Chunara, R., & Paul, M. J. (n.d.).Publication year
2020Journal title
JMIR Public Health and SurveillanceVolume
6Issue
2AbstractBackground: Internet data can be used to improve infectious disease models. However, the representativeness and individual-level validity of internet-derived measures are largely unexplored as this requires ground truth data for study. Objective: This study sought to identify relationships between Web-based behaviors and/or conversation topics and health status using a ground truth, survey-based dataset. Methods: This study leveraged a unique dataset of self-reported surveys, microbiological laboratory tests, and social media data from the same individuals toward understanding the validity of individual-level constructs pertaining to influenza-like illness in social media data. Logistic regression models were used to identify illness in Twitter posts using user posting behaviors and topic model features extracted from users’ tweets. Results: Of 396 original study participants, only 81 met the inclusion criteria for this study. Of these participants’ tweets, we identified only two instances that were related to health and occurred within 2 weeks (before or after) of a survey indicating symptoms. It was not possible to predict when participants reported symptoms using features derived from topic models (area under the curve [AUC]=0.51; P=.38), though it was possible using behavior features, albeit with a very small effect size (AUC=0.53; P≤.001). Individual symptoms were also generally not predictable either. The study sample and a random sample from Twitter are predictably different on held-out data (AUC=0.67; P≤.001), meaning that the content posted by people who participated in this study was predictably different from that posted by random Twitter users. Individuals in the random sample and the GoViral sample used Twitter with similar frequencies (similar @ mentions, number of tweets, and number of retweets; AUC=0.50; P=.19). Conclusions: To our knowledge, this is the first instance of an attempt to use a ground truth dataset to validate infectious disease observations in social media data. The lack of signal, the lack of predictability among behaviors or topics, and the demonstrated volunteer bias in the study population are important findings for the large and growing body of disease surveillance using internet-sourced data.