Rumi Chunara

Scroll

Associate Professor of Biostatistics

Associate Professor of Computer Science and Engineering, Tandon

Director of Center for Health Data Science

The overarching goal of Dr. Rumi Chunara's research is to develop computational and statistical approaches for acquiring, integrating and using data to improve population-level public health. She focuses on the design and development of data mining and machine learning methods to address challenges related to data and goals of public health, as well as fairness and ethics in the design and use of data and algorithms embedded in social systems.

At NYU, Dr. Chunara also leads the Chunara Lab, which develops computational and statistical methods across data mining, natural language processing, spatio-temporal analyses and machine learning, to study population health. Previously, she was a Postdoctoral Fellow and Instructor at HealthMap and the Children's Hospital Informatics Program at Harvard Medical School. She completed her PhD at the Harvard-MIT Division of Health Sciences and Technology and BSc at Caltech.

Education

BS, Electrical Engineering (Honors), Caltech

MS, Electrical Engineering and Computer Science, MIT

PhD, Medical and Electrical Engineering, MIT (Harvard-MIT Division of Health Sciences and Technology)

Honors and awards

Max Planck Sabbatical Award (2021)

speaker at NSF Computer and Information Science and Engineering Directorate Career Proposal Writing Workshop (2020)

Invited tutorial on Public Health and Machine Learning at ACM Conference on Health, Inference and Learning (2020)

Keynote at Human Computation and Crowdsourcing (2019)

Invited Speaker at Expert Group Meeting at United Nations Population Fund, Advances in Mobile Technologies for Data Collection Panel (2019)

Keynote at ''Mapping the Equity Dimensions of Artificial Intelligence in Public Health'', University of Toronto (2019)

Facebook Research Award (2019)

Gates Foundation Grand Challenges Exploration Award (2019)

NSF CAREER Award (2019)

MIT Technology Review Top 35 Innovators Under 35 (2014)

MIT Presidential Fellow (2004)

Areas of research and study

Health Disparities

Machine learning

Social Computing

Social Determinants of Health

Publications

Structural racism and homophobia evaluated through social media sentiment combined with activity spaces and associations with mental health among young sexual minority men

Chunara, R., & Chunara, R. (n.d.).

Publication year

2023

Journal title

Social Science & Medicine

Volume

320

Page(s)

115755

10.1016/j.socscimed.2023.115755

Abstract

Abstract

BACKGROUND: Research suggests that structural racism and homophobia are associated with mental well-being. However, structural discrimination measures which are relevant to lived experiences and that evade self-report biases are needed. Social media and global-positioning systems (GPS) offer opportunity to measure place-based negative racial sentiment linked to relevant locations via precise geo-coding of activity spaces. This is vital for young sexual minority men (YSMM) of color who may experience both racial and sexual minority discrimination and subsequently poorer mental well-being. METHODS: P18 Neighborhood Study (n = 147) data were used. Measures of place-based negative racial and sexual-orientation sentiment were created using geo-located social media as a proxy for racial climate via socially-meaningfully-defined places. Exposure to place-based negative sentiment was computed as an average of discrimination by places frequented using activity space measures per person. Outcomes were number of days of reported poor mental health in last 30 days. Zero-inflated Poisson regression analyses were used to assess influence of and type of relationship between place-based negative racial or sexual-orientation sentiment exposure and mental well-being, including the moderating effect of race/ethnicity. RESULTS: We found evidence for a non-linear relationship between place-based negative racial sentiment and mental well-being among our racially and ethnically diverse sample of YSMM (p < .05), and significant differences in the relationship for different race/ethnicity groups (p < .05). The most pronounced differences were detected between Black and White non-Hispanic vs. Hispanic sexual minority men. At two standard deviations above the overall mean of negative racial sentiment exposure based on activity spaces, Black and White YSMM reported significantly more poor mental health days in comparison to Hispanic YSMM. CONCLUSIONS: Effects of discrimination can vary by race/ethnicity and discrimination type. Experiencing place-based negative racial sentiment may have implications for mental well-being among YSMM regardless of race/ethnicity, which should be explored in future research including with larger samples sizes.

Understanding Disparities in Post Hoc Machine Learning Explanation

Mhasawade, V., Rahman, S., Haskell-Craig, Z., Chunara, R., & Chunara, R. (n.d.).

Publication year

2023

Abstract

Abstract

Previous work has highlighted that existing post-hoc explanation methods exhibit disparities in explanation fidelity (across 'race' and 'gender' as sensitive attributes), and while a large body of work focuses on mitigating these issues at the explanation metric level, the role of the data generating process and black box model in relation to explanation disparities remains largely unexplored. Accordingly, through both simulations as well as experiments on a real-world dataset, we specifically assess challenges to explanation disparities that originate from properties of the data: limited sample size, covariate shift, concept shift, omitted variable bias, and challenges based on model properties: inclusion of the sensitive attribute and appropriate functional form. Through controlled simulation analyses, our study demonstrates that increased covariate shift, concept shift, and omission of covariates increase explanation disparities, with the effect pronounced higher for neural network models that are better able to capture the underlying functional form in comparison to linear models. We also observe consistent findings regarding the effect of concept shift and omitted variable bias on explanation disparities in the Adult income dataset. Overall, results indicate that disparities in model explanations can also depend on data and model properties. Based on this systematic investigation, we provide recommendations for the design of explanation methods that mitigate undesirable disparities. [Journal_ref: ]

When do Minimax-fair Learning and Empirical Risk Minimization Coincide

Singh*, H., Kleindessner, M., Cevher, V., myname, Russell, C., Chunara, R., & Chunara, R. (n.d.).

Publication year

2023

Journal title

International Conference on Machine Learning (ICML)

Abstract

Abstract

When do Minimax-fair Learning and Empirical Risk Minimization Coincide?

Singh, H., Kleindessner, M., Cevher, V., Chunara, R., Chunara, R., & Russell, C. (n.d.). (A. I. S. C. and S. E. S. and Probability, Ed.).

Publication year

2023

Volume

202

Page(s)

31969-31989

Abstract

Abstract

Minimax-fair machine learning minimizes the error for the worst-off group. However, empirical evidence suggests that when sophisticated models are trained with standard empirical risk minimization (ERM), they often have the same performance on the worst-off group as a minimax-trained model. Our work makes this counter-intuitive observation concrete. We prove that if the hypothesis class is sufficiently expressive and the group information is recoverable from the features, ERM and minimax-fairness learning formulations indeed have the same performance on the worst-off group. We provide additional empirical evidence of how this observation holds on a wide range of datasets and hypothesis classes. Since ERM is fundamentally easier than minimax optimization, our findings have implications on the practice of fair machine learning.

Active Linear Regression in the Online Setting via LeverageScore Sampling

Singh, H., Musco, C., Chunara, R., & Chunara, R. (n.d.).

Publication year

2022

Abstract

Abstract

Association Between Copayment Amount and Filling of Medications for Angiotensin Receptor Neprilysin Inhibitors in Patients With Heart Failure

Mukhopadhyay, A., Adhikari, S., Li, X., Dodson, J. A., Kronish, I. M., Shah, B., Ramatowski, M., Chunara, R., Chunara, R., Kozloff, S., & Blecker, S. (n.d.).

Publication year

2022

Journal title

Journal of the American Heart Association

Volume

Issue

Page(s)

e027662

Abstract

Abstract

Association Between Copayment Amount and Filling of Medications for Angiotensin Receptor Neprilysin Inhibitors in Patients With Heart Failure

Mukhopadhyay, A., Adhikari, S., Li, X., Dodson, J. A., Kronish, I. M., Shah, B., Ramatowski, M., Chunara, R., Chunara, R., Kozloff, S., & Blecker, S. (n.d.).

Publication year

2022

Journal title

Journal of the American Heart Association

Volume

Issue

10.1161/JAHA.122.027662

Abstract

Abstract

BACKGROUND: Angiotensin receptor neprilysin inhibitors (ARNI) reduce mortality and hospitalization for patients with heart failure. However, relatively high copayments for ARNI may contribute to suboptimal adherence, thus potentially limiting their benefits. METHODS AND RESULTS: We conducted a retrospective cohort study within a large, multi-site health system. We included patients with: ARNI prescription between November 20, 2020 and June 30, 2021; diagnosis of heart failure or left ventricular ejection fraction ≤40%; and available pharmacy or pharmacy benefit manager copayment data. The primary exposure was copayment, categorized as $0, $0.01 to $10, $10.01 to $100, and >$100. The primary outcome was prescription fill nonadherence, defined as the proportion of days covered $100. Patients with higher copayments had higher rates of nonadherence, ranging from 17.2% for $0 copayment to 34.2% for copayment >$100 (P$100 (OR, 2.58 [95% CI, 1.63– 4.18], P

Association of U.S. birth, duration of residence in the U.S., and atherosclerotic cardiovascular disease risk factors among Asian adults

Al Rifai, M., Kianoush, S., Jain, V., Joshi, P. H., Cainzos-Achirica, M., Nasir, K., Merchant, A. T., Dodani, S., Wong, S. S., Samad, Z., Mehta, A., Chunara, R., Chunara, R., Kalra, A., & Virani, S. S. (n.d.).

Publication year

2022

Journal title

Preventive Medicine Reports

Volume

10.1016/j.pmedr.2022.101916

Abstract

Abstract

Introduction: Prior studies have shown a direct association between U.S. birth and duration of residence with atherosclerotic cardiovascular disease (ASCVD) though, few have specifically focused on Asian Americans. Methods: We utilized cross-sectional data from the 2006 to 2015 National Health Interview Survey. We compared prevalent cardiovascular risk factors and ASCVD among Asian American individuals by U.S. birth and duration of time spent in the U.S. Results: The study sample consisted of 18,150 Asian individuals of whom 20.5 % were Asian Indian, 20.5 % were Chinese, 23.4 % were Filipino, and 35.6 % were of other Asian ethnic groups. The mean (standard error) age was 43.8 (0.21) years and 53 % were women. In multivariable-adjusted logistic regression models, U.S. birth was associated with a higher prevalence odds ratio (95 % confidence interval) of current smoking 1.31 (1.07,1.60), physical inactivity 0.62 (0.54,0.72), obesity 2.26 (1.91,2.69), hypertension 1.33 (1.12,1.58), and CAD 1.96 (1.24,3.11), but lower prevalence of stroke 0.28 (0.11,0.71). Spending greater than 15 years in the U.S. was associated with a higher prevalence of current smoking 1.65 (1.24,2.21), obesity 2.33 (1.57,3.47), diabetes 2.68 (1.17,6.15), and hyperlipidemia 1.72 (1.09,2.71). Conclusion: Heterogeneity exists in cardiovascular risk factor burden among Asian Americans according to Asian ethnicity, U.S. birth, and duration of time living in the U.S.

Building Public Health Surveillance 3.0 : Emerging Timely Measures of Physical, Economic, and Social Environmental Conditions Affecting Health

Thorpe, L. E., Chunara, R., Chunara, R., Roberts, T., Pantaleo, N., Irvine, C., Conderino, S., Li, Y., Hsieh, P. Y., Gourevitch, M. N., Levine, S., Ofrane, R., & Spoer, B. (n.d.).

Publication year

2022

Journal title

American journal of public health

Volume

112

Issue

Page(s)

1436-1445

10.2105/AJPH.2022.306917

Abstract

Abstract

In response to rapidly changing societal conditions stemming from the COVID-19 pandemic, we summarize data sources with potential to produce timely and spatially granular measures of physical, economic, and social conditions relevant to public health surveillance, and we briefly describe emerging analytic methods to improve small-area estimation. To inform this article, we reviewed published systematic review articles set in the United States from 2015 to 2020 and conducted unstructured interviews with senior content experts in public heath practice, academia, and industry. We identified a modest number of data sources with high potential for generating timely and spatially granular measures of physical, economic, and social determinants of health. We also summarized modeling and machine-learning techniques useful to support development of time-sensitive surveillance measures that may be critical for responding to future major events such as the COVID-19 pandemic. (Am J Public Health. 2022;112(10):1436-1445. https://doi.org/10.2105/AJPH.2022.306917).

Discrimination is associated with C-reactive protein among young sexual minority men

Cook, S. H., Slopen, N., Scarimbolo, L., Mirin, N., Wood, E. P., Rosendale, N., Chunara, R., Chunara, R., Burke, C. W., & Halkitis, P. N. (n.d.).

Publication year

2022

Journal title

Journal of Behavioral Medicine

Volume

Issue

Page(s)

649-657

10.1007/s10865-022-00307-4

Abstract

Abstract

This report examines associations between everyday discrimination, microaggressions, and CRP to gain insight on potential mechanisms that may underlie increased CVD risk among sexual minority male young adults. The sample consisted of 60 participants taken from the P18 cohort between the ages of 24 and 28 years. Multinomial logistic regression models were used to examine the association between perceived everyday discrimination and LGBQ microaggressions with C-reactive protein cardiovascular risk categories of low-, average-, and high-risk, as defined by the American Heart Association and Centers for Disease Control. Adjustments were made for BMI. Individuals who experienced more everyday discrimination had a higher risk of being classified in the high-risk CRP group compared to the low-risk CRP group (RRR = 3.35, p = 0.02). Interpersonal LGBQ microaggressions were not associated with CRP risk category. Everyday discrimination, but not specific microaggressions based on sexual orientation, were associated with elevated levels of CRP among young sexual minority men (YSMM). Thus, to implement culturally and age-appropriate interventions, further researcher is needed to critically examine the specific types of discrimination and the resultant impact on YSMM’s health.

Evidence for Telemedicine’s Ongoing Transformation of Health Care Delivery Since the Onset of COVID-19 : Retrospective Observational Study

Mandal, S., Wiesenfeld, B. M., Mann, D., Lawrence, K., Chunara, R., Chunara, R., Testa, P., Nov, O., Nov, O., & Nov, O. (n.d.).

Publication year

2022

Journal title

JMIR Formative Research

Volume

Issue

10.2196/38661

Abstract

Abstract

Background: The surge of telemedicine use during the early stages of the COVID-19 pandemic has been well documented. However, scarce evidence considers the use of telemedicine in the subsequent period. Objective: This study aims to evaluate use patterns of video-based telemedicine visits for ambulatory care and urgent care provision over the course of recurring pandemic waves in 1 large health system in New York City (NYC) and what this means for health care delivery. Methods: Retrospective electronic health record (EHR) data of patients from January 1, 2020, to February 28, 2022, were used to longitudinally track and analyze telemedicine and in-person visit volumes across ambulatory care specialties and urgent care, as well as compare them to a prepandemic baseline (June-November 2019). Diagnosis codes to differentiate suspected COVID-19 visits from non–COVID-19 visits, as well as evaluating COVID-19–based telemedicine use over time, were compared to the total number of COVID-19–positive cases in the same geographic region (city level). The time series data were segmented based on change-point analysis, and variances in visit trends were compared between the segments. Results: The emergence of COVID-19 prompted an early increase in the number of telemedicine visits across the urgent care and ambulatory care settings. This use continued throughout the pandemic at a much higher level than the prepandemic baseline for both COVID-19 and non–COVID-19 suspected visits, despite the fluctuation in COVID-19 cases throughout the pandemic and the resumption of in-person clinical services. The use of telemedicine-based urgent care services for COVID-19 suspected visits showed more variance in response to each pandemic wave, but telemedicine visits for ambulatory care have remained relatively steady after the initial crisis period. During the Omicron wave, the use of all visit types, including in-person activities, decreased. Patients between 25 and 34 years of age were the largest users of telemedicine-based urgent care. Patient satisfaction with telemedicine-based urgent care remained high despite the rapid scaling of services to meet increased demand. Conclusions: The trend of the increased use of telemedicine as a means of health care delivery relative to the pre–COVID-19 baseline has been maintained throughout the later pandemic periods despite fluctuating COVID-19 cases and the resumption of in-person care delivery. Overall satisfaction with telemedicine-based care is also high. The trends in telemedicine use suggest that telemedicine-based health care delivery has become a mainstream and sustained supplement to in-person-based ambulatory care, particularly for younger patients, for both urgent and nonurgent care needs. These findings have implications for the health care delivery system, including practice leaders, insurers, and policymakers. Further investigation is needed to evaluate telemedicine adoption by key demographics, identify ongoing barriers to adoption, and explore the impacts of sustained use of telemedicine on health care outcomes and experience.

Generalizability challenges of mortality risk prediction models : A retrospective analysis on a multi-center database

Singh, H., Mhasawade, V., Chunara, R., & Chunara, R. (n.d.).

Publication year

2022

Journal title

PLOS Digital Health

Volume

Issue

4 April

10.1371/journal.pdig.0000023

Abstract

Abstract

Modern predictive models require large amounts of data for training and evaluation, absence of which may result in models that are specific to certain locations, populations in them and clinical practices. Yet, best practices for clinical risk prediction models have not yet considered such challenges to generalizability. Here we ask whether population- and group-level performance of mortality prediction models vary significantly when applied to hospitals or geographies different from the ones in which they are developed. Further, what characteristics of the datasets explain the performance variation? In this multi-center cross-sectional study, we analyzed electronic health records from 179 hospitals across the US with 70,126 hospitalizations from 2014 to 2015. Generalization gap, defined as difference between model performance metrics across hospitals, is computed for area under the receiver operating characteristic curve (AUC) and calibration slope. To assess model performance by the race variable, we report differences in false negative rates across groups. Data were also analyzed using a causal discovery algorithm “Fast Causal Inference” that infers paths of causal influence while identifying potential influences associated with unmeasured variables. When transferring models across hospitals, AUC at the test hospital ranged from 0.777 to 0.832 (1st-3rd quartile or IQR; median 0.801); calibration slope from 0.725 to 0.983 (IQR; median 0.853); and disparity in false negative rates from 0.046 to 0.168 (IQR; median 0.092). Distribution of all variable types (demography, vitals, and labs) differed significantly across hospitals and regions. The race variable also mediated differences in the relationship between clinical variables and mortality, by hospital/region. In conclusion, group-level performance should be assessed during generalizability checks to identify potential harms to the groups. Moreover, for developing methods to improve model performance in new environments, a better understanding and documentation of provenance of data and health processes are needed to identify and mitigate sources of variation.

Impact of COVID-19 forecast visualizations on pandemic risk perceptions

Padilla, L., Hosseinpour, H., Fygenson, R., Howell, J., Chunara, R., Chunara, R., & Bertini, E. (n.d.).

Publication year

2022

Journal title

Scientific reports

Volume

Issue

10.1038/s41598-022-05353-1

Abstract

Abstract

People worldwide use SARS-CoV-2 (COVID-19) visualizations to make life and death decisions about pandemic risks. Understanding how these visualizations influence risk perceptions to improve pandemic communication is crucial. To examine how COVID-19 visualizations influence risk perception, we conducted two experiments online in October and December of 2020 (N = 2549) where we presented participants with 34 visualization techniques (available at the time of publication on the CDC’s website) of the same COVID-19 mortality data. We found that visualizing data using a cumulative scale consistently led to participants believing that they and others were at more risk than before viewing the visualizations. In contrast, visualizing the same data with a weekly incident scale led to variable changes in risk perceptions. Further, uncertainty forecast visualizations also affected risk perceptions, with visualizations showing six or more models increasing risk estimates more than the others tested. Differences between COVID-19 visualizations of the same data produce different risk perceptions, fundamentally changing viewers’ interpretation of information.

Publisher Correction : Impact of COVID-19 forecast visualizations on pandemic risk perceptions

Padilla, L., Hosseinpour, H., Fygenson, R., Howell, J., Chunara, R., Chunara, R., & Bertini, E. (n.d.).

Publication year

2022

Journal title

Scientific reports

Volume

Issue

Page(s)

3650

10.1038/s41598-022-07502-y

Abstract

Abstract

The original version of this Article contained an error in Figure 6 where “No Forecast” was incorrectly given as “No Rorecast”. The original Figure 6 accompanying legend appears below. The original Article has been corrected.

Publisher Correction : Impact of COVID-19 forecast visualizations on pandemic risk perceptions (Scientific Reports, (2022), 12, 1, (2014), 10.1038/s41598-022-05353-1)

Padilla, L., Hosseinpour, H., Fygenson, R., Howell, J., Chunara, R., Chunara, R., & Bertini, E. (n.d.).

Publication year

2022

Journal title

Scientific reports

Volume

Issue

10.1038/s41598-022-07502-y

Abstract

Abstract

Search Term Identification Methods for Computational Health Communication : Word Embedding and Network Approach for Health Content on YouTube

Tong, C., Margolin, D., Chunara, R., Chunara, R., Niederdeppe, J., Taylor, T., Dunbar, N., & King, A. J. (n.d.).

Publication year

2022

Journal title

JMIR Medical Informatics

Volume

Issue

10.2196/37862

Abstract

Abstract

Background: Common methods for extracting content in health communication research typically involve using a set of well-established queries, often names of medical procedures or diseases, that are often technical or rarely used in the public discussion of health topics. Although these methods produce high recall (ie, retrieve highly relevant content), they tend to overlook health messages that feature colloquial language and layperson vocabularies on social media. Given how such messages could contain misinformation or obscure content that circumvents official medical concepts, correctly identifying (and analyzing) them is crucial to the study of user-generated health content on social media platforms. Objective: Health communication scholars would benefit from a retrieval process that goes beyond the use of standard terminologies as search queries. Motivated by this, this study aims to put forward a search term identification method to improve the retrieval of user-generated health content on social media. We focused on cancer screening tests as a subject and YouTube as a platform case study. Methods: We retrieved YouTube videos using cancer screening procedures (colonoscopy, fecal occult blood test, mammogram, and pap test) as seed queries. We then trained word embedding models using text features from these videos to identify the nearest neighbor terms that are semantically similar to cancer screening tests in colloquial language. Retrieving more YouTube videos from the top neighbor terms, we coded a sample of 150 random videos from each term for relevance. We then used text mining to examine the new content retrieved from these videos and network analysis to inspect the relations between the newly retrieved videos and videos from the seed queries. Results: The top terms with semantic similarities to cancer screening tests were identified via word embedding models. Text mining analysis showed that the 5 nearest neighbor terms retrieved content that was novel and contextually diverse, beyond the content retrieved from cancer screening concepts alone. Results from network analysis showed that the newly retrieved videos had at least one total degree of connection (sum of indegree and outdegree) with seed videos according to YouTube relatedness measures. Conclusions: We demonstrated a retrieval technique to improve recall and minimize precision loss, which can be extended to various health topics on YouTube, a popular video-sharing social media platform. We discussed how health communication scholars can apply the technique to inspect the performance of the retrieval strategy before investing human coding resources and outlined suggestions on how such a technique can be extended to other health contexts.

Segmenting across places : The need for fair transfer learning with satellite imagery

Zhang, M., Singh, H., Chok, L., Chunara, R., & Chunara, R. (n.d.).

Publication year

2022

Page(s)

2915-2924

10.1109/CVPRW56347.2022.00329

Abstract

Abstract

The increasing availability of high-resolution satellite imagery has enabled the use of machine learning to support land-cover measurement and inform policy-making. However, labelling satellite images is expensive and is available for only some locations. This prompts the use of transfer learning to adapt models from data-rich locations to others. Given the potential for high-impact applications of satellite imagery across geographies, a systematic assessment of transfer learning implications is warranted. In this work, we consider the task of land-cover segmentation and study the fairness implications of transferring models across locations. We leverage a large satellite image segmentation benchmark with 5987 images from 18 districts (9 urban and 9 rural). Via fairness metrics we quantify disparities in model performance along two axes - across urban-rural locations and across land-cover classes. Findings show that state-of-the-art models have better overall accuracy in rural areas compared to urban areas, through unsupervised domain adaptation methods transfer learning better to urban versus rural areas and enlarge fairness gaps. In analysis of reasons for these findings, we show that raw satellite images are overall more dissimilar between source and target districts for rural than for urban locations. This work highlights the need to conduct fairness analysis for satellite imagery segmentation models and motivates the development of methods for fair transfer learning in order not to introduce disparities between places, particularly urban and rural locations.

Causal Multi-level Fairness

Mhasawade, V., Chunara, R., & Chunara, R. (n.d.).

Publication year

2021

Page(s)

784-794

10.1145/3461702.3462587

Abstract

Abstract

Algorithmic systems are known to impact marginalized groups severely, and more so, if all sources of bias are not considered. While work in algorithmic fairness to-date has primarily focused on addressing discrimination due to individually linked attributes, social science research elucidates how some properties we link to individuals can be conceptualized as having causes at macro (e.g. structural) levels, and it may be important to be fair to attributes at multiple levels. For example, instead of simply considering race as a causal, protected attribute of an individual, the cause may be distilled as perceived racial discrimination an individual experiences, which in turn can be affected by neighborhood-level factors. This multi-level conceptualization is relevant to questions of fairness, as it may not only be important to take into account if the individual belonged to another demographic group, but also if the individual received advantaged treatment at the macro-level. In this paper, we formalize the problem of multi-level fairness using tools from causal inference in a manner that allows one to assess and account for effects of sensitive attributes at multiple levels. We show importance of the problem by illustrating residual unfairness if macro-level sensitive attributes are not accounted for, or included without accounting for their multi-level nature. Further, in the context of a real-world task of predicting income based on macro and individual-level attributes, we demonstrate an approach for mitigating unfairness, a result of multi-level sensitive attributes.

Fairness violations and mitigation under covariate shift

Singh, H., Singh, R., Mhasawade, V., Chunara, R., & Chunara, R. (n.d.).

Publication year

2021

Page(s)

3-13

10.1145/3442188.3445865

Abstract

Abstract

We study the problem of learning fair prediction models for unseen test sets distributed differently from the train set. Stability against changes in data distribution is an important mandate for responsible deployment of models. The domain adaptation literature addresses this concern, albeit with the notion of stability limited to that of prediction accuracy. We identify sufficient conditions under which stable models, both in terms of prediction accuracy and fairness, can be learned. Using the causal graph describing the data and the anticipated shifts, we specify an approach based on feature selection that exploits conditional independencies in the data to estimate accuracy and fairness metrics for the test set. We show that for specific fairness definitions, the resulting model satisfies a form of worst-case optimality. In context of a healthcare task, we illustrate the advantages of the approach in making more equitable decisions.

Machine learning and algorithmic fairness in public and population health

Mhasawade, V., Zhao, Y., Chunara, R., & Chunara, R. (n.d.).

Publication year

2021

Journal title

Nature Machine Intelligence

Volume

Issue

Page(s)

659-666

10.1038/s42256-021-00373-4

Abstract

Abstract

Until now, much of the work on machine learning and health has focused on processes inside the hospital or clinic. However, this represents only a narrow set of tasks and challenges related to health; there is greater potential for impact by leveraging machine learning in health tasks more broadly. In this Perspective we aim to highlight potential opportunities and challenges for machine learning within a holistic view of health and its influences. To do so, we build on research in population and public health that focuses on the mechanisms between different cultural, social and environmental factors and their effect on the health of individuals and communities. We present a brief introduction to research in these fields, data sources and types of tasks, and use these to identify settings where machine learning is relevant and can contribute to new knowledge. Given the key foci of health equity and disparities within public and population health, we juxtapose these topics with the machine learning subfield of algorithmic fairness to highlight specific opportunities where machine learning, public and population health may synergize to achieve health equity.

Social Determinants in Machine Learning Cardiovascular Disease Prediction Models : A Systematic Review

Zhao, Y., Wood, E. P., Mirin, N., Cook, S. H., Chunara, R., & Chunara, R. (n.d.).

Publication year

2021

Journal title

American journal of preventive medicine

Volume

Issue

Page(s)

596-605

10.1016/j.amepre.2021.04.016

Abstract

Abstract

Introduction: Cardiovascular disease is the leading cause of death worldwide, and cardiovascular disease burden is increasing in low-resource settings and for lower socioeconomic groups. Machine learning algorithms are being developed rapidly and incorporated into clinical practice for cardiovascular disease prediction and treatment decisions. Significant opportunities for reducing death and disability from cardiovascular disease worldwide lie with accounting for the social determinants of cardiovascular outcomes. This study reviews how social determinants of health are being included in machine learning algorithms to inform best practices for the development of algorithms that account for social determinants. Methods: A systematic review using 5 databases was conducted in 2020. English language articles from any location published from inception to April 10, 2020, which reported on the use of machine learning for cardiovascular disease prediction that incorporated social determinants of health, were included. Results: Most studies that compared machine learning algorithms and regression showed increased performance of machine learning, and most studies that compared performance with or without social determinants of health showed increased performance with them. The most frequently included social determinants of health variables were gender, race/ethnicity, marital status, occupation, and income. Studies were largely from North America, Europe, and China, limiting the diversity of the included populations and variance in social determinants of health. Discussion: Given their flexibility, machine learning approaches may provide an opportunity to incorporate the complex nature of social determinants of health. The limited variety of sources and data in the reviewed studies emphasize that there is an opportunity to include more social determinants of health variables, especially environmental ones, that are known to impact cardiovascular disease risk and that recording such data in electronic databases will enable their use.

Telemedicine and healthcare disparities : a cohort study in a large healthcare system in New York City during COVID-19

Chunara, R., Chunara, R., Zhao, Y., Chen, J., Lawrence, K., Testa, P. A., Nov, O., Nov, O., Nov, O., & Mann, D. M. (n.d.).

Publication year

2021

Journal title

Journal of the American Medical Informatics Association : JAMIA

Volume

Issue

Page(s)

33-41

10.1093/jamia/ocaa217

Abstract

Abstract

OBJECTIVE: Through the coronavirus disease 2019 (COVID-19) pandemic, telemedicine became a necessary entry point into the process of diagnosis, triage, and treatment. Racial and ethnic disparities in healthcare have been well documented in COVID-19 with respect to risk of infection and in-hospital outcomes once admitted, and here we assess disparities in those who access healthcare via telemedicine for COVID-19. MATERIALS AND METHODS: Electronic health record data of patients at New York University Langone Health between March 19th and April 30, 2020 were used to conduct descriptive and multilevel regression analyses with respect to visit type (telemedicine or in-person), suspected COVID diagnosis, and COVID test results. RESULTS: Controlling for individual and community-level attributes, Black patients had 0.6 times the adjusted odds (95% CI: 0.58-0.63) of accessing care through telemedicine compared to white patients, though they are increasingly accessing telemedicine for urgent care, driven by a younger and female population. COVID diagnoses were significantly more likely for Black versus white telemedicine patients. DISCUSSION: There are disparities for Black patients accessing telemedicine, however increased uptake by young, female Black patients. Mean income and decreased mean household size of a zip code were also significantly related to telemedicine use. CONCLUSION: Telemedicine access disparities reflect those in in-person healthcare access. Roots of disparate use are complex and reflect individual, community, and structural factors, including their intersection-many of which are due to systemic racism. Evidence regarding disparities that manifest through telemedicine can be used to inform tool design and systemic efforts to promote digital health equity.

Uncertainty as a Form of Transparency : Measuring, Communicating, and Using Uncertainty

Bhatt, U., Antorán, J., Zhang, Y., Liao, Q. V., Sattigeri, P., Fogliato, R., Melançon, G., Krishnan, R., Stanley, J., Tickoo, O., Nachman, L., Chunara, R., Chunara, R., Srikumar, M., Weller, A., & Xiang, A. (n.d.).

Publication year

2021

Page(s)

401-413

10.1145/3461702.3462571

Abstract

Abstract

Algorithmic transparency entails exposing system properties to various stakeholders for purposes that include understanding, improving, and contesting predictions. Until now, most research into algorithmic transparency has predominantly focused on explainability. Explainability attempts to provide reasons for a machine learning model's behavior to stakeholders. However, understanding a model's specific behavior alone might not be enough for stakeholders to gauge whether the model is wrong or lacks sufficient knowledge to solve the task at hand. In this paper, we argue for considering a complementary form of transparency by estimating and communicating the uncertainty associated with model predictions. First, we discuss methods for assessing uncertainty. Then, we characterize how uncertainty can be used to mitigate model unfairness, augment decision-making, and build trustworthy systems. Finally, we outline methods for displaying uncertainty to stakeholders and recommend how to collect information required for incorporating uncertainty into existing ML pipelines. This work constitutes an interdisciplinary review drawn from literature spanning machine learning, visualization/HCI, design, decision-making, and fairness. We aim to encourage researchers and practitioners to measure, communicate, and use uncertainty as a form of transparency.

Comparison of Social Media, Syndromic Surveillance, and Microbiologic Acute Respiratory Infection Data : Observational Study

Daughton, A. R., Chunara, R., Chunara, R., & Paul, M. J. (n.d.).

Publication year

2020

Journal title

JMIR Public Health and Surveillance

Volume

Issue

10.2196/14986

Abstract

Abstract

Background: Internet data can be used to improve infectious disease models. However, the representativeness and individual-level validity of internet-derived measures are largely unexplored as this requires ground truth data for study. Objective: This study sought to identify relationships between Web-based behaviors and/or conversation topics and health status using a ground truth, survey-based dataset. Methods: This study leveraged a unique dataset of self-reported surveys, microbiological laboratory tests, and social media data from the same individuals toward understanding the validity of individual-level constructs pertaining to influenza-like illness in social media data. Logistic regression models were used to identify illness in Twitter posts using user posting behaviors and topic model features extracted from users’ tweets. Results: Of 396 original study participants, only 81 met the inclusion criteria for this study. Of these participants’ tweets, we identified only two instances that were related to health and occurred within 2 weeks (before or after) of a survey indicating symptoms. It was not possible to predict when participants reported symptoms using features derived from topic models (area under the curve [AUC]=0.51; P=.38), though it was possible using behavior features, albeit with a very small effect size (AUC=0.53; P≤.001). Individual symptoms were also generally not predictable either. The study sample and a random sample from Twitter are predictably different on held-out data (AUC=0.67; P≤.001), meaning that the content posted by people who participated in this study was predictably different from that posted by random Twitter users. Individuals in the random sample and the GoViral sample used Twitter with similar frequencies (similar @ mentions, number of tweets, and number of retweets; AUC=0.50; P=.19). Conclusions: To our knowledge, this is the first instance of an attempt to use a ground truth dataset to validate infectious disease observations in social media data. The lack of signal, the lack of predictability among behaviors or topics, and the demonstrated volunteer bias in the study population are important findings for the large and growing body of disease surveillance using internet-sourced data.

COVID-19 transforms health care through telemedicine : Evidence from the field

Mann, D. M., Chen, J., Chunara, R., Chunara, R., Testa, P. A., Nov, O., Nov, O., & Nov, O. (n.d.).

Publication year

2020

Journal title

Journal of the American Medical Informatics Association

Volume

Issue

Page(s)

1132-1135

10.1093/jamia/ocaa072

Abstract

Abstract

This study provides data on the feasibility and impact of video-enabled telemedicine use among patients and providers and its impact on urgent and nonurgent healthcare delivery from one large health system (NYU Langone Health) at the epicenter of the coronavirus disease 2019 (COVID-19) outbreak in the United States. Between March 2nd and April 14th 2020, telemedicine visits increased from 102.4 daily to 801.6 daily. (683% increase) in urgent care after the system-wide expansion of virtual urgent care staff in response to COVID-19. Of all virtual visits post expansion, 56.2% and 17.6% urgent and nonurgent visits, respectively, were COVID-19-related. Telemedicine usage was highest by patients 20 to 44 years of age, particularly for urgent care. The COVID-19 pandemic has driven rapid expansion of telemedicine use for urgent care and nonurgent care visits beyond baseline periods. This reflects an important change in telemedicine that other institutions facing the COVID-19 pandemic should anticipate.

Rumi Chunara

Rumi Chunara

Associate Professor of Biostatistics

Associate Professor of Computer Science and Engineering, Tandon

Director of Center for Health Data Science

Professional overview

Education

Honors and awards

Areas of research and study

Publications

Publications

Structural racism and homophobia evaluated through social media sentiment combined with activity spaces and associations with mental health among young sexual minority men

Publication year

Journal title

Volume

Page(s)

Understanding Disparities in Post Hoc Machine Learning Explanation

Publication year

When do Minimax-fair Learning and Empirical Risk Minimization Coincide

Publication year

Journal title

When do Minimax-fair Learning and Empirical Risk Minimization Coincide?

Publication year

Volume

Page(s)

Active Linear Regression in the Online Setting via LeverageScore Sampling

Publication year

Association Between Copayment Amount and Filling of Medications for Angiotensin Receptor Neprilysin Inhibitors in Patients With Heart Failure

Publication year

Journal title

Volume

Issue

Page(s)

Association Between Copayment Amount and Filling of Medications for Angiotensin Receptor Neprilysin Inhibitors in Patients With Heart Failure

Publication year

Journal title

Volume

Issue

Association of U.S. birth, duration of residence in the U.S., and atherosclerotic cardiovascular disease risk factors among Asian adults

Publication year

Journal title

Volume

Building Public Health Surveillance 3.0 : Emerging Timely Measures of Physical, Economic, and Social Environmental Conditions Affecting Health

Publication year

Journal title

Volume

Issue

Page(s)

Discrimination is associated with C-reactive protein among young sexual minority men

Publication year

Journal title

Volume

Issue

Page(s)

Evidence for Telemedicine’s Ongoing Transformation of Health Care Delivery Since the Onset of COVID-19 : Retrospective Observational Study

Publication year

Journal title

Volume

Issue

Generalizability challenges of mortality risk prediction models : A retrospective analysis on a multi-center database

Publication year

Journal title

Volume

Issue

Impact of COVID-19 forecast visualizations on pandemic risk perceptions

Publication year

Journal title

Volume

Issue

Publisher Correction : Impact of COVID-19 forecast visualizations on pandemic risk perceptions

Publication year

Journal title

Volume

Issue

Page(s)

Publisher Correction : Impact of COVID-19 forecast visualizations on pandemic risk perceptions (Scientific Reports, (2022), 12, 1, (2014), 10.1038/s41598-022-05353-1)

Publication year

Journal title

Volume

Issue