Rumi Chunara
Rumi Chunara
Associate Professor of Biostatistics
Associate Professor of Computer Science and Engineering, Tandon
Director of Center for Health Data Science
-
Professional overview
-
The overarching goal of Dr. Rumi Chunara's research is to develop computational and statistical approaches for acquiring, integrating and using data to improve population-level public health. She focuses on the design and development of data mining and machine learning methods to address challenges related to data and goals of public health, as well as fairness and ethics in the design and use of data and algorithms embedded in social systems.
At NYU, Dr. Chunara also leads the Chunara Lab, which develops computational and statistical methods across data mining, natural language processing, spatio-temporal analyses and machine learning, to study population health. Previously, she was a Postdoctoral Fellow and Instructor at HealthMap and the Children's Hospital Informatics Program at Harvard Medical School. She completed her PhD at the Harvard-MIT Division of Health Sciences and Technology and BSc at Caltech.
-
Education
-
BS, Electrical Engineering (Honors), CaltechMS, Electrical Engineering and Computer Science, MITPhD, Medical and Electrical Engineering, MIT (Harvard-MIT Division of Health Sciences and Technology)
-
Honors and awards
-
Max Planck Sabbatical Award (2021)speaker at NSF Computer and Information Science and Engineering Directorate Career Proposal Writing Workshop (2020)Invited tutorial on Public Health and Machine Learning at ACM Conference on Health, Inference and Learning (2020)Keynote at Human Computation and Crowdsourcing (2019)Invited Speaker at Expert Group Meeting at United Nations Population Fund, Advances in Mobile Technologies for Data Collection Panel (2019)Keynote at ''Mapping the Equity Dimensions of Artificial Intelligence in Public Health'', University of Toronto (2019)Facebook Research Award (2019)Gates Foundation Grand Challenges Exploration Award (2019)NSF CAREER Award (2019)MIT Technology Review Top 35 Innovators Under 35 (2014)MIT Presidential Fellow (2004)
-
Areas of research and study
-
Health DisparitiesMachine learningSocial ComputingSocial Determinants of Health
-
Publications
Publications
Methodological Improvements in Social Vulnerability Index Construction Reinforce Role of Wealth Across International Contexts
AbstractChunara, R., Chunara, R., Paul, R., Reid, S., Vieira, C. C., Wolfe, C., Zhange, Y., Zhao, Y., & Chunara, R. (n.d.).Publication year
2023Journal title
MPIDR Working PapersAbstract~Mitigating Urban-Rural Disparities in Contrastive Representation Learning with Satellite Imagery
AbstractChunara, R., Chunara, R., Zhang, M., & Chunara, R. (n.d.).Publication year
2024Volume
7Page(s)
1723--1734Abstract~Monitoring Influenza Epidemics in China with Search Query from Baidu
AbstractYuan, Q., Nsoesie, E. O., Lv, B., Peng, G., Chunara, R., Chunara, R., & Brownstein, J. S. (n.d.).Publication year
2013Journal title
PloS oneVolume
8Issue
5AbstractSeveral approaches have been proposed for near real-time detection and prediction of the spread of influenza. These include search query data for influenza-related terms, which has been explored as a tool for augmenting traditional surveillance methods. In this paper, we present a method that uses Internet search query data from Baidu to model and monitor influenza activity in China. The objectives of the study are to present a comprehensive technique for: (i) keyword selection, (ii) keyword filtering, (iii) index composition and (iv) modeling and detection of influenza activity in China. Sequential time-series for the selected composite keyword index is significantly correlated with Chinese influenza case data. In addition, one-month ahead prediction of influenza cases for the first eight months of 2012 has a mean absolute percent error less than 11%. To our knowledge, this is the first study on the use of search query data from Baidu in conjunction with this approach for estimation of influenza activity in China.National cervical cancer burden estimation through systematic review and analysis of publicly available data in Pakistan
AbstractChughtai, N., Perveen, K., Gillani, S. R., Abbas, A., Chunara, R., Chunara, R., Manji, A. A., Karani, S., Noorali, A. A., Zakaria, M., Shamsi, U., Chishti, U., Khan, A. A., Soofi, S., Pervez, S., & Samad, Z. (n.d.).Publication year
2023Journal title
BMC public healthVolume
23Issue
1AbstractBackground: Cervical cancer is a major cause of cancer-related deaths among women worldwide. Paucity of data on cervical cancer burden in countries like Pakistan hamper requisite resource allocation. Objective: To estimate the burden of cervical cancer in Pakistan using available data sources. Methods: We performed a systematic review to identify relevant data on Pakistan between 1995 to 2022. Study data identified through the systematic review that provided enough information to allow age specific incidence rates and age standardized incidence rates (ASIR) calculations for cervical cancer were merged. Population at risk estimates were derived and adjusted for important variables in the care-seeking pathway. The calculated ASIRs were applied to 2020 population estimates to estimate the number of cervical cancer cases in Pakistan. Results: A total of 13 studies reported ASIRs for cervical cancer for Pakistan. Among the studies selected, the Karachi Cancer Registry reported the highest disease burden estimates for all reported time periods: 1995–1997 ASIR = 6.81, 1998–2002 ASIR = 7.47, and 2017–2019 ASIR = 6.02 per 100,000 women. Using data from Karachi, Punjab and Pakistan Atomic Energy Cancer Registries from 2015–2019, we derived an unadjusted ASIR for cervical cancer of 4.16 per 100,000 women (95% UI 3.28, 5.28). Varying model assumptions produced adjusted ASIRs ranging from 5.2 to 8.4 per 100,000 women. We derived an adjusted ASIR of 7.60, (95% UI 5.98, 10.01) and estimated 6166 (95% UI 4833, 8305) new cases of cervical cancer per year. Conclusion: The estimated cervical cancer burden in Pakistan is higher than the WHO target. Estimates are sensitive to health seeking behavior, and appropriate physician diagnostic intervention, factors that are relevant to the case of cervical cancer, a stigmatized disease in a low-lower middle income country setting. These estimates make the case for approaching cervical cancer elimination through a multi-pronged strategy.Neighborhood-Level Socioeconomic Status and Prescription Fill Patterns Among Patients With Heart Failure
AbstractMukhopadhyay, A., Blecker, S., Li, X., Kronish, I. M., Chunara, R., Chunara, R., Zheng, Y., Lawrence, S., Dodson, J. A., Kozloff, S., & Adhikari, S. (n.d.).Publication year
2023Journal title
JAMA network openVolume
6Issue
12Page(s)
e2347519AbstractImportance: Medication nonadherence is common among patients with heart failure with reduced ejection fraction (HFrEF) and can lead to increased hospitalization and mortality. Patients living in socioeconomically disadvantaged areas may be at greater risk for medication nonadherence due to barriers such as lower access to transportation or pharmacies. Objective: To examine the association between neighborhood-level socioeconomic status (nSES) and medication nonadherence among patients with HFrEF and to assess the mediating roles of access to transportation, walkability, and pharmacy density. Design, Setting, and Participants: This retrospective cohort study was conducted between June 30, 2020, and December 31, 2021, at a large health system based primarily in New York City and surrounding areas. Adult patients with a diagnosis of HF, reduced EF on echocardiogram, and a prescription of at least 1 guideline-directed medical therapy (GDMT) for HFrEF were included. Exposure: Patient addresses were geocoded, and nSES was calculated using the Agency for Healthcare Research and Quality SES index, which combines census-tract level measures of poverty, rent burden, unemployment, crowding, home value, and education, with higher values indicating higher nSES. Main Outcomes and Measures: Medication nonadherence was obtained through linkage of health record prescription data with pharmacy fill data and was defined as proportion of days covered (PDC) of less than 80% over 6 months, averaged across GDMT medications. Results: Among 6247 patients, the mean (SD) age was 73 (14) years, and majority were male (4340 [69.5%]). There were 1011 (16.2%) Black participants, 735 (11.8%) Hispanic/Latinx participants, and 3929 (62.9%) White participants. Patients in lower nSES areas had higher rates of nonadherence, ranging from 51.7% in the lowest quartile (731 of 1086 participants) to 40.0% in the highest quartile (563 of 1086 participants) (P < .001). In adjusted analysis, patients living in the lower 2 nSES quartiles had significantly higher odds of nonadherence when compared with patients living in the highest nSES quartile (quartile 1: odds ratio [OR], 1.57 [95% CI, 1.35-1.83]; quartile 2: OR, 1.35 [95% CI, 1.16-1.56]). No mediation by access to transportation and pharmacy density was found, but a small amount of mediation by neighborhood walkability was observed. Conclusions and Relevance: In this retrospective cohort study of patients with HFrEF, living in a lower nSES area was associated with higher rates of GDMT nonadherence. These findings highlight the importance of considering neighborhood-level disparities when developing approaches to improve medication adherence.Network inference from multimodal data : A review of approaches from infectious disease transmission
AbstractRay, B., Ghedin, E., Chunara, R., & Chunara, R. (n.d.).Publication year
2016Journal title
Journal of Biomedical InformaticsVolume
64Page(s)
44-54AbstractNetworks inference problems are commonly found in multiple biomedical subfields such as genomics, metagenomics, neuroscience, and epidemiology. Networks are useful for representing a wide range of complex interactions ranging from those between molecular biomarkers, neurons, and microbial communities, to those found in human or animal populations. Recent technological advances have resulted in an increasing amount of healthcare data in multiple modalities, increasing the preponderance of network inference problems. Multi-domain data can now be used to improve the robustness and reliability of recovered networks from unimodal data. For infectious diseases in particular, there is a body of knowledge that has been focused on combining multiple pieces of linked information. Combining or analyzing disparate modalities in concert has demonstrated greater insight into disease transmission than could be obtained from any single modality in isolation. This has been particularly helpful in understanding incidence and transmission at early stages of infections that have pandemic potential. Novel pieces of linked information in the form of spatial, temporal, and other covariates including high-throughput sequence data, clinical visits, social network information, pharmaceutical prescriptions, and clinical symptoms (reported as free-text data) also encourage further investigation of these methods. The purpose of this review is to provide an in-depth analysis of multimodal infectious disease transmission network inference methods with a specific focus on Bayesian inference. We focus on analytical Bayesian inference-based methods as this enables recovering multiple parameters simultaneously, for example, not just the disease transmission network, but also parameters of epidemic dynamics. Our review studies their assumptions, key inference parameters and limitations, and ultimately provides insights about improving future network inference methods in multiple applications.New data paradigms : From the crowd and back
AbstractChunara, R., & Chunara, R. (n.d.). (J.-Y. Nie, Z. Obradovic, T. Suzumura, R. Ghosh, R. Nambiar, C. Wang, H. Zang, R. Baeza-Yates, R. Baeza-Yates, X. Hu, J. Kepner, A. Cuzzocrea, J. Tang, & M. Toyoda, Eds.).Publication year
2017Page(s)
3979-3980AbstractKnowledge generation from citizens is becoming both more feasible as well as important. Data directly from individuals can be critical as it can add information beyond what is available otherwise. Crowdsourced data also is very amenable in open data efforts given the nature of its generation. In this talk I will describe several efforts in which we are generating crowdsourced knowledge from open data and using it to more readily improve knowledge in public health.New technologies for reporting real-time emergent infections
AbstractChunara, R., Chunara, R., Freifeld, C. C., & Brownstein, J. S. (n.d.).Publication year
2012Journal title
ParasitologyVolume
139Issue
14Page(s)
1843-1851AbstractNovel technologies have prompted a new paradigm in disease surveillance. Advances in computation, communications and materials enable new technologies such as mobile phones and microfluidic chips. In this paper we illustrate examples of new technologies that can augment disease detection. We describe technologies harnessing the internet, mobile phones, point of care diagnostic tools and methods that facilitate detection from passively collected unstructured data. We demonstrate how these can all assist in quicker detection, investigation and response to emerging infectious events. Novel technologies enable collection and dissemination of epidemic intelligence data to both public health practitioners and the general public, enabling finer temporal and spatial resolution of disease monitoring than through traditional public health processes.Online reporting for malaria surveillance using micro-monetary incentives, in urban India 2010-2011
AbstractChunara, R., Chunara, R., Chhaya, V., Bane, S., Mekaru, S. R., Chan, E. H., Freifeld, C. C., & Brownstein, J. S. (n.d.).Publication year
2012Journal title
Malaria JournalVolume
11AbstractBackground: The objective of this study was to investigate the use of novel surveillance tools in a malaria endemic region where prevalence information is limited. Specifically, online reporting for participatory epidemiology was used to gather information about malaria spread directly from the public. Individuals in India were incentivized to self-report their recent experience with malaria by micro-monetary payments. Methods. Self-reports about malaria diagnosis status and related information were solicited online via Amazon's Mechanical Turk. Responders were paid $0.02 to answer survey questions regarding their recent experience with malaria. Timing of the peak volume of weekly self-reported malaria diagnosis in 2010 was compared to other available metrics such as the volume over time of and information about the epidemic from media sources. Distribution of Plasmodium species reports were compared with values from the literature. The study was conducted in summer 2010 during a malaria outbreak in Mumbai and expanded to other cities during summer 2011, and prevalence from self-reports in 2010 and 2011 was contrasted. Results: Distribution of Plasmodium species diagnosis through self-report in 2010 revealed 59% for Plasmodium vivax, which is comparable to literature reports of the burden of P. vivax in India (between 50 and 69%). Self-reported Plasmodium falciparum diagnosis was 19% and during the 2010 outbreak and the estimated burden was between 10 and 15%. Prevalence between 2010 and 2011 via self-reports decreased significantly from 36.9% to 19.54% in Mumbai (p = 0.001), and official reports also confirmed a prevalence decrease in 2011. Conclusions: With careful study design, micro-monetary incentives and online reporting are a rapid way to solicit malaria, and potentially other public health information. This methodology provides a cost-effective way of executing a field study that can act as a complement to traditional public health surveillance methods, offering an opportunity to obtain information about malaria activity, temporal progression, demographics affected or Plasmodium-specific diagnosis at a finer resolution than official reports can provide. The recent adoption of technologies, such as the Internet supports self-reporting mediums, and self-reporting should continue to be studied as it can foster preventative health behaviours.Participatory disease surveillance in latin america
AbstractJohansson, M., Wojcik, O., Chunara, R., Chunara, R., Smolinski, M., & Brownstein, J. (n.d.).Publication year
2013Page(s)
695-696AbstractParticipatory disease surveillance systems are dynamic, sensitive, and accurate. They also offer an opportunity to directly connect the public to public health. Implementing them in Latin America requires targeting multiple acute febrile illnesses, designing a system that is appropriate and scalable, and developing local strategies for encouraging participation.Participatory epidemiology : Use of mobile phones for community-based health reporting
AbstractFreifeld, C. C., Chunara, R., Chunara, R., Mekaru, S. R., Chan, E. H., Kass-Hout, T., Iacucci, A. A., & Brownstein, J. S. (n.d.).Publication year
2010Journal title
PLoS MedicineVolume
7Issue
12Abstract~Phased array systems in silicon
AbstractChunara, R., Chunara, R., Hajimiri, A., Komijani, A., Natarajan, A., Chunara, R., Guan, X., & Hashemi, H. (n.d.).Publication year
2004Journal title
IEEE Communications MagazineVolume
42Issue
8Page(s)
122-130AbstractPhased array systems, a special case of MIMO systems, take advantage of spatial directivity and array gain to increase spectral efficiency. Implementing a phased array system at high frequency in a commercial silicon process technology presents several challenges. This article focuses on the architectural and circuit-level trade-offs involved in the design of the first silicon-based fully integrated phased array system operating at 24 GHz. The details of some of the important circuit building blocks are also discussed. The measured results demonstrate the feasibility of using integrated phased arrays for wireless communication and vehicular radar applications at 24 GHz.Population-aware hierarchical Bayesian domain adaptation via multi-component invariant learning
AbstractMhasawade, V., Rehman, N. A., Chunara, R., & Chunara, R. (n.d.).Publication year
2020Page(s)
182-192AbstractWhile machine learning is rapidly being developed and deployed in health settings such as influenza prediction, there are critical challenges in using data from one environment to predict in another due to variability in features. Even within disease labels there can be differences (e.g. "fever" may mean something different reported in a doctor's office versus in an online app). Moreover, models are often built on passive, observational data which contain different distributions of population subgroups (e.g. men or women). Thus, there are two forms of instability between environments in this observational transport problem. We first harness substantive knowledge from health research to conceptualize the underlying causal structure of this problem in a health outcome prediction task. Based on sources of stability in the model and the task, we posit that we can combine environment and population information in a novel population-aware hierarchical Bayesian domain adaptation framework that harnesses multiple invariant components through population attributes when needed. We study the conditions under which invariant learning fails, leading to reliance on the environment-specific attributes. Experimental results for an influenza prediction task on four datasets gathered from different contexts show the model can improve prediction in the case of largely unlabelled target data from a new environment and different constituent population, by harnessing both environment and population invariant information. This work represents a novel, principled way to address a critical challenge by blending domain (health) knowledge and algorithmic innovation. The proposed approach will have significant impact in many social settings wherein who the data comes from and how it was generated, matters.Prevalence of familial hypercholesterolemia in a country-wide laboratory network in Pakistan : 10-year data from 988, 306 patients
AbstractFarhad, A., Noorali, A. A., Tajuddin, S., Khan, S. D., Ali, M., Chunara, R., Chunara, R., Khan, A. H., Zafar, A., Merchant, A., Bokhari, S. S., Virani, S. S., & Samad, Z. (n.d.).Publication year
2023Journal title
Progress in Cardiovascular DiseasesVolume
79Page(s)
19-27AbstractIntroduction: Familial hypercholesterolemia (FH) is a modifiable risk factor for premature coronary heart disease but is poorly diagnosed and treated. We leveraged a large laboratory network in Pakistan to study the prevalence, gender and geographic distribution of FH. Methodology: Data were curated from the Aga Khan University Hospital clinical laboratories, which comprises of 289 laboratories and collection points spread over 94 districts. Clinically ordered lipid profiles from 1st January 2009 to 30th June 2018 were included and data on 1,542,281 LDL-C values was extracted. We used the Make Early Diagnosis to Prevent Early Death (MEDPED) criteria to classify patients as FH and reported data on patients with low-density liporotein -cholesterol (LDL-C) ≥ 190 mg/dL. FH cases were also examined by their spatial distribution. Results: After applying exclusions, the final sample included 988,306 unique individuals, of which 24,273 individuals (1:40) had LDL-C values of ≥190 mg/dL. Based on the MEDPED criteria, 2416 individuals (1:409) had FH. FH prevalence was highest in individuals 10–19 years (1:40) and decreased as the patient age increased. Among individuals ≥40 years, the prevalence of FH was higher for females compared with males (1:755 vs 1:1037, p < 0.001). Median LDL-C for the overall population was 112 mg/dL (IQR = 88-136 mg/dL). The highest prevalence after removing outliers was observed in Rajan Pur district (1.23% [0.70–2.10%]) in Punjab province, followed by Mardan (1.18% [0.80–1.70%]) in Khyber Pakhtunkhwa province, and Okara (0.99% [0.50–1.80%]) in Punjab province. Conclusion: There is high prevalence of actionable LDL-C values in lipid samples across a large network of laboratories in Pakistan. Variable FH prevalence across geographic locations in Pakistan may need to be explored at the population level for intervention and management of contributory factors. Efforts at early diagnosis and treatment of FH are urgently needed.Preventing Pandemics Via International Development : A Systems Approach
AbstractBogich, T. L., Chunara, R., Chunara, R., Scales, D., Chan, E., Pinheiro, L. C., Chmura, A. A., Carroll, D., Daszak, P., & Brownstein, J. S. (n.d.).Publication year
2012Journal title
PLoS MedicineVolume
9Issue
12Abstract~Public health for the people : Participatory infectious disease surveillance in the digital age
AbstractWójcik, O. P., Brownstein, J. S., Chunara, R., Chunara, R., & Johansson, M. A. (n.d.).Publication year
2014Journal title
Emerging Themes in EpidemiologyVolume
11Issue
1AbstractThe 21st century has seen the rise of Internet-based participatory surveillance systems for infectious diseases. These systems capture voluntarily submitted symptom data from the general public and can aggregate and communicate that data in near real-time. We reviewed participatory surveillance systems currently running in 13 different countries. These systems have a growing evidence base showing a high degree of accuracy and increased sensitivity and timeliness relative to traditional healthcare-based systems. They have also proven useful for assessing risk factors, vaccine effectiveness, and patterns of healthcare utilization while being less expensive, more flexible, and more scalable than traditional systems. Nonetheless, they present important challenges including biases associated with the population that chooses to participate, difficulty in adjusting for confounders, and limited specificity because of reliance only on syndromic definitions of disease limits. Overall, participatory disease surveillance data provides unique disease information that is not available through traditional surveillance sources.Publisher Correction : Impact of COVID-19 forecast visualizations on pandemic risk perceptions
AbstractPadilla, L., Hosseinpour, H., Fygenson, R., Howell, J., Chunara, R., Chunara, R., & Bertini, E. (n.d.).Publication year
2022Journal title
Scientific reportsVolume
12Issue
1Page(s)
3650AbstractThe original version of this Article contained an error in Figure 6 where “No Forecast” was incorrectly given as “No Rorecast”. The original Figure 6 accompanying legend appears below. The original Article has been corrected.Publisher Correction : Impact of COVID-19 forecast visualizations on pandemic risk perceptions (Scientific Reports, (2022), 12, 1, (2014), 10.1038/s41598-022-05353-1)
AbstractPadilla, L., Hosseinpour, H., Fygenson, R., Howell, J., Chunara, R., Chunara, R., & Bertini, E. (n.d.).Publication year
2022Journal title
Scientific reportsVolume
12Issue
1AbstractThe original version of this Article contained an error in Figure 6 where “No Forecast” was incorrectly given as “No Rorecast”. The original Figure 6 accompanying legend appears below. The original Article has been corrected.Quantifying depression-related language on social media during the COVID-19 pandemic
AbstractDavis, B. D., McKnight, D. E., Teodorescu, D., Quan-Haase, A., Chunara, R., Chunara, R., Fyshe, A., & Lizotte, D. J. (n.d.).Publication year
2020Journal title
International Journal of Population Data ScienceVolume
5Issue
4AbstractIntroduction The COVID-19 pandemic had clear impacts on mental health. Social media presents an opportunity for assessing mental health at the population level. Objectives 1) Identify and describe language used on social media that is associated with discourse about depression. 2) Describe the associations between identified language and COVID-19 incidence over time across several geographies. Methods We create a word embedding based on the posts in Reddit's/r/Depression and use this word embedding to train representations of active authors. We contrast these authors against a control group and extract keywords that capture differences between the two groups. We filter these keywords for face validity and to match character limits of an information retrieval system, Elasticsearch. We retrieve all geo-tagged posts on Twitter from April 2019 to June 2021 from Seattle, Sydney, Mumbai, and Toronto. The tweets are scored with BM25 using the keywords. We call this score rDD. We compare changes in average score over time with case counts from the pandemic's beginning through June 2021. Results We observe a pattern in rDD across all cities analyzed: There is an increase in rDD near the start of the pandemic which levels off over time. However, in Mumbai we also see an increase aligned with a second wave of cases. Conclusions Our results are concordant with other studies which indicate that the impact of the pandemic on mental health was highest initially and was followed by recovery, largely unchanged by subsequent waves. However, in the Mumbai data we observed a substantial rise in rDD with a large second wave. Our results indicate possible un-captured heterogeneity across geographies, and point to a need for a better understanding of this differential impact on mental health.Quantifying greenspace using deep learning in Karachi, Pakistan
AbstractChunara, R., Chunara, R., Zhang, M., Arshad, H., Abbas, M., Jehanzeb, H., Tahir, I., Hassan, J., Chunara, R., & others. (n.d.).Publication year
2023Abstract~Quantifying greenspace with satellite images in Karachi, Pakistan using a new data augmentation paradigm
AbstractChunara, R., Chunara, R., Zhang, M., Arshad, H., Abbas, M., Jehanzeb, H., Tahir, I., Hassan, J., Samad, Z., & Chunara, R. (n.d.).Publication year
2025Journal title
ACM Journal on Computing and Sustainable SocietiesAbstract~Quantifying the localized relationship between vector containment activities and dengue incidence in a real-world setting : A spatial and time series modelling analysis based on geo-located data from Pakistan
AbstractRehman, N. A., Salje, H., Kraemer, M. U., Subramanian, L., Saif, U., Chunara, R., & Chunara, R. (n.d.).Publication year
2020Journal title
PLoS neglected tropical diseasesVolume
14Issue
5Page(s)
1-22AbstractIncreasing urbanization is having a profound effect on infectious disease risk, posing significant challenges for governments to allocate limited resources for their optimal control at a sub-city scale. With recent advances in data collection practices, empirical evidence about the efficacy of highly localized containment and intervention activities, which can lead to optimal deployment of resources, is possible. However, there are several challenges in analyzing data from such real-world observational settings. Using data on 3.9 million instances of seven dengue vector containment activities collected between 2012 and 2017, here we develop and assess two frameworks for understanding how the generation of new dengue cases changes in space and time with respect to application of different types of containment activities. Accounting for the non-random deployment of each containment activity in relation to dengue cases and other types of containment activities, as well as deployment of activities in different epidemiological contexts, results from both frameworks reinforce existing knowledge about the efficacy of containment activities aimed at the adult phase of the mosquito lifecycle. Results show a 10% (95% CI: 1–19%) and 20% reduction (95% CI: 4–34%) reduction in probability of a case occurring in 50 meters and 30 days of cases which had Indoor Residual Spraying (IRS) and fogging performed in the immediate vicinity, respectively, compared to cases of similar epidemiological context and which had no containment in their vicinity. Simultaneously, limitations due to the real-world nature of activity deployment are used to guide recommendations for future deployment of resources during outbreaks as well as data collection practices. Conclusions from this study will enable more robust and comprehensive analyses of localized containment activities in resource-scarce urban settings and lead to improved allocation of resources of government in an outbreak setting.Quantitative methods for measuring neighborhood characteristics in neighborhood health research
AbstractDuncan, D. T., Goedel, W. C., Chunara, R., & Chunara, R. (n.d.).Publication year
2018Page(s)
57-90Abstract~Quasi-experimental designs for assessing response on social media to policy changes
AbstractTian, Y., Chunara, R., & Chunara, R. (n.d.).Publication year
2020Page(s)
671-682AbstractRegulation of tobacco products is rapidly evolving. Understanding public sentiment in response to changes is very important as authorities assess how to effectively protect population health. Social media systems are widely recognized to be useful for collecting data about human preferences and perceptions. However, how social media data may be used, in rapid policy change settings, given challenges of narrow time periods and specific locations and non-representative the population using social media is an open question. In this paper we apply quasi-experimental designs, which have been used previously in observational data such as social media, to control for time and location confounders on social media, and then use content analysis of Twitter and Reddit posts to illustrate the content of reactions to tobacco flavor bans and the effect of taxation on e-cigarettes. Conclusions distill the potential role of social media in settings of rapidly changing regulation, in complement to what is learned by traditional denominator-based representative surveys.Race, ethnicity and national origin-based discrimination in social media and hate crimes across 100 U.S. cities
AbstractRelia, K., Li, Z., Cook, S. H., Chunara, R., & Chunara, R. (n.d.).Publication year
2019Page(s)
417-427AbstractWe study malicious online content via a specific type of hate speech: race, ethnicity and national-origin based discrimination in social media, alongside hate crimes motivated by those characteristics, in 100 cities across the United States. We develop a spatially-diverse training dataset and classification pipeline to delineate targeted and self-narration of discrimination on social media, accounting for language across geographies. Controlling for census parameters, we find that the proportion of discrimination that is targeted is associated with the number of hate crimes. Finally, we explore the linguistic features of discrimination Tweets in relation to hate crimes by city, features used by users who Tweet different amounts of discrimination, and features of discrimination compared to non-discrimination Tweets. Findings from this spatial study can inform future studies of how discrimination in physical and virtual worlds vary by place, or how physical and virtual world discrimination may synergize.