Assistant Professor of Biostatistics
Dr. Rumi Chunara researches and develops ways to use unstructured data in real-world applications and understand population health. As a computer engineers and scientist, she has revolutionized how medical and public health researchers collect health information through the Internet and mobile technology.
Driven to understand how and why diseases spread in populations, she has developed cutting-edge research models at HealthMap and the Children's Hospital Informatics Program at Harvard Medical School. Through the GoViral study, Dr. Chunara works closely with students on campus to collect crowd sourced data of influenza in real-time. GoViral uses the collected data and modeling methods to better understand viral spread, uncover geographical variation in spread and epidemiology, and predict and recommend behaviors that limit disease spread. At NYU, Dr. Chunara also leads the Chunara Lab, which develops computational and statistical methods across data mining, natural language processing, spatio-temporal analyses and machine learning, to study population health.
BS, Electrical Engineering (Honors), CaltechMS, Electrical Engineering and Computer Science, MITPhD, Medical and Electrical Engineering, MIT (Harvard-MIT Division of Health Sciences and Technology)
MIT Technology Review 35 (2014)MIT Presidential Fellow (2004)
Data miningPersonally-generated dataSpatio-temporal statistics
Characterizing sleep issues using TwitterMcIver, D. J., Hawkins, J. B., Chunara, R., Chatterjee, A. K., Bhandari, A., Fitzgerald, T. P., Jain, S. H., & Brownstein, J. S.
Journal titleJournal of Medical Internet Research
Page(s)e140Background: Sleep issues such as insomnia affect over 50 million Americans and can lead to serious health problems, including depression and obesity, and can increase risk of injury. Social media platforms such as Twitter offer exciting potential for their use in studying and identifying both diseases and social phenomenon. Objective: Our aim was to determine whether social media can be used as a method to conduct research focusing on sleep issues. Methods: Twitter posts were collected and curated to determine whether a user exhibited signs of sleep issues based on the presence of several keywords in tweets such as insomnia, "can't sleep", Ambien, and others. Users whose tweets contain any of the keywords were designated as having self-identified sleep issues (sleep group). Users who did not have self-identified sleep issues (non-sleep group) were selected from tweets that did not contain pre-defined words or phrases used as a proxy for sleep issues. Results: User data such as number of tweets, friends, followers, and location were collected, as well as the time and date of tweets. Additionally, the sentiment of each tweet and average sentiment of each user were determined to investigate differences between non-sleep and sleep groups. It was found that sleep group users were significantly less active on Twitter (P=.04), had fewer friends (P<.001), and fewer followers (P<.001) compared to others, after adjusting for the length of time each user's account has been active. Sleep group users were more active during typical sleeping hours than others, which may suggest they were having difficulty sleeping. Sleep group users also had significantly lower sentiment in their tweets (P<.001), indicating a possible relationship between sleep and pyschosocial issues. Conclusions: We have demonstrated a novel method for studying sleep issues that allows for fast, cost-effective, and customizable data to be gathered.
Estimating influenza attack rates in the United States using a participatory cohortChunara, R., Goldstein, E., Patterson-Lomba, O., & Brownstein, J. S.
Journal titleScientific Reports
Volume5We considered how participatory syndromic surveillance data can be used to estimate influenza attack rates during the 2012-2013 and 2013-2014 seasons in the United States. Our inference is based on assessing the difference in the rates of self-reported influenza-like illness (ILI, defined as presence of fever and cough/sore throat) among the survey participants during periods of active vs. low influenza circulation as well as estimating the probability of self-reported ILI for influenza cases. Here, we combined Flu Near You data with additional sources (Hong Kong household studies of symptoms of influenza cases and the U.S. Centers for Disease Control and Prevention estimates of vaccine coverage and effectiveness) to estimate influenza attack rates. The estimated influenza attack rate for the early vaccinated Flu Near You members (vaccination reported by week 45) aged 20-64 between calendar weeks 47-12 was 14.7%(95% CI(5.9%,24.1%)) for the 2012-2013 season and 3.6%(â '3.3%,10.3%) for the 2013-2014 season. The corresponding rates for the US population aged 20-64 were 30.5% (4.4%, 49.3%) in 2012-2013 and 7.1%(-5.1%, 32.5%) in 2013-2014. The attack rates in women and men were similar each season. Our findings demonstrate that participatory syndromic surveillance data can be used to gauge influenza attack rates during future influenza seasons.
Flu near you: Crowdsourced symptom reporting spanning 2 influenza seasonsSmolinski, M. S., Crawley, A. W., Baltrusaitis, K., Chunara, R., Olsen, J. M., Wójcik, O., Santillana, M., Nguyen, A., & Brownstein, J. S.
Journal titleAmerican Journal of Public Health
Page(s)2124-2130Objectives. We summarized Flu Near You (FNY) data from the 2012?2013 and 2013?2014 influenza seasons in the United States. Methods. FNY collects limited demographic characteristic information upon registration, and prompts users each Monday to report symptoms of influenzalike illness (ILI) experienced during the previous week. We calculated the descriptive statistics and rates of ILI for the 2012?2013 and 2013?2014 seasons. We compared raw and noise-filtered ILI rates with ILI rates from the Centers for Disease Control and Prevention ILINet surveillance system. Results. More than 61 000 participants submitted at least 1 report during the 2012?2013 season, totaling 327 773 reports. Nearly 40 000 participants submitted at least 1 report during the 2013?2014 season, totaling 336 933 reports. Rates of ILI as reported by FNY tracked closely with ILINet in both timing and magnitude. Conclusions. With increased participation, FNY has the potential to serve as a viable complement to existing outpatient, hospital-based, and laboratory surveillance systems. Although many established systems have the benefits of specificity and credibility, participatory systems offer advantages in the areas of speed, sensitivity, and scalability.
A case study of the New York City 2012-2013 influenza season with daily geocoded Twitter data from temporal and spatiotemporal perspectivesNagar, R., Yuan, Q., Freifeld, C. C., Santillana, M., Nojima, A., Chunara, R., & Brownstein, J. S.
Journal titleJournal of Medical Internet Research
Page(s)e236Background: Twitter has shown some usefulness in predicting influenza cases on a weekly basis in multiple countries and on different geographic scales. Recently, Broniatowski and colleagues suggested Twitter's relevance at the city-level for New York City. Here, we look to dive deeper into the case of New York City by analyzing daily Twitter data from temporal and spatiotemporal perspectives. Also, through manual coding of all tweets, we look to gain qualitative insights that can help direct future automated searches. Objective: The intent of the study was first to validate the temporal predictive strength of daily Twitter data for influenza-like illness emergency department (ILI-ED) visits during the New York City 2012-2013 influenza season against other available and established datasets (Google search query, or GSQ), and second, to examine the spatial distribution and the spread of geocoded tweets as proxies for potential cases. Methods: From the Twitter Streaming API, 2972 tweets were collected in the New York City region matching the keywords "flu", "influenza", "gripe", and "high fever". The tweets were categorized according to the scheme developed by Lamb et al. A new fourth category was added as an evaluator guess for the probability of the subject(s) being sick to account for strength of confidence in the validity of the statement. Temporal correlations were made for tweets against daily ILI-ED visits and daily GSQ volume. The best models were used for linear regression for forecasting ILI visits. A weighted, retrospective Poisson model with SaTScan software (n=1484), and vector map were used for spatiotemporal analysis. Results: Infection-related tweets (R=.763) correlated better than GSQ time series (R=.683) for the same keywords and had a lower mean average percent error (8.4 vs 11.8) for ILI-ED visit prediction in January, the most volatile month of flu. SaTScan identified primary outbreak cluster of high-probability infection tweets with a 2.74 relative risk ratio compared to medium-probability infection tweets at P=.001 in Northern Brooklyn, in a radius that includes Barclay's Center and the Atlantic Avenue Terminal. Conclusions: While others have looked at weekly regional tweets, this study is the first to stress test Twitter for daily city-level data for New York City. Extraction of personal testimonies of infection-related tweets suggests Twitter's strength both qualitatively and quantitatively for ILI-ED prediction compared to alternative daily datasets mixed with awareness-based data such as GSQ. Additionally, granular Twitter data provide important spatiotemporal insights. A tweet vector-map may be useful for visualization of city-level spread when local gold standard data are otherwise unavailable.
Averting the perfect storm: addressing youth substance use risk from social media useSalimian, P. K., Chunara, R., & Weitzman, E. R.
Journal titlePediatric Annals
Page(s)411Adolescents are developmentally sensitive to pathways that influence alcohol and other drug (AOD) use. In the absence of guidance, their routine engagement with social media may add a further layer of risk. There are several potential mechanisms for social media use to influence AOD risk, including exposure to peer portrayals of AOD use, socially amplified advertising, misinformation, and predatory marketing against a backdrop of lax regulatory systems and privacy controls. Here the authors summarize the influences of the social media world and suggest how pediatricians in everyday practice can alert youth and their parents to these risks to foster conversation, awareness, and harm reduction.
Public health for the people: Participatory infectious disease surveillance in the digital ageWójcik, O. P., Brownstein, J. S., Chunara, R., & Johansson, M. A.
Journal titleEmerging Themes in Epidemiology
Issue1The 21st century has seen the rise of Internet-based participatory surveillance systems for infectious diseases. These systems capture voluntarily submitted symptom data from the general public and can aggregate and communicate that data in near real-time. We reviewed participatory surveillance systems currently running in 13 different countries. These systems have a growing evidence base showing a high degree of accuracy and increased sensitivity and timeliness relative to traditional healthcare-based systems. They have also proven useful for assessing risk factors, vaccine effectiveness, and patterns of healthcare utilization while being less expensive, more flexible, and more scalable than traditional systems. Nonetheless, they present important challenges including biases associated with the population that chooses to participate, difficulty in adjusting for confounders, and limited specificity because of reliance only on syndromic definitions of disease limits. Overall, participatory disease surveillance data provides unique disease information that is not available through traditional surveillance sources.
Assessing the Online Social Environment for Surveillance of Obesity PrevalenceChunara, R., Bouton, L., Ayers, J. W., & Brownstein, J. S.
Journal titlePLoS One
Issue4Background:Understanding the social environmental around obesity has been limited by available data. One promising approach used to bridge similar gaps elsewhere is to use passively generated digital data.Purpose:This article explores the relationship between online social environment via web-based social networks and population obesity prevalence.Methods:We performed a cross-sectional study using linear regression and cross validation to measure the relationship and predictive performance of user interests on the online social network Facebook to obesity prevalence in metros across the United States of America (USA) and neighborhoods within New York City (NYC). The outcomes, proportion of obese and/or overweight population in USA metros and NYC neighborhoods, were obtained via the Centers for Disease Control and Prevention Behavioral Risk Factor Surveillance and NYC EpiQuery systems. Predictors were geographically specific proportion of users with activity-related and sedentary-related interests on Facebook.Results:Higher proportion of the population with activity-related interests on Facebook was associated with a significant 12.0% (95% Confidence Interval (CI) 11.9 to 12.1) lower predicted prevalence of obese and/or overweight people across USA metros and 7.2% (95% CI: 6.8 to 7.7) across NYC neighborhoods. Conversely, greater proportion of the population with interest in television was associated with higher prevalence of obese and/or overweight people of 3.9% (95% CI: 3.7 to 4.0) (USA) and 27.5% (95% CI: 27.1 to 27.9, significant) (NYC). For activity-interests and national obesity outcomes, the average root mean square prediction error from 10-fold cross validation was comparable to the average root mean square error of a model developed using the entire data set.Conclusions:Activity-related interests across the USA and sedentary-related interests across NYC were significantly associated with obesity prevalence. Further research is needed to understand how the online social environment relates to health outcomes and how it can be used to identify or target interventions.
Monitoring Influenza Epidemics in China with Search Query from BaiduYuan, Q., Nsoesie, E. O., Lv, B., Peng, G., Chunara, R., & Brownstein, J. S.
Journal titlePLoS One
Issue5Several approaches have been proposed for near real-time detection and prediction of the spread of influenza. These include search query data for influenza-related terms, which has been explored as a tool for augmenting traditional surveillance methods. In this paper, we present a method that uses Internet search query data from Baidu to model and monitor influenza activity in China. The objectives of the study are to present a comprehensive technique for: (i) keyword selection, (ii) keyword filtering, (iii) index composition and (iv) modeling and detection of influenza activity in China. Sequential time-series for the selected composite keyword index is significantly correlated with Chinese influenza case data. In addition, one-month ahead prediction of influenza cases for the first eight months of 2012 has a mean absolute percent error less than 11%. To our knowledge, this is the first study on the use of search query data from Baidu in conjunction with this approach for estimation of influenza activity in China.
Twitter as a Sentinel in Emergency Situations: Lessons from the Boston Marathon ExplosionsCassa, C. A., Chunara, R., Mandl, K., & Brownstein, J. S.
Journal titlePLoS CurrentsImmediately following the Boston Marathon attacks, individuals near the scene posted a deluge of data to social media sites. Previous work has shown that these data can be leveraged to provide rapid insight during natural disasters, disease outbreaks and ongoing conflicts that can assist in the public health and medical response. Here, we examine and discuss the social media messages posted immediately after and around the Boston Marathon bombings, and find that specific keywords appear frequently prior to official public safety and news media reports. Individuals immediately adjacent to the explosions posted messages within minutes via Twitter which identify the location and specifics of events, demonstrating a role for social media in the early recognition and characterization of emergency events. *Christopher Cassa and Rumi Chunara contributed equally to this work.
Using search queries for malaria surveillance, ThailandOcampo, A. J., Chunara, R., & Brownstein, J. S.
Journal titleMalaria Journal
Issue1Background: Internet search query trends have been shown to correlate with incidence trends for select infectious diseases and countries. Herein, the first use of Google search queries for malaria surveillance is investigated. The research focuses on Thailand where real-time malaria surveillance is crucial as malaria is re-emerging and developing resistance to pharmaceuticals in the region. Methods. Official Thai malaria case data was acquired from the World Health Organization (WHO) from 2005 to 2009. Using Google correlate, an openly available online tool, and by surveying Thai physicians, search queries potentially related to malaria prevalence were identified. Four linear regression models were built from different sub-sets of malaria-related queries to be used in future predictions. The models' accuracies were evaluated by their ability to predict the malaria outbreak in 2009, their correlation with the entire available malaria case data, and by Akaike information criterion (AIC). Results: Each model captured the bulk of the variability in officially reported malaria incidence. Correlation in the validation set ranged from 0.75 to 0.92 and AIC values ranged from 808 to 586 for the models. While models using malaria-related and general health terms were successful, one model using only microscopy-related terms obtained equally high correlations to malaria case data trends. The model built strictly of queries provided by Thai physicians was the only one that consistently captured the well-documented second seasonal malaria peak in Thailand. Conclusions: Models built from Google search queries were able to adequately estimate malaria activity trends in Thailand, from 2005-2010, according to official malaria case counts reported by WHO. While presenting their own limitations, these search queries may be valid real-time indicators of malaria incidence in the population, as correlations were on par with those of related studies for other infectious diseases. Additionally, this methodology provides a cost-effective description of malaria prevalence that can act as a complement to traditional public health surveillance. This and future studies will continue to identify ways to leverage web-based data to improve public health.
Why we need crowdsourced data in infectious disease surveillanceChunara, R., Smolinski, M. S., & Brownstein, J. S.
Journal titleCurrent Infectious Disease Reports
Page(s)316-319In infectious disease surveillance, public health data such as environmental, hospital, or census data have been extensively explored to create robust models of disease dynamics. However, this information is also subject to its own biases, including latency, high cost, contributor biases, and imprecise resolution. Simultaneously, new technologies including Internet and mobile phone based tools, now enable information to be garnered directly from individuals at the point of care. Here, we consider how these crowdsourced data offer the opportunity to fill gaps in and augment current epidemiological models. Challenges and methods for overcoming limitations of the data are also reviewed. As more new information sources become mature, incorporating these novel data into epidemiological frameworks will enable us to learn more about infectious disease dynamics.
New technologies for reporting real-time emergent infectionsChunara, R., Freifeld, C. C., & Brownstein, J. S.
Page(s)1843-1851Novel technologies have prompted a new paradigm in disease surveillance. Advances in computation, communications and materials enable new technologies such as mobile phones and microfluidic chips. In this paper we illustrate examples of new technologies that can augment disease detection. We describe technologies harnessing the internet, mobile phones, point of care diagnostic tools and methods that facilitate detection from passively collected unstructured data. We demonstrate how these can all assist in quicker detection, investigation and response to emerging infectious events. Novel technologies enable collection and dissemination of epidemic intelligence data to both public health practitioners and the general public, enabling finer temporal and spatial resolution of disease monitoring than through traditional public health processes.
Online reporting for malaria surveillance using micro-monetary incentives, in urban India 2010-2011Chunara, R., Chhaya, V., Bane, S., Mekaru, S. R., Chan, E. H., Freifeld, C. C., & Brownstein, J. S.
Journal titleMalaria Journal
Volume11Background: The objective of this study was to investigate the use of novel surveillance tools in a malaria endemic region where prevalence information is limited. Specifically, online reporting for participatory epidemiology was used to gather information about malaria spread directly from the public. Individuals in India were incentivized to self-report their recent experience with malaria by micro-monetary payments. Methods. Self-reports about malaria diagnosis status and related information were solicited online via Amazon's Mechanical Turk. Responders were paid $0.02 to answer survey questions regarding their recent experience with malaria. Timing of the peak volume of weekly self-reported malaria diagnosis in 2010 was compared to other available metrics such as the volume over time of and information about the epidemic from media sources. Distribution of Plasmodium species reports were compared with values from the literature. The study was conducted in summer 2010 during a malaria outbreak in Mumbai and expanded to other cities during summer 2011, and prevalence from self-reports in 2010 and 2011 was contrasted. Results: Distribution of Plasmodium species diagnosis through self-report in 2010 revealed 59% for Plasmodium vivax, which is comparable to literature reports of the burden of P. vivax in India (between 50 and 69%). Self-reported Plasmodium falciparum diagnosis was 19% and during the 2010 outbreak and the estimated burden was between 10 and 15%. Prevalence between 2010 and 2011 via self-reports decreased significantly from 36.9% to 19.54% in Mumbai (p = 0.001), and official reports also confirmed a prevalence decrease in 2011. Conclusions: With careful study design, micro-monetary incentives and online reporting are a rapid way to solicit malaria, and potentially other public health information. This methodology provides a cost-effective way of executing a field study that can act as a complement to traditional public health surveillance methods, offering an opportunity to obtain information about malaria activity, temporal progression, demographics affected or Plasmodium-specific diagnosis at a finer resolution than official reports can provide. The recent adoption of technologies, such as the Internet supports self-reporting mediums, and self-reporting should continue to be studied as it can foster preventative health behaviours.
Preventing Pandemics Via International Development: A Systems ApproachBogich, T. L., Chunara, R., Scales, D., Chan, E., Pinheiro, L. C., Chmura, A. A., Carroll, D., Daszak, P., & Brownstein, J. S.
Journal titlePLoS Medicine
Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreakChunara, R., Andrews, J. R., & Brownstein, J. S.
Journal titleAmerican Journal of Tropical Medicine and Hygiene
Page(s)39-45During infectious disease outbreaks, data collected through health institutions and official reporting structures may not be available for weeks, hindering early epidemiologic assessment. By contrast, data from informal media are typically available in near real-time and could provide earlier estimates of epidemic dynamics. We assessed correlation of volume of cholera-related HealthMap news media reports, Twitter postings, and government cholera cases reported in the first 100 days of the 2010 Haitian cholera outbreak. Trends in volume of informal sources significantly correlated in time with official case data and was available up to 2 weeks earlier. Estimates of the reproductive number ranged from 1.54 to 6.89 (informal sources) and 1.27 to 3.72 (official sources) during the initial outbreak growth period, and 1.04 to 1.51 (informal) and 1.06 to 1.73 (official) when Hurricane Tomas afflicted Haiti. Informal data can be used complementarily with official data in an outbreak setting to get timely estimates of disease dynamics.
Suspended microchannel resonators with piezoresistive sensorsLee, J., Chunara, R., Shen, W., Payer, K., Babcock, K., Burg, T. P., & Manalis, S. R.
Journal titleLab on a Chip - Miniaturisation for Chemistry and Biology
Page(s)645-651Precision frequency detection has enabled the suspended microchannel resonator (SMR) to weigh single living cells, single nanoparticles, and adsorbed protein layers in fluid. To date, the SMR resonance frequency has been determined optically, which requires the use of an external laser and photodiode and cannot be easily arrayed for multiplexed measurements. Here we demonstrate the first electronic detection of SMR resonance frequency by fabricating piezoresistive sensors using ion implantation into single crystal silicon resonators. To validate the piezoresistive SMR, buoyant mass histograms of budding yeast cells and a mixture of 1.6, 2.0, 2.5, and 3.0 m diameter polystyrene beads are measured. For SMRs designed to weigh micron-sized particles and cells, the mass resolution achieved with piezoresistive detection (∼3.4 fg in a 1 kHz bandwidth) is comparable to what can be achieved by the conventional optical-lever detector. Eliminating the need for expensive and delicate optical components will enable new uses for the SMR in both multiplexed and field deployable applications.
Participatory epidemiology: Use of mobile phones for community-based health reportingFreifeld, C. C., Chunara, R., Mekaru, S. R., Chan, E. H., Kass-Hout, T., Iacucci, A. A., & Brownstein, J. S.
Journal titlePLoS Medicine
Mass-based readout for agglutination assaysChunara, R., Godin, M., Knudsen, S. M., & Manalis, S. R.
Journal titleApplied Physics Letters
Issue19We present a mass-based readout for agglutination assays. The suspended microchannel resonator (SMR) is used to classify monomers and dimers that are formed during early stage aggregation, and to relate the total count to the analyte concentration. Using a model system of streptavidin functionalized microspheres and biotinylated antibody as the analyte, we obtain a dose-response curve over a concentration range of 0.63-630 nM and show that the results are comparable to what has been previously achieved by image analysis and conventional flow cytometry.
Phased array systems in siliconHajimiri, A., Komijani, A., Natarajan, A., Chunara, R., Guan, X., & Hashemi, H.
Journal titleIEEE Communications Magazine
Page(s)122-130Phased array systems, a special case of MIMO systems, take advantage of spatial directivity and array gain to increase spectral efficiency. Implementing a phased array system at high frequency in a commercial silicon process technology presents several challenges. This article focuses on the architectural and circuit-level trade-offs involved in the design of the first silicon-based fully integrated phased array system operating at 24 GHz. The details of some of the important circuit building blocks are also discussed. The measured results demonstrate the feasibility of using integrated phased arrays for wireless communication and vehicular radar applications at 24 GHz.