Yajun Mei
Yajun Mei
Professor of Biostatistics
-
Professional overview
-
Yajun Mei is a Professor of Biostatistics at NYU/GPH, starting from July 1, 2024. He received the B.S. degree in Mathematics from Peking University, Beijing, China, in 1996, and the Ph.D. degree in Mathematics with a minor in Electrical Engineering from the California Institute of Technology, Pasadena, CA, USA, in 2003. He was a Postdoc in Biostatistics in the renowned Fred Hutch Cancer Center in Seattle, WA during 2003 and 2005. Prior to joining NYU, Dr. Mei was an Assistant/Associate/Full Professor in H. Milton Stewart School of Industrial and Systems Engineering at the Georgia Institute of Technology, Atlanta, GA for 18 years from 2006 to 2024, and had been a co-director of Biostatistics, Epidemiology, and Study Design (BERD) of Georgia CTSA since 2018.
Dr. Mei’s research interests are statistics, machine learning, and data science, and their applications in biomedical science and public health, particularly, streaming data analysis, sequential decision/design, change-point problems, precision/personalized medicine, hot-spots detection for infectious diseases, longitudinal data analysis, bioinformatics, and clinical trials. His work has received several recognitions including Abraham Wald Prizes in Sequential Analysis in both 2009 and 2024, NSF CAREER Award in 2010, an elected Fellow of American Statistical Association (ASA) in 2023, and multiple best paper awards.
-
Education
-
BS, Mathematics, Peking UniversityPhD, Mathematics, California Institute of Technology
-
Honors and awards
-
Fellow of American Statistical Association (2023)Star Research Achievement Award, 2021 Virtual Critical Care Congress (2021)Best Paper Competition Award, Quality, Statistics & Reliability of INFORMS (2020)Bronze Snapshot Award, Society of Critical Care Medicine (2019)NSF Career AwardThank a Teacher Certificate, Center for Teaching and Learning (2011201220162020202120222023)Abraham Wald Prize (2009)Best Paper Award, 11th International Conference on Information Fusion (2008)New Researcher Fellow, Statistical and Applied Mathematical Sciences Institute (2005)Fred Hutchinson SPAC Travel Award to attend 2005 Joint Statistical Meetings, Minneapolis, MN (2005)Travel Award to 8th New Researchers Conference, Minneapolis, MN (2005)Travel Award to IEEE International Symposium on Information Theory, Chicago, IL (2004)Travel Award to IPAM workshop on inverse problem, UCLA, Los Angeles, CA (2003)Fred Hutchinson SPAC Course Scholarship (2003)Travel Award to the SAMSI workshop on inverse problem, Research Triangular Park, NC (2002)
-
Publications
Publications
Large-Scale Multi-Stream Quickest Change Detection via Shrinkage Post-Change Estimation
AbstractWang, Y., & Mei, Y. (n.d.).Publication year
2015Journal title
IEEE Transactions on Information TheoryVolume
61Issue
12Page(s)
6926-6938AbstractThe quickest change detection problem is considered in the context of monitoring large-scale independent normal distributed data streams with possible changes in some of the means. It is assumed that for each individual local data stream, either there are no local changes, or there is a big local change that is larger than a pre-specified lower bound. Two different types of scenarios are studied: one is the sparse post-change case when the unknown number of affected data streams is much smaller than the total number of data streams, and the other is when all local data streams are affected simultaneously although not necessarily identically. We propose a systematic approach to develop efficient global monitoring schemes for quickest change detection by combining hard thresholding with linear shrinkage estimators to estimating all post-change parameters simultaneously. Our theoretical analysis demonstrates that the shrinkage estimation can balance the tradeoff between the first-order and second-order terms of the asymptotic expression on the detection delays, and our numerical simulation studies illustrate the usefulness of shrinkage estimation and the challenge of Monte Carlo simulation of the average run length to false alarm in the context of online monitoring large-scale data streams.Linear-mixed effects models for feature selection in high-dimensional NMR spectra
AbstractMei, Y., Kim, S. B., & Tsui, K. L. (n.d.).Publication year
2009Journal title
Expert Systems with ApplicationsVolume
36Issue
3 PART 1Page(s)
4703-4708AbstractFeature selection in metabolomics can identify important metabolite features that play a significant role in discriminating between various conditions among samples. In this paper, we propose an efficient feature selection method for high-resolution nuclear magnetic resonance (NMR) spectra obtained from time-course experiments. Our proposed approach combines linear-mixed effects (LME) models with a multiple testing procedure based on a false discovery rate. The proposed LME approach is illustrated using NMR spectra with 574 metabolite features obtained for an experiment to examine metabolic changes in response to sulfur amino acid intake. The experimental results showed that classification models constructed with the features selected by the proposed approach resulted in lower rates of misclassification than those models with full features. Furthermore, we compared the LME approach with the two-sample t-test approach that oversimplifies the time-course factor.Monitoring High-Dimensional Streaming Data via Fusing Nonparametric Shiryaev-Roberts Statistics
AbstractZhang, X., & Mei, Y. (n.d.).Publication year
2024Page(s)
1065-1070AbstractMonitoring high-dimensional streaming data has a wide range of applications in science, engineering, and industry. In this work, we propose an efficient and robust sequential change-point detection algorithm for monitoring high-dimensional streaming data. It has two components. At the local level, we adopt a window-limited nonparametric Shiryaev-Roberts (WL-NPSR) statistic for detecting potential distribution changes at each dimension of the streaming data. At the global level, we fuse local WL-NPSR statistics together to construct a global monitoring statistic via quantile filtering and sum-shrinkage functions. Theoretical analysis and extensive numerical experiments demonstrate the efficiency and robustness of our proposed algorithm.Multi-Stream Quickest Detection with Unknown Post-Change Parameters under Sampling Control
AbstractAbstractThe multi-stream quickest detection problem with unknown post-change parameters is studied under the sampling control constraint, where there are M local processes in a system but one is only able to take observations from one of these M local processes at each time instant. The objective is to raise a correct alarm as quickly as possible once the change occurs subject to both false alarm and sampling control constraints. We propose an efficient myopic-sampling-based quickest detection algorithm under sampling control constraint, and show it is asymptotically optimal in the sense of minimizing the detection delay under our context when the number M of processes is fixed. Simulation studies are conducted to validate our theoretical results.Nonparametric monitoring of multivariate data via KNN learning
AbstractLi, W., Zhang, C., Tsung, F., & Mei, Y. (n.d.).Publication year
2021Journal title
International Journal of Production ResearchVolume
59Issue
20Page(s)
6311-6326AbstractProcess monitoring of multivariate quality attributes is important in many industrial applications, in which rich historical data are often available thanks to modern sensing technologies. While multivariate statistical process control (SPC) has been receiving increasing attention, existing methods are often inadequate as they are sensitive to the parametric model assumptions of multivariate data. In this paper, we propose a novel, nonparametric k-nearest neighbours empirical cumulative sum (KNN-ECUSUM) control chart that is a machine-learning-based black-box control chart for monitoring multivariate data by utilising extensive historical data under both in-control and out-of-control scenarios. Our proposed method utilises the k-nearest neighbours (KNN) algorithm for dimension reduction to transform multivariate data into univariate data and then applies the CUSUM procedure to monitor the change on the empirical distribution of the transformed univariate data. Extensive simulation studies and a real industrial example based on a disk monitoring system demonstrate the robustness and effectiveness of our proposed method.Online parallel monitoring via hard-thresholding post-change estimation
AbstractAbstractThe online parallel monitoring problem is studied when one is monitoring large-scale data streams, and an event occurs at an unknown time and affects an unknown subset of data streams. Efficient online parallel monitoring schemes are developed by combining the standard sequential change-point method with hard-thresholding post-change estimation. Theoretical analysis and simulation study demonstrate the usefulness of hard-thresholding for online parallel monitoring.Optimal stationary binary quantizer for decentralized quickest change detection in hidden Markov models
AbstractAbstractThe decentralized quickest change detection problem is studied in sensor networks, where a set of sensors receive observations from a hidden Markov model X and send sensor messages to a central processor, called the fusion center, which makes a final decision when observations are stopped. It is assumed that the parameter θ in the hidden Markov model for X changes from θ0 to θ1 at some unknown time. The problem is to determine the policies at the sensor and fusion center levels to jointly optimize the detection delay subject to the average run length (ARL) to false alarm constraint. In this article, a CUSUM-type fusion rule with stationary binary sensor messages is studied and a simple method for choosing the optimal local sensor thresholds is introduced. Further research is also given.Optimal Stopping for Interval Estimation in Bernoulli Trials
AbstractYaacoub, T., Moustakides, G. V., & Mei, Y. (n.d.).Publication year
2019Journal title
IEEE Transactions on Information TheoryVolume
65Issue
5Page(s)
3022-3033AbstractWe propose an optimal sequential methodology for obtaining confidence intervals for a binomial proportion heta. Assuming that an independent and identically distributed sequence of Bernoulli ( heta ) trials is observed sequentially, we are interested in designing: 1) a stopping time T that will decide the best time to stop sampling the process and 2) an optimum estimator \hat{{ heta}}-{{T}} that will provide the optimum center of the interval estimate of heta. We follow a semi-Bayesian approach, where we assume that there exists a prior distribution for heta , and our goal is to minimize the average number of samples while we guarantee a minimal specified coverage probability level. The solution is obtained by applying standard optimal stopping theory and computing the optimum pair (T,\hat{{ heta }}-{{T}}) numerically. Regarding the optimum stopping time component T , we demonstrate that it enjoys certain very interesting characteristics not commonly encountered in solutions of other classical optimal stopping problems. In particular, we prove that, for a particular prior (beta density), the optimum stopping time is always bounded from above and below; it needs to first accumulate a sufficient amount of information before deciding whether or not to stop, and it will always terminate before some finite deterministic time. We also conjecture that these properties are present with any prior. Finally, we compare our method with the optimum fixed-sample-size procedure as well as with existing alternative sequential schemes.Optimum Multi-Stream Sequential Change-Point Detection with Sampling Control
AbstractXu, Q., Mei, Y., & Moustakides, G. V. (n.d.).Publication year
2021Journal title
IEEE Transactions on Information TheoryVolume
67Issue
11Page(s)
7627-7636AbstractIn multi-stream sequential change-point detection it is assumed that there are M processes in a system and at some unknown time, an occurring event changes the distribution of the samples of a particular process. In this article, we consider this problem under a sampling control constraint when one is allowed, at each point in time, to sample a single process. The objective is to raise an alarm as quickly as possible subject to a proper false alarm constraint. We show that under sampling control, a simple myopic-sampling-based sequential change-point detection strategy is second-order asymptotically optimal when the number M of processes is fixed. This means that the proposed detector, even by sampling with a rate 1/M of the full rate, enjoys the same detection delay, up to some additive finite constant, as the optimal procedure. Simulation experiments corroborate our theoretical results.Pharmacologic Venous Thromboembolism Prophylaxis in Patients with Nontraumatic Subarachnoid Hemorrhage Requiring an External Ventricular Drain
AbstractUkpabi, C., Sadan, O., Shi, Y., Greene, K. N., Samuels, O., Mathew, S., Joy, J., Mei, Y., & Asbury, W. (n.d.).Publication year
2024Journal title
Neurocritical CareVolume
41Page(s)
779-787AbstractBackground: Optimal pharmacologic thromboprophylaxis dosing is not well described in patients with subarachnoid hemorrhage (SAH) with an external ventricular drain (EVD). Our patients with SAH with an EVD who receive prophylactic enoxaparin are routinely monitored using timed anti-Xa levels. Our primary study goal was to determine the frequency of venous thromboembolism (VTE) and secondary intracranial hemorrhage (ICH) for this population of patients who received pharmacologic prophylaxis with enoxaparin or unfractionated heparin (UFH). Methods: A retrospective chart review was performed for all patients with SAH admitted to the neurocritical care unit at Emory University Hospital between 2012 and 2017. All patients with SAH who required an EVD were included. Results: Of 1,351 patients screened, 868 required an EVD. Of these 868 patients, 627 received enoxaparin, 114 received UFH, and 127 did not receive pharmacologic prophylaxis. VTE occurred in 7.5% of patients in the enoxaparin group, 4.4% in the UFH group (p = 0.32), and 3.2% in the no VTE prophylaxis group (p = 0.08). Secondary ICH occurred in 3.83% of patients in the enoxaparin group, 3.51% in the UFH group (p = 1), and 3.94% in the no VTE prophylaxis group (p = 0.53). As steady-state anti-Xa levels increased from 0.1 units/mL to > 0.3 units/mL, there was a trend toward a lower incidence of VTE. However, no correlation was noted between rising anti-Xa levels and an increased incidence of secondary ICH. When compared, neither enoxaparin nor UFH use was associated with a significantly reduced incidence of VTE or an increased incidence of ICH. Conclusions: In this retrospective study of patients with nontraumatic SAH with an EVD who received enoxaparin or UFH VTE prophylaxis or no VTE prophylaxis, there was no statistically significant difference in the incidence of VTE or secondary ICH. For patients receiving prophylactic enoxaparin, achieving higher steady-state target anti-Xa levels may be associated with a lower incidence of VTE without increasing the risk of secondary ICH.Pivotal Estimation of Linear Discriminant Analysis in High Dimensions
AbstractFang, E. X., Mei, Y., Shi, Y., Xu, Q., & Zhao, T. (n.d.).Publication year
2023Journal title
Journal of Machine Learning ResearchVolume
24AbstractWe consider the linear discriminant analysis problem in the high-dimensional settings. In this work, we propose PANDA(PivotAl liNear Discriminant Analysis), a tuning-insensitive method in the sense that it requires very little effort to tune the parameters. Moreover, we prove that PANDA achieves the optimal convergence rate in terms of both the estimation error and misclassification rate. Our theoretical results are backed up by thorough numerical studies using both simulated and real datasets. In comparison with the existing methods, we observe that our proposed PANDA yields equal or better performance, and requires substantially less effort in parameter tuning.Precision in the specification of ordinary differential equations and parameter estimation in modeling biological processes
AbstractAbstractIn recent years, the use of differential equations to describe the dynamics of within-host viral infections, most frequently HIV-1 or Hepatitis B or C dynamics, has become quite common. The pioneering work described in [1,2,3,4] provided estimates of both the HIV-1 viral clearance rate, c, and infected cell turnover rate, δ, and revealed that while it often takes years for HIV-1 infection to progress to AIDS, the virus is replicating rapidly and continuously throughout these years of apparent latent infection. In addition, at least two compartments of viral-producing cells that decay at different rates were identified. Estimates of infected cell decay and viral clearance rates dramatically changed the understanding of HIV replication, etiology, and pathogenesis. Since that time, models of this type have been used extensively to describe and predict both in vivo viral and/or immune system dynamics and the transmission of HIV throughout a population. However, there are both mathematical and statistical challenges associated with models of this type, and the goal of this chapter is to describe some of these as well as offer possible solutions or options. In particular statistical aspects associated with parameter estimation, model comparison and study design will be described. Although the models developed by Perelson et al. [3,4] are relatively simple and were developed nearly 20 years ago, these models will be used in this chapter to demonstrate concepts in a relatively simple setting. In the first section, a statistical approach for model comparison is described using the model developed in [4] as the null hypothesis model for formal statistical comparison to an alternative model. In the next section, the concept of the mathematical sensitivity matrix and its relationship to the Fisher information matrix (FIM) will be described, and will be used to demonstrate how to evaluate parameter identifiability in ordinary differential equation (ODE) models. The next section demonstrates how to determine what types of additional data are required to address the problem of nonidentifiable parameters in ODE models. Examples are provided to demonstrate these concepts. The chapter ends with some recommendations.Predicting the rheology of limestone calcined clay cements (LC3) : Linking composition and hydration kinetics to yield stress through Machine Learning
AbstractCanbek, O., Xu, Q., Mei, Y., Washburn, N. R., & Kurtis, K. E. (n.d.).Publication year
2022Journal title
Cement and Concrete ResearchVolume
160AbstractThe physicochemical characteristics of calcined clay influence yield stress of limestone calcined clay cements (LC3), but the independent influences the clay's physical and chemical characteristics as well as the effect of other variables on LC3 rheology are less well-understood. Further, a relationship between LC3 hydration kinetics and yield stress – important for informing mixture design – has not yet been established. Here, rheological properties were determined in pastes with varying water-to-solid ratio (w/s), constituent mass ratios (PC:metakaolin:limestone), limestone particle size and gypsum content. From these data, an ML model developed allowed the independent examination of the different mechanisms by which metakaolin fraction influences yield stress of LC3, identifying four predictors – packing index, Al2O3/SO3, total particle density and metakaolin fraction relative to limestone (MK/LS) – most significant for predicting LC3 yield stress. A methodology based on kernel smoothing also identified hydration kinetics parameters best correlated with yield stress.Private Sequential Hypothesis Testing for Statisticians : Privacy, Error Rates, and Sample Size
AbstractZhang, W., Mei, Y., & Cummings, R. (n.d.).Publication year
2022Journal title
Proceedings of Machine Learning ResearchVolume
151Page(s)
11356-11373AbstractThe sequential hypothesis testing problem is a class of statistical analyses where the sample size is not fixed in advance. Instead, the decision-process takes in new observations sequentially to make real-time decisions for testing an alternative hypothesis against a null hypothesis until some stopping criterion is satisfied. In many common applications of sequential hypothesis testing, the data can be highly sensitive and may require privacy protection; for example, sequential hypothesis testing is used in clinical trials, where doctors sequentially collect data from patients and must determine when to stop recruiting patients and whether the treatment is effective. The field of differential privacy has been developed to offer data analysis tools with strong privacy guarantees, and has been commonly applied to machine learning and statistical tasks. In this work, we study the sequential hypothesis testing problem under a slight variant of differential privacy, known as Renyi differential privacy. We present a new private algorithm based on Wald's Sequential Probability Ratio Test (SPRT) that also gives strong theoretical privacy guarantees. We provide theoretical analysis on statistical performance measured by Type I and Type II error as well as the expected sample size. We also empirically validate our theoretical results on several synthetic databases, showing that our algorithms also perform well in practice. Unlike previous work in private hypothesis testing that focused only on the classical fixed sample setting, our results in the sequential setting allow a conclusion to be reached much earlier, and thus saving the cost of collecting additional samples.Quantitation of lymphatic transport mechanism and barrier influences on lymph node-resident leukocyte access to lymph-borne macromolecules and drug delivery systems
AbstractArcher, P. A., Sestito, L. F., Manspeaker, M. P., O’Melia, M. J., Rohner, N. A., Schudel, A., Mei, Y., & Thomas, S. N. (n.d.).Publication year
2021Journal title
Drug Delivery and Translational ResearchVolume
11Issue
6Page(s)
2328-2343AbstractLymph nodes (LNs) are tissues of the immune system that house leukocytes, making them targets of interest for a variety of therapeutic immunomodulation applications. However, achieving accumulation of a therapeutic in the LN does not guarantee equal access to all leukocyte subsets. LNs are structured to enable sampling of lymph draining from peripheral tissues in a highly spatiotemporally regulated fashion in order to facilitate optimal adaptive immune responses. This structure results in restricted nanoscale drug delivery carrier access to specific leukocyte targets within the LN parenchyma. Herein, a framework is presented to assess the manner in which lymph-derived macromolecules and particles are sampled in the LN to reveal new insights into how therapeutic strategies or drug delivery systems may be designed to improve access to dLN-resident leukocytes. This summary analysis of previous reports from our group assesses model nanoscale fluorescent tracer association with various leukocyte populations across relevant time periods post administration, studies the effects of bioactive molecule NO on access of lymph-borne solutes to dLN leukocytes, and illustrates the benefits to leukocyte access afforded by lymphatic-targeted multistage drug delivery systems. Results reveal trends consistent with the consensus view of how lymph is sampled by LN leukocytes resulting from tissue structural barriers that regulate inter-LN transport and demonstrate how novel, engineered delivery systems may be designed to overcome these barriers to unlock the therapeutic potential of LN-resident cells as drug delivery targets.Quantization effect on second moment of log-likelihood ratio and its application to decentralized sequential detection
AbstractAbstractIt is well known that quantization cannot increase the Kullback-Leibler divergence which can be thought of as the expected value or first moment of the log-likelihood ratio. In this paper, we investigate the quantization effects on the second moment of the log-likelihood ratio. It is shown that quantization may result in an increase in the case of the second moment, but the increase is bounded above by 2/e. The result is then applied to decentralized sequential detection problems to provide a simpler sufficient condition for asymptotic optimality theory, and the technique is also extended to investigate the quantization effects on other higher-order moments of the log-likelihood ratio and provide lower bounds on higher-order moments.Quantization effect on the log-likelihood ratio and its application to decentralized sequential detection
AbstractWang, Y., & Mei, Y. (n.d.).Publication year
2013Journal title
IEEE Transactions on Signal ProcessingVolume
61Issue
6Page(s)
1536-1543AbstractIt is well known that quantization cannot increase the Kullback-Leibler divergence which can be thought of as the expected value or first moment of the log-likelihood ratio. In this paper, we investigate the quantization effects on the second moment of the log-likelihood ratio. It is shown via the convex domination technique that quantization may result in an increase in the case of the second moment, but the increase is bounded above by 2/e. The result is then applied to decentralized sequential detection problems not only to provide simpler sufficient conditions for asymptotic optimality theories in the simplest models, but also to shed new light on more complicated models. In addition, some brief remarks on other higher-order moments of the log-likelihood ratio are also provided.Quickest Change Detection and Kullback-Leibler Divergence for Two-State Hidden Markov Models
AbstractFuh, C. D., & Mei, Y. (n.d.).Publication year
2015Journal title
IEEE Transactions on Signal ProcessingVolume
63Issue
18Page(s)
4866-4878AbstractIn this paper, the quickest change detection problem is studied in two-state hidden Markov models (HMM), where the vector parameter θ of the HMM changes from θ0 to θ1 at some unknown time, and one wants to detect the true change as quickly as possible while controlling the false alarm rate. It turns out that the generalized likelihood ratio (GLR) scheme, while theoretically straightforward, is generally computationally infeasible for the HMM. To develop efficient but computationally simple schemes for the HMM, we first discuss a subtlety in the recursive form of the generalized likelihood ratio (GLR) scheme for the HMM. Then we show that the recursive CUSUM scheme proposed in Fuh (Ann. Statist., 2003) can be regarded as a quasi-GLR scheme for pseudo post-change hypotheses with certain dependence structure between pre- and postchange observations. Next, we extend the quasi-GLR idea to propose recursive score schemes in the scenario when the postchange parameter θ1 of the HMM involves a real-valued nuisance parameter. Finally, the Kullback-Leibler (KL) divergence plays an essential role in the quickest change detection problem and many other fields, however it is rather challenging to numerically compute it in HMMs. Here we develop a non-Monte Carlo method that computes the KL divergence of two-state HMMs via the underlying invariant probability measure, which is characterized by the Fredholm integral equation. Numerical study demonstrates an unusual property of the KL divergence for HMM that implies the severe effects of misspecifying the postchange parameter for the HMM.Quickest change detection and Kullback-Leibler divergence for two-state hidden Markov models
AbstractAbstractThe quickest change detection problem is studied in two-state hidden Markov models (HMM), where the vector parameter θ of the HMM may change from θ0 to θ1 at some unknown time, and one wants to detect the true change as quickly as possible while controlling the false alarm rate. It turns out that the generalized likelihood ratio (GLR) scheme, while theoretically straightforward, is generally computationally infeasible for the HMM. To develop efficient but computationally simple schemes for the HMM, we first show that the recursive CUSUM scheme proposed in Fuh (Ann. Statist., 2003) can be regarded as a quasi-GLR scheme for some suitable pseudo post-change hypotheses. Next, we extend the quasi-GLR idea to propose recursive score schemes in a more complicated scenario when the post-change parameter θ1 of the HMM involves a real-valued nuisance parameter. Finally, our research provides an alternative approach that can numerically compute the Kullback-Leibler (KL) divergence of two-state HMMs via the invariant probability measure and the Fredholm integral equation.Quickest detection in censoring sensor networks
AbstractAbstractThe quickest change detection problem is studied in a general context of monitoring a large number of data streams in sensor networks when the trigger event may affect different sensors differently. In particular, the occurring event could have an immediate or delayed impact on some unknown, but not necessarily all, sensors. Motivated by censoring sensor networks, scalable detection schemes are developed based on the sum of those local CUSUM statistics that are large under either hard thresholding or top-r thresholding rules or both. The proposed schemes are shown to possess certain asymptotic optimality properties.Quickest Detection in High-Dimensional Linear Regression Models via Implicit Regularization
AbstractXu, Q., Yu, Y., & Mei, Y. (n.d.).Publication year
2024Page(s)
1059-1064AbstractIn this paper, we consider the quickest detection problem in high-dimensional streaming data, where the unknown regression coefficients might change at some unknown time. We propose a quickest detection algorithm based on the implicit regularization algorithm via gradient descent, and provide theoretical guarantees on the average run length to false alarm and detection delay. Numerical studies are conducted to validate the theoretical results.Rapid detection of hot-spot by tensor decomposition with application to weekly gonorrhea data
AbstractZhao, Y., Yan, H., Holte, S. E., Kerani, R. P., & Mei, Y. (n.d.).Publication year
2020Page(s)
289-310AbstractIn many bio-surveillance and healthcare applications, data sources are measured from many spatial locations repeatedly over time, say, daily/weekly/monthly. In these applications, we are typically interested in detecting hot-spots, which are defined as some structured outliers that are sparse over the spatial domain but persistent over time. In this paper, we propose a tensor decomposition method to detect when and where the hot-spots occur. Our proposed methods represent the observed raw data as a three-dimensional tensor including a circular time dimension for daily/weekly/monthly patterns, and then decompose the tensor into three components: smooth global trend, local hot-spots, and residuals. A combination of LASSO and fused LASSO is used to estimate the model parameters, and a CUSUM procedure is applied to detect when and where the hot-spots might occur. The usefulness of our proposed methodology is validated through numerical simulation and a real-world dataset in the weekly number of gonorrhea cases from 2006 to 2018 for 50 states in the United States.Rapid detection of hot-spots via tensor decomposition with applications to crime rate data
AbstractZhao, Y., Yan, H., Holte, S., & Mei, Y. (n.d.).Publication year
2022Journal title
Journal of Applied StatisticsVolume
49Issue
7Page(s)
1636-1662AbstractIn many real-world applications of monitoring multivariate spatio-temporal data that are non-stationary over time, one is often interested in detecting hot-spots with spatial sparsity and temporal consistency, instead of detecting system-wise changes as in traditional statistical process control (SPC) literature. In this paper, we propose an efficient method to detect hot-spots through tensor decomposition, and our method has three steps. First, we fit the observed data into a Smooth Sparse Decomposition Tensor (SSD-Tensor) model that serves as a dimension reduction and de-noising technique: it is an additive model decomposing the original data into: smooth but non-stationary global mean, sparse local anomalies, and random noises. Next, we estimate model parameters by the penalized framework that includes Least Absolute Shrinkage and Selection Operator (LASSO) and fused LASSO penalty. An efficient recursive optimization algorithm is developed based on Fast Iterative Shrinkage Thresholding Algorithm (FISTA). Finally, we apply a Cumulative Sum (CUSUM) Control Chart to monitor model residuals after removing global means, which helps to detect when and where hot-spots occur. To demonstrate the usefulness of our proposed SSD-Tensor method, we compare it with several other methods including scan statistics, LASSO-based, PCA-based, T2-based control chart in extensive numerical simulation studies and a real crime rate dataset.Repetitive Low-level Blast Exposure and Neurocognitive Effects in Army Ranger Mortarmen
AbstractWoodall, J. L., Sak, J. A., Cowdrick, K. R., Muñoz, B. M., McElrath, J. H., Trimpe, G. R., Mei, Y., Myhre, R. L., Rains, J. K., & Hutchinson, C. R. (n.d.).Publication year
2023Journal title
Military MedicineVolume
188Issue
3-4Page(s)
E771-E779AbstractIntroduction: Occupational exposure to repetitive, low-level blasts in military training and combat has been tied to subconcussive injury and poor health outcomes for service members. Most low-level blast studies to date have focused on explosive breaching and firing heavy weapon systems; however, there is limited research on the repetitive blast exposure and physiological effects that mortarmen experience when firing mortar weapon systems. Motivated by anecdotal symptoms of mortarmen, the purpose of this paper is to characterize this exposure and its resulting neurocognitive effects in order to provide preliminary findings and actionable recommendations to safeguard the health of mortarmen. Materials and Methods: In collaboration with the U.S. Army Rangers at Fort Benning, blast exposure, symptoms, and pupillary light reflex were measured during 3 days of firing 81 mm and 120 mm mortars in training. Blast exposure analysis included the examination of the blast overpressure (BOP) and cumulative exposure by mortarman position, as well as comparison to the 4 psi safety threshold. Pupillary light reflex responses were analyzed with linear mixed effects modeling. All neurocognitive results were compared between mortarmen (n=11) and controls (n=4) and cross-compared with blast exposure and blast history. Results: Nearly 500 rounds were fired during the study, resulting in a high cumulative blast exposure for all mortarmen. While two mortarmen had average BOPs exceeding the 4 psi safety limit (Fig. 2), there was a high prevalence of mTBIlike symptoms among all mortarmen, with over 70% experiencing headaches, ringing in the ears, forgetfulness/poor memory, and taking longer to think during the training week (n≥8/11). Mortarmen also had smaller and slower pupillary light reflex responses relative to controls, with significantly slower dilation velocity (PRobust change detection for large-scale data streams
AbstractZhang, R., Mei, Y., & Shi, J. (n.d.).Publication year
2022Journal title
Sequential AnalysisVolume
41Issue
1Page(s)
1-19AbstractRobust change point detection for large-scale data streams has many real-world applications in industrial quality control, signal detection, and biosurveillance. Unfortunately, it is highly nontrivial to develop efficient schemes due to three challenges: (1) the unknown sparse subset of affected data streams, (2) the unexpected outliers, and (3) computational scalability for real-time monitoring and detection. In this article, we develop a family of efficient real-time robust detection schemes for monitoring large-scale independent data streams. For each data stream, we propose to construct a new local robust detection statistic called the (Formula presented.) -CUSUM (cumulative sum) statistic that can reduce the effect of outliers by using the Box-Cox transformation of the likelihood function. Then the global scheme will raise an alarm based upon the sum of the shrinkage transformation of these local (Formula presented.) -CUSUM statistics to filter out unaffected data streams. In addition, we propose a new concept called false alarm breakdown point to measure the robustness of online monitoring schemes and propose a worst-case detection efficiency score to measure the detection efficiency when the data contain outliers. We then characterize the breakdown point and the efficiency score of our proposed schemes. Asymptotic analysis and numerical simulations are conducted to illustrate the robustness and efficiency of our proposed schemes.