Yajun Mei
Yajun Mei
Professor of Biostatistics
-
Professional overview
-
Yajun Mei is a Professor of Biostatistics at NYU/GPH, starting from July 1, 2024. He received the B.S. degree in Mathematics from Peking University, Beijing, China, in 1996, and the Ph.D. degree in Mathematics with a minor in Electrical Engineering from the California Institute of Technology, Pasadena, CA, USA, in 2003. He was a Postdoc in Biostatistics in the renowned Fred Hutch Cancer Center in Seattle, WA during 2003 and 2005. Prior to joining NYU, Dr. Mei was an Assistant/Associate/Full Professor in H. Milton Stewart School of Industrial and Systems Engineering at the Georgia Institute of Technology, Atlanta, GA for 18 years from 2006 to 2024, and had been a co-director of Biostatistics, Epidemiology, and Study Design (BERD) of Georgia CTSA since 2018.
Dr. Mei’s research interests are statistics, machine learning, and data science, and their applications in biomedical science and public health, particularly, streaming data analysis, sequential decision/design, change-point problems, precision/personalized medicine, hot-spots detection for infectious diseases, longitudinal data analysis, bioinformatics, and clinical trials. His work has received several recognitions including Abraham Wald Prizes in Sequential Analysis in both 2009 and 2024, NSF CAREER Award in 2010, an elected Fellow of American Statistical Association (ASA) in 2023, and multiple best paper awards.
-
Education
-
BS, Mathematics, Peking UniversityPhD, Mathematics, California Institute of Technology
-
Honors and awards
-
Fellow of American Statistical Association (2023)Star Research Achievement Award, 2021 Virtual Critical Care Congress (2021)Best Paper Competition Award, Quality, Statistics & Reliability of INFORMS (2020)Bronze Snapshot Award, Society of Critical Care Medicine (2019)NSF Career AwardThank a Teacher Certificate, Center for Teaching and Learning (2011201220162020202120222023)Abraham Wald Prize (2009)Best Paper Award, 11th International Conference on Information Fusion (2008)New Researcher Fellow, Statistical and Applied Mathematical Sciences Institute (2005)Fred Hutchinson SPAC Travel Award to attend 2005 Joint Statistical Meetings, Minneapolis, MN (2005)Travel Award to 8th New Researchers Conference, Minneapolis, MN (2005)Travel Award to IEEE International Symposium on Information Theory, Chicago, IL (2004)Travel Award to IPAM workshop on inverse problem, UCLA, Los Angeles, CA (2003)Fred Hutchinson SPAC Course Scholarship (2003)Travel Award to the SAMSI workshop on inverse problem, Research Triangular Park, NC (2002)
-
Publications
Publications
A comparison of fusion policies for local detection statistics in federated monitoring
AbstractZhang, X., Alexopoulos, C., & Mei, Y. (n.d.).Publication year
2026Journal title
SEQUENTIAL ANALYSIS - DESIGN METHODS AND APPLICATIONSAbstract~Evaluating Time-space methodologies to detect clusters of HIV transmission: a comparison of advanced methods in Washington State, 2010-2022
AbstractErly, S., Yan, H., Kerani, R., Mei, Y., & Holte, S. (n.d.).Publication year
2026Journal title
Journal of Acquired immune Deficiency Syndromes (JAIDS)Abstract~Is Grouping Always Detrimental to Monitoring Multinomial Data?
AbstractLi, J., & Mei, Y. (n.d.).Publication year
2026Journal title
TECHNOMETRICSAbstract~Quickest detection under weighted sampling
AbstractZhang, X., & Mei, Y. (n.d.).Publication year
2026Abstract~The average run length to false alarm of a differentially private CUSUM algorithm
AbstractMei, Y., & Yakir, B. (n.d.).Publication year
2026Abstract~Comprehensive Anatomical Staging Predicts Clinical Progression in Mild Cognitive Impairment: A Data-Driven Approach
AbstractTandon, R., Mei, Y., Lah, J. J., & Mitchell, C. S. (n.d.).Publication year
2025Journal title
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCESAbstract~Exploring the correlation between corrective glucose treatment and long-term patient outcomes: a SHINE secondary analysis
AbstractHorton, P., Patel, V., Hall, C. L., Johnston, K. C., Mei, Y., & Sadan, O. (n.d.).Publication year
2025Journal title
FRONTIERS IN NEUROLOGYAbstract~Impact of compensation coefficients on active sequential change point detection
AbstractXu, Q., Mei, Y., & Shi, J. (n.d.).Publication year
2025Journal title
Sequential AnalysisVolume
44Issue
2Page(s)
153-177AbstractUnder a general setting of active sequential change point detection problems, there are p local streams in a system but we are only able to take observations from q out of these p local streams at each time instant owing to the sampling control constraint. At some unknown change time, an undesired event occurs to the system and changes the local distributions from f to g for a subset of s unknown local streams. The objective is determining how to adaptively sample local streams and decide when to raise a global alarm, so that we can detect the correct change as quickly as possible subject to the false alarm constraint. One efficient algorithm is the TRAS algorithm proposed in Liu et al. (2015), which incorporates an idea of compensation coefficients for unobserved data streams. However, it is unclear how to choose the compensation coefficients suitably from a theoretical point of view to balance the trade-off between the detection delay and false alarm. In this article, we investigate the impact of compensation coefficients on the TRAS algorithm. Our main contributions are twofold. On the one hand, under the general setting, we prove that if the compensation coefficient is larger than (Formula presented.) where I(f, g) is the Kullback-Leibler divergence, then the TRAS algorithm is suboptimal in the sense of having too large detection delays. On the other hand, under the special case of (Formula presented.) if the compensation coefficient is small enough, then the TRAS algorithm is efficient to detect when the change occurs at time ν = 0. Though it remains an open problem to develop general asymptotic optimality theorems, our results shed lights how to tune compensation coefficients suitably in real-world applications, and extensive numerical studies are conducted to validate our results.Optimal change detection in muti-armed bandit
AbstractMei, Y., & Yakir, B. (n.d.).Publication year
2025Abstract~Precise False Alarm Rate of the SUM-CUSUM Scheme for High-Dimensional Streaming Data
AbstractMei, Y., & Yakir, B. (n.d.).Publication year
2025Abstract~Predicting confirmed cases of various epidemics using global temporal-feature-based graph convolutional network
AbstractKang, J., Kim, J. S., Park, H. J., Lee, S., Han, Y. J., Mei, Y., & Han, S. W. (n.d.).Publication year
2025Journal title
KNOWLEDGE-BASED SYSTEMSAbstract~Rollout designs for lump-sum data
AbstractXu, Q., Tian, H., Sarkar, A., & Mei, Y. (n.d.).Publication year
2025Journal title
Journal of Applied StatisticsVolume
52Issue
9Page(s)
1777-1790AbstractThis work studies rollout design problems with a focus of suitable choices of rollout rate under the standard Type I and Type II error probabilities control framework. The main challenge of rollout design is that data is often observed in a lump-sum manner from a spatio-temporal point of view: (1) temporally, only the sum of data in a given sliding time window can be observed; (2) spatially, there are two subgroups for the data at each time step: control and treatment, but one can only observe the total values instead of individual values from each subgroup. We develop rollout tests of lump-sum data under both fixed-sample-size and sequential settings, subject to the constraints on Type I and Type II error probabilities. Numerical studies are conducted to validate our theoretical results.Beyond Point Prediction : Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process
AbstractLi, Z., Xu, Q., Xu, Z., Mei, Y., Zhao, T., & Zha, H. (n.d.).Publication year
2024Journal title
Proceedings of Machine Learning ResearchVolume
235Page(s)
29096-29111AbstractSpatio-temporal point processes (STPPs) are potent mathematical tools for modeling and predicting events with both temporal and spatial features. Despite their versatility, most existing methods for learning STPPs either assume a restricted form of the spatio-temporal distribution, or suffer from inaccurate approximations of the intractable integral in the likelihood training objective. These issues typically arise from the normalization term of the probability density function. Moreover, existing works only provide point prediction for events without quantifying their uncertainty, such as confidence intervals for the event's arrival time and confidence regions for the event's location, which is crucial given the considerable randomness of the data. To tackle these challenges, we introduce SMASH: a Score MAtching-based pSeudolikeliHood estimator for learning marked STPPs. Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of marked STPPs through score-matching and predicts confidence intervals/regions for event time and location by generating samples through a score-based sampling algorithm. The superior performance of our proposed framework is demonstrated through extensive experiments on both point and confidence interval/region prediction of events.Cost-efficient fixed-width confidence intervals for the difference of two Bernoulli proportions
AbstractErazo, I., Goldsman, D., & Mei, Y. (n.d.).Publication year
2024Journal title
Journal of SimulationVolume
18Issue
5Page(s)
726-744AbstractWe study the properties of confidence intervals (CIs) for the difference of two Bernoulli distributions’ success parameters, (Formula presented.), in the case where the goal is to obtain a CI of a given half-width while minimising sampling costs when the observation costs may be different between the two distributions. We propose three different methods for constructing fixed-width CIs: (i) a two-stage sampling procedure, (ii) a sequential method that carries out sampling in batches, and (iii) an (Formula presented.) -stage “look-ahead” procedure. Under diverse scenarios, our proposed algorithms obtain significant cost savings versus their baseline counterparts. Furthermore, for the scenarios under study, our sequential-batches and (Formula presented.) -stage “look-ahead” procedures approximately obtain the nominal coverage while meeting the desired width requirement. Our sequential-batching method is more efficient than the “look-ahead” method computationally, with average running times an order-of-magnitude faster over the scenarios tested. We illustrate our procedures on a case study comparing generic and brand-name drugs.Directional false discovery rate control in large-scale multiple comparisons
AbstractLiang, W., Xiang, D., Mei, Y., & Li, W. (n.d.).Publication year
2024Journal title
Journal of Applied StatisticsVolume
51Issue
15Page(s)
3195-3214AbstractThe advance of high-throughput biomedical technology makes it possible to access massive measurements of gene expression levels. An important statistical issue is identifying both under-expressed and over-expressed genes for a disease. Most existing multiple-testing procedures focus on selecting only the non-null or significant genes without further identifying their expression type. Only limited methods are designed for the directional problem, and yet they fail to separately control the numbers of falsely discovered over-expressed and under-expressed genes with only a unified index combining all the false discoveries. In this paper, based on a three-classification multiple testing framework, we propose a practical data-driven procedure to control separately the two directions of false discoveries. The proposed procedure is theoretically valid and optimal in the sense that it maximizes the expected number of true discoveries while controlling the false discovery rates for under-expressed and over-expressed genes simultaneously. The procedure allows different nominal levels for the two directions, exhibiting high flexibility in practice. Extensive numerical results and analysis of two large-scale genomic datasets show the effectiveness of our procedure.Jugular Venous Catheterization is Not Associated with Increased Complications in Patients with Aneurysmal Subarachnoid Hemorrhage
AbstractAkbik, F., Shi, Y., Philips, S., Pimentel-Farias, C., Grossberg, J. A., Howard, B. M., Tong, F., Cawley, C. M., Samuels, O. B., Mei, Y., & Sadan, O. (n.d.).Publication year
2024Journal title
Neurocritical CareAbstractBackground: Classic teaching in neurocritical care is to avoid jugular access for central venous catheterization (CVC) because of a presumed risk of increasing intracranial pressure (ICP). Limited data exist to test this hypothesis. Aneurysmal subarachnoid hemorrhage (aSAH) leads to diffuse cerebral edema and often requires external ventricular drains (EVDs), which provide direct ICP measurements. Here, we test whether CVC access site correlates with ICP measurements and catheter-associated complications in patients with aSAH. Methods: In a single-center retrospective cohort study, patients with aSAH admitted to Emory University Hospital between January 1, 2012, through December 31, 2020, were included. Patients were assigned by the access site of the first CVC placed. The subset of patients with an EVD were further studied. ICP measurements were analyzed using linear mixed effect models, with a binary comparison between internal-jugular (IJ) versus non-IJ access. Results: A total of 1577 patients were admitted during the study period with CVC access: subclavian (SC) (887, 56.2%), IJ (365, 23.1%), femoral (72, 4.6%), and peripheral inserted central catheter (PICC) (253, 16.0%). Traumatic pneumothorax was the most common with SC access (3.0%, p < 0.01). Catheter-associated infections did not differ between sites. Catheter-associated deep venous thrombosis was most common in femoral (8.3%) and PICC (3.6%) access (p < 0.05). A total of 1220 patients had an EVD, remained open by default, generating 351,462 ICP measurements. ICP measurements, as compared over the first 24–postinsertion hours and the next 10 days, were similar between the two groups. Subgroup analysis accounting for World Federation of Neurological Surgeons grade on presentation yielded similar results. Conclusions: Contrary to classic teaching, we find that IJ CVC placement was not associated with increased ICP in the clinical context of the largest, quantitative data set to date. Further, IJ access was the least likely to be associated with an access-site complication when compared with SC, femoral, and PICC. Together, these data support the safety, and perhaps preference, of ultrasound-guided IJ venous catheterization in neurocritically ill patients.Monitoring High-Dimensional Streaming Data via Fusing Nonparametric Shiryaev-Roberts Statistics
AbstractZhang, X., & Mei, Y. (n.d.).Publication year
2024Page(s)
1065-1070AbstractMonitoring high-dimensional streaming data has a wide range of applications in science, engineering, and industry. In this work, we propose an efficient and robust sequential change-point detection algorithm for monitoring high-dimensional streaming data. It has two components. At the local level, we adopt a window-limited nonparametric Shiryaev-Roberts (WL-NPSR) statistic for detecting potential distribution changes at each dimension of the streaming data. At the global level, we fuse local WL-NPSR statistics together to construct a global monitoring statistic via quantile filtering and sum-shrinkage functions. Theoretical analysis and extensive numerical experiments demonstrate the efficiency and robustness of our proposed algorithm.Pharmacologic Venous Thromboembolism Prophylaxis in Patients with Nontraumatic Subarachnoid Hemorrhage Requiring an External Ventricular Drain
AbstractUkpabi, C., Sadan, O., Shi, Y., Greene, K. N., Samuels, O., Mathew, S., Joy, J., Mei, Y., & Asbury, W. (n.d.).Publication year
2024Journal title
Neurocritical CareVolume
41Page(s)
779-787AbstractBackground: Optimal pharmacologic thromboprophylaxis dosing is not well described in patients with subarachnoid hemorrhage (SAH) with an external ventricular drain (EVD). Our patients with SAH with an EVD who receive prophylactic enoxaparin are routinely monitored using timed anti-Xa levels. Our primary study goal was to determine the frequency of venous thromboembolism (VTE) and secondary intracranial hemorrhage (ICH) for this population of patients who received pharmacologic prophylaxis with enoxaparin or unfractionated heparin (UFH). Methods: A retrospective chart review was performed for all patients with SAH admitted to the neurocritical care unit at Emory University Hospital between 2012 and 2017. All patients with SAH who required an EVD were included. Results: Of 1,351 patients screened, 868 required an EVD. Of these 868 patients, 627 received enoxaparin, 114 received UFH, and 127 did not receive pharmacologic prophylaxis. VTE occurred in 7.5% of patients in the enoxaparin group, 4.4% in the UFH group (p = 0.32), and 3.2% in the no VTE prophylaxis group (p = 0.08). Secondary ICH occurred in 3.83% of patients in the enoxaparin group, 3.51% in the UFH group (p = 1), and 3.94% in the no VTE prophylaxis group (p = 0.53). As steady-state anti-Xa levels increased from 0.1 units/mL to > 0.3 units/mL, there was a trend toward a lower incidence of VTE. However, no correlation was noted between rising anti-Xa levels and an increased incidence of secondary ICH. When compared, neither enoxaparin nor UFH use was associated with a significantly reduced incidence of VTE or an increased incidence of ICH. Conclusions: In this retrospective study of patients with nontraumatic SAH with an EVD who received enoxaparin or UFH VTE prophylaxis or no VTE prophylaxis, there was no statistically significant difference in the incidence of VTE or secondary ICH. For patients receiving prophylactic enoxaparin, achieving higher steady-state target anti-Xa levels may be associated with a lower incidence of VTE without increasing the risk of secondary ICH.Quickest Detection in High-Dimensional Linear Regression Models via Implicit Regularization
AbstractXu, Q., Yu, Y., & Mei, Y. (n.d.).Publication year
2024Page(s)
1059-1064AbstractIn this paper, we consider the quickest detection problem in high-dimensional streaming data, where the unknown regression coefficients might change at some unknown time. We propose a quickest detection algorithm based on the implicit regularization algorithm via gradient descent, and provide theoretical guarantees on the average run length to false alarm and detection delay. Numerical studies are conducted to validate the theoretical results.Active learning-based multistage sequential decision-making model with application on common bile duct stone evaluation
AbstractTian, H., Cohen, R. Z., Zhang, C., & Mei, Y. (n.d.).Publication year
2023Journal title
Journal of Applied StatisticsVolume
50Issue
14Page(s)
2951-2969AbstractMultistage sequential decision-making occurs in many real-world applications such as healthcare diagnosis and treatment. One concrete example is when the doctors need to decide to collect which kind of information from subjects so as to make the good medical decision cost-effectively. In this paper, an active learning-based method is developed to model the doctors' decision-making process that actively collects necessary information from each subject in a sequential manner. The effectiveness of the proposed model, especially its two-stage version, is validated on both simulation studies and a case study of common bile duct stone evaluation for pediatric patients.Adaptive resources allocation CUSUM for binomial count data monitoring with application to COVID-19 hotspot detection
AbstractHu, J., Mei, Y., Holte, S., & Yan, H. (n.d.).Publication year
2023Journal title
Journal of Applied StatisticsVolume
50Issue
14Page(s)
2889-2913AbstractIn this paper, we present an efficient statistical method (denoted as ‘Adaptive Resources Allocation CUSUM’) to robustly and efficiently detect the hotspot with limited sampling resources. Our main idea is to combine the multi-arm bandit (MAB) and change-point detection methods to balance the exploration and exploitation of resource allocation for hotspot detection. Further, a Bayesian weighted update is used to update the posterior distribution of the infection rate. Then, the upper confidence bound (UCB) is used for resource allocation and planning. Finally, CUSUM monitoring statistics to detect the change point as well as the change location. For performance evaluation, we compare the performance of the proposed method with several benchmark methods in the literature and showed the proposed algorithm is able to achieve a lower detection delay and higher detection precision. Finally, this method is applied to hotspot detection in a real case study of county-level daily positive COVID-19 cases in Washington State WA) and demonstrates the effectiveness with very limited distributed samples.Asymptotic optimality theory for active quickest detection with unknown postchange parameters
AbstractXu, Q., & Mei, Y. (n.d.).Publication year
2023Journal title
Sequential AnalysisVolume
42Issue
2Page(s)
150-181AbstractThe active quickest detection problem with unknown postchange parameters is studied under the sampling control constraint, where there are p local streams in a system but one is only able to take observations from one and only one of these p local streams at each time instant. The objective is to raise a correct alarm as quickly as possible once the change occurs subject to both false alarm and sampling control constraints. Here we assume that exactly one of the p local streams is affected, and the postchange distribution involves unknown parameters. In this context, we propose an efficient greedy cyclic sampling–based quickest detection algorithm and show that our proposed algorithm is asymptotically optimal in the sense of minimizing the detection delay under both false alarm and sampling control constraints. Numerical studies are conducted to show the effectiveness and applicability of the proposed algorithm.Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional Data Under Sampling Control
AbstractZhang, W., & Mei, Y. (n.d.).Publication year
2023Journal title
TechnometricsVolume
65Issue
1Page(s)
33-43AbstractIn many real-world problems of real-time monitoring high-dimensional streaming data, one wants to detect an undesired event or change quickly once it occurs, but under the sampling control constraint in the sense that one might be able to only observe or use selected components data for decision-making per time step in the resource-constrained environments. In this article, we propose to incorporate multi-armed bandit approaches into sequential change-point detection to develop an efficient bandit change-point detection algorithm based on the limiting Bayesian approach to incorporate a prior knowledge of potential changes. Our proposed algorithm, termed Thompson-Sampling-Shiryaev-Roberts-Pollak (TSSRP), consists of two policies per time step: the adaptive sampling policy applies the Thompson Sampling algorithm to balance between exploration for acquiring long-term knowledge and exploitation for immediate reward gain, and the statistical decision policy fuses the local Shiryaev–Roberts–Pollak statistics to determine whether to raise a global alarm by sum shrinkage techniques. Extensive numerical simulations and case studies demonstrate the statistical and computational efficiency of our proposed TSSRP algorithm.CSSQ : a ChIP-seq signal quantifier pipeline
AbstractKumar, A., Hu, M. Y., Mei, Y., & Fan, Y. (n.d.).Publication year
2023Journal title
Frontiers in Cell and Developmental BiologyVolume
11AbstractChromatin immunoprecipitation followed by sequencing (ChIP-seq) has revolutionized the studies of epigenomes and the massive increase in ChIP-seq datasets calls for robust and user-friendly computational tools for quantitative ChIP-seq. Quantitative ChIP-seq comparisons have been challenging due to noisiness and variations inherent to ChIP-seq and epigenomes. By employing innovative statistical approaches specially catered to ChIP-seq data distribution and sophisticated simulations along with extensive benchmarking studies, we developed and validated CSSQ as a nimble statistical analysis pipeline capable of differential binding analysis across ChIP-seq datasets with high confidence and sensitivity and low false discovery rate with any defined regions. CSSQ models ChIP-seq data as a finite mixture of Gaussians faithfully that reflects ChIP-seq data distribution. By a combination of Anscombe transformation, k-means clustering, estimated maximum normalization, CSSQ minimizes noise and bias from experimental variations. Further, CSSQ utilizes a non-parametric approach and incorporates comparisons under the null hypothesis by unaudited column permutation to perform robust statistical tests to account for fewer replicates of ChIP-seq datasets. In sum, we present CSSQ as a powerful statistical computational pipeline tailored for ChIP-seq data quantitation and a timely addition to the tool kits of differential binding analysis to decipher epigenomes.Editorial to the special issue : modern streaming data analytics
AbstractMei, Y., Bartroff, J., Chen, J., Fellouris, G., & Zhang, R. (n.d.).Publication year
2023Journal title
Journal of Applied StatisticsVolume
50Issue
14Page(s)
2857-2861Abstract~