Assistant Professor, Biostatistics
Dr. Hai Shu is an Assistant Professor in the Department of Biostatistics at New York University. He earned a Ph.D. in Biostatistics from University of Michigan and a B.S. in Information and Computational Science from Harbin Institute of Technology in China.
His research interests include high-dimensional data analysis (esp. data integration), machine/deep learning, medical image analysis (e.g., PET, MRI, Mammography), and their applications in Alzheimer’s disease, brain tumors, breast cancer, etc. He has published relevant papers in top-tier journals and conference, such as The Annals of Statistics, Journal of the American Statistical Association, Biometrics, and AAAI Conference on Artificial Intelligence. He has also served as a reviewer on related topics for Journal of the American Statistical Association, Statistica Sinica, International Joint Conference on Artificial Intelligence, etc.
Prior to joining NYU, Dr. Hai Shu was a Postdoctoral Fellow in the Department of Biostatistics at The University of Texas MD Anderson Cancer Center.
View Dr. Hai Shu's website at https://wp.nyu.edu/haishu
Postdoctoral Fellow, Department of Biostatistics, The University of Texas MD Anderson Cancer Center, USAPh.D. in Biostatistics, Department of Biostatistics, University of Michigan, Ann Arbor, USAM.S. in Biostatistics, Department of Biostatistics, University of Michigan, Ann Arbor, USAB.S. in Information and Computational Science, Department of Mathematics, Harbin Institute of Technology (哈尔滨工业大学), China
Alzheimer’s diseaseBrain tumorsBreast cancerDeep learningHigh-dimensional data analysis/integrationMachine learningMedical image analysisSpatial/temporal data analysis
D-CCA: A Decomposition-Based Canonical Correlation Analysis for High-Dimensional DatasetsFailed retrieving data from NYU Scholars.
Assessment of network module identification across complex diseasesFailed generating bibliography.Abstract
Journal titleNature methods
Page(s)843-852Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the ‘Disease Module Identification DREAM Challenge’, an open competition to comprehensively assess module identification methods across diverse protein–protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology.
Estimation of large covariance and precision matrices from temporally dependent observationsShu, H., & Nan, B.
Journal titleAnnals of Statistics
Page(s)1321-1350We consider the estimation of large covariance and precision matrices from high-dimensional sub-Gaussian or heavier-tailed observations with slowly decaying temporal dependence. The temporal dependence is allowed to be long-range so with longer memory than those considered in the current literature. We show that several commonly used methods for independent observations can be applied to the temporally dependent data. In particular, the rates of convergence are obtained for the generalized thresholding estimation of covariance and correlation matrices, and for the constrained l 1 minimization and the l 1 penalized likelihood estimation of precision matrix. Properties of sparsistency and sign-consistency are also established. A gap-block cross-validation method is proposed for the tuning parameter selection, which performs well in simulations. As a motivating example, we study the brain functional connectivity using resting-state fMRI time series data with long-range temporal dependence.
Multiple testing for neuroimaging via hidden Markov random fieldShu, H., Nan, B., & Koeppe, R.
Page(s)741-750Traditional voxel-level multiple testing procedures in neuroimaging, mostly p-value based, often ignore the spatial correlations among neighboring voxels and thus suffer from substantial loss of power. We extend the local-significance-index based procedure originally developed for the hidden Markov chain models, which aims to minimize the false nondiscovery rate subject to a constraint on the false discovery rate, to three-dimensional neuroimaging data using a hidden Markov random field model. A generalized expectation-maximization algorithm for maximizing the penalized likelihood is proposed for estimating the model parameters. Extensive simulations show that the proposed approach is more powerful than conventional false discovery rate procedures. We apply the method to the comparison between mild cognitive impairment, a disease status with increased risk of developing Alzheimer's or another dementia, and normal controls in the FDG-PET imaging study of the Alzheimer's Disease Neuroimaging Initiative.