Yang Feng

Yang Feng
Professor of Biostatistics
-
Professional overview
-
Yang Feng is a Professor and Ph.D. Program Director of Biostatistics in the School of Global Public Health and an affiliate faculty in the Center for Data Science at New York University. He obtained his Ph.D. in Operations Research at Princeton University in 2010.
Feng's research interests encompass the theoretical and methodological aspects of machine learning, high-dimensional statistics, social network models, and nonparametric statistics, leading to a wealth of practical applications, including Alzheimer's disease, cancer classification, and electronic health records. His research has been funded by multiple grants from the National Institutes of Health (NIH) and the National Science Foundation (NSF), notably the NSF CAREER Award.
He is currently an Associate Editor for the Journal of the American Statistical Association (JASA), the Journal of Business & Economic Statistics (JBES), Journal of Computational & Graphical Statistics (JCGS), and the Annals of Applied Statistics (AoAS). His professional recognitions include being named a fellow of the American Statistical Association (ASA) and the Institute of Mathematical Statistics (IMS), as well as an elected member of the International Statistical Institute (ISI).
Please visit Dr. Yang Feng's website and Google Scholar page from more information.
-
Education
-
B.S. in Mathematics, University of Science and Technology of China, Hefei, ChinaPh.D. in Operations Research, Princeton University, Princeton, NJ
-
Areas of research and study
-
BioinformaticsBiostatisticsHigh-dimensional data analysis/integrationMachine learningModeling Social and Behavioral DynamicsNonparametric statistics
-
Publications
Publications
Binary switch portfolio
How Many Communities Are There?
JDINAC: Joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data
Post selection shrinkage estimation for high-dimensional data analysis
Rejoinder to ‘Post-selection shrinkage estimation for high-dimensional data analysis’
A survey on Neyman-Pearson classification and suggestions for future research
Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification
Neyman-pearson classiffication under high-dimensional settings
Tuning-parameter selection in regularized estimations of large covariance matrices
Variable selection and prediction with incomplete high-dimensional data
Functional and Parametric Estimation in a Semi-and Nonparametric Model with Application to Mass-Spectrometry Data
APPLE: Approximate path for penalized likelihood estimators
Modified Cross-Validation for Penalized High-Dimensional Linear Regression Models
Regularized principal components of heritability
A road to classification in high dimensional space: The regularized optimal affine discriminant
Nonparametric independence screening in sparse ultra-high-dimensional additive models
Nonparametric estimation of genewise variance for microarray data
The Microarray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models
Shi, L., Campbell, G., Jones, W. D., Campagne, F., Wen, Z., Walker, S. J., Su, Z., Chu, T. M., Goodsaid, F. M., Pusztai, L., Shaughnessy, J. D., Oberthuer, A., Thomas, R. S., Paules, R. S., Fielden, M., Barlogie, B., Chen, W., Du, P., Fischer, M., … Wolfinger, R. D. (n.d.).Publication year
2010Journal title
Nature BiotechnologyVolume
28Issue
8Page(s)
827-838AbstractGene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.Local quasi-likelihood with a parametric guide
Network exploration via the adaptive LASSO and SCAD penalties