Yang Feng
Yang Feng
Professor of Biostatistics
-
Professional overview
-
Yang Feng is a Professor and Ph.D. Program Director of Biostatistics in the School of Global Public Health and an affiliate faculty in the Center for Data Science at New York University. He obtained his Ph.D. in Operations Research at Princeton University in 2010.
Feng's research interests encompass the theoretical and methodological aspects of machine learning, high-dimensional statistics, social network models, and nonparametric statistics, leading to a wealth of practical applications, including Alzheimer's disease, cancer classification, and electronic health records. His research has been funded by multiple grants from the National Institutes of Health (NIH) and the National Science Foundation (NSF), notably the NSF CAREER Award.
He is currently an Associate Editor for the Journal of the American Statistical Association (JASA), the Journal of Business & Economic Statistics (JBES), Journal of Computational & Graphical Statistics (JCGS), and the Annals of Applied Statistics (AoAS). His professional recognitions include being named a fellow of the American Statistical Association (ASA) and the Institute of Mathematical Statistics (IMS), as well as an elected member of the International Statistical Institute (ISI).
Please visit Dr. Yang Feng's website and Google Scholar page from more information.
-
Education
-
B.S. in Mathematics, University of Science and Technology of China, Hefei, ChinaPh.D. in Operations Research, Princeton University, Princeton, NJ
-
Areas of research and study
-
BioinformaticsBiostatisticsHigh-dimensional data analysis/integrationMachine learningModeling Social and Behavioral DynamicsNonparametric statistics
-
Publications
Publications
Mediation effect selection in high-dimensional and compositional microbiome data
AbstractFeng, Y., Zhang, H., Chen, J., Feng, Y., Wang, C., Li, H., & Liu, L. (n.d.).Publication year
2021Journal title
Statistics in medicineVolume
40Issue
4Page(s)
885--896Abstract~Model Averaging for Nonlinear Regression Models
AbstractFeng, Y., Liu, Q., Yao, Q., & Zhao, G. (n.d.).Publication year
2022Journal title
Journal of Business and Economic StatisticsVolume
40Issue
2Page(s)
785-798AbstractThis article considers the problem of model averaging for regression models that can be nonlinear in their parameters and variables. We consider a nonlinear model averaging (NMA) framework and propose a weight-choosing criterion, the nonlinear information criterion (NIC). We show that up to a constant, NIC is an asymptotically unbiased estimator of the risk function under nonlinear settings with some mild assumptions. We also prove the optimality of NIC and show the convergence of the model averaging weights. Monte Carlo experiments reveal that NMA leads to relatively lower risks compared with alternative model selection and model averaging methods in most situations. Finally, we apply the NMA method to predicting the individual wage, where our approach leads to the lowest prediction errors in most cases.Multi-label Random Subspace Ensemble Classification
AbstractBi, F., Zhu, J., & Feng, Y. (n.d.).Publication year
2024Journal title
Journal of Computational and Graphical StatisticsAbstractIn this work, we develop a new ensemble learning framework, multi-label Random Subspace Ensemble (mRaSE), for multi-label classification. Given a base classifier (e.g., multinomial logistic regression, classification tree, K-nearest neighbors), mRaSE works by first randomly sampling a collection of subspaces, then choosing the best ones that achieve the minimum cross-validation errors and, finally, aggregating the chosen weak learners. In addition to its superior prediction performance, mRaSE also provides a model-free feature ranking depending on the given base classifier. An iterative version of mRaSE is also developed to further improve the performance. A model-free extension is pursued on the iterative version, leading to the so-called Super mRaSE, which accepts a collection of base classifiers as input to the algorithm. We show the proposed algorithms compared favorably with the state-of-the-art classification algorithm including random forest and deep neural network, via extensive simulation studies and two real data applications. The new algorithms are implemented in an updated version of the R package RaSEn.NCOG-11. ASSOCIATION OF HYPERGLYCEMIA AND TUMOR SUBCLASS ON SURVIVAL IN IDH-WILDTYPE GLIOBLASTOMA
AbstractFeng, Y., Liu, E., Vasudevaraja, V., Sviderskiy, V., Feng, Y., Tran, I., Serrano, J., Cordova, C., Kurz, S., Golfinos, J., Sulman, E., & others. (n.d.).Publication year
2021Journal title
Neuro-OncologyVolume
23Issue
Suppl 6Page(s)
vi154Abstract~Nested Model Averaging on Solution Path for High-dimensional Linear Regression
AbstractFeng, Y., Feng, Y., & Liu, Q. (n.d.).Publication year
2020Journal title
StatAbstract~Neyman-Pearson classification: parametrics and sample size requirement
AbstractFeng, Y., Tong, X., Xia, L., Wang, J., & Feng, Y. (n.d.).Publication year
2020Journal title
Journal of Machine Learning ResearchAbstract~Neyman-Pearson Multi-Class Classification via Cost-Sensitive Learning
AbstractTian, Y., & Feng, Y. (n.d.).Publication year
2024Journal title
Journal of the American Statistical AssociationAbstractMost existing classification methods aim to minimize the overall misclassification error rate. However, in applications such as loan default prediction, different types of errors can have varying consequences. To address this asymmetry issue, two popular paradigms have been developed: the Neyman-Pearson (NP) paradigm and the cost-sensitive (CS) paradigm. Previous studies on the NP paradigm have primarily focused on the binary case, while the multi-class NP problem poses a greater challenge due to its unknown feasibility. In this work, we tackle the multi-class NP problem by establishing a connection with the CS problem via strong duality and propose two algorithms. We extend the concept of NP oracle inequalities, crucial in binary classifications, to NP oracle properties in the multi-class context. Our algorithms satisfy these NP oracle properties under certain conditions. Furthermore, we develop practical algorithms to assess the feasibility and strong duality in multi-class NP problems, which can offer practitioners the landscape of a multi-class NP problem with various target error levels. Simulations and real data studies validate the effectiveness of our algorithms. To our knowledge, this is the first study to address the multi-class NP problem with theoretical guarantees. The proposed algorithms have been implemented in the R package npcs, which is available on CRAN. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.Neyman-Pearson Multi-class Classification via Cost-sensitive Learning
AbstractFeng, Y., Tian, Y. e., & Feng, Y. (n.d.).Publication year
2021Journal title
arXiv preprint arXiv:2111.04597Abstract~Omics feature selection with the extended SIS R package : identification of a body mass index epigenetic multimarker in the Strong Heart Study
AbstractDomingo-Relloso, A., Feng, Y., Rodriguez-Hernandez, Z., Haack, K., Cole, S. A., Navas-Acien, A., Tellez-Plaza, M., & Bermudez, J. D. (n.d.).Publication year
2024Journal title
American Journal of EpidemiologyVolume
193Issue
7Page(s)
1010-1018AbstractThe statistical analysis of omics data poses a great computational challenge given their ultra–high-dimensional nature and frequent between-features correlation. In this work, we extended the iterative sure independence screening (ISIS) algorithm by pairing ISIS with elastic-net (Enet) and 2 versions of adaptive elastic-net (adaptive elastic-net (AEnet) and multistep adaptive elastic-net (MSAEnet)) to efficiently improve feature selection and effect estimation in omics research. We subsequently used genome-wide human blood DNA methylation data from American Indian participants in the Strong Heart Study (n = 2235 participants; measured in 1989-1991) to compare the performance (predictive accuracy, coefficient estimation, and computational efficiency) of ISIS-paired regularization methods with that of a bayesian shrinkage and traditional linear regression to identify an epigenomic multimarker of body mass index (BMI). ISIS-AEnet outperformed the other methods in prediction. In biological pathway enrichment analysis of genes annotated to BMI-related differentially methylated positions, ISIS-AEnet captured most of the enriched pathways in common for at least 2 of all the evaluated methods. ISIS-AEnet can favor biological discovery because it identifies the most robust biological pathways while achieving an optimal balance between bias and efficient feature selection. In the extended SIS R package, we also implemented ISIS paired with Cox and logistic regression for time-to-event and binary endpoints, respectively, and a bootstrap approach for the estimation of regression coefficients.Omics feature selection with the extended SIS R package: identification of a body mass index epigenetic multi-marker in the Strong Heart Study
AbstractFeng, Y., Domingo-Relloso, A., Feng, Y., Rodriguez-Hernandez, Z., Haack, K., Cole, S. A., Navas-Acien, A., Tellez-Plaza, M., & Bermudez, J. D. (n.d.).Publication year
2024Journal title
American Journal of EpidemiologyPage(s)
kwae006Abstract~On the estimation of correlation in a binary sequence model
AbstractFeng, Y., Weng, H., & Feng, Y. (n.d.).Publication year
2020Journal title
Journal of Statistical Planning and InferenceVolume
207Page(s)
123--137Abstract~On the sparsity of Mallows model averaging estimator
AbstractFeng, Y., Feng, Y., Liu, Q., & Okui, R. (n.d.).Publication year
2020Journal title
Economics LettersVolume
187Page(s)
108916Abstract~PCABM: Pairwise Covariates-Adjusted Block Model for Community Detection
AbstractFeng, Y., Huang, S., Sun, J., & Feng, Y. (n.d.).Publication year
2023Journal title
Journal of the American Statistical AssociationPage(s)
1--13Abstract~Prognostic value of DNA methylation subclassification, aneuploidy, and CDKN2A/B homozygous deletion in predicting clinical outcome of IDH mutant astrocytomas
AbstractGalbraith, K., Garcia, M., Wei, S., Chen, A., Schroff, C., Serrano, J., Pacione, D., Placantonakis, D. G., William, C. M., Faustin, A., Zagzag, D., Barbaro, M., Del Pilar Guillermo Prieto Eibl, M., Shirahata, M., Reuss, D., Tran, Q. T., Alom, Z., von Deimling, A., Orr, B. A., … Snuderl, M. (n.d.).Publication year
2024Journal title
Neuro-OncologyVolume
26Issue
6Page(s)
1042-1051AbstractBackground. Isocitrate dehydrogenase (IDH) mutant astrocytoma grading, until recently, has been entirely based on morphology. The 5th edition of the Central Nervous System World Health Organization (WHO) introduces CDKN2A/B homozygous deletion as a biomarker of grade 4. We sought to investigate the prognostic impact of DNA methylation-derived molecular biomarkers for IDH mutant astrocytoma. Methods. We analyzed 98 IDH mutant astrocytomas diagnosed at NYU Langone Health between 2014 and 2022. We reviewed DNA methylation subclass, CDKN2A/B homozygous deletion, and ploidy and correlated molecular biomarkers with histological grade, progression free (PFS), and overall (OS) survival. Findings were confirmed using 2 independent validation cohorts. Results. There was no significant difference in OS or PFS when stratified by histologic WHO grade alone, copy number complexity, or extent of resection. OS was significantly different when patients were stratified either by CDKN2A/B homozygous deletion or by DNA methylation subclass (P value = .0286 and .0016, respectively). None of the molecular biomarkers were associated with significantly better PFS, although DNA methylation classification showed a trend (P value = .0534). Conclusions. The current WHO recognized grading criteria for IDH mutant astrocytomas show limited prognostic value. Stratification based on DNA methylation shows superior prognostic value for OS.Prognostic value of DNA methylation subclassification, aneuploidy, and CDKN2A/B homozygous deletion in predicting clinical outcome of IDH mutant astrocytomas
AbstractFeng, Y., Galbraith, K., Garcia, M., Wei, S., Chen, A., Schroff, C., Serrano, J., Pacione, D., Placantonakis, D. G., William, C. M., Faustin, A., & others. (n.d.).Publication year
2024Journal title
Neuro-OncologyPage(s)
noae009Abstract~Racial distribution of molecularly classified brain tumors
AbstractFang, C. S., Wang, W., Schroff, C., Movahed-Ezazi, M., Vasudevaraja, V., Serrano, J., Sulman, E. P., Golfinos, J. G., Orringer, D., Galbraith, K., Feng, Y., & Snuderl, M. (n.d.).Publication year
2024Journal title
Neuro-Oncology AdvancesVolume
6Issue
1AbstractBackground. In many cancers, specific subtypes are more prevalent in specific racial backgrounds. However, little is known about the racial distribution of specific molecular types of brain tumors. Public data repositories lack data on many brain tumor subtypes as well as diagnostic annotation using the current World Health Organization classification. A better understanding of the prevalence of brain tumors in different racial backgrounds may provide insight into tumor predisposition and development, and improve prevention. Methods. We retrospectively analyzed the racial distribution of 1709 primary brain tumors classified by their methylation profiles using clinically validated whole genome DNA methylation. Self-reported race was obtained from medical records. Our cohort included 82% White, 10% Black, and 8% Asian patients with 74% of patients reporting their race. Results. There was a significant difference in the racial distribution of specific types of brain tumors. Blacks were overrepresented in pituitary adenomas (35%, P < .001), with the largest proportion of FSH/LH subtype. Whites were underrepresented at 47% of all pituitary adenoma patients (P < .001). Glioblastoma (GBM) IDH wild-type showed an enrichment of Whites, at 90% (P < .001), and a significantly smaller percentage of Blacks, at 3% (P < .001). Conclusions. Molecularly classified brain tumor groups and subgroups show different distributions among the three main racial backgrounds suggesting the contribution of race to brain tumor development.RaSE: A Variable Screening Framework via Random Subspace Ensembles
AbstractFeng, Y., Tian, Y. e., & Feng, Y. (n.d.).Publication year
2021Journal title
Journal of American Statistical AssociationAbstract~RaSE: Random Subspace Ensemble Classification
AbstractFeng, Y., Tian, Y. e., & Feng, Y. (n.d.).Publication year
2021Journal title
Journal of Machine Learning ResearchAbstract~Regularization after retention in ultrahigh dimensional linear regression models
AbstractFeng, Y., Weng, H., Feng, Y., & Qiao, X. (n.d.).Publication year
2019Journal title
Statistica SinicaAbstract~Semiparametric Modeling and Analysis for Longitudinal Network Data
AbstractFeng, Y., He, Y., Sun, J., Tian, Y., Ying, Z., & Feng, Y. (n.d.).Publication year
2023Journal title
arXiv preprint arXiv:2308.12227Abstract~Simulation of New York City’s Ventilator Allocation Guideline During the Spring 2020 COVID-19 Surge
AbstractFeng, Y., Walsh, B. C., Zhu, J., Feng, Y., Berkowitz, K. A., Betensky, R. A., Nunnally, M. E., & Pradhan, D. R. (n.d.).Publication year
2023Journal title
JAMA network openVolume
6Issue
10Page(s)
e2336736--e2336736Abstract~Spectral clustering via adaptive layer aggregation for multi-layer networks
AbstractFeng, Y., Huang, S., Weng, H., & Feng, Y. (n.d.).Publication year
2022Journal title
Journal of Computational and Graphical StatisticsIssue
just-acceptedPage(s)
1--35Abstract~Super RaSE: Super Random Subspace Ensemble Classification
AbstractFeng, Y., Zhu, J., & Feng, Y. (n.d.).Publication year
2021Journal title
Journal of Risk and Financial ManagementVolume
14Issue
12Page(s)
612Abstract~Targeted crisis risk control: A neyman-pearson approach
AbstractFeng, Y., Feng, Y., Tong, X., & Xin, W. (n.d.).Publication year
2021Journal title
Available at SSRNAbstract~Targeting predictors via partial distance correlation with applications to financial forecasting
AbstractFeng, Y., Yousuf, K., & Feng, Y. (n.d.).Publication year
2022Journal title
Journal of Business & Economic StatisticsVolume
40Issue
3Page(s)
1007--1019Abstract~