Skip to main content

Yang Feng

Yang Feng

Yang Feng

Scroll

Professor of Biostatistics

Professional overview

Yang Feng is a Professor and Ph.D. Program Director of Biostatistics in the School of Global Public Health and an affiliate faculty in the Center for Data Science at New York University. He obtained his Ph.D. in Operations Research at Princeton University in 2010.

Feng's research interests encompass the theoretical and methodological aspects of machine learning, high-dimensional statistics, social network models, and nonparametric statistics, leading to a wealth of practical applications, including Alzheimer's disease, cancer classification, and electronic health records. His research has been funded by multiple grants from the National Institutes of Health (NIH) and the National Science Foundation (NSF), notably the NSF CAREER Award.

He is currently an Associate Editor for the Journal of the American Statistical Association (JASA), the Journal of Business & Economic Statistics (JBES), Journal of Computational & Graphical Statistics (JCGS), and the Annals of Applied Statistics (AoAS). His professional recognitions include being named a fellow of the American Statistical Association (ASA) and the Institute of Mathematical Statistics (IMS), as well as an elected member of the International Statistical Institute (ISI).

Please visit Dr. Yang Feng's website and Google Scholar page from more information.

Education

B.S. in Mathematics, University of Science and Technology of China, Hefei, China
Ph.D. in Operations Research, Princeton University, Princeton, NJ

Areas of research and study

Bioinformatics
Biostatistics
High-dimensional data analysis/integration
Machine learning
Modeling Social and Behavioral Dynamics
Nonparametric statistics

Publications

Publications

Mediation effect selection in high-dimensional and compositional microbiome data

Feng, Y., Zhang, H., Chen, J., Feng, Y., Wang, C., Li, H., & Liu, L. (n.d.).

Publication year

2021

Journal title

Statistics in medicine

Volume

40

Issue

4

Page(s)

885--896
Abstract
Abstract
~

Model Averaging for Nonlinear Regression Models

Feng, Y., Liu, Q., Yao, Q., & Zhao, G. (n.d.).

Publication year

2022

Journal title

Journal of Business and Economic Statistics

Volume

40

Issue

2

Page(s)

785-798
Abstract
Abstract
This article considers the problem of model averaging for regression models that can be nonlinear in their parameters and variables. We consider a nonlinear model averaging (NMA) framework and propose a weight-choosing criterion, the nonlinear information criterion (NIC). We show that up to a constant, NIC is an asymptotically unbiased estimator of the risk function under nonlinear settings with some mild assumptions. We also prove the optimality of NIC and show the convergence of the model averaging weights. Monte Carlo experiments reveal that NMA leads to relatively lower risks compared with alternative model selection and model averaging methods in most situations. Finally, we apply the NMA method to predicting the individual wage, where our approach leads to the lowest prediction errors in most cases.

Multi-label Random Subspace Ensemble Classification

Bi, F., Zhu, J., & Feng, Y. (n.d.).

Publication year

2024

Journal title

Journal of Computational and Graphical Statistics
Abstract
Abstract
In this work, we develop a new ensemble learning framework, multi-label Random Subspace Ensemble (mRaSE), for multi-label classification. Given a base classifier (e.g., multinomial logistic regression, classification tree, K-nearest neighbors), mRaSE works by first randomly sampling a collection of subspaces, then choosing the best ones that achieve the minimum cross-validation errors and, finally, aggregating the chosen weak learners. In addition to its superior prediction performance, mRaSE also provides a model-free feature ranking depending on the given base classifier. An iterative version of mRaSE is also developed to further improve the performance. A model-free extension is pursued on the iterative version, leading to the so-called Super mRaSE, which accepts a collection of base classifiers as input to the algorithm. We show the proposed algorithms compared favorably with the state-of-the-art classification algorithm including random forest and deep neural network, via extensive simulation studies and two real data applications. The new algorithms are implemented in an updated version of the R package RaSEn.

NCOG-11. ASSOCIATION OF HYPERGLYCEMIA AND TUMOR SUBCLASS ON SURVIVAL IN IDH-WILDTYPE GLIOBLASTOMA

Feng, Y., Liu, E., Vasudevaraja, V., Sviderskiy, V., Feng, Y., Tran, I., Serrano, J., Cordova, C., Kurz, S., Golfinos, J., Sulman, E., & others. (n.d.).

Publication year

2021

Journal title

Neuro-Oncology

Volume

23

Issue

Suppl 6

Page(s)

vi154
Abstract
Abstract
~

Nested Model Averaging on Solution Path for High-dimensional Linear Regression

Feng, Y., Feng, Y., & Liu, Q. (n.d.).

Publication year

2020

Journal title

Stat
Abstract
Abstract
~

Neyman-Pearson classification: parametrics and sample size requirement

Feng, Y., Tong, X., Xia, L., Wang, J., & Feng, Y. (n.d.).

Publication year

2020

Journal title

Journal of Machine Learning Research
Abstract
Abstract
~

Neyman-Pearson Multi-Class Classification via Cost-Sensitive Learning

Tian, Y., & Feng, Y. (n.d.).

Publication year

2024

Journal title

Journal of the American Statistical Association
Abstract
Abstract
Most existing classification methods aim to minimize the overall misclassification error rate. However, in applications such as loan default prediction, different types of errors can have varying consequences. To address this asymmetry issue, two popular paradigms have been developed: the Neyman-Pearson (NP) paradigm and the cost-sensitive (CS) paradigm. Previous studies on the NP paradigm have primarily focused on the binary case, while the multi-class NP problem poses a greater challenge due to its unknown feasibility. In this work, we tackle the multi-class NP problem by establishing a connection with the CS problem via strong duality and propose two algorithms. We extend the concept of NP oracle inequalities, crucial in binary classifications, to NP oracle properties in the multi-class context. Our algorithms satisfy these NP oracle properties under certain conditions. Furthermore, we develop practical algorithms to assess the feasibility and strong duality in multi-class NP problems, which can offer practitioners the landscape of a multi-class NP problem with various target error levels. Simulations and real data studies validate the effectiveness of our algorithms. To our knowledge, this is the first study to address the multi-class NP problem with theoretical guarantees. The proposed algorithms have been implemented in the R package npcs, which is available on CRAN. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

Neyman-Pearson Multi-class Classification via Cost-sensitive Learning

Feng, Y., Tian, Y. e., & Feng, Y. (n.d.).

Publication year

2021

Journal title

arXiv preprint arXiv:2111.04597
Abstract
Abstract
~

Omics feature selection with the extended SIS R package : identification of a body mass index epigenetic multimarker in the Strong Heart Study

Domingo-Relloso, A., Feng, Y., Rodriguez-Hernandez, Z., Haack, K., Cole, S. A., Navas-Acien, A., Tellez-Plaza, M., & Bermudez, J. D. (n.d.).

Publication year

2024

Journal title

American Journal of Epidemiology

Volume

193

Issue

7

Page(s)

1010-1018
Abstract
Abstract
The statistical analysis of omics data poses a great computational challenge given their ultra–high-dimensional nature and frequent between-features correlation. In this work, we extended the iterative sure independence screening (ISIS) algorithm by pairing ISIS with elastic-net (Enet) and 2 versions of adaptive elastic-net (adaptive elastic-net (AEnet) and multistep adaptive elastic-net (MSAEnet)) to efficiently improve feature selection and effect estimation in omics research. We subsequently used genome-wide human blood DNA methylation data from American Indian participants in the Strong Heart Study (n = 2235 participants; measured in 1989-1991) to compare the performance (predictive accuracy, coefficient estimation, and computational efficiency) of ISIS-paired regularization methods with that of a bayesian shrinkage and traditional linear regression to identify an epigenomic multimarker of body mass index (BMI). ISIS-AEnet outperformed the other methods in prediction. In biological pathway enrichment analysis of genes annotated to BMI-related differentially methylated positions, ISIS-AEnet captured most of the enriched pathways in common for at least 2 of all the evaluated methods. ISIS-AEnet can favor biological discovery because it identifies the most robust biological pathways while achieving an optimal balance between bias and efficient feature selection. In the extended SIS R package, we also implemented ISIS paired with Cox and logistic regression for time-to-event and binary endpoints, respectively, and a bootstrap approach for the estimation of regression coefficients.

Omics feature selection with the extended SIS R package: identification of a body mass index epigenetic multi-marker in the Strong Heart Study

Feng, Y., Domingo-Relloso, A., Feng, Y., Rodriguez-Hernandez, Z., Haack, K., Cole, S. A., Navas-Acien, A., Tellez-Plaza, M., & Bermudez, J. D. (n.d.).

Publication year

2024

Journal title

American Journal of Epidemiology

Page(s)

kwae006
Abstract
Abstract
~

On the estimation of correlation in a binary sequence model

Feng, Y., Weng, H., & Feng, Y. (n.d.).

Publication year

2020

Journal title

Journal of Statistical Planning and Inference

Volume

207

Page(s)

123--137
Abstract
Abstract
~

On the sparsity of Mallows model averaging estimator

Feng, Y., Feng, Y., Liu, Q., & Okui, R. (n.d.).

Publication year

2020

Journal title

Economics Letters

Volume

187

Page(s)

108916
Abstract
Abstract
~

PCABM: Pairwise Covariates-Adjusted Block Model for Community Detection

Feng, Y., Huang, S., Sun, J., & Feng, Y. (n.d.).

Publication year

2023

Journal title

Journal of the American Statistical Association

Page(s)

1--13
Abstract
Abstract
~

Prognostic value of DNA methylation subclassification, aneuploidy, and CDKN2A/B homozygous deletion in predicting clinical outcome of IDH mutant astrocytomas

Galbraith, K., Garcia, M., Wei, S., Chen, A., Schroff, C., Serrano, J., Pacione, D., Placantonakis, D. G., William, C. M., Faustin, A., Zagzag, D., Barbaro, M., Del Pilar Guillermo Prieto Eibl, M., Shirahata, M., Reuss, D., Tran, Q. T., Alom, Z., von Deimling, A., Orr, B. A., … Snuderl, M. (n.d.).

Publication year

2024

Journal title

Neuro-Oncology

Volume

26

Issue

6

Page(s)

1042-1051
Abstract
Abstract
Background. Isocitrate dehydrogenase (IDH) mutant astrocytoma grading, until recently, has been entirely based on morphology. The 5th edition of the Central Nervous System World Health Organization (WHO) introduces CDKN2A/B homozygous deletion as a biomarker of grade 4. We sought to investigate the prognostic impact of DNA methylation-derived molecular biomarkers for IDH mutant astrocytoma. Methods. We analyzed 98 IDH mutant astrocytomas diagnosed at NYU Langone Health between 2014 and 2022. We reviewed DNA methylation subclass, CDKN2A/B homozygous deletion, and ploidy and correlated molecular biomarkers with histological grade, progression free (PFS), and overall (OS) survival. Findings were confirmed using 2 independent validation cohorts. Results. There was no significant difference in OS or PFS when stratified by histologic WHO grade alone, copy number complexity, or extent of resection. OS was significantly different when patients were stratified either by CDKN2A/B homozygous deletion or by DNA methylation subclass (P value = .0286 and .0016, respectively). None of the molecular biomarkers were associated with significantly better PFS, although DNA methylation classification showed a trend (P value = .0534). Conclusions. The current WHO recognized grading criteria for IDH mutant astrocytomas show limited prognostic value. Stratification based on DNA methylation shows superior prognostic value for OS.

Prognostic value of DNA methylation subclassification, aneuploidy, and CDKN2A/B homozygous deletion in predicting clinical outcome of IDH mutant astrocytomas

Feng, Y., Galbraith, K., Garcia, M., Wei, S., Chen, A., Schroff, C., Serrano, J., Pacione, D., Placantonakis, D. G., William, C. M., Faustin, A., & others. (n.d.).

Publication year

2024

Journal title

Neuro-Oncology

Page(s)

noae009
Abstract
Abstract
~

Racial distribution of molecularly classified brain tumors

Fang, C. S., Wang, W., Schroff, C., Movahed-Ezazi, M., Vasudevaraja, V., Serrano, J., Sulman, E. P., Golfinos, J. G., Orringer, D., Galbraith, K., Feng, Y., & Snuderl, M. (n.d.).

Publication year

2024

Journal title

Neuro-Oncology Advances

Volume

6

Issue

1
Abstract
Abstract
Background. In many cancers, specific subtypes are more prevalent in specific racial backgrounds. However, little is known about the racial distribution of specific molecular types of brain tumors. Public data repositories lack data on many brain tumor subtypes as well as diagnostic annotation using the current World Health Organization classification. A better understanding of the prevalence of brain tumors in different racial backgrounds may provide insight into tumor predisposition and development, and improve prevention. Methods. We retrospectively analyzed the racial distribution of 1709 primary brain tumors classified by their methylation profiles using clinically validated whole genome DNA methylation. Self-reported race was obtained from medical records. Our cohort included 82% White, 10% Black, and 8% Asian patients with 74% of patients reporting their race. Results. There was a significant difference in the racial distribution of specific types of brain tumors. Blacks were overrepresented in pituitary adenomas (35%, P < .001), with the largest proportion of FSH/LH subtype. Whites were underrepresented at 47% of all pituitary adenoma patients (P < .001). Glioblastoma (GBM) IDH wild-type showed an enrichment of Whites, at 90% (P < .001), and a significantly smaller percentage of Blacks, at 3% (P < .001). Conclusions. Molecularly classified brain tumor groups and subgroups show different distributions among the three main racial backgrounds suggesting the contribution of race to brain tumor development.

RaSE: A Variable Screening Framework via Random Subspace Ensembles

Feng, Y., Tian, Y. e., & Feng, Y. (n.d.).

Publication year

2021

Journal title

Journal of American Statistical Association
Abstract
Abstract
~

RaSE: Random Subspace Ensemble Classification

Feng, Y., Tian, Y. e., & Feng, Y. (n.d.).

Publication year

2021

Journal title

Journal of Machine Learning Research
Abstract
Abstract
~

Regularization after retention in ultrahigh dimensional linear regression models

Feng, Y., Weng, H., Feng, Y., & Qiao, X. (n.d.).

Publication year

2019

Journal title

Statistica Sinica
Abstract
Abstract
~

Semiparametric Modeling and Analysis for Longitudinal Network Data

Feng, Y., He, Y., Sun, J., Tian, Y., Ying, Z., & Feng, Y. (n.d.).

Publication year

2023

Journal title

arXiv preprint arXiv:2308.12227
Abstract
Abstract
~

Simulation of New York City’s Ventilator Allocation Guideline During the Spring 2020 COVID-19 Surge

Feng, Y., Walsh, B. C., Zhu, J., Feng, Y., Berkowitz, K. A., Betensky, R. A., Nunnally, M. E., & Pradhan, D. R. (n.d.).

Publication year

2023

Journal title

JAMA network open

Volume

6

Issue

10

Page(s)

e2336736--e2336736
Abstract
Abstract
~

Spectral clustering via adaptive layer aggregation for multi-layer networks

Feng, Y., Huang, S., Weng, H., & Feng, Y. (n.d.).

Publication year

2022

Journal title

Journal of Computational and Graphical Statistics

Issue

just-accepted

Page(s)

1--35
Abstract
Abstract
~

Super RaSE: Super Random Subspace Ensemble Classification

Feng, Y., Zhu, J., & Feng, Y. (n.d.).

Publication year

2021

Journal title

Journal of Risk and Financial Management

Volume

14

Issue

12

Page(s)

612
Abstract
Abstract
~

Targeted crisis risk control: A neyman-pearson approach

Feng, Y., Feng, Y., Tong, X., & Xin, W. (n.d.).

Publication year

2021

Journal title

Available at SSRN
Abstract
Abstract
~

Targeting predictors via partial distance correlation with applications to financial forecasting

Feng, Y., Yousuf, K., & Feng, Y. (n.d.).

Publication year

2022

Journal title

Journal of Business &amp; Economic Statistics

Volume

40

Issue

3

Page(s)

1007--1019
Abstract
Abstract
~

Contact

yang.feng@nyu.edu 708 Broadway New York, NY, 10003