Yang Feng

Yang Feng
Yang Feng

Associate Professor of Biostatistics

Professional overview

Yang Feng is Associate Professor of Biostatistics. He received his B.S. in mathematics from the University of Science and Technology of China and his Ph.D. in Operations Research from Princeton University.

Dr. Feng's research interests include high-dimensional statistical learning, network models, nonparametric and semiparametric methods, and bioinformatics. He has published in The Annals of Statistics, Journal of the American Statistical Association, Journal of the Royal Statistical Society Series B, Journal of Machine Learning Research and Science Advances. Feng serves on the editorial boards of the Journal of Business & Economic Statistics, Statistica Sinica, and Statistical Analysis and Data Mining: The ASA Data Science Journal.

Prior to joining NYU, Feng was Associate Professor of Statistics and an affiliated member in the Data Science Institute at Columbia University. He is an elected member of the International Statistical Institute and a recipient of the NSF CAREER award.

Publications

Publications

A projection-based conditional dependence measure with applications to high-dimensional undirected graphical models

Failed retrieving data from NYU Scholars.

On the sparsity of Mallows model averaging estimator

Feng, Y., Liu, Q., & Okui, R.

Publication year

2020

Journal title

Economics Letters

Volume

187
Abstract
We show that Mallows model averaging estimator proposed by Hansen (2007) can be written as a least squares estimation with a weighted L1 penalty and additional constraints. By exploiting this representation, we demonstrate that the weight vector obtained by this model averaging procedure has a sparsity property in the sense that a subset of models receives exactly zero weights. Moreover, this representation allows us to adapt algorithms developed to efficiently solve minimization problems with many parameters and weighted L1 penalty. In particular, we develop a new coordinate-wise descent algorithm for model averaging. Simulation studies show that the new algorithm computes the model averaging estimator much faster and requires less memory than conventional methods when there are many models.

A Kronecker Product Model for Repeated Pattern Detection on 2D Urban Images

Liu, J., Psarakis, E. Z., Feng, Y., & Stamos, I.

Publication year

2019

Journal title

IEEE Transactions on Pattern Analysis and Machine Intelligence

Volume

41

Issue

9

Page(s)

2266-2272
Abstract
Repeated patterns (such as windows, balconies, and doors) are prominent and significant features in urban scenes. Therefore, detection of these repeated patterns becomes very important for city scene analysis. This paper attacks the problem of repeated pattern detection in a precise, efficient and automatic way, by combining traditional feature extraction with a Kronecker product based low-rank model. We introduced novel algorithms that extract repeated patterns from rectified images with solid theoretical support. Our method is tailored for 2D images of building façades and tested on a large set of façade images.

Likelihood adaptively modified penalties

Feng, Y., Li, T., & Ying, Z.

Publication year

2019

Journal title

Applied Stochastic Models in Business and Industry

Volume

35

Issue

2

Page(s)

330-353
Abstract
A new family of penalty functions, ie, adaptive to likelihood, is introduced for model selection in general regression models. It arises naturally through assuming certain types of prior distribution on the regression parameters. To study the stability properties of the penalized maximum-likelihood estimator, 2 types of asymptotic stability are defined. Theoretical properties, including the parameter estimation consistency, model selection consistency, and asymptotic stability, are established under suitable regularity conditions. An efficient coordinate-descent algorithm is proposed. Simulation results and real data analysis show that the proposed approach has competitive performance in comparison with the existing methods.

On the estimation of correlation in a binary sequence model

Weng, H., & Feng, Y.

Publication year

2019

Journal title

Journal of Statistical Planning and Inference

Volume

207

Page(s)

123-137
Abstract
We consider a binary sequence generated by thresholding a hidden continuous sequence. The hidden variables are assumed to have a compound symmetry covariance structure with a single parameter characterizing the common correlation. We study the parameter estimation problem under such one-parameter models. We demonstrate that maximizing the likelihood function does not yield consistent estimates for the correlation. We then formally prove the nonestimability of the parameter by deriving a non-vanishing minimax lower bound. This counter-intuitive phenomenon provides an interesting insight that one-bit information of each latent variable is not sufficient to consistently recover their common correlation. On the other hand, we further show that trinary data generated from the hidden variables can consistently estimate the correlation with parametric convergence rate. Thus we reveal a phase transition phenomenon regarding the discretization of latent continuous variables while preserving the estimability of the correlation. Numerical experiments are performed to validate the conclusions.

Regularization after retention in ultrahigh dimensional linear regression models

Weng, H., Feng, Y., & Qiao, X.

Publication year

2019

Journal title

Statistica Sinica

Volume

29

Issue

1

Page(s)

387-407
Abstract
In ultrahigh dimensional setting, independence screening has been both theoretically and empirically proved a useful variable selection framework with low computation cost. In this work, we propose a two-step framework using marginal information in a different fashion than independence screening. In particular, we retain significant variables rather than screening out irrelevant ones. The method is shown to be model selection consistent in the ultrahigh dimensional linear regression model. To improve the finite sample performance, we then introduce a three-step version and characterize its asymptotic behavior. Simulations and data analysis show advantages of our method over independence screening and its iterative variants in certain regimes.

The restricted consistency property of leave-nV-out cross-validation for high-dimensional variable selection

Feng, Y., & Yu, Y.

Publication year

2019

Journal title

Statistica Sinica

Volume

29

Issue

3

Page(s)

1607-1630
Abstract
Cross-validation (CV) methods are popular for selecting the tuning parameter in high-dimensional variable selection problems. We show that a misalignment of the CV is one possible reason for its over-selection behavior. To fix this issue, we propose using a version of leave-nv-out CV (CV(nv)) to select the optimal model from a restricted candidate model set for high-dimensional generalized linear models. By using the same candidate model sequence and a proper order for the construction sample size nc in each CV split, CV(nv) avoids potential problems when developing theoretical properties. CV(nv) is shown to exhibit the restricted model-selection consistency property under mild conditions. Extensive simulations and a real-data analysis support the theoretical results and demonstrate the performance of CV(nv) in terms of both model selection and prediction.

A crowdsourced analysis to identify ab initio molecular signatures predictive of susceptibility to viral infection

Failed generating bibliography.

Publication year

2018

Journal title

Nature communications

Volume

9

Issue

1
Abstract
The response to respiratory viruses varies substantially between individuals, and there are currently no known molecular predictors from the early stages of infection. Here we conduct a community-based analysis to determine whether pre- or early post-exposure molecular factors could predict physiologic responses to viral exposure. Using peripheral blood gene expression profiles collected from healthy subjects prior to exposure to one of four respiratory viruses (H1N1, H3N2, Rhinovirus, and RSV), as well as up to 24 h following exposure, we find that it is possible to construct models predictive of symptomatic response using profiles even prior to viral exposure. Analysis of predictive gene features reveal little overlap among models; however, in aggregate, these genes are enriched for common pathways. Heme metabolism, the most significantly enriched pathway, is associated with a higher risk of developing symptoms following viral exposure. This study demonstrates that pre-exposure molecular predictors can be identified and improves our understanding of the mechanisms of response to respiratory viruses.

Model Selection for High-Dimensional Quadratic Regression via Regularization

Hao, N., Feng, Y., & Zhang, H. H.

Publication year

2018

Journal title

Journal of the American Statistical Association

Volume

113

Issue

522

Page(s)

615-625
Abstract
Quadratic regression (QR) models naturally extend linear models by considering interaction effects between the covariates. To conduct model selection in QR, it is important to maintain the hierarchical model structure between main effects and interaction effects. Existing regularization methods generally achieve this goal by solving complex optimization problems, which usually demands high computational cost and hence are not feasible for high-dimensional data. This article focuses on scalable regularization methods for model selection in high-dimensional QR. We first consider two-stage regularization methods and establish theoretical properties of the two-stage LASSO. Then, a new regularization method, called regularization algorithm under marginality principle (RAMP), is proposed to compute a hierarchy-preserving regularization solution path efficiently. Both methods are further extended to solve generalized QR models. Numerical results are also shown to demonstrate performance of the methods. Supplementary materials for this article are available online.

Neyman-Pearson classification algorithms and NP receiver operating characteristics

Tong, X., Feng, Y., & Li, J. J.

Publication year

2018

Journal title

Science Advances

Volume

4

Issue

2
Abstract
In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class 1 observation as class 0)while enforcing an upper bound, a, on the type I error.Despite its century-long history in hypothesis testing, the NP paradigm has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than a do not satisfy the type I error control objective because the resulting classifiers are likely to have type I errors much larger than a, and the NP paradigm has not been properly implemented in practice. We develop the first umbrella algorithm that implements the NP paradigm for all scoringtype classification methods, such as logistic regression, support vector machines, and random forests. Powered by this algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands motivated by the popular ROC curves. NP-ROC bands will help choose a in a data-adaptive way and compare different NP classifiers. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the R package nproc, through simulation and real data studies.

Nonparametric independence screening via favored smoothing bandwidth

Feng, Y., Wu, Y., & Stefanski, L. A.

Publication year

2018

Journal title

Journal of Statistical Planning and Inference

Volume

197

Page(s)

1-14
Abstract
We propose a flexible nonparametric regression method for ultrahigh-dimensional data. As a first step, we propose a fast screening method based on the favored smoothing bandwidth of the marginal local constant regression. Then, an iterative procedure is developed to recover both the important covariates and the regression function. Theoretically, we prove that the favored smoothing bandwidth based screening possesses the model selection consistency property. Simulation studies as well as real data analysis show the competitive performance of the new procedure.

Penalized weighted least absolute deviation regression

Gao, X., & Feng, Y.

Publication year

2018

Journal title

Statistics and its Interface

Volume

11

Issue

1

Page(s)

79-89
Abstract
In a linear model where the data is contaminated or the random error is heavy-tailed, least absolute deviation (LAD) regression has been widely used as an alternative approach to least squares (LS) regression. However, it is well known that LAD regression is not robust to outliers in the explanatory variables. When the data includes some leverage points, LAD regression may perform even worse than LS regression. In this manuscript, we propose to improve LAD regression in a penalized weighted least absolute deviation (PWLAD) framework. The main idea is to associate each observation with a weight reflecting the degree of outlying and leverage effect and obtain both the weight and coefficient vector estimation simultaneously and adaptively. The proposed PWLAD is able to provide regression coefficients estimate with strong robustness, and perform outlier detection at the same time, even when the random error does not have finite variances. We provide sufficient conditions under which PWLAD is able to identify true outliers consistently. The performance of the proposed estimator is demonstrated via extensive simulation studies and real examples.

SIS: An R package for sure independence screening in ultrahigh-dimensional statistical models

Saldana, D. F., & Feng, Y.

Publication year

2018

Journal title

Journal of Statistical Software

Volume

83
Abstract
We revisit sure independence screening procedures for variable selection in generalized linear models and the Cox proportional hazards model. Through the publicly available R package SIS, we provide a unified environment to carry out variable selection using iterative sure independence screening (ISIS) and all of its variants. For the regularization steps in the ISIS recruiting process, available penalties include the LASSO, SCAD, and MCP while the implemented variants for the screening steps are sample splitting, data-driven thresholding, and combinations thereof. Performance of these feature selection techniques is investigated by means of real and simulated data sets, where we find considerable improvements in terms of model selection and computational time between our algorithms and traditional penalized pseudo-likelihood methods applied directly to the full set of covariates.

Binary switch portfolio

Li, T., Chen, K., Feng, Y., & Ying, Z.

Publication year

2017

Journal title

Quantitative Finance

Volume

17

Issue

5

Page(s)

763-780
Abstract
We propose herein a new portfolio selection method that switches between two distinct asset allocation strategies. An important component is a carefully designed adaptive switching rule, which is based on a machine learning algorithm. It is shown that using this adaptive switching strategy, the combined wealth of the new approach is a weighted average of that of the successive constant rebalanced portfolio and that of the 1/N portfolio. In particular, it is asymptotically superior to the 1/N portfolio under mild conditions in the long run. Applications to real data show that both the returns and the Sharpe ratios of the proposed binary switch portfolio are the best among several popular competing methods over varying time horizons and stock pools.

How Many Communities Are There?

Saldaña, D. F., Yu, Y., & Feng, Y.

Publication year

2017

Journal title

Journal of Computational and Graphical Statistics

Volume

26

Issue

1

Page(s)

171-181
Abstract
Stochastic blockmodels and variants thereof are among the most widely used approaches to community detection for social networks and relational data. A stochastic blockmodel partitions the nodes of a network into disjoint sets, called communities. The approach is inherently related to clustering with mixture models; and raises a similar model selection problem for the number of communities. The Bayesian information criterion (BIC) is a popular solution, however, for stochastic blockmodels, the conditional independence assumption given the communities of the endpoints among different edges is usually violated in practice. In this regard, we propose composite likelihood BIC (CL-BIC) to select the number of communities, and we show it is robust against possible misspecifications in the underlying stochastic blockmodel assumptions. We derive the requisite methodology and illustrate the approach using both simulated and real data. Supplementary materials containing the relevant computer code are available online.

JDINAC: Joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data

Ji, J., He, D., Feng, Y., He, Y., Xue, F., & Xie, L.

Publication year

2017

Journal title

Bioinformatics

Volume

33

Issue

19

Page(s)

3080-3087
Abstract
Motivation A complex disease is usually driven by a number of genes interwoven into networks, rather than a single gene product. Network comparison or differential network analysis has become an important means of revealing the underlying mechanism of pathogenesis and identifying clinical biomarkers for disease classification. Most studies, however, are limited to network correlations that mainly capture the linear relationship among genes, or rely on the assumption of a parametric probability distribution of gene measurements. They are restrictive in real application. Results We propose a new Joint density based non-parametric Differential Interaction Network Analysis and Classification (JDINAC) method to identify differential interaction patterns of network activation between two groups. At the same time, JDINAC uses the network biomarkers to build a classification model. The novelty of JDINAC lies in its potential to capture non-linear relations between molecular interactions using high-dimensional sparse data as well as to adjust confounding factors, without the need of the assumption of a parametric probability distribution of gene measurements. Simulation studies demonstrate that JDINAC provides more accurate differential network estimation and lower classification error than that achieved by other state-of-the-art methods. We apply JDINAC to a Breast Invasive Carcinoma dataset, which includes 114 patients who have both tumor and matched normal samples. The hub genes and differential interaction patterns identified were consistent with existing experimental studies. Furthermore, JDINAC discriminated the tumor and normal sample with high accuracy by virtue of the identified biomarkers. JDINAC provides a general framework for feature selection and classification using high-dimensional sparse omics data.

Post selection shrinkage estimation for high-dimensional data analysis

Gao, X., Ahmed, S. E., & Feng, Y.

Publication year

2017

Journal title

Applied Stochastic Models in Business and Industry

Volume

33

Issue

2

Page(s)

97-120
Abstract
In high-dimensional data settings where p ≫ n, many penalized regularization approaches were studied for simultaneous variable selection and estimation. However, with the existence of covariates with weak effect, many existing variable selection methods, including Lasso and its generations, cannot distinguish covariates with weak and no contribution. Thus, prediction based on a subset model of selected covariates only can be inefficient. In this paper, we propose a post selection shrinkage estimation strategy to improve the prediction performance of a selected subset model. Such a post selection shrinkage estimator (PSE) is data adaptive and constructed by shrinking a post selection weighted ridge estimator in the direction of a selected candidate subset. Under an asymptotic distributional quadratic risk criterion, its prediction performance is explored analytically. We show that the proposed post selection PSE performs better than the post selection weighted ridge estimator. More importantly, it improves the prediction performance of any candidate subset model selected from most existing Lasso-type variable selection methods significantly. The relative performance of the post selection PSE is demonstrated by both simulation studies and real-data analysis.

Rejoinder to ‘Post-selection shrinkage estimation for high-dimensional data analysis’

Gao, X., Ejaz Ahmed, S., & Feng, Y.

Publication year

2017

Journal title

Applied Stochastic Models in Business and Industry

Volume

33

Issue

2

Page(s)

131-135
Abstract
Rejoinder to the paper entitled ‘Post-selection shrinkage estimation for high-dimensional data analysis’ discusses different aspects of the study. One fundamental ingredient of the work is to formally split the signals into strong and weak ones. The rationale is that the usual one-step method such as the least absolute shrinkage and selection operator (LASSO) may be very effective in detecting strong signals while failing to identify some weak ones, which in turn has a significant impact on the model fitting, as well as prediction. The discussions of both Fan and QYY contain very interesting comments on the separation of the three sets of variables.

Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification

Fan, J., Feng, Y., Jiang, J., & Tong, X.

Publication year

2016

Journal title

Journal of the American Statistical Association

Volume

111

Issue

513

Page(s)

275-287
Abstract
We propose a high-dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called feature augmentation via nonparametrics and selection (FANS). We motivate FANS by generalizing the naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression datasets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.

Neyman-pearson classiffication under high-dimensional settings

Zhao, A., Feng, Y., Wang, L., & Tong, X.

Publication year

2016

Journal title

Journal of Machine Learning Research

Volume

17

Page(s)

1-39
Abstract
Most existing binary classiffication methods target on the optimization of the overall classification risk and may fail to serve some real-world applications such as cancer diagnosis, where users are more concerned with the risk of misclassifying one speciffic class than the other. Neyman-Pearson (NP) paradigm was introduced in this context as a novel statistical framework for handling asymmetric type I/II error priorities. It seeks classifiers with a minimal type II error and a constrained type I error under a user specified level. This article is the first attempt to construct classifiers with guaranteed theoretical performance under the NP paradigm in high-dimensional settings. Based on the fundamental Neyman-Pearson Lemma, we used a plug-in approach to construct NP-Type classifiers for Naive Bayes models. The proposed classifiers satisfy the NP oracle inequalities, which are natural NP paradigm counterparts of the oracle inequalities in classical binary classification. Besides their desirable theoretical properties, we also demonstrated their numerical advantages in prioritized error control via both simulation and real data studies.

Tuning-parameter selection in regularized estimations of large covariance matrices

Fang, Y., Wang, B., & Feng, Y.

Publication year

2016

Journal title

Journal of Statistical Computation and Simulation

Volume

86

Issue

3

Page(s)

494-509
Abstract
Recently many regularized estimators of large covariance matrices have been proposed, and the tuning parameters in these estimators are usually selected via cross-validation. However, there is a lack of consensus on the number of folds for conducting cross-validation. One round of cross-validation involves partitioning a sample of data into two complementary subsets, a training set and a validation set. In this manuscript, we demonstrate that if the estimation accuracy is measured in the Frobenius norm, the training set should consist of majority of the data; whereas if the estimation accuracy is measured in the operator norm, the validation set should consist of majority of the data. We also develop methods for selecting tuning parameters based on the bootstrap and compare them with their cross-validation counterparts. We demonstrate that the cross-validation methods with ‘optimal’ choices of folds are more appropriate than their bootstrap counterparts.

Variable selection and prediction with incomplete high-dimensional data

Liu, Y., Wang, Y., Feng, Y., & Wall, M. M.

Publication year

2016

Journal title

Annals of Applied Statistics

Volume

10

Issue

1

Page(s)

418-450
Abstract
We propose a Multiple Imputation Random Lasso (MIRL) method to select important variables and to predict the outcome for an epidemiological study of Eating and Activity in Teens. In this study 80% of individuals have at least one variable missing. Therefore, using variable selection methods developed for complete data after listwise deletion substantially reduces prediction power. Recent work on prediction models in the presence of incomplete data cannot adequately account for large numbers of variables with arbitrary missing patterns. We propose MIRL to combine penalized regression techniques with multiple imputation and stability selection. Extensive simulation studies are conducted to compare MIRL with several alternatives. MIRL outperforms other methods in high-dimensional scenarios in terms of both reduced prediction error and improved variable selection performance, and it has greater advantage when the correlation among variables is high and missing proportion is high. MIRL is shown to have improved performance when comparing with other applicable methods when applied to the study of Eating and Activity in Teens for the boys and girls separately, and to a subgroup of low social economic status (SES) Asian boys who are at high risk of developing obesity.

Functional and Parametric Estimation in a Semi-and Nonparametric Model with Application to Mass-Spectrometry Data

Ma, W., Feng, Y., Chen, K., & Ying, Z.

Publication year

2015

Journal title

International Journal of Biostatistics

Volume

11

Issue

2

Page(s)

285-303
Abstract
Motivated by modeling and analysis of mass-spectrometry data, a semi-and nonparametric model is proposed that consists of linear parametric components for individual location and scale and a nonparametric regression function for the common shape. A multi-step approach is developed that simultaneously estimates the parametric components and the nonparametric function. Under certain regularity conditions, it is shown that the resulting estimators is consistent and asymptotic normal for the parametric part and achieve the optimal rate of convergence for the nonparametric part when the bandwidth is suitably chosen. Simulation results are presented to demonstrate the effectiveness and finite-sample performance of the method. The method is also applied to a SELDI-TOF mass spectrometry data set from a study of liver cancer patients.

APPLE: Approximate path for penalized likelihood estimators

Yu, Y., & Feng, Y.

Publication year

2014

Journal title

Statistics and Computing

Volume

24

Issue

5

Page(s)

803-819
Abstract
In high-dimensional data analysis, penalized likelihood estimators are shown to provide superior results in both variable selection and parameter estimation. A new algorithm, APPLE, is proposed for calculating the Approximate Path for Penalized Likelihood Estimators. Both convex penalties (such as LASSO) and folded concave penalties (such as MCP) are considered. APPLE efficiently computes the solution path for the penalized likelihood estimator using a hybrid of the modified predictor-corrector method and the coordinate-descent algorithm. APPLE is compared with several well-known packages via simulation and analysis of two gene expression data sets.

Modified Cross-Validation for Penalized High-Dimensional Linear Regression Models

Yu, Y., & Feng, Y.

Publication year

2014

Journal title

Journal of Computational and Graphical Statistics

Volume

23

Issue

4

Page(s)

1009-1027
Abstract
In this article, for Lasso penalized linear regression models in high-dimensional settings, we propose a modified cross-validation (CV) method for selecting the penalty parameter. The methodology is extended to other penalties, such as Elastic Net. We conduct extensive simulation studies and real data analysis to compare the performance of the modified CV method with other methods. It is shown that the popular K-fold CV method includes many noise variables in the selected model, while the modified CV works well in a wide range of coefficient and correlation settings. Supplementary materials containing the computer code are available online.

Contact

yf31@nyu.edu +1 (212) 992-3810 715/719 Broadway New York, NY 10003