Latent Variable Augmented Sparse Regress

报告人：：郑泽敏
报告地点：：数学与统计学院四楼学术报告厅
报告时间：： 2017年04月14日星期五10:30-11:30

报告简介：

As a powerful tool for producing meaningful and interpretable models, sparse modeling has gained increasing popularity for analyzing large-scale data sets. However, the key assumption of sparse feature effects which underlie high-dimensional statistical inference has been questioned in real applications. On the other hand, most of existing methods assume implicitly that all features in a model are observable. Yet some latent confounding factors may potentially exist in the hidden structure of the original model. In this paper, we consider nonsparse feature effects under the conditional sparsity structure that the original coefficient vector is sparse only after taking out the effects of latent factors. A new framework, latent variable augmented sparse regression (LAVAR), is proposed to simultaneously recover the significant observable predictors and latent factors. In particular, one potential family of latent variables that incorporate the population principal components of observable features is explored and asymptotic properties of sample principal components are established for a wide class of distributions. With the aid of these properties, we prove that the proposed framework can enjoy model selection consistency and oracle inequalities under various prediction and variable selection losses for both observable predictors and latent confounding factors. Our new method and results are evidenced by simulation and real data examples.

主讲人简介：

郑泽敏博士毕业于南加州大学，现为中国科学技术大学管理学院统计与金融系特任教授。他的研究领域是高维统计推断，研究兴趣包括高维情况下的模型选择、分类、传播网络推断以及大数据问题。其研究成果发表在Journal of the Royal Statistical Society Series B, The Annals of Statistics, Journal of Machine Learning Research 等统计学顶级刊物上。曾获美国数理统计协会颁发的科研新人奖（IMS Travel Award）以及南加州大学的杰出科研奖（Prize for Excellent in Research）。