统计学主题系列报告

Paradoxes and resolutions for semiparametric fusion of individual and summary data

报告人:苗旺

报告地点:腾讯会议ID:858-165-028

报告时间:2022年11月4日星期五14:00-15:00


报告摘要:

Suppose we have available individual data from an internal study and   various types of summary statistics from relevant external studies. External   summary statistics have been used as constraints on the internal data   distribution, which promised to improve the statistical inference in the   internal data; however, the additional use of external summary data may lead   to paradoxical results: efficiency   loss may occur if the uncertainty of the summary statistics is not negligible   and estimation bias can emerge if they are obtained in a different population   from the internal study. We investigate these paradoxical results in a   semiparametric framework. We establish the semiparametric efficiency bound   for estimating a general functional of the internal data distribution, which   is shown to be no larger than that using only internal data. We propose a   data-fused efficient estimator that achieves this bound so that the   efficiency paradox is resolved. This   data-fused estimator is further regularized with adaptive lasso penalty so   that the resultant estimator can achieve the same asymptotic distribution as   the oracle one that uses only unbiased summary statistics, which resolves the   bias paradox. Simulations and application to a Helicobacter pylori infection   dataset are used to illustrate the proposed methods.


主讲人简介:


苗旺,北京大学概率统计系研究员,2008-2017年在北京大学数学科学学院读本科和博士,2017-2018年在哈佛大学生物统计系做博士后研究,2018年入职北京大学光华管理学院,2020年调入数学科学学院。苗旺的研究兴趣包括因果推断,缺失数据分析,及其在生物统计,流行病学,经济学和人工智能研究中的应用,与合作者提出混杂分析的代理推断方法,发展非随机缺失数据的识别性和双稳健估计理论,以及数据融合的半参数理论