统计学主题系列报告

Nearest-Neighbor Sampling Based Conditional Independence Testing

报告人:谌自奇

报告地点:腾讯会议:481989032

报告时间:2024年05月09日星期四10:00-11:00

报告摘要:

The conditional randomization test (CRT) was recently proposed to test whether two random variables X and Y are conditionally independent given random variables Z. The CRT assumes that the conditional distribution of X given Z is known under the null hypothesis and then it is compared to the distribution of the observed samples of the original data. The aim of this paper is to develop a novel alternative of CRT by using nearest-neighbor sampling without assuming the exact form of the distribution of X given Z. Specifically, we utilize the computationally efficient 1-nearest-neighbor to approximate the conditional distribution that encodes the null hypothesis. Then, theoretically, we show that the distribution of the generated samples is very close to the true conditional distribution in terms of total variation distance. Furthermore, we take the classifier-based conditional mutual information estimator as our test statistic. The test statistic as an empirical fundamental information theoretic quantity is able to well capture the conditional-dependence feature. We show that our proposed test is computationally very fast, while controlling type I and II errors quite well. Finally, we demonstrate the efficiency of our proposed test in both synthetic and real data analyses.

主讲人简介:

谌自奇,华东师范大学统计学院研究员,博士生导师。博士毕业于东北师范大学, 曾于2016-2019年在美国安德森癌症研究中心生物统计系从事博士后研究工作。专注复杂数据领域的统计学及其交叉科学研究,研究兴趣包含高维矩阵、条件独立性检验、因果结果学习、机器学习、生物统计学中的统计方法等。在JASA、Biometrics、NeurIPS等国际权威统计或者计算机期刊(会议)上发表论文20多篇。主持国家自然科学基金面上项目2项、国家自然科学基金重点项目(子课题)1项,国家自然科学基金青年项目1项等,作为骨干力量参与国家重点研发计划和上海市“科技创新行动计划”基础研究领域应用数学重点项目。