| 1,013 | 20 | 96 |
| 下载次数 | 被引频次 | 阅读次数 |
零膨胀计数数据破坏了泊松分布的方差-均值关系,可由取值服从泊松分布的数据和取值为零(退化分布)的数据各占一定比例所构成的混合分布所解释。本文基于自适应弹性网技术,研究了零膨胀计数数据的联合建模及变量选择问题。对于零膨胀泊松分布,引入潜变量,构造出零膨胀泊松模型的完全似然,由零膨胀部分和泊松部分两项组成。考虑到协变量可能存在共线性和稀疏性,通过对似然函数加自适应弹性网惩罚得到目标函数,然后利用EM算法得到回归系数的稀疏估计量,并用贝叶斯信息准则BIC来确定最优调节参数。本文也给出了估计量的大样本性质的理论证明和模拟研究,最后把所提出的方法应用到实际问题中。
Abstract:Zero-inflated count data damage the mean-variance relation in Poisson distribution, which can be explained by the mixture distribution composed pro rata of data subject to Poisson distribution and zero-valued observations(degradation distribution). This paper studies the joint modeling and variable selection from zero-inflated count data based on the adaptive elastic-net technique. As to the zero-inflated Poisson distribution, some latent variables are induced into constructing a complete likelihood of the regression model, consisted of two components(zero-inflated and Poisson). Taking the possible collinearity and sparsity of covariates into account, the objective function is obtained by adding the adaptive elastic-net penalty to the likelihood function. Then the sparse estimator of the regression coefficient is achieved by using the EM algorithm to optimize the objective function. The Bayesian information criterion(BIC) is employed to determine the optimal tuning parameter. This paper also presents the performance of the proposed estimator with large sample properties through a theoretical demonstration and simulation study, and then applied to the practical issues with the real data.
[1]J Mullahy. Specification and Testing of Some Modified Count Data Models [J]. Journal of Econometrics, 1986, 33:341-365.
[2]D Lambert. Zero-inflated Poisson Regression, with an Application to Defects in Manufacturing [J]. Technometrics, 1992, 34(1) :1-14.
[3]徐昕,袁卫,孟生旺.零膨胀广义泊松回归模型与保险费率厘定[J]. 数学的实践与认识, 2009,24:99-107.
[4]T Chen, P Wu, W Tang, et al. Variable Selection for Distribution-free Models for Longitudinal Zero-inflated Count Responses [J]. Statistics in Medicine, 2016, 35(16): 2770-2785.
[5]Q Xu, W Zhang, T Zhang, et al. Zero-inflated Models for Identifying Relationships Between Body Mass Index and Gastroesophageal Reflux Symptoms: A Nationwide Population-based Study in China [J]. Digestive Diseases and Sciences, 2016, 61(7): 1986-1995.
[6]解锋昌,韦博成, 林金官.ZI数据的统计分析综述[J].应用概率统计,2009,25(6):659-671.
[7]R Tibshirani. Regression Shrinkage and Selection Via the Lasso [J]. Journal of the Royal Statistical Society. Series B (Methodological), 1996: 267-288.
[8]J Fan, R Li. Variable Selection Via Nonconcave Penalized Likelihood and its Oracle Properties [J]. Journal of the American Statistical Association, 2001, 96(456): 1348-1360.
[9]H Zou. The Adaptive Lasso and its Oracle Properties[J]. Journal of the American Statistical Association, 2006, 101(476): 1418-1429.
[10]H Zou, T Hastie. Regularization and Variable Selection Via the Elastic Net [J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2005, 67(2): 301-320.
[11]A Buu, N J Johnson, R Li, et al. New Variable Selection Methods for Zero Inflated Count Data with Applications to the Substance Abuse Field [J]. Statistics in Medicine, 2011, 30(18): 2326-2340.
[12]M Jochmann. What Belongs Where? Variable Selection for Zero-inflated Count Models with an Application to Demand for Health Care [J]. Computational Statistics,2013, 28:1947-64.
[13]Z Wang, S Ma, C Y Wang, et al. EM for Regularized Zero Inflated Regression Models with Applications to Postoperative Morbidity After Cardiac Surgery in Children [J]. Statistics in Medicine, 2014, 33(29): 5192-5208.
[14]T Chen, P Wu, W Tang, et al. Variable Selection for Distribution-free Models for Longitudinal Zero-inflated Count Responses [J]. Statistics in Medicine, 2016, 35(16): 2770-2785.
[15]P Zeng, Y Wei, Y Zhao, et al. Variable Selection Approach for Zero Inflated Count Data Via Adaptive Lasso [J]. Journal of Applied Statistics, 2014, 41(4): 879-894.
[16]Y Tang, L Xiang, Z Zhu. Risk Factor Selection in Rate Making: EM Adaptive LASSO for Zero-Inflated Poisson Regression Models [J]. Risk Analysis, 2014, 34(6): 1112-1127.
[17]H Mallick, H K Tiwari. EM Adaptive LASSO—A Multilocus Modeling Strategy for Detecting SNPs Associated with Zero-inflated Count Phenotypes [J]. Frontiers in Genetics, 2016, 7, 32.
[18]E Cantoni, M Auda. Stochastic Variable Selection Strategies for Zero-inflated Models [J]. Statistical Modelling, 2018, 18(1): 3-23.
[19]H Zou, H H Zhang. On the Adaptive Elastic-net with a Diverging Number of Parameters [J]. Annals of Statistics, 2009, 37(4): 1733.
[20]P L Gupta, R C Gupta, R C Tripathi. Analysis of Zero-adjusted Count Data [J]. Computational Statistics & Data Analysis, 1996, 23(2): 207-218.
[21]解锋昌,韦博成,林金官.零过多数据的统计分析及其应用[M].北京:科学出版社,2013.
基本信息:
DOI:10.19343/j.cnki.11-1302/c.2019.01.009
中图分类号:O212.1
引用信息:
[1]胡亚南,田茂再.零膨胀计数数据的联合建模及变量选择[J].统计研究,2019,36(01):104-114.DOI:10.19343/j.cnki.11-1302/c.2019.01.009.
基金信息:
中国人民大学科学研究基金(中央高校基本科研业务费专项资金资助)项目“大数据分析的稳健统计理论与应用研究”(18XNL012)的资助
2019-01-25
2019-01-25