nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2016, 11, v.33;No.302 109-112
大数据分析仍需要统计思想——以ARGO模型为例
基金项目(Foundation): 中国人民大学科学研究基金(中央高校基本科研业务费专项资金资助)项目(15XNI011)的阶段性成果
邮箱(Email):
DOI: 10.19343/j.cnki.11-1302/c.2016.11.015
摘要:

在大数据时代,传统的统计学是否还有用武之地引起很多争议。本文以ARGO模型为案例,介绍了统计方法在大数据分析中的应用和取得的成果,并从统计学的角度出发,提出改进的措施与方法。通过ARGO模型的分析结果发现,大数据分析的很多根本性问题仍然是统计问题,而数据中的统计规律仍然是数据分析要挖掘的最大价值,这也意味着统计思想在大数据分析中只能越来越重要。而对于结构复杂、来源多样的大数据来说,统计学方法也需要新的探索和尝试,这将是统计学所面临的机遇和挑战。

Abstract:

In the era of big data,people argue that the traditional statistic has lost its superiority in data analysis. In this paper,we take ARGO model as an example to introduce the applications and achievements of statistical methods in big data analysis,and put forward the potential improvements from statistics' point of view. The analysis of ARGO model shows that many intrinsic problems in big data can be resolved by statistical methods and the statistical law contained in the data is still the greatest value of data mining,which means that the statistical thinking can only become more and more important in big data analysis. However,in the face of big data with complex structure and diversity of sources,statistical methods also need further exploration and try to seize the new opportunities for development and rise to new challenges.

参考文献

[1]Viktor Mayer-Schonberger,Kenneth Cukier.Big data:A revolution that will transform how we live,work,and think[M].Eamon Dolan/Houghton Mifflin Harcourt,2013.

[2]J Ginsberg,M H Mohebbi,R S Patel.Detecting influenza epidemics using search engine query data[J].Nature,2009(457):1012-1014.

[3]D Lazer,R Kennedy,G King,A Vespignani.The parable of Google Flu:Traps in big data analysis[J].Science,2014(343):1203-1205.

[4]秦磊,谢邦昌.谷歌流感趋势的成功与失误[J].统计研究,2016(33):107-110.

[5]M Santillana,et al.What can digital diseasedetection learn from(an external revision to)Google Flu Trends?[J].American Journal of Preventive Medicine,2014(47):341-347.

[6]S Yang,M Santillana,S C Kou.Accurate estimation of influenza epidemics using Google search data via ARGO[J].Proceedings of the National Academy of Sciences of United States of America.2015(112):14473-14478.

[7]M Santillana,et al.Using clinicians’search query data to monitor influenza epidemics[J].Clinical Infectious Diseases,2014(59):1446-1450.

[8]M J Paul,M Dredze,D Broniatowski.Twitter improves influenza forecasting[J].PLOS Currents Outbreaks.2014 Oct 28.Edition 1.

[9]R Tibshirani,Regression shrinkage and selection via the lasso[J].Journal of the Royal Statistical Society(Series B),1996(58):267-288.

[10]A E Hoerl,R W Kennard.Ridge regression:Biased estimation for nonorthogonalproblems[J].Technometrics,1970(12):55-67.

[11]H Zou,T Hastie.Regularization and variable selection via the elastic net[J].Journal of the Royal Statistical Society(Series B),2005(67):301-320.

基本信息:

DOI:10.19343/j.cnki.11-1302/c.2016.11.015

中图分类号:C829

引用信息:

[1]林存洁,李扬.大数据分析仍需要统计思想——以ARGO模型为例[J].统计研究,2016,33(11):109-112.DOI:10.19343/j.cnki.11-1302/c.2016.11.015.

基金信息:

中国人民大学科学研究基金(中央高校基本科研业务费专项资金资助)项目(15XNI011)的阶段性成果

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文