统计研究

2022, 05, v.39 134-145

一种基于机器学习的宏观经济数据融合方法

基金项目(Foundation): 国家社会科学基金项目“面向城市计算的多领域数据融合方法研究”（20XTJ005）;国家社会科学基金项目“因子分析的稀疏处理理论及其拓展研究”（18BTJ038）;国家社会科学基金项目“大规模稀疏函数型数据修复方法与应用研究”（19XTJ002）; 中央引导地方科技发展项目“城市计算方法体系构建及甘肃智慧城市应用”（GSK215115）

邮箱(Email): noah@lzufe.edu.cn;

DOI: 10.19343/j.cnki.11-1302/c.2022.05.010

发布时间： 2022-05-30

出版时间： 2022-05-30

网络发布时间： 2022-05-30

移动端阅读

3,144	13	538
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

大数据和机器学习正在改变经济统计学的研究范式与方法。宏观经济数据作为统计产品，用于描述一定范围内的经济状态或联系。与微观多源异构数据一样，宏观经济数据也具有融合二次开发的潜质，且具备更好的数据质量保障。本文在梳理机器学习数据融合方法的基础上，指出一类宏观经济数据融合任务，提出一种宏观经济数据融合方法，旨在提高预测能力。首先，通过论证经济状态数据、经济关联数据的可融合形式特征，给出提取不同类型数据共同特征的模型化表示方法；进而提出一种数据融合模型，给出模型求解的交替迭代求解算法，该模型可以统一处理数据融合基础上的无监督学习、监督学习和半监督学习任务。并且，本文基于2017年中国统计年鉴、2017年中国投入产出表和2017—2018年中国经济景气月报数据开展数据融合应用，结果表明，与非融合方法相比，数据融合方法提高了预测精度。

关键词： 数据融合; 经济状态; 经济关联; 机器学习;

Abstract：

Big data and machine learning are changing the research paradigm and methods of economic statistics. As a statistical product, macroeconomic data can describe economic status and economic connections in a certain range. Similar to integrated utilization of multi-source heterogeneous micro-data,macroeconomic data also has the potential for fusion, with its better data quality assurance. Based on an overview of data fusion methods of machine learning, this paper indicates a type of macroeconomic data fusion tasks, and proposes a data fusion approach, which aims to improve the prediction ability. Specifically,the potential merging characteristics of economic status data and economic connection data are demonstrated and a model representation method for extracting common feature information of these data is given. Then a fusion model is proposed, which can uniformly handle the task of unsupervised learning, supervised learning and semi-supervised learning on the basis of data fusion. An alternative iterative algorithm is followed for this model. The application of fusing the data from China Statistical Yearbooks(2017), Input-output Tables of China(2017), and China's economic prosperity monthly reports(2017—2018) proves that our algorithm of data fusion has a better prediction accuracy compared with the non-fusion methods.

KeyWords： Data Fusion; Economic Status; Economic Connections; Machine Learning;

如需获取全文，请访问cnki.net

参考文献

[1]洪永淼,汪寿阳.大数据如何改变经济学研究范式?[J].管理世界, 2021, 37(10):40–55, 72.

[2]黄恒君.政府统计生产体系中的大数据融入探讨:基于数据源与数据质量的分析[J].统计研究, 2019, 36(7):3–12.

[3]李金昌.大数据应用的质量控制[J].统计研究, 2020, 37(2):119–128.

[4]李金昌.关于统计数据的几点认识[J].统计研究, 2017, 34(11):3–14.

[5]马双鸽,王小燕,方匡南.大数据的整合分析方法[J].统计研究, 2015, 32(11):3–11.

[6]王芳,王宣艺,陈硕.经济学研究中的机器学习:回顾与展望[J].数量经济技术经济研究, 2020, 37(4):146–164.

[7]朱建平,冯冲,陈淑真.全球政府数据共享模式研究:对中国的启示[J].统计学报, 2020, 1(1):14–25.

[8] Alyannezhadi M, Pouyan A, Abolghasemi V. An Efficient Algorithm for Multisensory Data Fusion under Uncertainty Condition[J]. Journal of Electrical Systems and Information Technology, 2017, 4(1):269–278.

[9] Bleiholder J, Naumann F. Data Fusion[J]. ACM Computing Surveys, 2008, 41(1):1–41.

[10] Blumenstock J, Cadamuro G, On R. Predicting Poverty and Wealth from Mobile Phone Metadata[J]. Science, 2015, 350(6264):1073–1076.

[11] Couper M P. Is the Sky Falling? New Technology, Changing Media, and the Future of Surveys[J]. Survey Research Methods, 2013, 7(3):145–156.

[12] Ding C, Li T, Jordan M. Convex and Semi-Nonnegative Matrix Factorizations[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(1):45–55.

[13] Gebru T, Krause J, Wang Y, et al. Using Deep Learning and Google Street View to Estimate the Demographic Makeup of Neighborhoods Across the United States[C]. Proceedings of the National Academy of Sciences, 2017, 114(50):13108–13113.

[14] Gonen M, Margolin A A. Localized Data Fusion for Kernel k-Means Clustering with Application to Cancer Biology[A].//Advances in Neural Information Processing Systems 27[M], Ghahramani Z, Welling M, Cortes C, et al, Curran Associates, Inc., 2014:1305–1313.

[15] Groves R M. Three Eras of Survey Research[J]. Public Opinion Quarterly, 2011, 75(5):861–871.

[16] Guevara Z, Molina-Pérez E, García E, et al. Energy and CO2 Emission Relationships in the NAFTA Trading Bloc:A Multi-Regional Multi-Factor Energy Input-Output Approach[J]. Economic Systems Research, 2019, 31(2):178–205.

[17] Kahou S, Bouthillier X, Lamblin P, et al. Emonets:Multimodal Deep Learning Approaches for Emotion Recognition in Video[J]. Journal on Multimodal User Interfaces, 2015, 10(2):1–13.

[18] Lee D D, Seung H S. Learning the Parts of Objects by Non-negative Matrix Factorization[J]. Nature, 1999, 401(6755):788–791.

[19] Liang N, Yang Z, Li Z, et al. Semi-supervised Multi-view Clustering with Graph-regularized Partially Shared Non-negative Matrix Factorization[J]. Knowledge-Based Systems, 2020, 190(2):105161.

[20] Liu X, Zhu X, Li M, et al. Late Fusion Incomplete Multi-view Clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2019, 41(10):2410–2423.

[21] Ng A Y, Jordan M I, Weiss Y. On Spectral Clustering:Analysis and An Algorithm[A].//Advances in Neural Information Processing Systems14[M], Dietterich T G, Becker S, Ghahramani Z, MIT Press, 2002:849–856.

[22] Pan S J, Yang Q. A Survey on Transfer Learning[J]. IEEE Transactions on Knowledge&Data Engineering, 2010, 22(10):1345–1359.

[23] Sun S. A Survey of Multi-view Machine Learning[J]. Neural Computing&Applications, 2013, 23(7):2031–2038.

[24] Sun X, Huang Z, Peng X, et al. Building a Model-based Personalised Recommendation Approach for Tourist Attractions from Geotagged Social Media Data[J]. International Journal of Digital Earth, 2019, 12(6):661–678.

[25] Tan F, Cheng C, Wei Z. Modeling and Elucidation of Housing Price[J]. Data Mining and Knowledge Discovery, 2019, 33(3):636–662.

[26] Varian H R. Big Data:New Tricks for Econometrics[J]. Journal of Economic Perspectives, 2014, 28(2):3–28.

[27] Vidal R, Ma Y, Sastry S S. Generalized Principal Component Analysis[M]. New York:Springer, 2016:138–153.

[28] Xiao X, Yu Z, Luo Q, et al. Inferring Social Ties between Users with Human Location History[J]. Journal of Ambient Intelligence&Humanized Computing, 2014, 5(1):3–19.

[29] Zheng Y, Capra L, Wolfson O, et al. Urban Computing[J]. ACM Transactions on Intelligent Systems and Technology, 2014, 5(3):1–55.

[30] Zheng Y. Methodologies for Cross-Domain Data Fusion:An Overview[J]. IEEE Transactions on Big Data, 2015, 1(1):16–34.

(1)联合国统计司的《官方统计基本原则》，https://unstats.un.org/unsd/dnss/gp/FP-New-C.pdf。

(2)前者如国家统计局（http://www.stats.gov.cn/tjsj），后者如EPSData(http://www.epsnet.com.cn)。

(3)前者以统计年鉴为代表，对地区、部门本身的数量进行描述；后者以投入产出表为代表，对地区间、部门间相关关系进行数量描述。

(4)仍以统计年鉴和投入产出表代表两类数据。对统计年鉴类与投入产出表的联合开发，要么对投入产出表进行适当的计算加工，使其适用于统计年鉴类数据的情形；要么设计、扩展投入产出表，将统计年鉴类数据融入其中（Guevara等，2019）。当然，无论采用何种融合方式，一类数据形态的变化，必然导致信息损失；从方法构造上看，融入方式也不自然。

(1)研究目标数据与经济状态数据具有相似的形式，但可以不添加非负约束，且仍然可以运用半非负矩阵分解（Semi-NMF）的方式（Ding等，2010）开展特征提取，对矩阵V的处理和本文的NMF是完全一致的。

(1)若对矩阵U不施加非负约束，则构成了半非负矩阵分解。

(2)此处的待经济状态对象可以是矩阵X的第j列xj，也可以是与此相关的数据。

(1)将42部门中的“交通运输及仓储业”和“邮政业”合并为“交通运输、仓储和邮政业”；“住宿和餐饮业”“信息传输、计算机服务和软件业”“金融业”“房地产业”“租赁和商务服务业”“研究与试验发展业”“综合技术服务业”“水利、环境和公共设施管理业”“居民服务和其他服务业”“教育”“卫生、社会保障和社会福利业”“文化、体育和娱乐业”和“公共管理和社会组织”归并为“其他行业”。

(1)准确率=正确分类的数量/样本总量

基本信息:

DOI：10.19343/j.cnki.11-1302/c.2022.05.010

中图分类号:F124

引用信息:

[1]黄恒君,高海燕,韩君.一种基于机器学习的宏观经济数据融合方法[J].统计研究,2022,39(05):134-145.DOI:10.19343/j.cnki.11-1302/c.2022.05.010.

基金信息:

国家社会科学基金项目“面向城市计算的多领域数据融合方法研究”（20XTJ005）;国家社会科学基金项目“因子分析的稀疏处理理论及其拓展研究”（18BTJ038）;国家社会科学基金项目“大规模稀疏函数型数据修复方法与应用研究”（19XTJ002）; 中央引导地方科技发展项目“城市计算方法体系构建及甘肃智慧城市应用”（GSK215115）

发布时间：

2022-05-30

出版时间：

2022-05-30

网络发布时间：

2022-05-30

请选择需要下载的pdf数据

统计研究

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文

请选择需要下载的pdf数据

统计研究

使用微信“扫一扫”功能。将此内容分享给您的微信好友或者朋友圈

引用

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈