nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2022, 06, v.39 148-160
网络舆情赋能金融科技股票收盘价预测研究
基金项目(Foundation):
邮箱(Email): 201801410140@uibe.edu.cn;
DOI: 10.19343/j.cnki.11-1302/c.2022.06.010
摘要:

金融科技发展进程中,网络舆情或许能给该行业指标数据的预测做出贡献,但相关研究尚不充分。本文将万得(wind)数据库中金融科技股票的交易数据作为金融科技行业的缩影,利用情感分类模型对爬取的11万余条微博文本中的投资者情绪进行挖掘。研究发现:负向投资者情绪占比对84只金融科技股票样本的平均收盘价存在负向影响,且具有长期稳定的均衡关系。进而,本文构建了以负向投资者情绪、工作日变量及其他金融科技股票量化指标数据为模型输入、预测金融科技股票平均收盘价指标数据的长短时间记忆神经网络模型(Long Short-Term Memory,LSTM)。结果表明:引入投资者负向情绪占比后,实验组LSTM模型比对照组的预测评价指标结果更加优秀,表明网络舆情对金融科技股票收盘价预测具有重要作用;实验组LSTM模型在不同预测期限上的预测效果评价指标均优于其他对照模型(随机森林、多层神经网络和支持向量回归模型),进一步证实了其良好的预测性能和模型稳健性。本文研究进一步丰富了自然语言处理和深度学习技术在金融科技领域的研究,为金融科技行业相关指标数据的预测提供了新的思路。

Abstract:

In the development of Fintech, the Internet public opinion may contribute to the forecast of the industry's index data, but the relevant research is still insufficient. We use the Fintech stock transaction data in the Wind database as a microcosm of the financial technology industry, and use the sentiment classification model to mine the investor sentiment in the crawled more than 110000 Weibo texts. The study finds that the proportion of negative investor sentiment has a negative effect on the average closing price of the sample 84 Fintech stocks, and has a long-term stable equilibrium relationship. Furthermore, we construct a long short-term memory(LSTM) neural network model, which uses negative investor sentiment, weekday variables, and other quantitative index data of Fintech stocks as model inputs to predict the average closing price index data of Fintech stocks. The results show that after introducing the proportion of negative investor sentiment, the LSTM model of the experimental group has better results than the control group in forecast evaluation index, which illustrates the important role of internet public opinion in predicting the closing price of Fintech stocks. Then, the LSTM model in the experimental group also has better prediction results over different prediction periods than other control models(random forest, MLP neural network and support vector regression model), which further confirms its good prediction performance and model robustness. The article further enriches the research of natural language processing and deep learning technology in the field of Fintech, and provides new ideas for the prediction of relevant index data in the Fintech industry.

参考文献

[1]顾文涛,王儒,郑肃豪,等.金融市场收益率方向预测模型研究:基于文本大数据方法[J].统计研究, 2020, 37(11):68–79.

[2]李苍舒,沈艳.数字经济时代下新金融业态风险的识别、测度及防控[J].管理世界, 2019, 35(12):53–69.

[3]欧阳资生,李虹宣.网络舆情对金融市场的影响研究:一个文献综述[J].统计与信息论坛, 2019, 34(11):122–128.

[4]邵新建,何明燕,江萍,等.媒体公关、投资者情绪与证券发行定价[J].金融研究, 2015(9):190–206.

[5]汪昌云,武佳薇.媒体语气、投资者情绪与IPO定价[J].金融研究, 2015(9):174–189.

[6]王靖一,黄益平.金融科技媒体情绪的刻画与对网贷市场的影响[J].经济学(季刊), 2018, 17(4):1623–1650.

[7]杨青,王晨蔚.基于深度学习LSTM神经网络的全球股票指数预测研究[J].统计研究, 2019, 36(3):65–77.

[8]杨晓兰,沈翰彬,祝宇.本地偏好、投资者情绪与股票收益率:来自网络论坛的经验证据[J].金融研究, 2016(12):143–158.

[9]张谊浩,李元,苏中锋,等.网络搜索能预测股票市场吗?[J].金融研究, 2014(2):193–206.

[10] Bengio Y, Ducharme R, Vincent P, et al. A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003, 3(3):1137–1155.

[11] Devlin J, Chang M W, Lee K, et al. BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J/OL].[1810.04805]BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding(arxiv.org), 2019-5-24.

[12] Fu X L, Zhang S, Chen J, et al. A Sentiment-aware Trading Volume Prediction Model for P2P Market using LSTM[J]. IEEE Access, 2019(7):81934–81944.

[13] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997(8):1735–1780.

[14] Kim Y. Convolutional Neural Networks for Sentence Classification[J/OL]. https://arxiv.org/abs/1408.5882, 2014-9-3.

[15] Lecun Y, Bottou L. Gradient-based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278–2324.

[16] Liu Y, Ott M, Goyal N, et al. RoBERTa:A Robustly Optimized BERT Pretraining Approach[J/OL].[1907.11692v1] RoBERTa:A Robustly Optimized BERT Pretraining Approach(arxiv.org), 2019-7-26.

[17] Lan Z, Chen M, Goodman S, et al. ALBERT:A Lite BERT for Self-supervised Learning of Language Representations[J/OL].[1909.11942v5]ALBERT:A Lite BERT for Self-supervised Learning of Language Representations(arxiv.org), 2020-2-9.

[18] Minaee S, Kalchbrenner N, Cambria E, et al. Deep Learning Based Text Classification:A Comprehensive Review[J/OL].[2004.03705v3] Deep Learning Based Text Classification:A Comprehensive Review(arxiv.org), 2021-1-4.

[19] Mikolov T. Distributed Representations of Words and Phrases and their Compositionality[J]. Advances in Neural Information Processing Systems, 2013(26):3111–3119.

[20] Oliveira N, Cortez P, Areal N. The Impact of Microblogging Data for Stock Market Prediction:Using Twitter to Predict Returns, Volatility,Trading Volume and Survey Sentiment Indices[J]. Expert Systems with Applications, 2017(73):125–144.

[21] Pennington J, Socher R, Manning C. Glove:Global Vectors for Word Representation[A]. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP)[C]. 2014:1532–1543.

[22] Peters M, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[A]. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies[C]. 2018:2227–2237.

[23] Tai K S, Socher R, Manning C D. Improved Semantic Representations From Tree-structured Long Short-term Memory Networks[J]. Computer Science, 2015, 5(1):1–11.

[24] Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need[A]. 31st Conference on Neural Information Processing Systems(NIPS2017)[C]. 2017:1–15.

[25] Xiong R, Nichols E P, Shen Y. Deep Learning Stock Volatility with Google Domestic Trends[J/OL].[1512.04916] Deep Learning Stock Volatility with Google Domestic Trends(arxiv.org), 2015-12-15.

[26] Zhou P, Qi Z, Zheng S, et al. Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling[A]. The 26th International Conference on Computational Linguistics:Technical Papers[C]. 2016:3485–3495.

(1)据中国信息通信研究院《G20国家数字经济发展研究报告(2018年)》,数字产业化即信息通信产业,包括电子信息制造业、电信业、软件和信息技术服务业、互联网行业等;产业数字化即传统产业由于应用数字技术所带来的生产数量和生产效率的提升,其新增产出构成数字经济的重要组成部分。

(2)在2018年全球金融科技发明专利排行榜(TOP20)中,有6家企业属于我国,且这6家企业的专利申请量在上榜的20家企业中占比高达42%。

(1)因篇幅所限,样本84只金融科技股票的说明以附表展示,见《统计研究》网站所列附件。

(2)“模板”是指人工手动分类结果的20000条文本内容及其数字标签。

(3)具体而言,该模型中Transformer编码器的隐藏网络层数L为12层,隐藏层神经元节点数H(即Feed Forward输出向量的维数)为768,多头注意力机制的头数为12,共计有110M参数量。

(1)1表示“完全保留”,0表示“完全舍弃”。

(1)得到一个[–1,1]之间的值。

基本信息:

DOI:10.19343/j.cnki.11-1302/c.2022.06.010

中图分类号:F832

引用信息:

[1]崔炎炎,刘立新.网络舆情赋能金融科技股票收盘价预测研究[J].统计研究,2022,39(06):148-160.DOI:10.19343/j.cnki.11-1302/c.2022.06.010.

发布时间:

2022-06-25

出版时间:

2022-06-25

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文