Исследование методов векторного представления текстовой информации для решения задачи анализа тональности

Инна Васильевна Бондарева; Дмитрий Григорьевич Лагерев

Информационные технологии интеллектуальной поддержки принятия решений, Информационные технологии интеллектуальной поддержки принятия решений 2018

Инна Васильевна Бондарева, Дмитрий Григорьевич Лагерев

Изменена: 2018-06-20

Аннотация

В данной статье рассматриваются методы векторного представления слов для решения задачи анализа тональности текстов. Описываются общие положения, актуальность, а также особенности данной задачи в контексте гранулярности текстов. Приводится формулировка понятия «word embedding» и его объяснение. Проведен обзор существующих современных методов анализа тональности текстовой информации на английском языке. Обращается внимание на проблемы анализа тональности, работа над которыми представляет собой перспективное направление в обработке текстовой информации. Особое внимание уделяется исследованию моделей векторного представления слов, по итогам которого сделан вывод об эффективности их применения в задаче анализа тональности.

Ключевые слова

анализ тональности; извлечение мнений; векторное представление слов; word2vec; glove; lsa

Литература

1. Averchenkov V., Budylskii D., Podvesovskii A. (et. al.) Hierarchical Deep Learning: A Promising Technique for Opinion Monitoring and Sentiment Analysis in Russian-Language Social Networks // A. Kravets et al. (Eds.): CIT&DS 2015, Communications in Computer and Information Science, 2015. Vol. 535. P. 583-592.

2. García-Moya L., Anaya-Sanchez H., Berlanga-Llavori R. Retrieving product features and opinions from customer reviews // IEEE Intelligent Systems. 2013. Vol. 28. № 3, P. 19–27.

3. Cha M., Haddadi H., Benevenuto F., Gummadi, K.P. Measuring User Influence in Twitter: The Million Follower Fallacy. // Proc. of the 4th International AAAI Conference on Weblogs and Social Media, USA, Washington, 2010.

4. Kolkur S., Dantal G. Mahe R. Study of Different Levels for Sentiment Analysis // International Journal of Current Engineering and Technology. 2015. Vol. 5, №. 2.

5. Liu B. Sentiment Analysis and Opinion Mining // Synthesis Lectures on Human Language Technologies. 2012. Vol. 5. №. 1.

6. Liu B. Sentiment analysis: mining opinions, sentiments, and emotions // The Cambridge University Press, 2015.

7. Mikolov, T., Yih W., Zweig G. Linguistic regularities in continuous space word representations. // Proc of NAACL-HLT 2013. Р. 746–751.

8. McGinnis W. Beyond one-hot: an exploration of categorical variables // Data science, technology, Atlanta. – 2015; URL: http://www.willmcginnis.com/2015/11/29/beyond-one-hot-an-exploration-of-categorical-variables/ (дата обращения: 20.03.2018).

9. Pennington, J., Socher R., Manning C.D. Global Vectors for Word Representation. // Proc. of the 2014 Conference on Empirical Methods in Natural Language Processing, Р. 1532–1543.

10. Takala, P. Word Embeddings for Morphologically Rich Languages // Computational Intelligence and Machine Learning. Belgium. Bruges. 2016. Р. 27–29.

11. Mass A. L., Daly R. E., Pham P. T., Huang D., Ng A. Y., Potts C. Learning word vectors for sentiment analysis // Proc. of the Annual Meeting of the Association for Computational Linguistics. 2011.

12. Bespalov D., Bai B., Qi Y., Shokoufandeh A. Sentiment classification based on supervised latent n-gram analysis. // Proc. of the International Conference on Information and Knowledge Management. 2011.

13. Le Q., Mikolov T. Distributed representations of sentences and documents. // Proc. of the International Conference on Machine Learning. 2014.

14. Tang D., Wei F., Yang N., Zhou M., Liu T., Qin B. Learning sentiment-specific word embedding for twitter sentiment classification. // Proc. of the Annual Meeting of the Association for Computational Linguistics. 2014.

15. Tang D., Wei F., Qin B., Yang N., Liu T., Zhoug M. Sentiment embeddings with applications to sentiment analysis // IEEE Transactions on Knowledge and Data Engineering. 2016. Vol. 28. № 2.

16. Yu L.C., Wang J., Lai K.R., Zhang X. Refining word embeddings for sentiment analysis // Proc. of the Conference on Empirical Methods on Natural Language Processing. 2017.

17. Vo D-T., Zhang Y. Target-dependent twitter sentiment classification with rich automatic features. // Proc. of the Internal Joint Conference on Artificial Intelligence. 2015.

18. Li J., Jurafsky D. Do multi-sense embeddings improve natural language understanding? // Proc. of the Conference on Empirical Methods in Natural Language Processing. USA. Colorado. Denver. 2015. P. 1287–1292

19. Zhou H., Chen L., Shi F., Huang D. Learning bilingual sentiment word embeddings for cross-language sentiment classification // Proc. of the Annual Meeting of the Association for Computational Linguistics. 2015.

20. Ren Y, Zhang Y, Zhang, M and Ji D. Improving Twitter sentiment classification using topic-enriched multiprototype word embeddings. // Proc. of AAAI Conference on Artificial Intelligence. 2016.

21. Barnes J., Lambert P., Badia T. Exploring distributional representations and machine translation for aspectbased cross-lingual sentiment classification. // Proc. of the 27th International Conference on Computational Linguistics. 2016.

22. Zhang W, Yuan Q, Han J, and Wang J. Collaborative multi-Level embedding learning from reviews for rating prediction. // Proc. of the International Joint Conference on Artificial Intelligence. 2016.

23. Deerwester S., Dumais S.T., Furnas G.W., Landauer T.K., Harshman R. Indexing by Latent Semantic Analysis // The American Society for Information Science. 1990. Vol. 41. P. 391-407

24. Векторная модель // MachineLearning.ru. 2016. url: www.machinelearning.ru/wiki/index.php?title=Векторная_модель

25. Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space // Proc. of Workshop at ICLR. 2013. Р. 1301-3781.

26. Levy, O., & Goldberg, Y. Neural Word Embedding as Implicit Matrix Factorization. // Advances in Neural Information Processing Systems. 2014 P. 2177–2185.

27. Arora, S., Li, Y., Liang, Y., Ma, T., & Risteski, A. A Latent Variable Model Approach to PMI-based Word Embeddings // Transactions of the Association for Computational Linguistics. 2016 Vol. 4, P. 385–399.

28. Gittens, A., Achlioptas, D., & Mahoney, M. W. Skip-Gram – Zipf + Uniform = Vector Additivity. // Proc. of the 55th Annual Meeting of the Association for Computational Linguistics. 2017 P. 69–76.

29. Mimno, D., & Thompson, L. The strange geometry of skip-gram with negative sampling. // Proc. of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017. P. 2863–2868.

Полный текст: PDF