Journal of Frontiers of Computer Science and Technology ›› 2023, Vol. 17 ›› Issue (12): 2861-2879.DOI: 10.3778/j.issn.1673-9418.2303083
• Frontiers·Surveys • Previous Articles Next Articles
HE Dongbin, TAO Sha, ZHU Yanhong, REN Yanzhao, CHU Yunxia
Online:
2023-12-01
Published:
2023-12-01
何东彬,陶莎,朱艳红,任延昭,褚云霞
HE Dongbin, TAO Sha, ZHU Yanhong, REN Yanzhao, CHU Yunxia. Survey of Automatic Labeling Methods for Topic Models[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(12): 2861-2879.
何东彬, 陶莎, 朱艳红, 任延昭, 褚云霞. 主题模型自动标记方法研究综述[J]. 计算机科学与探索, 2023, 17(12): 2861-2879.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2303083
[1] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet alloca-tion[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. [2] MEI Q, SHEN X, ZHAI C. Automatic labeling of multinomial topic models[C]//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, Aug 12-15, 2007. New York: ACM, 2007: 490-499. [3] KOU W, LI F, BALDWIN T. Automatic labelling of topic models using word vectors and letter trigram vectors[C]//LNCS 9460: Proceedings of the 11th Asia Information Ret-rieval Societies Conference on Information Retrieval Tech-nology, Brisbane, Dec 2-4, 2015. Cham: Springer, 2015: 253-264. [4] WAN X, WANG T. Automatic labeling of topic models using text summaries[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Aug 7-12, 2016. Stroudsburg: ACL, 2017: 2297-2305. [5] MEI Q, ZHAI C. Discovering evolutionary theme patterns from text: an exploration of temporal text mining[C]//Pro-ceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, Aug 21-24, 2005. New York: ACM, 2005: 198-207. [6] MEI Q, LIU C, SU H, et al. A probabilistic approach to spatiotemporal theme pattern mining on weblogs[C]//Pro-ceedings of the 15th International Conference on World Wide Web, Edinburgh, May 23-26, 2006. New York: ACM, 2006: 533-542. [7] LAU J H, GRIESER K, NEWMAN D, et al. Automatic la-belling of topic models[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Jun 19-24, 2011. Stroudsburg: ACL, 2011: 1536-1545. [8] MAGATTI D, CALEGARI S, CIUCCI D, et al. Automatic labeling of topics[C]//Proceedings of the 9th International Conference on Intelligent Systems Design and Applications, Pisa, Nov 30-Dec 2, 2009. Washington: IEEE Computer Society, 2009: 1227-1232. [9] 凌洪飞, 欧石燕. 面向主题模型的主题自动语义标注研究综述[J]. 数据分析与知识发现, 2019, 3(9): 16-26. LIN H F, OU S Y. Review of automatic semantic labeling for topic models[J]. Data Analysis and Knowledge Discovery, 2019, 3(9): 16-26. [10] SALTON G, WONG A, YANG C S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18(11): 613-620. [11] TURNEY P D, PANTEL P. From frequency to meaning: vector space models of semantics[J]. Journal of Artificial Intelligence Research, 2010, 37: 141-188. [12] DEERWESTER S, DUMAIS S T, FURNAS G W, et al. Indexing by latent semantic analysis[J]. Journal of the Ame-rican Society for Information Science, 1990, 41(6): 391-407. [13] ZHAO W Z, MA H F, HE Q. Parallel K-means clustering based on MapReduce[C]//LNCS 5931: Proceedings of the 1st International Conference on Cloud Computing. Berlin,Heidelberg: Springer, 2009: 674-679. [14] 周厚奎. 概率主题模型的研究及其在多媒体主题发现和演化中的应用 [D]. 杭州: 浙江大学, 2017. ZHOU H K. Research on probabilistic topic model and its application in multimedia topic discovery and evolution[D]. Hangzhou: Zhejiang University, 2017. [15] HOFMANN T. Probabilistic latent semantic indexing[C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, Aug 15-19, 1999. New York: ACM, 1999: 50-57. [16] TEH Y W, NEWMAN D, WELLING M. A collapsed varia-tional Bayesian inference algorithm for latent Dirichlet allo-cation[C]//Advances in Neural Information Processing Sys-tems 19, Vancouver, Dec 4-7, 2006. Cambridge: MIT Press, 2007: 1353-1360. [17] PORTEOUS I, NEWMAN D, IHLER A, et al. Fast collapsed Gibbs sampling for latent Dirichlet allocation[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Aug 24-27, 2008. New York: ACM, 2008: 569-577. [18] CHRISTOU D. Feature extraction using latent Dirichlet allo-cation and neural networks: a case study on movie synopses [J]. arXiv:1604.01272, 2016. [19] MEHROTRA R, SANNER S, BUNTINE W, et al. Impro-ving LDA topic models for microblogs via Tweet pooling and automatic labeling[C]//Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Jul 28-Aug 1, 2013. New York: ACM, 2013: 889-892. [20] JEON H B, LEE S Y. Language model adaptation based on topic probability of latent Dirichlet allocation[J]. ETRI Journal, 2016, 38(3): 487-493. [21] SANTANIELLO D, COLACE F, LOMBARDI M, et al. Sentiment analysis in social networks: a methodology based on the latent Dirichlet allocation approach[C]//Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology, Prague, Sep 9-13, 2019. Amsterdam: Atlantis Press, 2019: 1-8. [22] ALETRAS N, MITTAL A. Labeling topics with images using a neural network[C]//LNCS 10193: Proceedings of the 39th European Conference on IR Research, Aberdeen, Apr 8-13, 2017. Cham: Springer, 2017: 500-505. [23] ALETRAS N, STEVENSON M. Labelling topics using unsupervised graph-based methods[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Jun 22-27, 2014. Stroudsburg: ACL, 2014: 631-636. [24] HULPUS I, HAYES C, KARNSTEDT M, et al. Unsupervised graph-based topic labelling using DBpedia[C]//Proceedings of the 6th ACM International Conference on Web Search and Data Mining, Rome, Feb 4-8, 2013. New York: ACM, 2013: 465-474. [25] BHATIA S, LAU J H, BALDWIN T. Automatic labelling of topics with neural embeddings[C]//Proceedings of the 26th International Conference on Computational Linguistics, Osaka, Dec 11-16, 2016. Stroudsburg: ACL, 2016: 953-963. [26] ALOKAILI A, ALETRAS N, STEVENSON M. Re-ranking words to improve interpretability of automatically generated topics[C]//Proceedings of the 13th International Conference on Computational Semantics, Gothenburg, May 23-27, 2019. Stroudsburg: ACL, 2019: 43-54. [27] KIM H H, RHEE H Y. An ontology-based labeling of in-fluential topics using topic network analysis[J]. Journal of Information Processing Systems, 2019, 15(5): 1096-1107. [28] SANJAYA N A, BA M L, ABDESSALEM T, et al. Harnes-sing truth discovery algorithms on the topic labelling pro-blem[C]//Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services, Yogyakarta, Nov 19-21, 2018. New York: ACM, 2018: 8-14. [29] KOZONO R, SAGA R. Automatic labeling for hierarchical topics with NETL[C]//Proceedings of the 2020 IEEE Inter-national Conference on Systems, Man, and Cybernetics, To-ronto, Oct 11-14, 2020. Piscataway: IEEE, 2020: 3740-3745. [30] ZOSA E, PIVOVAROVA L, BOGGIA M, et al. Multilingual topic labelling of news topics using ontological mapping[C]//LNCS 13186: Proceedings of the 44th European Con-ference on IR Research, Stavanger, Apr 10-14, 2022. Cham: Springer, 2022: 248-256. [31] POPA C, REBEDEA T. BART-TL: weakly-supervised topic label generation[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Apr 19-23, 2021. Stroudsburg: ACL, 2021: 1418-1425. [32] KINARIWALA S A, DESHMUKH S. Onto_TML: auto-labeling of topic models[J]. Journal of Integrated Science and Technology, 2021, 9(2): 85-91. [33] ALOKAILI A, ALETRAS N, STEVENSON M. Automatic generation of topic labels[C]//Proceedings of the 43rd Inter-national ACM SIGIR Conference on Research and Deve-lopment in Information Retrieval, Jul 25-30, 2020. New York: ACM, 2020: 1965-1968. [34] TIWARI P, TRIPATHI A, SINGH A, et al. Advanced hierar-chical topic labeling for short text[J]. IEEE Access, 2023,11: 35158-35174. [35] ALLAHYARIA M, POURIYEHA S, KOCHUTA K, et al. OntoLDA: an ontology-based topic model for automatic topic labeling[Z]. Amsterdam: IOS Press, 2009: 1-20. [36] SHAHRIAR K T, MONI M A, HOQUE M M, et al. SATLabel: a framework for sentiment and aspect terms based automatic topic labelling[C]//Proceedings of Machine Intelligence and Data Science Applications 2021, Cumilla, Dec 2021. Berlin, Heidelberg: Springer, 2022: 63-75. [37] HE D, WANG M, KHATTAK A M, et al. Automatic labeling of topic models using graph-based ranking[J]. IEEE Access, 2019, 7: 131593-131608. [38] BASAVE A E C, HE Y, XU R. Automatic labelling of topic models learned from twitter by summarisation[C]//Proceedings of the 52nd Annual Meeting of the Association for Compu-tational Linguistics, Baltimore, Jun 22-27, 2014. Stroudsburg: ACL, 2014: 618-624. [39] BARAWI M H, LIN C, SIDDHARTHAN A. Automatically labelling sentiment-bearing topics with descriptive sentence labels[C]//LNCS 10260: Proceedings of the 22nd International Conference on Applications of Natural Language to Infor-mation Systems, Liège, Jun 21-23, 2017. Cham: Springer, 2017: 299-312. [40] HE D, REN Y, KHATTAK A M, et al. Automatic topic la-beling model with paired-attention based on pre-trained deep neural network[C]//Proceedings of the 2021 International Joint Conference on Neural Networks, Shenzhen, Jul 18-22, 2021. Piscataway: IEEE, 2021: 1-9. [41] KOZBAGAROV O, MUSSABAYEV R, MLADENOVIC N. A new sentence-based interpretative topic modeling and automatic topic labeling[J]. Symmetry, 2021, 13(5): 837. [42] ALETRAS N, STEVENSON M. Representing topics using images[C]//Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Lin-guistics: Human Language Technologies, Atlanta, Jun 9-14, 2013. Stroudsburg: ACL, 2013: 158-167. [43] SORODOC I, LAU J H, ALETRAS N, et al. Multimodal topic labelling[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2017: 701-706. [44] NGUYEN C T, ZHAN D C, ZHOU Z H. Multi-modal image annotation with multi-instance multi-label LDA[C]//Procee-dings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, Aug 3-9, 2013. Menlo Park: AAAI, 2013: 1558-1564. [45] ALETRAS N, BALDWIN T, LAU J H, et al. Evaluating topic representations for exploring document collections[J]. Journal of the Association for Information Science and Technology, 2017, 68(1): 154-167. [46] MAO X L, MING Z Y, ZHA Z J, et al. Automatic labeling hierarchical topics[C]//Proceedings of the 21st ACM Inter-national Conference on Information and Knowledge Man-agement. New York: ACM, 2012: 2383-2386. [47] REIMERS N, GUREVYCH I. Sentence-BERT: sentence em-beddings using siamese BERT-networks[J]. arXiv:1908.10084, 2019. [48] LEWIS M, LIU Y, GOYAL N, et al. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[J]. arXiv:1910.13461, 2019. [49] CHEN J, YAN J, ZHANG B, et al. Diverse topic phrase extraction through latent semantic analysis[C]//Proceedings of the 6th International Conference on Data Mining, Hong Kong, China, Dec 18-22, 2006. Washington: IEEE Computer Society, 2007: 834-838. [50] CHINCHOR N, ROBINSON P. MUC-7 named entity task definition[C]//Proceedings of the 7th Conference on Message Understanding. Stroudsburg: ACL, 1998: 1-21. [51] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their composi-tionality[C]//Proceedings of the 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, Dec 5-8, 2013. Red Hook: Curran Associates, 2013: 3111-3119. [52] LE Q V, MIKOLOV T. Distributed representations of sentences and documents[C]//Proceedings of the 31st International Con-ference on Machine Learning, Beijing, Jun 21-26, 2014: 1188-1196. [53] PENNINGTON J, SOCHER R, MANNING C D. GloVe: global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Lan-guage Processing, Doha, Oct 25-29, 2014. Stroudsburg: ACL, 2014: 1532-1543. [54] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North Ame-rican Chapter of the Association for Computational Linguis-tics: Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 4171-4186. [55] HULPUS I, HAYES C, KARNSTEDT M, et al. An eigen-value-based measure for word-sense disambiguation[C]//Pro-ceedings of the 25th International Florida Artificial Intelligence Research Society Conference, Marco Island, May 23-25, 2012. Menlo Park: AAAI, 2012: 1-6. [56] BOUMA G. Normalized (pointwise) mutual information in collocation extraction[C]//Proceedings of the 2009 International Conference of the German Society for Computational Lin-guistics and Language Technology, Potsdam, 2009: 31-40. [57] PAGE L, BRIN S, MOTWANI R, et al. The pagerank citation ranking: bringing order to the web[R]. Stanford InfoLab, 1999: 1-17. [58] SMITH A, LEE T Y, POURSABZI-SANGDEH F, et al. Evaluating visual representations for topic understanding and their effects on manually generated topic labels[J]. Transac-tions of the Association for Computational Linguistics, 2017, 5: 1-16. [59] CARBONELL J, GOLDSTEIN J. The use of MMR, diversity-based reranking for reordering documents and producing summaries[C]//Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Aug 24-28, 1998. New York: ACM, 1998: 335-336. [60] MIHALCEA R, TARAU P. TextRank: bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, A Meeting of SIGDAT, a Special Interest Group of the ACL, Held in Conjunction with ACL 2004, Barcelona, Jul 25-26, 2004. Stroudsburg: ACL, 2004: 404-411. [61] HE D, REN Y, KHATTAK A M, et al. Automatic topic labeling using graph-based pre-trained neural embedding[J]. Neurocomputing, 2021, 463: 596-608. [62] REN P, CHEN Z, REN Z, et al. Sentence relations for extrac-tive summarization with deep neural networks[J]. ACM Tran-sactions on Information Systems, 2018, 36(4): 1-32. [63] REN P, WEI F, ZHUMIN C, et al. A redundancy-aware sen-tence regression framework for extractive summarization[C]//Proceedings of the 26th International Conference on Com-putational Linguistics, Osaka, Dec 11-16, 2016. Stroudsburg: ACL, 2016: 33-43. [64] FUJISHIGE S. Submodular functions and optimization[M]. New York: Elsevier Science Inc., 2005. [65] LIN H, BILMES J. Multi-document summarization via bud-geted maximization of submodular functions[C]//Proceedings of the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Associa-tion for Computational Linguistics, Los Angeles, Jun 2-4, 2010. Stroudsburg: ACL, 2010: 912-920. [66] LIN H, BILMES J. A class of submodular functions for document summarization[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Jun 19-24, 2011. Stroudsburg: ACL, 2011: 510-520. [67] MALLICK C, DAS A K, DUTTA M, et al. Graph-based text summarization using modified TextRank[M]//Soft Computing in Data Analytics. Cham: Springer, 2019: 137-146. [68] BRIN S, PAGE L. The anatomy of a large-scale hypertextual web search engine[J]. Computer Networks and ISDN Systems, 1998, 30: 107-117. [69] ERKAN G, RADEV D R. LexRank: graph-based lexical cen-trality as salience in text summarization[J]. Journal of Artificial Intelligence Research, 2004, 22: 457-479. [70] LIU Y. Fine-tune BERT for extractive summarization[J]. arXiv:1903.10318, 2019. [71] LOWE D G. Object recognition from local scale-invariant features[C]//Proceedings of the 1999 International Conference on Computer Vision, Kerkyra, Sep 20-25, 1999. Washington: IEEE Computer Society, 1999: 1150-1157. [72] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110. [73] ZHOU Z H, ZHANG M L. Multi-instance multi-label learning with application to scene classification[C]//Proceedings of the 2006 International Conference on Neural Information Processing Systems, Vancouver, Dec 4-7, 2006. Cambridge: MIT Press, 2006: 1609-1616. [74] LEVY O, GOLDBERG Y. Dependency-based word embed-dings[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Jun 22-27, 2014. Stroudsburg: ACL, 2014: 302-308. [75] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recogni-tion, Miami, Jun 20-25, 2009. Washington: IEEE Computer Society, 2009: 248-255. [76] SIMONYAN K, ZISSERMAN A. Very deep convolutional net-works for large-scale image recognition[J]. arXiv:1409.1556, 2014. [77] BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3: 1137-1155. [78] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv:1301.3781, 2013. [79] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 5998-6008. [80] BASTANI K, NAMAVARI H, SHAFFER J. Latent Dirichlet allocation (LDA) for topic modeling of the CFPB consumer complaints[J]. Expert Systems with Applications, 2019, 127: 256-271. [81] SONG S, WANG C, CHEN H, et al. An emotional comfort framework for improving user satisfaction in E-commerce customer service chatbots[C]//Proceedings of the 2021 Con-ference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, Jun 6-11, 2021. Stroudsburg: ACL, 2021: 130-137. |
[1] | WANG Shijie, ZHOU Lihua, KONG Bing, ZHOU Junhua. LDA-DeepHawkes Model for Predicting Information Cascade [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(3): 410-425. |
[2] | LIU Shaoqin, TANG Shuang, ZHAO Junfeng, WANG Yasha, ZHUO Lin. Extended Topic Model Based Abnormal Medical Prescription Detection Method [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(1): 30-39. |
[3] | HUANG Chang, GUO Wenzhong, GUO Kun. Research on Improved BBTM Model for Microblog Hot Topic Discovery [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(7): 1102-1113. |
[4] | TANG Shuang, ZHANG Lingxiao, ZHAO Junfeng, XIE Bing, ZOU Yanzhen. Extensible Topic Modeling and Analysis Framework for Multisource Data [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(5): 742-752. |
[5] | ZHOU Kaiwen, YANG Zhihui, MA Huixin, HE Zhenying, JING Yinan, WANG X. Sean. Design and Development of Partitional Topic Model [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(7): 1036-1046. |
[6] | YAN Rong, GAO Guanglai. Using Topic Content Ranking for Pseudo Relevance Feedback [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(5): 814-821. |
[7] | HAN Junming, WANG Wei, LI Tong, HE Yun. Approach of Open Source Software Oriented Evolving Validation [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(4): 539-555. |
[8] | SHEN Guilan, JIA Caiyan, YU Jian, YANG Xiaoping. Semantic Community Detection Algorithm for Large Scale Information Network [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(4): 565-576. |
[9] | YIN Chunlin, WANG Wei, LI Tong, HE Yun, XIONG Wenjun, ZHOU Xiaoxuan. Using RNNLM to Conduct Topic Oriented Feature Location Method [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(10): 1599-1608. |
[10] | HAN Junming, WANG Wei, LI Tong, HE Yun. Feature Location Method of Evolved Software [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(9): 1201-1210. |
[11] | LI Tianchen, YIN Jianping. Sentiment Polarity Discrimination Method Based on Topic Clustering [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(7): 989-994. |
[12] | LIU Na, LU Ying, TANG Xiaojun, LI Mingxia. Multi-Document Summarization Algorithm Based on Significance Topic of LDA [J]. Journal of Frontiers of Computer Science and Technology, 2015, 9(2): 242-248. |
[13] | XU Bin, YANG Dan, ZHANG Yu, LI Feng, GAO Kening. Learners’ Activities Based Study Buddies Recommendation Towards MOOCs [J]. Journal of Frontiers of Computer Science and Technology, 2015, 9(1): 71-79. |
[14] | WANG Wei, MENG Xiangfu, XIAO Chunjiao. Analysis Approach of Emotional Word Based on Coupling Relationship [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(9): 1146-1152. |
[15] | WU Lei, ZHANG Wensheng, WANG Jue. Fusion Probabilistic Graphical Model on Heterogeneous Information Network Data [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(6): 712-718. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/