Loading...

Table of Content

    2024-11-01, Volume 18 Issue 11
    Frontiers·Surveys
    Research on Progress of Quantum Computing Simulation of Physical Systems
    LUAN Tian, KUANG Xueheng, WANG Wei, YUE Huanyu
    2024, 18(11):  2787-2797.  DOI: 10.3778/j.issn.1673-9418.2401060
    Abstract ( )   PDF (3951KB) ( )  
    References | Related Articles | Metrics
    Quantum computing, as a forefront field in quantum technology, has made significant progress in simulating physical systems, yet it still faces technical challenges such as hardware noise and quantum errors. This review discusses the latest advancements in quantum computing for simulating physical systems, with a focus on the application of quantum-classical hybrid algorithms and error mitigation techniques, exploring their strengths and limitations across various physical systems. The review covers the simulation of molecular systems using superconducting quantum computers, many-body problems in condensed matter systems, solving equations in complex fluid dynamics, and applications in astrophysics and high-energy physics. For molecular systems, variational quantum algorithms (VQE) are widely used to solve the ground state energy of multi-electron systems, with error mitigation methods improving simulation accuracy. In condensed matter systems, quantum computing has shown high precision and efficiency in simulating strongly correlated spin models, such as the Heisenberg and Ising models, achieving unprecedented accuracy in larger spin chain simulations. In the field of fluid dynamics, research indicates that quantum-classical hybrid algorithms can accelerate the solution of the Navier-Stokes equations to some extent, providing new tools for future fluid dynamics studies. In astrophysical simulations, quantum computing has been used to study the properties of black holes and dark matter, demonstrating potential exponential acceleration, which offers new possibilities for understanding physical phenomena under extreme conditions in the universe. In high-energy physics, quantum computing shows promising applications in solving problems like the Schwinger model and has begun exploring the potential of quantum machine learning in analyzing high-energy experimental data. This review provides a comprehensive perspective on the applications of quantum computing in simulating various physical systems, and outlines future directions and technical challenges.
    Overview of Trusted Authentication and Incentive Mechanisms in Blockchain-Based Internet of Vehicles
    GAO Chunqi, LI Leixiao, SHI Jianping
    2024, 18(11):  2798-2822.  DOI: 10.3778/j.issn.1673-9418.2312080
    Abstract ( )   PDF (9632KB) ( )  
    References | Related Articles | Metrics
    With the increasing demand for data sharing in the Internet of vehicles, reliable identity authentication protocols and scientifically rational incentive mechanisms have become the primary conditions for ensuring the stable operation of the Internet of vehicles network. Blockchain, as a decentralized distributed ledger, provides a technologically advanced data sharing platform for the Internet of vehicles, making the combination of blockchain technology and the Internet of vehicles a feasible new approach to data sharing. This paper summarizes the requirements of the Internet of vehicles, analyzes the blockchain-based Internet of vehicles architecture, and divides it into the cloud layer, mechanism layer and edge layer. It categorizes and compares the existing solutions for authentication protocols and incentive mechanisms in blockchain-based Internet of vehicles systems by summarizing relevant literature. It summarizes the workflow and implementation solutions of existing trustworthy authentication mechanisms by analyzing the distributed and centralized authentication architectures. It categorizes existing incentive mechanisms into value-based, trust-based and individual decision-based incentive mechanisms. Finally, this paper summarizes the existing problems and solutions of trustworthy authentication and incentive mechanisms from the perspectives of privacy protection and typical attacks. It provides an outlook on future research directions for blockchain-based Internet of vehicles, focusing on data sharing, multi-vehicle cooperation, and the integration of 6G technology.
    Survey of Deep Learning Based Extractive Summarization
    TIAN Xuan, LI Jialiang, MENG Xiaohuan
    2024, 18(11):  2823-2847.  DOI: 10.3778/j.issn.1673-9418.2308100
    Abstract ( )   PDF (10370KB) ( )  
    References | Related Articles | Metrics
    Automatic text summarization (ATS) is a popular research direction in natural language processing, and its main implementation methods are divided into two categories: extractive and abstractive. Extractive summarization directly uses the text content in the source document, and compared with abstractive summarization, it has higher grammatical and factual correctness, and has broad prospects for extractive summarization in domains such as policy interpretation, official document summarization, legal and medicine industry, etc. In recent years, extractive summarization based on deep learning has received extensive attention. This paper mainly reviews the research progress of extractive summarization technology based on deep learning in recent years, and analyzes the relevant research work for the two key steps of extractive summarization: text unit encoding and summary extraction. Firstly, according to the different model frameworks, text unit encoding methods are divided into four categories: hierarchical sequential encoding, encoding based on graph neural networks, fusion encoding, and pre-training-based encoding. Then, according to the different granularity of summary extraction in the summary extraction stage, summary extraction methods are divided into two categories: text unit-level extraction and summary-level extraction. This paper also introduces commonly used public datasets and performance evaluation indicators for extractive summarization tasks. Finally, the future possible research directions and corresponding development trends in this field are predicted and summarized.
    Review of Text-Oriented Entity Relation Extraction Research
    REN Anqi, LIU Lin, WANG Hailong, LIU Jing
    2024, 18(11):  2848-2871.  DOI: 10.3778/j.issn.1673-9418.2401033
    Abstract ( )   PDF (9902KB) ( )  
    References | Related Articles | Metrics
    Information extraction is the foundation of knowledge graph construction, and relation extraction, as a key process and core step of information extraction, aims to locate entities from text data and recognize semantic links between entities. Therefore, improving the efficiency of relation extraction can effectively improve the quality of information extraction, which affects the construction of knowledge graph and subsequent downstream tasks. Relation extraction can be categorized into sentence-level relation extraction and document-level relation extraction according to the length of the extracted text. The two levels of extraction methods have their own advantages and disadvantages in different application scenarios: sentence-level relation extraction is suitable for application scenarios with smaller datasets, while document-level relation extraction is suitable for scenarios such as news event analysis, long reports or articles with relational mining. Unlike the existing relation extraction, this paper first introduces the basic concept of relation extraction and the development history of the field in recent years, lists the datasets used in the two levels of relation extraction, and gives an overview of the characteristics of the datasets. Then, this paper elaborates on the sentence-level relation extraction and the document-level relation extraction respectively, summarizes the advantages and disadvantages of different levels of relation extraction, and analyses the performance and limitations of the representative models in each method. Finally, this paper summarizes the problems in the current research field and looks forward to future development of relation extraction.
    Review of Machine Unlearning
    HE Lisong, YANG Yang
    2024, 18(11):  2872-2886.  DOI: 10.3778/j.issn.1673-9418.2405027
    Abstract ( )   PDF (5730KB) ( )  
    References | Related Articles | Metrics
    To effectively protect data privacy and implement the “right to be forgotten”, it is necessary to eliminate the influence of specific subsets of training data from machine learning models and ensure that these data cannot be reverse-engineered. To address this issue, the research field of “machine unlearning” has emerged in recent years. This paper reviews the progress in machine unlearning research from three aspects: definitions, metrics, and algorithms. Firstly, it systematically outlines the core concepts, definitions, and evaluation metrics of machine unlearning, emphasizing the critical significance of certifiability metrics. Secondly, it categorizes unlearning algorithms into six major classes based on their design principles: structured initial training, influence functions approximate, gradient updates, noise unlearning, knowledge distillation unlearning, and boundary unlearning. It provides detailed descriptions of nine representative machine unlearning algorithms and their evolution. Based on a comparison of existing algorithms’ strengths and weaknesses, this paper discusses the potential and significance of constructing a unified framework for machine unlearning based on certification, and analyzes the theoretical and practical relationships between machine unlearning research and privacy protection. Finally, this paper outlines future research directions for machine unlearning, including the need to extend unlearning algorithms to subfields such as fair machine learning, transfer learning, and reinforcement learning; the potential for integrating various design approaches into future unlearning algorithms; the need for collaboration between technology and regulation in unlearning practices; and the benefits of integrating machine unlearning with incremental learning to improve the management and operation efficiency of machine learning models.
    Special Issue on Constructions and Applications of Large Language Models in Specific Domains
    Overview of Knowledge Graph Question Answering Enhanced by Large Language Models
    FENG Tuoyu, LI Weiping, GUO Qinglang, WANG Gangliang, ZHANG Yusong, QIAO Zijian
    2024, 18(11):  2887-2900.  DOI: 10.3778/j.issn.1673-9418.2407069
    Abstract ( )   PDF (5121KB) ( )  
    References | Related Articles | Metrics
    Knowledge graph question answering (KGQA) is a technology that retrieves relevant answers from a knowledge graph by processing natural language questions posed by users. Early KGQA technologies were limited by the size of knowledge graphs, computational power, and natural language processing capabilities, resulting in lower accuracy. In recent years, with advancements in artificial intelligence, particularly the development of large language models (LLMs), KGQA technology has achieved significant improvements. LLMs such as GPT-3 have been widely applied to enhancing the performance of KGQA. To better study and learn the enhanced KGQA technologies, this paper summarizes various methods using LLMs for KGQA. Firstly, the relevant knowledge of LLMs and KGQA is summarized, including the technical principles and training methods of LLMs, as well as the basic concepts of knowledge graphs, question answering, and KGQA. Secondly, existing methods of enhancing KGQA with LLMs are reviewed from two dimensions: semantic parsing and information retrieval. The problems that these methods address and their limitations are analyzed. Additionally, related resources and evaluation methods for KGQA enhanced by LLMs are collected and organized, and the performance of existing methods is summarized. Finally, the limitations of current methods are analyzed, and future research directions are proposed.
    Construction Method of Textbook Knowledge Graph Based on Multimodal and Knowledge Distillation
    LIU Jun, LENG Fangling, WU Wangwang, BAO Yubin
    2024, 18(11):  2901-2911.  DOI: 10.3778/j.issn.1673-9418.2406054
    Abstract ( )   PDF (4316KB) ( )  
    References | Related Articles | Metrics
    In order to efficiently construct a multimodal subject knowledge graph in the field of education, a textbook text entity relationship extraction algorithm based on large model knowledge distillation and multi-model collaborative reasoning is proposed. During the model training phase, this paper uses a closed source model with 100 billion parameters to annotate text data and achieve implicit knowledge distillation. Then, this paper fine-tunes the domain data instructions for the open-source billion scale parameter model to enhance the instruction compliance ability of the entity relationship extraction task of the open-source model. In the model inference stage, the closed source model serves as the guiding model, and the open-source billion scale parameter model serves as the execution model. Experimental results show that knowledge distillation, multi-model collaboration, and domain data instruction fine-tuning are effective, significantly improving the effectiveness of textbook text entity relationship extraction tasks based on instruction prompts. A multimodal named entity recognition algorithm for textbook diagrams with explicit and implicit knowledge enhancement has been proposed. Firstly, this paper uses techniques such as image OCR (optical character recognition) and visual language modeling to extract textual information and global content description information from textbook diagrams. Then, by using explicit knowledge base retrieval and implicit LLM hint enhancement methods, auxiliary knowledge that may be associated with image title pairs is obtained. The knowledge obtained from explicit knowledge base and implicit LLM is further fused to form the final auxiliary knowledge. Finally, the auxiliary knowledge of the schematic diagram is combined with the schematic diagram title to achieve multimodal named entity recognition of the textbook schematic diagram title. Experimental results show that the algorithm is advanced and the interpretability of the algorithm is enhanced.
    PTCR: Knowledge-Based Visual Question Answering Framework Based on Large Language Model
    XUE Di, LI Xin, LIU Mingshuai
    2024, 18(11):  2912-2924.  DOI: 10.3778/j.issn.1673-9418.2406028
    Abstract ( )   PDF (8427KB) ( )  
    References | Related Articles | Metrics
    Aiming at the problems of insufficient model input information and poor reasoning performance in knowledge-based visual question answering (VQA), this paper constructs a PTCR knowledge-based VQA framework based on large language model (LLM), which consists of four parts: answer candidate generation, targeted image descriptions, autonomous chain of thought (CoT) construction, and prompted LLM inference. The PTCR framework uses LLM to guide multimodal large language models to generate targeted image descriptions, which solves the problem of incomplete coverage of previous image captions. It improves the model??s reasoning ability by guiding LLM to autonomously generate CoT, which provides the thinking process of similar problems during the reasoning process; and it introduces selection rearrangement technology to eliminate LLM??s selection location discrimination during the reasoning process, and reduces the randomness error of the reasoning by means of majority voting. Experimental results show that the accuracy of the CogVLM model enhanced by the PTCR framework is improved by 16.7 percentage points and 13.3 percentage points on the OK-VQA and A-OKVQA datasets. Meanwhile, compared with Prophet, the accuracy of the PTCR framework is improved by 3.4 percentage points and 5.0 percentage points on the OK-VQA and A-OKVQA datasets. The results of ablation experiments demonstrate that the methods used in this paper, such as targeted image descriptions and autonomous chains of thought, are all effective in improving accuracy. It is evident that the PTCR framework has improved the performance of knowledge-based VQA.
    Multi-stage Reasoning Method for Emotional Support Dialogue Generation Based on Large Language Models
    SANG Chenyang, MA Tinghuai, XIE Xintong, SUN Shengjie, HUANG Rui
    2024, 18(11):  2925-2939.  DOI: 10.3778/j.issn.1673-9418.2406036
    Abstract ( )   PDF (6879KB) ( )  
    References | Related Articles | Metrics
    The task of emotional support dialogue requires providing supportive responses based on a thorough understanding of the user’s psychological state, with the aim of alleviating their emotional distress. Most existing studies employ end-to-end generation methods, where small pre-trained language models are fine-tuned to adapt to the emotional support task. However, these methods lack a fine-grained understanding of the user’s psychological state, resulting in insufficient empathy, and the model decision process is opaque, resulting in poor interpretability. To address these issues, inspired by the excellent reasoning capabilities of current large language models, this paper proposes an emotional support dialogue reasoning framework based on large language models called CoES (chain-of-emotional-support). This framework transforms the end-to-end generation problem into a step-by-step reasoning problem, breaking down the complex task of emotional support into simpler subtasks to be solved sequentially. The framework comprises three reasoning chains: the emotional reasoning chain, the strategy reasoning chain, and the response generation chain, which are used for the fine-grained exploration of the user’s psychological state, the selection of emotional support strategies, and the generation and optimization of responses, respectively. Additionally, this paper designs various external knowledge augmentation strategies to improve the reasoning effectiveness of the large model in the psychological state exploration and support strategy selection processes. Both manual and automatic evaluation results on the ESConv dataset demonstrate that the proposed reasoning method achieves advanced performance in terms of the interpretability of emotional support and the quality of content generation.
    Construction and Application of Large Language Model for Public Complaints with Knowledge Reasoning and Similarity Retrieval
    LIU Xin, GAO Huiquan, SHAO Changheng, CHEN Ziliang, LU Wenjuan, YANG Huiru
    2024, 18(11):  2940-2953.  DOI: 10.3778/j.issn.1673-9418.2406057
    Abstract ( )   PDF (6879KB) ( )  
    References | Related Articles | Metrics
    Efficiently responding to public complaints is a necessary measure to realize intelligent management and enhance public satisfaction, and the use of intelligent question answering for public complaints can save time and human resources. However, rule-based and retrieval-based models in intelligent question answering rely on preset knowledge. Therefore, they cannot provide effective responses when complaints are out of the scope of knowledge, nor can they maintain the coherence of conversations when dealing with multiple rounds of dialogues. Existing large language models can communicate smoothly with users, but general-purpose large language models lack domain knowledge. Due to the fact that the correct answers in the training data will contain information not covered by the questions, the general large language model generates wrong responses or answers that are not the questions asked, resulting in hallucination. To address these issues, a large language model (PC-LLM) for intelligent question-and-answer in the domain of public complaints has been constructed. Firstly, an entity relationship extraction model based on BERT-BiLSTM-CRF is designed to extract entities and relationships in the complaint work order in order to construct the complaint knowledge graph. The BERT model is used to vectorize the complaint work order and construct the vector index library of the complaint work order. In the stage of reply generation, this paper extracts the entities and relationships of users’ complaints, conducts knowledge reasoning through entity links in the knowledge graph of complaints, obtains potential relationship tips, and uses the knowledge graph of complaints to perform knowledge reasoning to obtain potential relationship hints. Meanwhile, this paper performs quick search of complaints within the vector index library of complaint work orders, and obtains similar complaints. Finally, a more accurate response can be generated by integrating potential relationship prompts, similar complaint prompts and complaint into a large language model. Experimental analysis shows that the performance of this large language model on the complaints dataset is significantly better than that of ChatGPT4o, ERNIE Bot, Tongyi Qianwen, and other large language models.
    Theory·Algorithm
    Pelican Optimization Algorithm Combining Unscented sigma Point Mutation and Cross Reversion
    ZUO Fengqin, ZHANG Damin, HE Qing, BAN Yunfei, SHEN Qianwen
    2024, 18(11):  2954-2968.  DOI: 10.3778/j.issn.1673-9418.2308010
    Abstract ( )   PDF (7339KB) ( )  
    References | Related Articles | Metrics
    Aiming at the problems of slow searching speed, low accuracy and easy to fall into local optimization in the optimization process of pelican optimization algorithm (POA), a pelican optimization algorithm combining unscented sigma point mutation and cross learning (MPOA) is proposed. Firstly, the random inverse learning strategy is used to generate a random inverse solution for individuals with poor positions in the population, and the unscented sigma points are introduced to mutate the inverse solution, so as to enhance the fine development of the algorithm in the visible range of the search domain and avoid the algorithm falling into local optimum. Secondly, randomness of Levy’s flight is used to improve the crossover and inversion strategy, the individual optimization process is dynamically explored and enriched, the diversity of the algorithm is maintained, and the global search ability of the algorithm is enhanced. Thirdly, the nonlinear convergence factor is introduced to balance the development and exploration ability of the algorithm, and the SPM-based chaotic sequence is utilized to perturb the nonlinear convergence factor in order to increase the diversity of solutions, avoid the algorithm falling into a local optimum at a later stage, and enhance the stability of the algorithm. Experimental simulation is carried out using 12 benchmark test functions, rank sum test and CEC2021 function, and comparative analysis of the optimization searching effect shows that the improved algorithm has stronger global searching ability and faster optimization searching speed. The MPOA algorithm is used to optimize the parameters of long short-term memory network (LSTM) model, and it is applied to the task of climate change prediction. Compared with other LSTM models optimized by six-population intelligent algorithms, the results show that the MPOA-LSTM model has better prediction accuracy.
    Recommendation Unlearning Algorithm Combining Fuzzy Clustering and Adaptive Denoising
    WANG Jianfang, CHAI Guangwen, CHEN Yiqing, LIANG Menghao, LUO Junwei
    2024, 18(11):  2969-2979.  DOI: 10.3778/j.issn.1673-9418.2312020
    Abstract ( )   PDF (6790KB) ( )  
    References | Related Articles | Metrics
    Privacy protection plays a crucial role in recommender systems as it helps to protect users’ sensitive information from disclosure risks. Recent recommendation unlearning has attracted increasing attention as an effective method of privacy protection. Existing methods often partition data into sub-partitions before training to enhance model training efficiency. However, simply partitioning interactions into sub-partitions can disrupt the integrity of user-item relationships and reduce the availability of data. In addition, the presence of false-positive noise in sub-partitions with implicit feedback can interfere with model training, preventing it from accurately capturing users’ true preferences. To address these challenges, a recommendation unlearning algorithm combining fuzzy clustering and adaptive denoising (FDRU) is proposed. Firstly, fuzzy clustering determines membership by calculating cosine distances between samples and various cluster centers, subsequently dividing the training dataset into several sub-partitions. Then, FDRU designs an adaptive denoising algorithm that dynamically eliminates false positive noise in sub-partitions based on thresholds. Finally, it utilizes dynamic weighted aggregation of sub-models for prediction and top-N recommendations. In order to assess the performance of the proposed algorithm, extensive experiments are carried out on three public datasets. Experimental results indicate that FDRU outperforms other benchmark algorithms on Recall and NDCG.
    Graphics·Image
    Image Data Augmentation Method for Random Channel Perturbation
    JIANG Wentao, LIU Yuwei, ZHANG Shengchong
    2024, 18(11):  2980-2995.  DOI: 10.3778/j.issn.1673-9418.2311022
    Abstract ( )   PDF (8959KB) ( )  
    References | Related Articles | Metrics
    The simulation of object occlusion strategies in data augmentation sets all the pixels in the randomly cropped region of the input image to zero, which erases the effective texture features and leads to poor network generalization. Therefore, this paper proposes a novel data augmentation method known as the “ChannelCut” method. The “ChannelCut” includes two methods: ChannelCut1 and ChannelCut2. Firstly, three square regions are randomly selected on the input image, and the channels of the input image are split to three channel images. Secondly, the ChannelCut1 method selects a square region on the three channel images respectively. The pixels selected by the three channels are different from each other and are set to zero. At the same time, the ChannelCut2 method retains the pixels of the square area selected on each channel in the ChannelCut1 method, and the pixels of the other two square areas corresponding to the channel are set to zero. Finally, the two methods merge the three channel images together to obtain two random channel perturbed images. The proposed method is fused into CNN models such as Resnet18, ShuffleNet V2, MobileNet V3 and experiments are carried out on five datasets such as CIFAR-10 and Image-nette. The results show that the proposed method has a better classification accuracy than the mainstream method on five datasets. Furthermore, the baseline performance has shown a significant improvement. The proposed method has advantages in fine-grained image classification and outperforms the automatic data enhancement type method that uses reinforcement learning in terms of time performance. The ChannelCut method has strong generality and effectiveness, can retain image texture features to different degrees, and enrich image diversity, significantly improving the robustness and generalization of the convolutional neural network model.
    Research on Adaptive Sample Type Discrimination for Remote Sensing Image Retrieval
    SHAO Huihu, GE Yun, XIONG Junjie, YU Jiejie
    2024, 18(11):  2996-3005.  DOI: 10.3778/j.issn.1673-9418.2402031
    Abstract ( )   PDF (5453KB) ( )  
    References | Related Articles | Metrics
    Remote sensing images are complex in content and rich in categories, and there are numerous images difficult in discrimination, resulting in poor performance of remote sensing image retrieval. For this reason, the adaptive sample type discrimination (ASTD) method is proposed, which dynamically categorizes the sample types into simple samples, ordinary samples and difficult samples, and the network performs different degrees of learning based on the types of samples, so as to effectively improve the discriminative ability of features. Firstly, an SHash network is designed, which takes Swin Transformer as the backbone and adds a hash layer at the end of the network. This network captures the semantic information of images globally, improving feature representation and retrieval efficiency. Secondly, in order to make the same category of images more aggregated, and to better distinguish  different categories of images, a hash center is defined for each category. The center corresponding to the input sample’s own category is specified as the positive center of the sample, and the other centers are the negative centers of the sample. Finally, the sample type discriminative loss STDLoss is proposed to adaptively discriminate the type of samples based on distance relationship between samples and positive and negative centers, so as to improve the network’s ability to learn from each type of samples. Comparison with five hashing methods such as DSH, CSQ and SHC on two remote sensing datasets, UC-Merced and AID, experimental results show that the network trained based on the ASTD can better learn the features of the samples and improve the retrieval performance.
    3D Point Cloud Object Tracking Based on Multi-level Fusion of Transformer Features
    LI Zhijie, LIANG Bowen, DING Xinmiao, GUO Wen
    2024, 18(11):  3006-3014.  DOI: 10.3778/j.issn.1673-9418.2401071
    Abstract ( )   PDF (4607KB) ( )  
    References | Related Articles | Metrics
    During the 3D point cloud object tracking, some issues such as occlusion, sparsity, and random noise often arise. To address these challenges, this paper proposes a novel approach to 3D point cloud object tracking based on multi-level fusion of Transformer features. The method mainly consists of the point attention embedding module and the point attention enhancement module, which are used for feature extraction and feature matching processes, respectively. Firstly, by embedding two attention mechanisms into each other to form the point attention embedding module and fusing it with the relationship-aware sampling method proposed by PTTR (point relation transformer for tracking), the purpose of fully extracting features is achieved. Subsequently, the feature information is input into the point attention enhancement module, and through cross-attention, features from different levels are matched sequentially to achieve the goal of deep fusion of global and local features. Moreover, to obtain discriminative feature fusion maps, a residual network is employed to connect the fusion results from different layers. Finally, the feature fusion map is input into the target prediction module to achieve precise prediction of the final 3D target object. Experimental validation on KITTI, nuScenes, and Waymo datasets demonstrates the effectiveness of the proposed method. Excluding few-shot data, the proposed method achieves an average improvement of 1.4 percentage points in success and 1.4 percentage points in precision in terms of object tracking.
    Artificial Intelligence·Pattern Recognition
    Document-Level Event Detection Method Based on Information Aggregation and Data Augmentation
    TAN Lijun, HU Yanli, CAO Jianwei, TAN Zhen
    2024, 18(11):  3015-3026.  DOI: 10.3778/j.issn.1673-9418.2312040
    Abstract ( )   PDF (5616KB) ( )  
    References | Related Articles | Metrics
    Event detection is a key task in the field of natural language processing, aiming to identify event trigger words and correctly classify their event types. Sentence-level event detection methods fail to effectively utilize intra-sentence and inter-sentence event relevance information, facing numerous challenges such as polysemy and event co-occurrence. Additionally, neural network-based event detection models require a large amount of text data for training, but the scarcity of corpus data severely affects the accuracy of results and the stability of the model. To address these issues, this paper proposes a document-level event detection method based on information aggregation and data augmentation, called LGIA (local and global information aggregation). This method adopts an encoder-decoder framework, designing a sentence-level local information extraction module based on dilated convolutional networks and a document-level global information extraction module based on conditional layer normalization, to deeply explore the contextual semantic information and the event correlations of the entire document. Meanwhile, this paper employs a data augmentation strategy of synonym replacement to effectively expand the data samples, thereby alleviating the impact of data scarcity. Experimental results validate that the proposed LGIA method achieves good results on the ACE2005 dataset and significantly improves performance on the augmented TAC-KBP2017 dataset, with F1 scores reaching 77.6% and 65.3%, respectively, demonstrating superior performance compared with existing baseline methods.
    Fast Multi-view Clustering with Sparse Matrix and Improved Normalized Cut
    YANG Mingrui, ZHOU Shibing, WANG Xi, SONG Wei
    2024, 18(11):  3027-3040.  DOI: 10.3778/j.issn.1673-9418.2309037
    Abstract ( )   PDF (5268KB) ( )  
    References | Related Articles | Metrics
    The multi-view clustering algorithm is a novel approach to explore the inherent clustering structure among data. However, most existing methods suffer from noise issues when constructing similarity graphs and may lose important information during the clustering, leading to lower accuracy. Moreover, iterative optimization approaches often used by these algorithms can be memory-overflowing and time-consuming. To address these limitations, a fast multi-view clustering algorithm with sparse matrix and improved normalized cut (SINFMC) is proposed. It first constructs similarity graphs for all views and integrates them to form a consensus graph matrix. Then, the [l1]-norm constraint is applied to the consensus graph matrix to obtain a sparse matrix, which helps to denoise the data and speed up computations. Finally, an improved normalized spectral clustering algorithm is used to cluster the sparse consensus graph and obtain a cluster indicator matrix. This matrix provides clustering results directly and avoids information loss and bias. Unlike other methods, the proposed algorithm does not require iterative optimization and simplifies the computation process through sparse matrix representation, reducing time and space complexity. Experimental results on both artificial and real-world datasets demonstrate that the proposed algorithm outperforms the compared algorithms in terms of quality and efficiency.
    Multi-channel Temporal Convolution Fusion for Multimodal Sentiment Analysis
    SUN Jie, CHE Wengang, GAO Shengxiang
    2024, 18(11):  3041-3050.  DOI: 10.3778/j.issn.1673-9418.2309071
    Abstract ( )   PDF (3736KB) ( )  
    References | Related Articles | Metrics
    Multimodal sentiment analysis has become a hot research direction in affective computing by extending unimodal analysis to multimodal environments with information fusion. Word-level representation fusion is a key technique for modeling cross-modal interactions by capturing interplay between different modal elements. And  word-level representation fusion faces two main challenges: local interactions between modal elements and global interactions along the temporal dimension. Existing methods often adopt attention mechanisms to model correlations between overall features across modalities when modeling local interactions, while ignoring interactions between adjacent elements and local features, and are computationally expensive. To address these issues, a multi-channel temporal convolution fusion (MCTCF) model is proposed, which uses 2D convolutions to obtain local interactions between modal elements. Specifically, local connections can capture associations between neighboring elements, multi-channel convolutions learn to fuse local features across modalities, and weight sharing greatly reduces computations. On the locally fused sequences, temporal LSTM networks further model global correlations along the temporal dimension. Extensive experiments on MOSI and MOSEI datasets demonstrate the efficacy and efficiency of MCTCF. Using just one convolution kernel (three channels, 28 weight parameters), it achieves state-of-the-art or competitive results on many metrics. Ablation studies confirm that both local convolution fusion and global temporal modeling are crucial for the superior performance. In summary, this paper enhances word-level representation fusion through feature interactions, and reduces computational complexity.
    Adaptive Classification Network for Similar Features Between Classes in Automatic Driving Scenarios
    JIANG Yanji, FENG Yuzhou, DONG Hao, TIAN Jialin
    2024, 18(11):  3051-3064.  DOI: 10.3778/j.issn.1673-9418.2403033
    Abstract ( )   PDF (5525KB) ( )  
    References | Related Articles | Metrics
    Addressing the issue of inter-class similarity is a challenging task in the research of autonomous driving scene classification, which primarily focuses on learning the distinctive features of targets in real-world complex traffic scenarios with high similarity, and constructing the overall correlation between features for scene classification. To this end, a multi-scale adaptive feature selection network for autonomous driving scene classification is proposed. Initially, a dual multi-scale feature extraction module is utilized for preliminary processing to extract inter-class similar features at different scales. Subsequently, a feature differentiation screening module is designed to complete the screening of scene-similar features, enabling the network to focus more on the typical and easily distinguishable features of different scene categories. Then, the feature screening results and multi-scale feature maps are transferred to the feature fusion classification module for scene classification, and the correlation between scene features is captured. Finally, an adaptive learning algorithm dynamically adjusts the training parameters through the output results, accelerating the network's convergence speed and improving accuracy. The proposed method is compared with existing network methods on three datasets: BDD100k, BDD100k+ and self-made dataset. Compared with the Top2 networks, it leads in accuracy by 3.29%, 5.59% and 12.65% (relatively), respectively. Experimental results demonstrate the effectiveness of the proposed method and its strong generalization capability. The scene classification method presented in this paper aims to learn the typical and easily distinguishable features and their correlations under different complex scene categories, reducing the impact of inter-class similarity among multiple targets, thereby making the scene classification results in real-world traffic scenario datasets more accurate.