Most Read articles

    Published in last 1 year |  In last 2 years |  In last 3 years |  All

    Published in last 1 year
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Survey of Multimodal Data Fusion Research
    ZHANG Hucheng, LI Leixiao, LIU Dongjiang
    Journal of Frontiers of Computer Science and Technology    2024, 18 (10): 2501-2520.   DOI: 10.3778/j.issn.1673-9418.2403083
    Although the powerful learning ability of deep learning has achieved excellent results in the field of single-modal applications, it has been found that the feature representation of a single modality is difficult to fully contain the complete information of a phenomenon. In order to break through the obstacles of feature representation on a single modality and make greater use of the value contained in multiple modalities, scholars have begun to propose the use of multimodal fusion to improve model learning performance. Multimodal fusion technology is to make the machine use the correlation and complementarity between modalities to fuse into a better feature representation in text, speech, image and video, which provides a basis for model training. At present, the research of multimodal fusion is still in the early stage of development. This paper starts from the hot research field of multimodal fusion in recent years, and expounds the multimodal fusion method and the multimodal alignment technology in the fusion process. Firstly, the application, advantages and disadvantages of joint fusion method, cooperative fusion method, encoder fusion method and split fusion method in multimodal fusion are analyzed. The problem of multimodal alignment in the fusion process is expounded, including explicit alignment and implicit alignment, as well as the application, advantages and disadvantages. Secondly, it expounds the application of popular datasets in multimodal fusion in different fields in recent years. Finally, the challenges and research prospects of multimodal fusion are expounded to further promote the development and application of multimodal fusion.
    Reference | Related Articles | Metrics
    Abstract4121
    PDF2638
    Survey of Development of YOLO Object Detection Algorithms
    XU Yanwei, LI Jun, DONG Yuanfang, ZHANG Xiaoli
    Journal of Frontiers of Computer Science and Technology    2024, 18 (9): 2221-2238.   DOI: 10.3778/j.issn.1673-9418.2402044
    In recent years, deep learning-based object detection algorithms have been a hot topic in computer vision research, with the YOLO (you only look once) algorithm standing out as an excellent object detection algorithm. The evolution of its network architecture has played a crucial role in improving detection speed and accuracy. This paper conducts a comprehensive horizontal analysis of the overall frameworks of YOLOv1 to YOLOv9, comparing  the network architecture (backbone network, neck layers and head layers) and loss functions. The strengths and limitations of different improvement methods are thoroughly discussed, with a specific evaluation of the impact of these improvements on model accuracy. This paper also delves into discussions on dataset selection and construction methods, the rationale behind choosing different evaluation metrics, and their applicability and limitations in various application scenarios. It further explores specific improvement methods for YOLO algorithm in five application domains (industrial, transportation, remote sensing, agriculture, biology), and discusses the balance among detection speed, accuracy, and complexity in these application domains. Finally, this paper analyzes the current development status of YOLO in various fields, summarizes existing issues in YOLO algorithm research through specific examples, and in conjunction with the trends in application domains, provides an outlook on the future of the YOLO algorithm. It also offers detailed explanations for four future research directions of YOLO (multi-task learning, edge computing, multimodal integration, virtual and augmented reality technology).
    Reference | Related Articles | Metrics
    Abstract963
    PDF694
    Critical Review of Multi-focus Image Fusion Based on Deep Learning Method
    LI Ziqi, SU Yuxuan, SUN Jun, ZHANG Yonghong, XIA Qingfeng, YIN Hefeng
    Journal of Frontiers of Computer Science and Technology    2024, 18 (9): 2276-2292.   DOI: 10.3778/j.issn.1673-9418.2306058
    Multi-focus image fusion is an effective image fusion technology, which aims to combine source images from different focal planes of the same scene to obtain a good fusion result. This means that the fused image will focus on all focal planes, that is, it contains more abundant scene information. The development of deep learning promotes the great progress of image fusion, and the powerful feature extraction and reconstruction ability of neural network makes the fusion result promising. In recent years, more and more multi-focus image fusion methods based on deep learning have been proposed, such as convolutional neural network (CNN), generative adversarial network (GAN) and automatic encoder, etc. In order to provide effective reference for relevant researchers and technicians, firstly, this paper introduces the concept of multi-focus image fusion and some evaluation indicators. Then, it analyzes more than ten advanced methods of multi-focus image fusion based on deep learning in recent years, discusses the characteristics and innovation of various methods, and summarizes their advantages and disadvantages. In addition, it reviews the application of multi-focus image fusion technology in various scenes, including photographic visualization, medical diagnosis, remote sensing detection and other fields. Finally, it proposes some challenges faced by current multi-focus image fusion related fields and looks forward to future possible research trends.
    Reference | Related Articles | Metrics
    Abstract628
    PDF536
    Review of Research on Multi-agent Reinforcement Learning Algorithms
    LI Mingyang, XU Ke’er, SONG Zhiqiang, XIA Qingfeng, ZHOU Peng
    Journal of Frontiers of Computer Science and Technology    2024, 18 (8): 1979-1997.   DOI: 10.3778/j.issn.1673-9418.2401020
    In recent years, the technique of multi-agent reinforcement learning algorithm has been widely used in the field of artificial intelligence. This paper systematically analyses the multi-agent reinforcement learning algorithm, examines its application and progress in multi-agent systems, and explores the relevant research results in depth. Firstly, it introduces the research background and development history of multi-agent reinforcement learning and summarizes the existing relevant research results. Secondly, it briefly reviews the application of traditional reinforcement learning algorithms under different tasks. Then, it highlights the classification of multi-agent reinforcement learning algorithms and their application in multi-agent systems according to the three main types of tasks (path planning, pursuit and escape game, task allocation), challenges, and solutions. Finally, it explores the existing algorithm training environments in the field of multi-agents, summarizes the improvement of deep learning on multi-agent reinforcement learning algorithms, proposes challenges and looks forward to future research directions in this field.
    Reference | Related Articles | Metrics
    Abstract589
    PDF484
    Advances of Adversarial Attacks and Robustness Evaluation for Graph Neural Networks
    WU Tao, CAO Xinwen, XIAN Xingping, YUAN Lin, ZHANG Shu, CUI Canyixing, TIAN Kan
    Journal of Frontiers of Computer Science and Technology    2024, 18 (8): 1935-1959.   DOI: 10.3778/j.issn.1673-9418.2311117
    In recent years, graph neural networks (GNNs) have gradually become an important research direction in artificial intelligence. However, the adversarial vulnerability of GNNs poses severe challenges to their practical applications. To gain a comprehensive understanding of adversarial attacks and robustness evaluation on GNNs, related state-of-the-art advancements are reviewed and discussed. Firstly, this paper introduces the research background of adversarial attacks on GNNs, provides a formal definition of these attacks, and elucidates the basic concepts and framework for research on adversarial attacks and robustness evaluation in GNNs. Following this,  this paper gives an overview of the specific methods proposed in the field of adversarial attacks on GNNs, and details the foremost methods while categorizing them based on the type of adversarial attack and range of attack targets. Their operating mechanisms, principles, and pros and cons are also analyzed. Additionally, considering the model robustness evaluation's dependency on adversarial attack methods and adversarial perturbation degree,  this paper focuses on direct evaluation indicators. To aid in designing and evaluating adversarial attack methods and GNNs' robust models,  this paper compares representative methods considering implementation ease, accuracy, and execution time. This paper foresees ongoing challenges and future research areas. Current research on GNNs?? adversarial robustness is experiment-oriented, lacking a guiding theoretical framework, necessitating further systematic theoretical research to ensure GNN-based systems' trustworthiness.
    Reference | Related Articles | Metrics
    Abstract581
    PDF407
    Survey of AIGC Large Model Evaluation: Enabling Technologies, Vulnerabilities and Mitigation
    XU Zhiwei, LI Hailong, LI Bo, LI Tao, WANG Jiatai, XIE Xueshuo, DONG Zehui
    Journal of Frontiers of Computer Science and Technology    2024, 18 (9): 2293-2325.   DOI: 10.3778/j.issn.1673-9418.2402023
    Artificial intelligence generated content (AIGC) models have attracted widespread attention and application worldwide due to their excellent content generation capabilities. However, the rapid development of AIGC large models also brings a series of hidden dangers, such as concerns about interpretability, fairness, security, and privacy preservation of model-generated content. In order to reduce the unknowable risks and their harms, it becomes more and more important to carry out a comprehensive measurement and evaluation of AIGC large models. Academics have initiated AIGC large model evaluation studies aiming to effectively address the related challenges and avoid potential risks. This paper summarizes and analyzes the AIGC large model evaluation studies. Firstly, an overview of the model evaluation process is provided, covering model evaluation pre-preparation and corresponding measurement indicators, and existing measurement benchmarks are systematically organized. Secondly, the representative applications of the AIGC large model in finance, politics and healthcare and their problems are discussed. Then, the measurement methods are studied in depth through different perspectives, such as interpretability, fairness, robustness, security and privacy, and the new issues that need to be paid attention to AIGC large model evaluation are deconstructed, and the ways to cope with the new challenges of large model evaluation are proposed. Finally, the future challenges of AIGC large model evaluation are discussed, and its future development direction is envisioned.
    Reference | Related Articles | Metrics
    Abstract481
    PDF426
    Review of Neural Network Lightweight
    DUAN Yuchen, FANG Zhenyu, ZHENG Jiangbin
    Journal of Frontiers of Computer Science and Technology    2025, 19 (4): 835-853.   DOI: 10.3778/j.issn.1673-9418.2403071
    With the continuous progress of deep learning technology, artificial neural network models have shown unprecedented performance in many fields such as image recognition, natural language processing, and autonomous driving. These models often have millions or even billions of parameters and learn complex feature representations through large amounts of training data. However, in resource-constrained environments, such as mobile devices, embedded systems and other edge computing scenarios, the power consumption, memory usage and computing efficiency of the model limit the application of large-scale neural network models. To solve this problem, the researchers have proposed a variety of model compression techniques, such as pruning, distillation, neural network search (NAS), quantization, and low-rank decomposition, which aim to reduce the number of parameters, computational complexity, and storage requirements of the model, while maintaining the accuracy of the model as much as possible. The following is a systematic introduction to the development process of these model compression methods, focusing on the main principles and key technologies of each method.    It mainly includes different strategies of pruning techniques, such as structured pruning and unstructured pruning; how to define knowledge in knowledge distillation; search space, search algorithm and network performance evaluation in NAS; post-training quantization and in-training quantization in quantization; and the singular value decomposition and tensor  decomposition in low rank decomposition. Finally, the future development direction of model compression technology is discussed.
    Reference | Related Articles | Metrics
    Abstract428
    PDF293
    Comprehensive Review of Physics-Guided Deep Learning: Advancements, Challenges, and Perspectives
    CHEN Chong, ZHU Xiaoyu, WANG Fang, XU Yaqian, ZHANG Wei
    Journal of Frontiers of Computer Science and Technology    2025, 19 (2): 277-294.   DOI: 10.3778/j.issn.1673-9418.2407056
    Although deep learning has significant achievements in addressing nonlinear and high-dimensional problems, it faces challenges in complex scientific and engineering domains (such as high computational costs and data requirements, the difficulties in interpreting its black-box nature, and the lack of capabilities for following the physical laws). Therefore, a novel framework called physics-guided deep learning has emerged which enhances the performance, explainability, and physical consistency of deep learning by integrating domain-specific physical knowledge into the construction and training process of deep learning models. This paper reviews and analyzes the researches (e.g., methodologies, applications, etc.) on physics-guided deep learning thoroughly. Firstly, the main motivations and theoretical foundations of the physics-guided deep learning are introduced. Secondly, a detailed discussion is conducted on the two modes: the combination of physical information with deep learning and the fusion of physical information with deep learning. The characteristics, limitations and application scenarios of the two modes are summarized and discussed. Finally, the performance of physics-guided deep learning on various applications is analyzed. Furthermore, the challenges of the physics-guided deep learning are discussed from four perspectives: computational complexity and convergence, biases while involving control equations, dependence on observational data, and difficulties in knowledge fusion, based on which, an outlook for the future direction of this domain is provided. This paper strives for providing research reference and multidimensional perspectives of physics-guided deep learning for the researchers.
    Reference | Related Articles | Metrics
    Abstract423
    PDF312
    Review of Research on Deep Learning in Retinal Blood Vessel Segmentation
    WANG Yousong, PEI Junpeng, LI Zenghui, WANG Wei
    Journal of Frontiers of Computer Science and Technology    2024, 18 (8): 1960-1978.   DOI: 10.3778/j.issn.1673-9418.2310083
    The segmentation results of retinal fundus images can provide auxiliary diagnosis for ophthalmic diseases such as diabetic retinopathy, glaucoma, and age-related macular degeneration. Accurate segmentation of retinal blood vessels provides strong support for diagnosis, treatment, and evaluation, helping doctors better understand the patient’s eye condition. This paper reviews recent papers on fundus vessel segmentation based on deep learning, introducing the most commonly used datasets for fundus vessel segmentation and preprocessing methods. It also classifies recent model algorithms into several categories: single-network models, multi-network models, and Transformer models. This paper introduces various modules within each category of networks, discussing their advantages and limitations in handling fundus vessel segmentation tasks. These analyses help us understand the characteristics and applicable scenarios of different modules. Furthermore, this paper summarizes the retrieved model data, comparing the performance of different algorithms on the same dataset and evaluating their strengths and weaknesses based on scores obtained from the same evaluation metrics. It analyzes the reasons for the advantages of better-scoring algorithms and points out the defects of current algorithms. Finally, it summarizes numerous challenges faced by deep learning methods in retinal vessel segmentation and identifies potential directions for future development of deep learning in fundus vessel segmentation.
    Reference | Related Articles | Metrics
    Abstract423
    PDF313
    Research on Construction and Application of Knowledge Graph Based on Large Language Model
    ZHANG Caike, LI Xiaolong, ZHENG Sheng, CAI Jiajun, YE Xiaozhou, LUO Jing
    Journal of Frontiers of Computer Science and Technology    2024, 18 (10): 2656-2667.   DOI: 10.3778/j.issn.1673-9418.2406013
    Massive amounts of operational and maintenance (O&M) data from nuclear power distributed control system (DCS) contain rich operational experience and expert knowledge. Effectively extracting DCS alarm response information and forming knowledge service is a current hotspot and frontier research area in rapid DCS response. Due to the lack of clear structure and standards in multi-source heterogeneous data of nuclear power DCS, previous knowledge extraction primarily relied on manual annotation and deep learning methods, which require extensive domain knowledge and information processing capabilities and are constrained by the heavy workload of data annotation. Therefore, this study proposes a knowledge extraction method using large language model (LLM) with a step-by-step prompting strategy, constructing a DCS O&M knowledge graph (KG). Based on large language model technology and secondary intent recognition methods, intelligent question and answer (Q&A) and other knowledge services are developed utilizing the knowledge graph. Using O&M data from a nuclear power plant’s DCS as a case study, the research focuses on knowledge extraction, knowledge graph construction, and intelligent Q&A. The results show that the model achieves an overall precision (P) of 91.24%, recall (R) of 85.85%, and F1-score of 88.43%. The proposed method can comprehensively capture key entities and attribute information from multi-source heterogeneous DCS O&M data, guiding domain knowledge Q&A, assisting O&M personnel in timely responding to DCS alarm anomalies, analyzing fault causes and response strategies, and providing guidance for DCS O&M training and maintenance in power plants.
    Reference | Related Articles | Metrics
    Abstract396
    PDF386
    Research Progress of Named Entity Recognition Based on Large Language Model
    LIANG Jia, ZHANG Liping, YAN Sheng, ZHAO Yubo, ZHANG Yawen
    Journal of Frontiers of Computer Science and Technology    2024, 18 (10): 2594-2615.   DOI: 10.3778/j.issn.1673-9418.2407038
    Named entity recognition aims to identify named entities and their types from unstructured text, which is an important basic task in natural language processing technologies such as question answering system, machine translation and knowledge graph. With the development of artificial intelligence, named entity recognition based on large language model has become a hot research topic. This paper reviews the latest research progress of named entity recognition based on large language model. Firstly, the development process of large language model and named entity recognition is summarized, and the commonly used datasets and evaluation methods for named entity recognition tasks are briefly introduced. This paper sorts out the traditional research work on named entity recognition from three aspects: rule-based and dictionary-based, statistical machine learning-based and deep learning-based. Secondly, how to apply different big language models to different fields of named entity recognition tasks is described in detail according to the model architecture, and the existing problems and improvement directions are analyzed. Finally, the challenges faced by named entity recognition tasks based on big language models are summarized, and future research directions are prospected.
    Reference | Related Articles | Metrics
    Abstract380
    PDF335
    Review of One-Stage Universal Object Detection Algorithms in Deep Learning
    WANG Ning, ZHI Min
    Journal of Frontiers of Computer Science and Technology    2025, 19 (5): 1115-1140.   DOI: 10.3778/j.issn.1673-9418.2411032
    In recent years, object detection algorithms have gradually become a hot research direction as a core task in the field of computer vision. They enable computers to recognize and locate target objects in images or video frames, and are widely used in fields such as autonomous driving, biological individual detection, agricultural detection, medical image analysis, etc. With the development of deep learning, general object detection algorithms have shifted from traditional object detection methods to object detection methods based on deep learning. The general object detection algorithms under deep learning are mainly divided into one-stage object detection and two-stage object detection. This paper takes one-stage object detection as the starting point and analyzes and summarizes the mainstream one-stage detection algorithms of the first one-stage object detection algorithm YOLO series (YOLOv1 to YOLOv11, YOLO main improved version), SSD, and DETR series based on Transformer architecture, based on the use of two different architectures: classical convolution and Transformer. This paper introduces the network structure and research progress of various algorithms, summarizes their characteristics, advantages, and limitations based on their structures, summarizes the main common datasets and evaluation indicators in the field of object detection, analyzes the performance of various algorithms and their improvement methods, discusses the application status of various algorithms in different fields, and finally looks forward to the future research directions of one-stage object detection algorithms.
    Reference | Related Articles | Metrics
    Abstract377
    PDF329
    Survey of Transformer-Based Model for Time Series Forecasting
    MENG Xiangfu, SHI Haoyuan
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 45-64.   DOI: 10.3778/j.issn.1673-9418.2403070
    Time series forecasting (TSF) refers to predicting future values and trends at specific time points or over time periods by analyzing potential information such as trends and seasonality in historical data. Time series data, generated by sensors, play a significant role in numerous fields, including finance, healthcare, energy, transportation, and meteorology. With the development of IoT sensors, the massive amounts of time series data are difficult to handle using traditional machine learning techniques. However, the Transformer model, which has shown excellent performance in various tasks within natural language processing and computer vision, has been effectively utilized by researchers to capture long-term dependencies, leading to rapid advancements in time series forecasting tasks. Therefore, this paper reviews time series forecasting methods based on the Transformer model. It chronologically outlines the development process of time series forecasting, systematically introduces the preprocessing procedures and methods for time series data, and presents commonly used evaluation metrics and datasets for time series forecasting. By focusing on algorithm frameworks, this paper systematically explains the application methods and working principles of various models based on the Transformer in TSF tasks. Through experiments, it compares the performance, advantages, and limitations of different models, and analyzes the experimental results. Finally, considering the challenges present in current work on Transformer models for time series forecasting, this paper proposes future development trends in this direction.
    Reference | Related Articles | Metrics
    Abstract367
    PDF289
    Review of Research on CNN and Visual Transformer Hybrid Models in Image Processing
    GUO Jialin, ZHI Min, YIN Yanjun, GE Xiangwei
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 30-44.   DOI: 10.3778/j.issn.1673-9418.2403009
    Convolutional neural network (CNN) and vision Transformer are two important deep learning models in the field of image processing, and they have made remarkable achievements in this field after years of continuous research and progress. In recent years, the hybrid model of CNN and vision Transformer is gradually emerging. Extensive research has constantly overcome the weaknesses of the two models, and effectively plays their respective highlights, showing excellent results in image processing tasks. This paper is based on the hybrid model of CNN and vision Transformer. First of all, the architecture, advantages and disadvantages of CNN and vision Transformer model are summarized, and the concept and advantages of hybrid model are summarized. Secondly, this paper comprehensively reviews the research status and actual progress of hybrid models from four aspects: serial structure fusion mode, parallel structure fusion mode, hierarchical cross structure fusion mode and other fusion modes, summarizes the main representative models of various fusion modes, and compares typical hybrid models from various aspects. Then, the application research of the hybrid model in the specific fields of actual image processing such as image recognition, image classification, object detection and image segmentation is described from multiple perspectives, showing the applicability and high efficiency of the hybrid model in practice. Finally, the future research direction of hybrid model is deeply analyzed, and future research and application of this model in image processing are prospected.
    Reference | Related Articles | Metrics
    Abstract342
    PDF232
    Overview of Knowledge Graph Question Answering Enhanced by Large Language Models
    FENG Tuoyu, LI Weiping, GUO Qinglang, WANG Gangliang, ZHANG Yusong, QIAO Zijian
    Journal of Frontiers of Computer Science and Technology    2024, 18 (11): 2887-2900.   DOI: 10.3778/j.issn.1673-9418.2407069
    Knowledge graph question answering (KGQA) is a technology that retrieves relevant answers from a knowledge graph by processing natural language questions posed by users. Early KGQA technologies were limited by the size of knowledge graphs, computational power, and natural language processing capabilities, resulting in lower accuracy. In recent years, with advancements in artificial intelligence, particularly the development of large language models (LLMs), KGQA technology has achieved significant improvements. LLMs such as GPT-3 have been widely applied to enhancing the performance of KGQA. To better study and learn the enhanced KGQA technologies, this paper summarizes various methods using LLMs for KGQA. Firstly, the relevant knowledge of LLMs and KGQA is summarized, including the technical principles and training methods of LLMs, as well as the basic concepts of knowledge graphs, question answering, and KGQA. Secondly, existing methods of enhancing KGQA with LLMs are reviewed from two dimensions: semantic parsing and information retrieval. The problems that these methods address and their limitations are analyzed. Additionally, related resources and evaluation methods for KGQA enhanced by LLMs are collected and organized, and the performance of existing methods is summarized. Finally, the limitations of current methods are analyzed, and future research directions are proposed.
    Reference | Related Articles | Metrics
    Abstract331
    PDF343
    Multimodal Sentiment Analysis Based on Cross-Modal Semantic Information Enhancement
    LI Mengyun, ZHANG Jing, ZHANG Huanxiang, ZHANG Xiaolin, LIU Luyao
    Journal of Frontiers of Computer Science and Technology    2024, 18 (9): 2476-2486.   DOI: 10.3778/j.issn.1673-9418.2307045
    With the development of social networks, humans express their emotions in different ways, including text, vision and speech, i.e., multimodal. In response to the failure of previous multimodal sentiment analysis methods to effectively obtain multimodal sentiment feature representations and the failure to fully consider the impact of redundant information on experiments during multimodal feature fusion, a multimodal sentiment analysis model based on cross-modal semantic information enhancement is proposed. Firstly, the model adopts BiLSTM network to mine the contextual information within each unimodal mode. Secondly, the information interaction between multiple modalities is modeled through the cross-modal information interaction mechanism to obtain six kinds of information interaction features, namely, text-to-speech and vision, speech-to-text and vision, and vision-to-text and speech, and then the same information interaction features of the target modalities are spliced together to obtain the information-enhanced unimodal feature vectors, which can efficiently obtain the shared and complementary in-depth semantic features between modalities. In addition, the semantic correlations between the original unimodal feature vectors and the information-enhanced unimodal feature vectors are computed separately using the multi-head self-attention mechanism, which improves the ability of identifying the key sentiment features and reduces the negative interference of the redundant information on the sentiment analysis. Experimental results on the public datasets CMU-MOSI (CMU multimodal opinion level sentiment intensity) and CMU-MOSEI (CMU multimodal opinion sentiment and emotion intensity) show that the proposed model can both enhance sentiment feature representation and effectively reduce the interference of redundant information, and it outperforms related works in terms of multimodal sentiment classification accuracy and generalization ability.
    Reference | Related Articles | Metrics
    Abstract324
    PDF292
    Survey on Construction Method of Temporal Knowledge Graph
    LU Jiamin, ZHANG Jing, FENG Jun, AN Qi
    Journal of Frontiers of Computer Science and Technology    2025, 19 (2): 295-315.   DOI: 10.3778/j.issn.1673-9418.2406089
    As a bridge connecting data, knowledge, and intelligence, knowledge graph has been widely applied in fields such as search assistance, intelligent recommendation, question-answering systems, and natural language processing. However, with the expansion of application scenarios, static knowledge graph has shown limitations in handling dynamic knowledge. The emergence of temporal knowledge graph addresses this shortcoming by integrating temporal information into the graph structure, enabling a more accurate representation of dynamic changes in knowledge. This paper provides a comprehensive study on the construction of temporal knowledge graph. It begins by introducing the concept of temporal knowledge graph and clarifying its value in handling dynamic knowledge. Then, it delves into the construction process of temporal knowledge graph, dividing the core process into three key stages: knowledge extraction, knowledge fusion, and knowledge computing. Subsequently, it thoroughly organizes each stage, and each stage is detailed with task definitions, research summaries, and the application of large language models. In the knowledge extraction stage, it focuses on named entity recognition, relation extraction, and time information extraction; in the fusion stage, it discusses entity alignment and entity linking; and in the computation stage, it focuses on knowledge reasoning. Finally, it explores the challenges faced at each stage and looks forward to future research directions.
    Reference | Related Articles | Metrics
    Abstract298
    PDF233
    Application of Generative Large Language Models in Chinese Radiology Domain
    CHEN Longfei, GAO Xin, HOU Haotian, YE Chuyang, LIU Ya'ou, ZHANG Meihui
    Journal of Frontiers of Computer Science and Technology    2024, 18 (9): 2337-2348.   DOI: 10.3778/j.issn.1673-9418.2406041
    In the Chinese radiology domain, radiology reports serve as a crucial basis for clinical decision-making. Therefore, utilizing natural language processing (NLP) technology to understand and learn from the textual content of radiology reports, thereby aiding radiological clinical work, has become an important research direction in this domain. However, when dealing with the natural language classification and generation tasks based on Chinese radiology reports using traditional methods, there are still challenges such as a lack of training corpora, privacy concerns, and poor model generalization capabilities, leading to insufficient overall performance. To address these issues, a solution for natural language tasks in the Chinese radiology domain based on locally efficient fine-tuning large language models is proposed. By collecting and constructing a large-scale, high-quality dataset for natural language tasks in the Chinese radiology reports, and employing the LoRA efficient fine-tuning method for supervised fine-tuning training of the open-source large language model Baichuan2, the “RadGPT” capable of solving four types of clinical tasks in the Chinese radiology domain simultaneously is proposed. A set of evaluation systems for natural language classification and generation tasks in the Chinese radiology domain is introduced. Multiple sets of experiments are conducted on three types of radiology report datasets from two centers, and comparisons are made with several typical existing methods. The results demonstrate that the proposed method performs better in terms of classification performance, text summarization and expansion capabilities, and model generalization.
    Reference | Related Articles | Metrics
    Abstract285
    PDF288
    Knowledge Augmentation on Traditional Chinese Medicine Language Model
    JI Xiangyu, WANG Xin, ZHANG Heyi, MENG Zhaopeng, ZHANG Junhua, ZHUANG Pengwei, JIA Yongzhe, XU Dawei
    Journal of Frontiers of Computer Science and Technology    2024, 18 (10): 2616-2629.   DOI: 10.3778/j.issn.1673-9418.2407082
    Recently, large language models (LLM) have made significant achievements in various fields. However, due to lack of specialized knowledge and the gap between modern medicine and traditional Chinese medicine (TCM), it is still a challenge to deploy LLM in TCM. Existing methods fail to maintain the structure of TCM prescription. To address the problems, a pattern of knowledge augmentation is proposed. The method includes model training, knowledge graph construction and knowledge augmentation. In the training phase, TCM language model is trained on TCM corpus, by a two-stage method combining pre-training and fine-tuning. In the knowledge graph construction phase, prescription knowledge graph is constructed from nearly 100000 preprocessed classical TCM prescriptions and those from ancient books. In the knowledge augmentation phase, enhanced by the above pattern, outputs are generated from computation of knowledge graph, according to the schema of knowledge graph from searching result, which preserves the structure of prescriptions. A set of evaluations specific to prescription optimizations is proposed, including objective and subjective indicators, to evaluate the performance of the model for the task. Experiment shows that the model improves greatly on both subjective and objective evaluations compared with baselines. BLEU-1 is increased by up to 0.09, while ROUGE-1 is increased by up to 0.21. Ablation study shows that, it is of vital importance for the model performance to be knowledge-augmented. BLEU-1 of augmentation-free model is decreased by about 37% compared with that of the augmented model.
    Reference | Related Articles | Metrics
    Abstract285
    PDF223
    Knowledge-aware Recommendation Algorithm Combining Hypergraph Contrast Learning and Relational Clustering
    WANG Yonggui, CHEN Shuming, LIU Yihai, LAI Zhenxiang
    Journal of Frontiers of Computer Science and Technology    2024, 18 (8): 2140-2155.   DOI: 10.3778/j.issn.1673-9418.2305058
    The recommendation algorithm combined with knowledge graph obtains the auxiliary information of items by introducing knowledge graph to achieve better recommendation effect. However, there are problems in the process of recommendation: long-tail distribution of relations in the knowledge graph, sparse user-item interaction data and unbalanced utilization of heterogeneous information. In response to these problems, a knowledge-aware recommendation algorithm combining hypergraph contrast learning and relational clustering (HC-CRKG) is proposed. Firstly, the knowledge graph is reconstructed by the way of relationship clustering, which alleviates the problem of long-tail distribution of relationships in the knowledge graph. Secondly, a user-item-entity heterogeneous graph is constructed, and a graph convolutional network combining attention mechanism is used to learn the heterogeneous graph embeddings of users and items. Meanwhile, a parametric hypergraph convolutional network is used to learn the hypergraph embeddings of users and items. Subsequently, contrast learning is performed between the heterogeneous graph embedding and the hypergraph embedding to introduce a self-supervised signal for the model to alleviate the data sparsity problem. Finally, the heterogeneous graph embedding and hypergraph embedding are combined for subsequent recommendation prediction, which further alleviates the heterogeneous information utilization imbalance problem. The model is tested against baseline models such as CKAN (collaborative knowledge-aware attentive network), KGIC (improving knowledge-aware recommendation with multi-level interactive contrastive learning), and VRKG4Rec (virtual relational knowledge graphs for recommendation) on three publicly available datasets MovieLens-1M, Book-Crossing and Last.FM. Experimental results show that the model achieves different degrees of improvement in AUC, F1 and Recall@K.
    Reference | Related Articles | Metrics
    Abstract276
    PDF245
    Research on Science and Technology Policy and Regulation Q&A System Driven by Large Models
    XIANG Xiaowei, SHEN Yanguang, HU Minghao, YAN Tianwei, LUO Wei, LUO Zhunchen
    Journal of Frontiers of Computer Science and Technology    2024, 18 (9): 2349-2360.   DOI: 10.3778/j.issn.1673-9418.2406023
    A question-and-answer (Q&A) system for science and technology (S&T) policies and regulations plays a critical role in helping the public understand and apply these regulations. Large language models (LLM) can significantly enhance the accuracy and efficiency of such systems. However, current LLM-based S&T policy and regulation Q&A systems face several challenges: the lack of large-scale, high-quality datasets, insufficient methods for auto-matically constructing datasets with accurate policy and regulation knowledge integration, and issues with the professional accuracy and timeliness of the models’ knowledge updates. To address these challenges, this paper proposes a retrieval-augmented self-prompting method for constructing a high-quality, large-scale S&T policy and regulation Q&A dataset. Additionally, a Q&A system is developed, which combines an LLM optimized by low-rank adaptation (LoRA) techniques with an S&T policy and regulation knowledge base, and employs prompt learning techniques to guide the system in generating accurate answers. Experimental results demonstrate that the constructed Q&A dataset significantly improves the integration of policy and regulation knowledge compared with traditional methods. Furthermore, the proposed Q&A system outperforms general LLM-driven systems across various metrics, highlighting its enhanced performance in the domain of S&T policies and regulations.
    Reference | Related Articles | Metrics
    Abstract274
    PDF266
    Survey on Application of Homomorphic Encryption in Deep Learning
    YANG Hongchao, YI Mengjun, LI Peijia, ZHANG Hanwen, SHEN Furao, ZHAO Jian, WANG Liuwang
    Journal of Frontiers of Computer Science and Technology    2024, 18 (12): 3065-3079.   DOI: 10.3778/j.issn.1673-9418.2406098
    With the widespread application of deep learning in various fields, data privacy and security issues have become increasingly important. Homomorphic encryption, a technique that allows computations to be performed directly on encrypted data, offers a potential solution to these problems. This paper surveys methods that combine deep learning with homomorphic encryption, exploring how to effectively apply deep learning models in encrypted environments. Firstly, the basics of homomorphic encryption are introduced, covering its basic principles, different classifications (including partially homomorphic encryption, somewhat homomorphic encryption and fully homomorphic encryption), and the development history of fully homomorphic encryption. Key models in deep learning, such as convolutional neural network and Transformer, are then detailed. The steps of combining homomorphic encryption with deep learning and how to adapt various layers of deep learning (e.g., convolutional layers, attention layer and activation function layer) to the homomorphic encryption environments are discussed. Subsequently, existing methods that integrate convolutional neural network and Transformer with homomorphic encryption are focused on. Specific implementation schemes for performing deep learning computations on encrypted data and performance optimization strategies employed to enhance efficiency and accuracy are discussed. The advantages and limitations of each method are summarized. Finally, current research progress is summarized, and an outlook on future research directions is provided.
    Reference | Related Articles | Metrics
    Abstract266
    PDF227
    Research on Lightweight Model of Multi-person Pose Estimation Based on Improved YOLOv8s-Pose
    FU Yu, GAO Shuhui
    Journal of Frontiers of Computer Science and Technology    2025, 19 (3): 682-692.   DOI: 10.3778/j.issn.1673-9418.2403059
    To address the issues of high computational load and slow detection speed in existing human pose estimation models, this paper proposes a lightweight improved algorithm based on the YOLOv8s-Pose model. Firstly, a lightweight module C2f-GhostNetBottleNeckV2 is introduced into the backbone to replace the original C2f, reducing the number of parameters. This paper also introduces the Non_Local attention mechanism to integrate the position information of human key points in the image into the channel dimension, thereby enhancing the efficiency of feature extraction and mitigating the accuracy degradation issues that often occur after model lightweighting. Furthermore, the weighted bidirectional feature pyramid network is incorporated into the neck layer to improve the model’s feature fusion capabilities, ensuring a good balance when processing features of different scales. A small object detection head is then added to the network to reduce the missed detection of small objects. Lastly, the CIOU loss function is replaced with Focal-EIOU to enhance the accuracy of human key point regression. Experimental results show that the improved model reduces the number of parameters by 9.3%, and compared with the original model on the COCO2017 human key points dataset, it achieves an improvement of 0.4 percentage points in mAP@0.50 and an improvement of 0.6 percentage points in mAP@0.50:0.95. Therefore, the proposed lightweight improvement algorithm not only reduces the number of model parameters but also enhances the accuracy of human pose estimation algorithms, especially for small target detection, which provides an effective means to achieve real-time and accurate pose estimation.
    Reference | Related Articles | Metrics
    Abstract259
    PDF190
    Research on Public Security Professional Small Sample Knowledge Extraction Method Based on Large Language Model
    PEI Bingsen, LI Xin, JIANG Zhangtao, LIU Mingshuai
    Journal of Frontiers of Computer Science and Technology    2024, 18 (10): 2630-2642.   DOI: 10.3778/j.issn.1673-9418.2403039
    The rapid development of informatization and digitalization in public security business has generated a large amount of law enforcement case data in public security work. However, due to various types of text and large amount of information, front-line police officers often face problems such as low reading efficiency and difficulty in aggregating information in the process of reading case files. In order to further utilize the law enforcement case text, it is necessary to conduct intelligent analysis and knowledge extraction. However, due to the professionalism, data sensitivity, confidentiality of public security professional law enforcement case text, as well as the requirements of public security data going out of the network, only a small number of learning training samples can be obtained, and the traditional deep learning model has unsatisfactory extraction effect. Therefore, this paper proposes to build a large language model in vertical fields with fewer resources and data, and realize the adaptation of the model to the public security profession. The model uses knowledge editing technology MEMIT (mess-editing memory in a transformer), low-resource fine-tuning technology LoRA (low-rank adaptation), and prompt templates to improve the model??s understanding of public security knowledge such as police terminology and common sense. Moreover, in order to further improve the knowledge extraction effect of the model, a small sample law enforcement case text data extraction process is designed to better integrate the professional knowledge related to the case in the model. Experimental results show that the accuracy of the public security professional vertical field large language model integrated with the extraction process in various knowledge extraction tasks is significantly improved compared with the traditional methods, which helps front-line police officers quickly, objectively and accurately analyze law enforcement case text, dig out potential case information, and support the intelligent development of public security work.
    Reference | Related Articles | Metrics
    Abstract258
    PDF237
    Dynamic-YOLOX: Detection Model for Apple Leaf Disease in Complex Background
    SHENG Shuai, DUAN Xianhua, HU Weikang, CAO Weijie
    Journal of Frontiers of Computer Science and Technology    2024, 18 (8): 2118-2129.   DOI: 10.3778/j.issn.1673-9418.2307022
    To address the issues of incomplete disease types and the single background of apple leaf images in the apple leaf disease dataset, this paper constructs a new dataset comprising six common apple leaf diseases with complex backgrounds. Additionally, this paper designs Dynamic-YOLOX based on YOLOX-S (you only look once X-S) for the detection of apple leaf disease, aiming to solve the problems of low accuracy, complex models, and insufficient real-time monitoring. Firstly, the ECA-SPPFCSPC (efficient channel attention cross-stage partial fast spatial pyramid pooling module) is devised and employed to replace the SPP (spatial pyramid pooling) and CSPNet (cross-stage partial network) components in the Dark5 segment of the YOLOX-S model backbone, aiming to reinforce the model ability to focus on deep semantic features, suppress irrelevant information and reduce hardware memory overhead. Secondly, the ODCSP (omni-dimensional dynamic cross-stage partial network) module is designed to replace all the CSPNet of the Dark2, Dark3, Dark4 segments in the YOLOX-S model backbone and neck network. This design enhances the model adaptability to various input features, reducing parameter and computational overhead while improving the average detection accuracy of the model. Finally, the Varifocal Loss is introduced to replace the BCEWithLogits Loss for classification confidence loss in the model to elevate the detection accuracy of dense small target diseases in apple leaves. On the homemade dataset, Dynamic-YOLOX demonstrates a relative mAP improvement of 4.54 percentage points over the original YOLOX-S model, achieving 84.63%. Simultaneously, the Params and FLOPs of the model decreases by 11.97% and 13.45%, respectively, and the detection speed reaches 44.07 FPS. Dynamic-YOLOX also exhibits a certain degree of superiority compared with mainstream apple leaf disease detection models.
    Reference | Related Articles | Metrics
    Abstract258
    PDF205
    Review of PCB Defect Detection Algorithm Based on Machine Vision
    YANG Sinian, CAO Lijia, YANG Yang, GUO Chuandong
    Journal of Frontiers of Computer Science and Technology    2025, 19 (4): 901-915.   DOI: 10.3778/j.issn.1673-9418.2409061
    Printed circuit board (PCB) as a core component of electronic products, its quality directly affects the reliability of the product. As electronic products move toward lighter, thinner, and more sophisticated, machine vision-based PCB defect detection faces challenges such as the difficulty of detecting tiny defects. In order to further study the PCB defect detection technology, the algorithms of each stage are discussed in detail according to the development history. Firstly, the main challenges in the field are pointed out, and traditional PCB defect detection methods and their limitations are introduced. Then, from the perspective of traditional machine learning and deep learning, this paper systematically reviews the PCB defect detection methods and their advantages and disadvantages in recent years. Next, this paper summarizes the commonly used evaluation indicators and mainstream datasets of PCB defect detection algorithms, compares the performance of the latest research methods on PCB-Defect, DeeP-PCB and HRIPCB datasets in the past three years, and analyzes the reasons for the differences. Finally, based on the current situation and the problems to be solved, the future development trend is prospected.
    Reference | Related Articles | Metrics
    Abstract253
    PDF158
    Multi-stage Reasoning Method for Emotional Support Dialogue Generation Based on Large Language Models
    SANG Chenyang, MA Tinghuai, XIE Xintong, SUN Shengjie, HUANG Rui
    Journal of Frontiers of Computer Science and Technology    2024, 18 (11): 2925-2939.   DOI: 10.3778/j.issn.1673-9418.2406036
    The task of emotional support dialogue requires providing supportive responses based on a thorough understanding of the user’s psychological state, with the aim of alleviating their emotional distress. Most existing studies employ end-to-end generation methods, where small pre-trained language models are fine-tuned to adapt to the emotional support task. However, these methods lack a fine-grained understanding of the user’s psychological state, resulting in insufficient empathy, and the model decision process is opaque, resulting in poor interpretability. To address these issues, inspired by the excellent reasoning capabilities of current large language models, this paper proposes an emotional support dialogue reasoning framework based on large language models called CoES (chain-of-emotional-support). This framework transforms the end-to-end generation problem into a step-by-step reasoning problem, breaking down the complex task of emotional support into simpler subtasks to be solved sequentially. The framework comprises three reasoning chains: the emotional reasoning chain, the strategy reasoning chain, and the response generation chain, which are used for the fine-grained exploration of the user’s psychological state, the selection of emotional support strategies, and the generation and optimization of responses, respectively. Additionally, this paper designs various external knowledge augmentation strategies to improve the reasoning effectiveness of the large model in the psychological state exploration and support strategy selection processes. Both manual and automatic evaluation results on the ESConv dataset demonstrate that the proposed reasoning method achieves advanced performance in terms of the interpretability of emotional support and the quality of content generation.
    Reference | Related Articles | Metrics
    Abstract252
    PDF187
    Research on Progress of Quantum Computing Simulation of Physical Systems
    LUAN Tian, KUANG Xueheng, WANG Wei, YUE Huanyu
    Journal of Frontiers of Computer Science and Technology    2024, 18 (11): 2787-2797.   DOI: 10.3778/j.issn.1673-9418.2401060
    Quantum computing, as a forefront field in quantum technology, has made significant progress in simulating physical systems, yet it still faces technical challenges such as hardware noise and quantum errors. This review discusses the latest advancements in quantum computing for simulating physical systems, with a focus on the application of quantum-classical hybrid algorithms and error mitigation techniques, exploring their strengths and limitations across various physical systems. The review covers the simulation of molecular systems using superconducting quantum computers, many-body problems in condensed matter systems, solving equations in complex fluid dynamics, and applications in astrophysics and high-energy physics. For molecular systems, variational quantum algorithms (VQE) are widely used to solve the ground state energy of multi-electron systems, with error mitigation methods improving simulation accuracy. In condensed matter systems, quantum computing has shown high precision and efficiency in simulating strongly correlated spin models, such as the Heisenberg and Ising models, achieving unprecedented accuracy in larger spin chain simulations. In the field of fluid dynamics, research indicates that quantum-classical hybrid algorithms can accelerate the solution of the Navier-Stokes equations to some extent, providing new tools for future fluid dynamics studies. In astrophysical simulations, quantum computing has been used to study the properties of black holes and dark matter, demonstrating potential exponential acceleration, which offers new possibilities for understanding physical phenomena under extreme conditions in the universe. In high-energy physics, quantum computing shows promising applications in solving problems like the Schwinger model and has begun exploring the potential of quantum machine learning in analyzing high-energy experimental data. This review provides a comprehensive perspective on the applications of quantum computing in simulating various physical systems, and outlines future directions and technical challenges.
    Reference | Related Articles | Metrics
    Abstract252
    PDF165
    Comprehensive Review of Research on Dynamic Binary Translation Techniques
    ZHANG Jin, SHAN Zehu, LIU Xiaodong, WANG Wenzhu, YU Jie, PENG Long, XIE Qiyou
    Journal of Frontiers of Computer Science and Technology    2024, 18 (10): 2521-2550.   DOI: 10.3778/j.issn.1673-9418.2312021
    Solving compatibility issues in programs is crucial for building a domestic software ecosystem. With the diversification of computer architectures, ensuring software runs smoothly across different platforms and hardware environments has become an urgent task in software development. Against this backdrop, dynamic binary translation (DBT) technology emerges as significant. As a core technology enabling interoperability between different instruction set architectures (ISA), DBT allows for cross-platform compatibility and significantly expands the applicability and flexibility of software through runtime instruction conversion. However, the introduction of DBT also places higher demands on system performance in terms of efficiency and resource utilization. This paper reviews DBT technology, including its basic principles, research progress, key technologies, and optimization methods. It starts with an introduction to the basic principles and history of DBT. Then, it elaborates on the research progress, especially significant achievements in improving translation accuracy and execution efficiency. Furthermore, it introduces six categories of DBT optimization techniques: runtime optimization, control flow optimization, instruction-level optimization, security and isolation optimization, resource management optimization, and hardware-software co-optimization. This paper also summarizes these key technologies, their optimization techniques, and the challenges they face. Finally, from multiple perspectives such as technological trends, application area expansion, and performance improvement strategies, the future research direction and development prospects of DBT technology are discussed.
    Reference | Related Articles | Metrics
    Abstract251
    PDF200
    Diagnosis of Power System Defects by Large Language Models and Graph Neural Networks
    LI Li, SHI Rongliang, GUO Xu, JIANG Hongxin
    Journal of Frontiers of Computer Science and Technology    2024, 18 (10): 2643-2655.   DOI: 10.3778/j.issn.1673-9418.2405085
    Defect ratings and analysis and processing of different devices and equipment in the power system are often affected by the subjectivity of operation and maintenance personnel, resulting in different severity ratings for the same defect text description. Differences in expertise also lead to differences in diagnostic analysis and different diagnostic efficiency. In order to improve the accuracy and efficiency of defect diagnosis, a defect text rating classification method based on graph neural network and a large model intelligent diagnosis and analysis assistant are proposed. Firstly, a professional dictionary is constructed to normalize the text description using natural language processing algorithms. Secondly, the semantic representation of defective text is optimized by statistical methods. Then, graph attention neural network and robustly optimized BERT approach (RoBERTa) are integrated to accurately rate and classify defective text. Finally, low-rank adaptation (LoRA) fine-tuning training based on the large language model Qwen1.5-14B-Chat is performed to obtain the large model Qwen-ElecDiag for power equipment diagnosis, which is combined with retrieval enhancement to generate the assistant for defect diagnosis of technology development equipment. In addition, the collation provides the instruction dataset for fine-tuning the power equipment diagnosis macromodel. Comparative experimental results show that the proposed graph neural network-based defect rating classification method improves nearly 8 percentage points in accuracy over the optimal baseline model BERT; the diagnostic assistant??s power knowledge as well as defect diagnostic capability is improved. By improving the accuracy of defect ratings and providing comprehensive specialized diagnostic suggestions, it not only improves the intelligent level of power equipment O&M, but also provides new solutions for intelligent O&M in other vertical fields.
    Reference | Related Articles | Metrics
    Abstract242
    PDF188
    Survey on Applications of AIGC in Multimodal Scenarios
    YUE Qi, ZHANG Chenkang
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 79-96.   DOI: 10.3778/j.issn.1673-9418.2404009
    Although artificial intelligence generated content (AIGC) has been able to achieve excellent results in the field of single-mode applications, using artificial intelligence to generate text, images, videos and other content, it is difficult for a single-mode feature representation to completely contain the complete information of a phenomenon. In order to enable AIGC to show greater generation capability, scholars propose applying multimodal information into AIGC to improve the learning performance and generation capability of models. By processing and integrating multiple modalities, AIGC acquires richer contextual information, which helps models better understand and generate content. The basic architecture, working principle and challenge of AIGC in dealing with multimodal problems are discussed in detail, and the AIGC models combined with multimodal information in recent years are classified and summarized. The application, challenge and development direction of AIGC in multimodal image generation, video generation and 3D shape generation are summarized. In the aspect of image generation, the application and limitation of generative adversarial network (GAN) model and diffusion model are discussed. In the aspect of video generation, the video generation based on diffusion model is analyzed, and the audio and video joint generation method is discussed. In the aspect of 3D shape generation, the 3D shape generation method under the guidance of diffusion model and neural network is discussed. The challenges faced by AIGC in multimodal applications are proposed, and the future research is prospected.
    Reference | Related Articles | Metrics
    Abstract242
    PDF157
    Review of Text-Oriented Entity Relation Extraction Research
    REN Anqi, LIU Lin, WANG Hailong, LIU Jing
    Journal of Frontiers of Computer Science and Technology    2024, 18 (11): 2848-2871.   DOI: 10.3778/j.issn.1673-9418.2401033
    Information extraction is the foundation of knowledge graph construction, and relation extraction, as a key process and core step of information extraction, aims to locate entities from text data and recognize semantic links between entities. Therefore, improving the efficiency of relation extraction can effectively improve the quality of information extraction, which affects the construction of knowledge graph and subsequent downstream tasks. Relation extraction can be categorized into sentence-level relation extraction and document-level relation extraction according to the length of the extracted text. The two levels of extraction methods have their own advantages and disadvantages in different application scenarios: sentence-level relation extraction is suitable for application scenarios with smaller datasets, while document-level relation extraction is suitable for scenarios such as news event analysis, long reports or articles with relational mining. Unlike the existing relation extraction, this paper first introduces the basic concept of relation extraction and the development history of the field in recent years, lists the datasets used in the two levels of relation extraction, and gives an overview of the characteristics of the datasets. Then, this paper elaborates on the sentence-level relation extraction and the document-level relation extraction respectively, summarizes the advantages and disadvantages of different levels of relation extraction, and analyses the performance and limitations of the representative models in each method. Finally, this paper summarizes the problems in the current research field and looks forward to future development of relation extraction.
    Reference | Related Articles | Metrics
    Abstract238
    PDF280
    Review of Application of Surface Electromyography Signals in Muscle Fatigue Research
    FANG Boru, QIU Dawei, BAI Yang, LIU Jing
    Journal of Frontiers of Computer Science and Technology    2024, 18 (9): 2261-2275.   DOI: 10.3778/j.issn.1673-9418.2312042
    Muscle fatigue is a physiological phenomenon that occurs when muscles are overused or continuously loaded during exercise or labor. Currently, analyzing the fatigue mechanism is still a complex and multi-layered research problem. In recent years, research methods focusing on surface electromyographic (sEMG) signals have garnered significant attention. The application of advanced signal processing techniques and machine learning algorithms has enhanced the precision of interpreting surface electromyographic data, deepening our understanding of the mechanisms underlying muscle fatigue. This, in turn, provides crucial scientific support for improving athletic performance, preventing sports injuries, and enhancing rehabilitation treatments.This comprehensive review of muscle fatigue research based on surface electromyographic signals covers various aspects. First, the definition of muscle fatigue and currently commonly used detection methods are explained, and the characteristics and application scope of various methods are pointed out; Secondly, the EMG characteristics that characterize muscle fatigue are introduced in detail from linear characteristics such as time domain, frequency domain, time-frequency domain and the use of nonlinear parameters, and the advantages and limitations of these characteristics are also discussed; Thirdly, combining fatigue characteristics as input data, the classification algorithms commonly used for muscle fatigue are explored, and the applicable conditions, advantages and disadvantages of each algorithm are accurately summarized from the aspects of machine learning and deep learning algorithms; Finally, the challenges faced by muscle fatigue research at this stage are pointed out, and on the basis of proposing feasible solutions, the future research directions are prospected.
    Reference | Related Articles | Metrics
    Abstract235
    PDF183
    Multi-agent Self-organizing Cooperative Hunting in Non-convex Environment with Improved MADDPG Algorithm
    ZHANG Hongqiang, SHI Jiahang, WU Lianghong, WANG Xi, ZUO Cili, CHEN Zuguo, LIU Zhaohua, CHEN Lei
    Journal of Frontiers of Computer Science and Technology    2024, 18 (8): 2080-2090.   DOI: 10.3778/j.issn.1673-9418.2310040
    A multi-agent reinforcement learning algorithm based on improved experience playback is proposed to solve the trapping efficiency problem of multi-agent in non-convex environment. The residual network (ResNet) is used to improve the network degradation problem, and the RW-MADDPG algorithm combined with the multi-agent depth deterministic strategy gradient algorithm (MADDPG) is proposed. In order to solve the problem of low utilization of experience pool data during multi-agent training, two methods to improve the utilization of experience pool data are proposed. In order to solve the problem that multiple agents are trapped inside obstacles such as unreachable target in non-convex obstacle environment, a reasonable trapping reward function is designed to make intelligent agents complete the trapping task in non-convex obstacle environment. Simulation experiments are designed based on this algorithm. Experimental results show that the algorithm increases the reward faster in the training stage and can complete the rounding task faster. Compared with MADDPG algorithm, the training time is shortened by 18.5% under static rounding environment and 49.5% under dynamic environment. Moreover, the global average reward of the rounding agent trained by this algorithm is higher in the non-convex obstacle environment.
    Reference | Related Articles | Metrics
    Abstract234
    PDF193
    Review of Machine Unlearning
    HE Lisong, YANG Yang
    Journal of Frontiers of Computer Science and Technology    2024, 18 (11): 2872-2886.   DOI: 10.3778/j.issn.1673-9418.2405027
    To effectively protect data privacy and implement the “right to be forgotten”, it is necessary to eliminate the influence of specific subsets of training data from machine learning models and ensure that these data cannot be reverse-engineered. To address this issue, the research field of “machine unlearning” has emerged in recent years. This paper reviews the progress in machine unlearning research from three aspects: definitions, metrics, and algorithms. Firstly, it systematically outlines the core concepts, definitions, and evaluation metrics of machine unlearning, emphasizing the critical significance of certifiability metrics. Secondly, it categorizes unlearning algorithms into six major classes based on their design principles: structured initial training, influence functions approximate, gradient updates, noise unlearning, knowledge distillation unlearning, and boundary unlearning. It provides detailed descriptions of nine representative machine unlearning algorithms and their evolution. Based on a comparison of existing algorithms’ strengths and weaknesses, this paper discusses the potential and significance of constructing a unified framework for machine unlearning based on certification, and analyzes the theoretical and practical relationships between machine unlearning research and privacy protection. Finally, this paper outlines future research directions for machine unlearning, including the need to extend unlearning algorithms to subfields such as fair machine learning, transfer learning, and reinforcement learning; the potential for integrating various design approaches into future unlearning algorithms; the need for collaboration between technology and regulation in unlearning practices; and the benefits of integrating machine unlearning with incremental learning to improve the management and operation efficiency of machine learning models.
    Reference | Related Articles | Metrics
    Abstract228
    PDF125
    Construction Method of Textbook Knowledge Graph Based on Multimodal and Knowledge Distillation
    LIU Jun, LENG Fangling, WU Wangwang, BAO Yubin
    Journal of Frontiers of Computer Science and Technology    2024, 18 (11): 2901-2911.   DOI: 10.3778/j.issn.1673-9418.2406054
    In order to efficiently construct a multimodal subject knowledge graph in the field of education, a textbook text entity relationship extraction algorithm based on large model knowledge distillation and multi-model collaborative reasoning is proposed. During the model training phase, this paper uses a closed source model with 100 billion parameters to annotate text data and achieve implicit knowledge distillation. Then, this paper fine-tunes the domain data instructions for the open-source billion scale parameter model to enhance the instruction compliance ability of the entity relationship extraction task of the open-source model. In the model inference stage, the closed source model serves as the guiding model, and the open-source billion scale parameter model serves as the execution model. Experimental results show that knowledge distillation, multi-model collaboration, and domain data instruction fine-tuning are effective, significantly improving the effectiveness of textbook text entity relationship extraction tasks based on instruction prompts. A multimodal named entity recognition algorithm for textbook diagrams with explicit and implicit knowledge enhancement has been proposed. Firstly, this paper uses techniques such as image OCR (optical character recognition) and visual language modeling to extract textual information and global content description information from textbook diagrams. Then, by using explicit knowledge base retrieval and implicit LLM hint enhancement methods, auxiliary knowledge that may be associated with image title pairs is obtained. The knowledge obtained from explicit knowledge base and implicit LLM is further fused to form the final auxiliary knowledge. Finally, the auxiliary knowledge of the schematic diagram is combined with the schematic diagram title to achieve multimodal named entity recognition of the textbook schematic diagram title. Experimental results show that the algorithm is advanced and the interpretability of the algorithm is enhanced.
    Reference | Related Articles | Metrics
    Abstract226
    PDF208
    Large Language Model Augmentation and Feature Alignment Method for Few-Shot Continual Relation Extraction
    LI Yifei, ZHANG Lingling, DONG Yuxuan, WANG Jiaxin, ZHONG Yujie, WEI Bifan
    Journal of Frontiers of Computer Science and Technology    2024, 18 (9): 2326-2336.   DOI: 10.3778/j.issn.1673-9418.2406056
    Relation extraction, as a key task in natural language processing, plays a significant role in deepening language understanding, constructing knowledge graphs, and optimizing information retrieval systems. However, traditional supervised learning methods are not well-suited for real-world scenarios due to the continuous emergence of new relations and the lack of large annotated datasets. Although the advent of large language models has significantly improved the performance of many natural language processing tasks, they still cannot effectively address the challenges of few-shot continual relation extraction. To fully leverage the semantic knowledge of large language models to mitigate catastrophic forgetting and overfitting issues, a novel few-shot continual relation extraction method, LAFA (large language model augmentation and feature alignment), is proposed. This method enhances representation alignment through various strategies such as relation instance rewriting, semantic expansion, and enhanced relation representation. It effectively improves the model adaptability to new relations and the retention of old knowledge while maintaining low data and computational costs. Experimental validation on two relation extraction datasets, FewRel and TACRED, demonstrates that LAFA outperforms existing methods in few-shot continual relation extraction tasks, particularly achieving the best results in incremental stages. Ablation experiments further reveal the significant contributions of each module to overall performance. Moreover, the inference efficiency and cost of LAFA are substantially lower than those of existing large language model-based methods, and it boasts strong scalability, being able to adapt to various language models.
    Reference | Related Articles | Metrics
    Abstract226
    PDF183
    Review of Multivariate Time Series Clustering Algorithms
    ZHENG Desheng, SUN Hanming, WANG Liyuan, DUAN Yaoxin, LI Xiaoyu
    Journal of Frontiers of Computer Science and Technology    2025, 19 (3): 582-601.   DOI: 10.3778/j.issn.1673-9418.2405013
    Multivariate time series (MTS) data, serving as a crucial basis for intelligent technologies across numerous domains, record the state changes of multiple variables in systems over time. Clustering technique, as a core tool in data mining, can partition data into different clusters based on structural similarity, thereby uncovering the structure and internal relationships within data to discover systemic development patterns and variable correlations. Faced with the challenges such as the complexity of multivariate time series data structures, the interconnectivity between variables, and data high-dimensionality, a substantial amount of research has been conducted internationally. This paper provides an overview of clustering analysis algorithms for multivariate time series data scenarios. Initially, based on classification standards such as feature extraction methods, similarity measurement algorithms, and clustering partition frameworks, this paper conducts a comparative analysis of existing multivariate time series clustering algorithms. For each category of detection technology, a detailed summary is provided, covering algorithm principles, representative methods, advantages and disadvantages, and the problems they address. Further discussion includes common evaluation standards and publicly available datasets related to multivariate time series clustering. Lastly, from the perspective of the unique structure of multivariate temporal data, this paper outlines several challenging issues and future research directions.
    Reference | Related Articles | Metrics
    Abstract219
    PDF172
    Fusion of Global Enhancement and Local Attention Features for Expression Recognition Network
    LIU Juan, WANG Ying, HU Min, HUANG Zhong
    Journal of Frontiers of Computer Science and Technology    2024, 18 (9): 2487-2500.   DOI: 10.3778/j.issn.1673-9418.2307013
    To suppress the effects such as occlusions and posture variations on facial expression recognition in natural scenes, expression recognition network fusing global enhancement and local attention features (GE-LA) is proposed. Firstly, to acquire the enhanced global context information, an enhancement structure of channel-spatial global features is constructed, which uses channel flow module (CFM) and spatial flow module (SFM) to obtain symmetric multi-scale channel semantics and pixel-level spatial semantics, respectively, and combines these two types of semantics to generate global enhanced features. Secondly, to extract local detail features, an efficient channel attention (ECA) mechanism is improved to channel-spatial attention (CSA) mechanism, and a local attention module (LAM) is constructed based on this to obtain channel and spatial high-level semantics. Finally, to enhance the anti-interference ability of the proposed network against factors such as occlusions and posture variations, an adaptive strategy is designed to obtain the weighted fusion of global enhancement features and local attention features, and to achieve expression classification based on the adaptive fusion features. Experimental results on facial expression datasets RAF-DB and FERPlus in natural scenes show that the expression recognition rates of the proposed network are 89.82% and 89.93%, respectively, which are 13.39 percentage points and 10.62 percentage points higher than the baseline network ResNet50. Compared with the related methods, the proposed method, which reduces the influence of occlusions and posture variations, has a better expression recognition performance in natural scenes.
    Reference | Related Articles | Metrics
    Abstract214
    PDF157
    Survey of NLP Data Augmentation Methods Based on Large Language Models
    XU Delong, LIN Min, WANG Yurong, ZHANG Shujun
    Journal of Frontiers of Computer Science and Technology    2025, 19 (6): 1395-1413.   DOI: 10.3778/j.issn.1673-9418.2410054
    Currently, large language models show great potential in the field of natural language processing (NLP), but their training process relies on a large number of high-quality samples. In low-resource scenarios, the number of existing data samples can hardly support the convergence of model training as the model size keeps increasing, and this problem has inspired researchers in related fields to investigate data augmentation methods. However, traditional data enhancement methods have limited application scope and data distortion problems in the context of large models in NLP. In contrast, data enhancement methods based on large language models can address this challenge more effectively. This paper offers a comprehensive exploration of data augmentation methods for large language models in the current NLP field and adopts a comprehensive perspective to study data enhancement in the NLP domain. Firstly, the development history of traditional data enhancement methods and big language models in the NLP domain is reviewed. Then, a variety of large language model data enhancement methods in the NLP domain at this stage are summarized, and the scope of application, advantages and limitations of each method are discussed in depth. Subsequently, data enhancement evaluation methods in the field of NLP are introduced. Finally, future research directions of data enhancement methods for large language models in the NLP domain are discussed through comparative experiments and result analyses of current methods, and prospective suggestions are made.
    Reference | Related Articles | Metrics
    Abstract214
    PDF237