Most Read articles

    Published in last 1 year |  In last 2 years |  In last 3 years |  All

    Published in last 1 year
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Comprehensive Review of Physics-Guided Deep Learning: Advancements, Challenges, and Perspectives
    CHEN Chong, ZHU Xiaoyu, WANG Fang, XU Yaqian, ZHANG Wei
    Journal of Frontiers of Computer Science and Technology    2025, 19 (2): 277-294.   DOI: 10.3778/j.issn.1673-9418.2407056
    Although deep learning has significant achievements in addressing nonlinear and high-dimensional problems, it faces challenges in complex scientific and engineering domains (such as high computational costs and data requirements, the difficulties in interpreting its black-box nature, and the lack of capabilities for following the physical laws). Therefore, a novel framework called physics-guided deep learning has emerged which enhances the performance, explainability, and physical consistency of deep learning by integrating domain-specific physical knowledge into the construction and training process of deep learning models. This paper reviews and analyzes the researches (e.g., methodologies, applications, etc.) on physics-guided deep learning thoroughly. Firstly, the main motivations and theoretical foundations of the physics-guided deep learning are introduced. Secondly, a detailed discussion is conducted on the two modes: the combination of physical information with deep learning and the fusion of physical information with deep learning. The characteristics, limitations and application scenarios of the two modes are summarized and discussed. Finally, the performance of physics-guided deep learning on various applications is analyzed. Furthermore, the challenges of the physics-guided deep learning are discussed from four perspectives: computational complexity and convergence, biases while involving control equations, dependence on observational data, and difficulties in knowledge fusion, based on which, an outlook for the future direction of this domain is provided. This paper strives for providing research reference and multidimensional perspectives of physics-guided deep learning for the researchers.
    Reference | Related Articles | Metrics
    Abstract541
    PDF364
    Review of Neural Network Lightweight
    DUAN Yuchen, FANG Zhenyu, ZHENG Jiangbin
    Journal of Frontiers of Computer Science and Technology    2025, 19 (4): 835-853.   DOI: 10.3778/j.issn.1673-9418.2403071
    With the continuous progress of deep learning technology, artificial neural network models have shown unprecedented performance in many fields such as image recognition, natural language processing, and autonomous driving. These models often have millions or even billions of parameters and learn complex feature representations through large amounts of training data. However, in resource-constrained environments, such as mobile devices, embedded systems and other edge computing scenarios, the power consumption, memory usage and computing efficiency of the model limit the application of large-scale neural network models. To solve this problem, the researchers have proposed a variety of model compression techniques, such as pruning, distillation, neural network search (NAS), quantization, and low-rank decomposition, which aim to reduce the number of parameters, computational complexity, and storage requirements of the model, while maintaining the accuracy of the model as much as possible. The following is a systematic introduction to the development process of these model compression methods, focusing on the main principles and key technologies of each method.    It mainly includes different strategies of pruning techniques, such as structured pruning and unstructured pruning; how to define knowledge in knowledge distillation; search space, search algorithm and network performance evaluation in NAS; post-training quantization and in-training quantization in quantization; and the singular value decomposition and tensor  decomposition in low rank decomposition. Finally, the future development direction of model compression technology is discussed.
    Reference | Related Articles | Metrics
    Abstract515
    PDF330
    Review of One-Stage Universal Object Detection Algorithms in Deep Learning
    WANG Ning, ZHI Min
    Journal of Frontiers of Computer Science and Technology    2025, 19 (5): 1115-1140.   DOI: 10.3778/j.issn.1673-9418.2411032
    In recent years, object detection algorithms have gradually become a hot research direction as a core task in the field of computer vision. They enable computers to recognize and locate target objects in images or video frames, and are widely used in fields such as autonomous driving, biological individual detection, agricultural detection, medical image analysis, etc. With the development of deep learning, general object detection algorithms have shifted from traditional object detection methods to object detection methods based on deep learning. The general object detection algorithms under deep learning are mainly divided into one-stage object detection and two-stage object detection. This paper takes one-stage object detection as the starting point and analyzes and summarizes the mainstream one-stage detection algorithms of the first one-stage object detection algorithm YOLO series (YOLOv1 to YOLOv11, YOLO main improved version), SSD, and DETR series based on Transformer architecture, based on the use of two different architectures: classical convolution and Transformer. This paper introduces the network structure and research progress of various algorithms, summarizes their characteristics, advantages, and limitations based on their structures, summarizes the main common datasets and evaluation indicators in the field of object detection, analyzes the performance of various algorithms and their improvement methods, discusses the application status of various algorithms in different fields, and finally looks forward to the future research directions of one-stage object detection algorithms.
    Reference | Related Articles | Metrics
    Abstract500
    PDF393
    Survey of Transformer-Based Model for Time Series Forecasting
    MENG Xiangfu, SHI Haoyuan
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 45-64.   DOI: 10.3778/j.issn.1673-9418.2403070
    Time series forecasting (TSF) refers to predicting future values and trends at specific time points or over time periods by analyzing potential information such as trends and seasonality in historical data. Time series data, generated by sensors, play a significant role in numerous fields, including finance, healthcare, energy, transportation, and meteorology. With the development of IoT sensors, the massive amounts of time series data are difficult to handle using traditional machine learning techniques. However, the Transformer model, which has shown excellent performance in various tasks within natural language processing and computer vision, has been effectively utilized by researchers to capture long-term dependencies, leading to rapid advancements in time series forecasting tasks. Therefore, this paper reviews time series forecasting methods based on the Transformer model. It chronologically outlines the development process of time series forecasting, systematically introduces the preprocessing procedures and methods for time series data, and presents commonly used evaluation metrics and datasets for time series forecasting. By focusing on algorithm frameworks, this paper systematically explains the application methods and working principles of various models based on the Transformer in TSF tasks. Through experiments, it compares the performance, advantages, and limitations of different models, and analyzes the experimental results. Finally, considering the challenges present in current work on Transformer models for time series forecasting, this paper proposes future development trends in this direction.
    Reference | Related Articles | Metrics
    Abstract461
    PDF354
    Spatial-Frequency Domain Adaptive Graph Neural Network for Heterophilic Social Networks
    ZHANG Lanze, GU Yijun, PENG Jingjie
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 169-186.   DOI: 10.3778/j.issn.1673-9418.2310047
    Traditional GNNs rely on the homophily assumption to implement low-pass filtering of neighboring nodes to aggregate and embed neighborhood similarity information. However, in heterophilic graphs, nodes belonging to different categories have many connections with each other, while nodes of the same category are far apart in the graph topology. This characteristic brings problems of “missing information aggregation of distant nodes” and “failure of homophily assumption” to traditional GNNs that focus on aggregating information in the proximal neighborhood. Therefore, this paper designs a heterophilic graph neural network (DA-HGNN) with a fusion of spatial-domain and frequency-domain adaptive embedding mechanisms to solve the above problems. To address the first problem, this paper designs a “distant spatial-domain embedding module” aimed at supplementing “cross-neighbor adaptive messaging” through high-order random walk migration probability selection and aggregation of distant similar nodes. To address the second problem, this paper develops a “proximal frequency-domain embedding module” to separate high-frequency and low-frequency signals using filters and designs a frequency-domain-guided attention mechanism to adaptively integrate the aforementioned information based on frequency preferences, thereby reducing the noise introduced by the “failure of homophily assumption”. The best experimental results are obtained on 4 publicly available heterophilic graph datasets, with an average increase in accuracy of 6.41 percentage points. Sensitivity analysis and ablation experiments describe the mechanism for selecting hyperparameters and the actual performance of each module, and verify the positive correlation among “node structural similarity” “node attribute vector similarity” and “node homophily” in heterophilic networks. Finally, the effectiveness of fraud detection is validated on a heterophilic real-world dataset, achieving an improvement of 4.4 percentage points in the AUC metric.
    Reference | Related Articles | Metrics
    Abstract459
    PDF68
    Review of Research on CNN and Visual Transformer Hybrid Models in Image Processing
    GUO Jialin, ZHI Min, YIN Yanjun, GE Xiangwei
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 30-44.   DOI: 10.3778/j.issn.1673-9418.2403009
    Convolutional neural network (CNN) and vision Transformer are two important deep learning models in the field of image processing, and they have made remarkable achievements in this field after years of continuous research and progress. In recent years, the hybrid model of CNN and vision Transformer is gradually emerging. Extensive research has constantly overcome the weaknesses of the two models, and effectively plays their respective highlights, showing excellent results in image processing tasks. This paper is based on the hybrid model of CNN and vision Transformer. First of all, the architecture, advantages and disadvantages of CNN and vision Transformer model are summarized, and the concept and advantages of hybrid model are summarized. Secondly, this paper comprehensively reviews the research status and actual progress of hybrid models from four aspects: serial structure fusion mode, parallel structure fusion mode, hierarchical cross structure fusion mode and other fusion modes, summarizes the main representative models of various fusion modes, and compares typical hybrid models from various aspects. Then, the application research of the hybrid model in the specific fields of actual image processing such as image recognition, image classification, object detection and image segmentation is described from multiple perspectives, showing the applicability and high efficiency of the hybrid model in practice. Finally, the future research direction of hybrid model is deeply analyzed, and future research and application of this model in image processing are prospected.
    Reference | Related Articles | Metrics
    Abstract447
    PDF268
    Multimodal Unsupervised Entity Alignment Approach with Progressive Strategies
    MA He, WANG Hairong, WANG Yiyan, SUN Chong, ZHOU Beijing
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 245-252.   DOI: 10.3778/j.issn.1673-9418.2310100
    Although the current entity alignment method utilizes the structural information between entities in the knowledge graph and achieves a good alignment effect, a large amount of side information contained between entities is ignored. This information has unique characteristics and can significantly enhance the alignment effect. The availability of entity profile information in entity alignment is analyzed, and an unsupervised entity alignment method fusing graphic and text information is proposed. This method enhances the feature representation of the entity by fusing the literal information and visual information of the entity; uses the two-way threshold nearest neighbor algorithm to filter out the entity pairs whose distance measurement is too high; uses a progressive strategy to dynamically increase the similarity threshold for controlling the quality of the generation of aligned entity pairs and their generation speed; defines the results obtained by the allocation algorithm to optimize the progressive strategy. To validate the method proposed in this paper, experiments are conducted on three sub-datasets of the DBP15K dataset, i.e. ZH_EN, JA_EN, and FR_EN. The results are compared with 10 methods including PSR, EVA, and DATTI. Experimental results show that the Hits@1 indicators reach 95.7% and 97.4% respectively on the ZH_EN and JA_EN datasets, and the Hits@10 indicator reaches 99.9% on the FR_EN dataset, showing excellent performance of the proposed method.
    Reference | Related Articles | Metrics
    Abstract391
    PDF99
    Survey on Construction Method of Temporal Knowledge Graph
    LU Jiamin, ZHANG Jing, FENG Jun, AN Qi
    Journal of Frontiers of Computer Science and Technology    2025, 19 (2): 295-315.   DOI: 10.3778/j.issn.1673-9418.2406089
    As a bridge connecting data, knowledge, and intelligence, knowledge graph has been widely applied in fields such as search assistance, intelligent recommendation, question-answering systems, and natural language processing. However, with the expansion of application scenarios, static knowledge graph has shown limitations in handling dynamic knowledge. The emergence of temporal knowledge graph addresses this shortcoming by integrating temporal information into the graph structure, enabling a more accurate representation of dynamic changes in knowledge. This paper provides a comprehensive study on the construction of temporal knowledge graph. It begins by introducing the concept of temporal knowledge graph and clarifying its value in handling dynamic knowledge. Then, it delves into the construction process of temporal knowledge graph, dividing the core process into three key stages: knowledge extraction, knowledge fusion, and knowledge computing. Subsequently, it thoroughly organizes each stage, and each stage is detailed with task definitions, research summaries, and the application of large language models. In the knowledge extraction stage, it focuses on named entity recognition, relation extraction, and time information extraction; in the fusion stage, it discusses entity alignment and entity linking; and in the computation stage, it focuses on knowledge reasoning. Finally, it explores the challenges faced at each stage and looks forward to future research directions.
    Reference | Related Articles | Metrics
    Abstract360
    PDF255
    Survey on Intelligent Identification of Constitution in Traditional Chinese Medicine
    LIANG Jiexin, FENG Yue, LI Jianzhong, CHEN Tao, LIN Zhuosheng, HE Ying, WANG Songbai
    Journal of Frontiers of Computer Science and Technology    2025, 19 (6): 1455-1475.   DOI: 10.3778/j.issn.1673-9418.2406102
    Traditional Chinese medicine (TCM) has thousands of years of experience in preventing diseases, while TCM constitution, as an important part of TCM, is closely related to individual health and thus plays an important role in disease prevention and treatment. In recent years, the rapid development of information technology and artificial intelligence has promoted the widespread application of numerous intelligent technologies in the field of TCM constitution recognition. These technologies not only make the traditional TCM constitution identification process more scientific and systematic, but also provide strong technical support for the modernization of TCM and personalized medicine, aiming to further improve the accuracy and efficiency of TCM constitution identification. In order to promote the research work on intelligent identification of TCM constitution, the research progress of its method is sorted out and summarized. Firstly, a systematic overview of the data analysis-based methods of constitution identification is made from the data level. Secondly, the traditional machine learning-based methods of constitution identification are reviewed and discussed, and compared from the perspective of classifiers. Lastly, the deep learning-based methods of constitution identification are elaborated and categorized into early neural networks, convolutional neural networks, hybrid networks, and other methods from the perspective of network architectures. For each of these three methods, they are analyzed according to their research methods and results, their advantages and limitations are compared, and the potential development trends in future research work are discussed.
    Reference | Related Articles | Metrics
    Abstract352
    PDF83
    Review of PCB Defect Detection Algorithm Based on Machine Vision
    YANG Sinian, CAO Lijia, YANG Yang, GUO Chuandong
    Journal of Frontiers of Computer Science and Technology    2025, 19 (4): 901-915.   DOI: 10.3778/j.issn.1673-9418.2409061
    Printed circuit board (PCB) as a core component of electronic products, its quality directly affects the reliability of the product. As electronic products move toward lighter, thinner, and more sophisticated, machine vision-based PCB defect detection faces challenges such as the difficulty of detecting tiny defects. In order to further study the PCB defect detection technology, the algorithms of each stage are discussed in detail according to the development history. Firstly, the main challenges in the field are pointed out, and traditional PCB defect detection methods and their limitations are introduced. Then, from the perspective of traditional machine learning and deep learning, this paper systematically reviews the PCB defect detection methods and their advantages and disadvantages in recent years. Next, this paper summarizes the commonly used evaluation indicators and mainstream datasets of PCB defect detection algorithms, compares the performance of the latest research methods on PCB-Defect, DeeP-PCB and HRIPCB datasets in the past three years, and analyzes the reasons for the differences. Finally, based on the current situation and the problems to be solved, the future development trend is prospected.
    Reference | Related Articles | Metrics
    Abstract340
    PDF189
    Survey on Applications of AIGC in Multimodal Scenarios
    YUE Qi, ZHANG Chenkang
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 79-96.   DOI: 10.3778/j.issn.1673-9418.2404009
    Although artificial intelligence generated content (AIGC) has been able to achieve excellent results in the field of single-mode applications, using artificial intelligence to generate text, images, videos and other content, it is difficult for a single-mode feature representation to completely contain the complete information of a phenomenon. In order to enable AIGC to show greater generation capability, scholars propose applying multimodal information into AIGC to improve the learning performance and generation capability of models. By processing and integrating multiple modalities, AIGC acquires richer contextual information, which helps models better understand and generate content. The basic architecture, working principle and challenge of AIGC in dealing with multimodal problems are discussed in detail, and the AIGC models combined with multimodal information in recent years are classified and summarized. The application, challenge and development direction of AIGC in multimodal image generation, video generation and 3D shape generation are summarized. In the aspect of image generation, the application and limitation of generative adversarial network (GAN) model and diffusion model are discussed. In the aspect of video generation, the video generation based on diffusion model is analyzed, and the audio and video joint generation method is discussed. In the aspect of 3D shape generation, the 3D shape generation method under the guidance of diffusion model and neural network is discussed. The challenges faced by AIGC in multimodal applications are proposed, and the future research is prospected.
    Reference | Related Articles | Metrics
    Abstract332
    PDF200
    Research on Lightweight Model of Multi-person Pose Estimation Based on Improved YOLOv8s-Pose
    FU Yu, GAO Shuhui
    Journal of Frontiers of Computer Science and Technology    2025, 19 (3): 682-692.   DOI: 10.3778/j.issn.1673-9418.2403059
    To address the issues of high computational load and slow detection speed in existing human pose estimation models, this paper proposes a lightweight improved algorithm based on the YOLOv8s-Pose model. Firstly, a lightweight module C2f-GhostNetBottleNeckV2 is introduced into the backbone to replace the original C2f, reducing the number of parameters. This paper also introduces the Non_Local attention mechanism to integrate the position information of human key points in the image into the channel dimension, thereby enhancing the efficiency of feature extraction and mitigating the accuracy degradation issues that often occur after model lightweighting. Furthermore, the weighted bidirectional feature pyramid network is incorporated into the neck layer to improve the model’s feature fusion capabilities, ensuring a good balance when processing features of different scales. A small object detection head is then added to the network to reduce the missed detection of small objects. Lastly, the CIOU loss function is replaced with Focal-EIOU to enhance the accuracy of human key point regression. Experimental results show that the improved model reduces the number of parameters by 9.3%, and compared with the original model on the COCO2017 human key points dataset, it achieves an improvement of 0.4 percentage points in mAP@0.50 and an improvement of 0.6 percentage points in mAP@0.50:0.95. Therefore, the proposed lightweight improvement algorithm not only reduces the number of model parameters but also enhances the accuracy of human pose estimation algorithms, especially for small target detection, which provides an effective means to achieve real-time and accurate pose estimation.
    Reference | Related Articles | Metrics
    Abstract324
    PDF246
    Survey of NLP Data Augmentation Methods Based on Large Language Models
    XU Delong, LIN Min, WANG Yurong, ZHANG Shujun
    Journal of Frontiers of Computer Science and Technology    2025, 19 (6): 1395-1413.   DOI: 10.3778/j.issn.1673-9418.2410054
    Currently, large language models show great potential in the field of natural language processing (NLP), but their training process relies on a large number of high-quality samples. In low-resource scenarios, the number of existing data samples can hardly support the convergence of model training as the model size keeps increasing, and this problem has inspired researchers in related fields to investigate data augmentation methods. However, traditional data enhancement methods have limited application scope and data distortion problems in the context of large models in NLP. In contrast, data enhancement methods based on large language models can address this challenge more effectively. This paper offers a comprehensive exploration of data augmentation methods for large language models in the current NLP field and adopts a comprehensive perspective to study data enhancement in the NLP domain. Firstly, the development history of traditional data enhancement methods and big language models in the NLP domain is reviewed. Then, a variety of large language model data enhancement methods in the NLP domain at this stage are summarized, and the scope of application, advantages and limitations of each method are discussed in depth. Subsequently, data enhancement evaluation methods in the field of NLP are introduced. Finally, future research directions of data enhancement methods for large language models in the NLP domain are discussed through comparative experiments and result analyses of current methods, and prospective suggestions are made.
    Reference | Related Articles | Metrics
    Abstract303
    PDF282
    Survey of Entity Relation Extraction Based on Large Language Models
    XIA Jianglan, LI Yanling, GE Fengpei
    Journal of Frontiers of Computer Science and Technology    2025, 19 (7): 1681-1698.   DOI: 10.3778/j.issn.1673-9418.2409086
    Entity relation extraction aims to identify entity pairs and their relationships from unstructured text, serving as the foundation for many downstream tasks in natural language processing. With the development of big data and deep learning technologies, significant progress has been made in entity relation extraction research. In recent years, applying large language models to this task has become a new research trend. Large language models, with their ability to automatically extract features and strong generalization capabilities, can significantly enhance the performance of the task. This paper provides a comprehensive review of entity relation extraction methods, categorizing them into two main types based on the evolution of techniques and models. Firstly, the definitions of named entity recognition and relation extraction tasks are introduced. Next, a systematic review of the development of entity relation extraction methods is presented, with an in-depth analysis of the advantages and disadvantages of the corresponding models. On this basis, this paper focuses on the unique advantages of large language model-based methods in addressing entity relation extraction tasks. Furthermore, the characteristics of current mainstream datasets are summarized, along with common evaluation metrics for entity relation extraction, such as precision, recall, and F1 score. Finally, the challenges in current research are analyzed, and future research directions are discussed.
    Reference | Related Articles | Metrics
    Abstract294
    PDF281
    Review of Multivariate Time Series Clustering Algorithms
    ZHENG Desheng, SUN Hanming, WANG Liyuan, DUAN Yaoxin, LI Xiaoyu
    Journal of Frontiers of Computer Science and Technology    2025, 19 (3): 582-601.   DOI: 10.3778/j.issn.1673-9418.2405013
    Multivariate time series (MTS) data, serving as a crucial basis for intelligent technologies across numerous domains, record the state changes of multiple variables in systems over time. Clustering technique, as a core tool in data mining, can partition data into different clusters based on structural similarity, thereby uncovering the structure and internal relationships within data to discover systemic development patterns and variable correlations. Faced with the challenges such as the complexity of multivariate time series data structures, the interconnectivity between variables, and data high-dimensionality, a substantial amount of research has been conducted internationally. This paper provides an overview of clustering analysis algorithms for multivariate time series data scenarios. Initially, based on classification standards such as feature extraction methods, similarity measurement algorithms, and clustering partition frameworks, this paper conducts a comparative analysis of existing multivariate time series clustering algorithms. For each category of detection technology, a detailed summary is provided, covering algorithm principles, representative methods, advantages and disadvantages, and the problems they address. Further discussion includes common evaluation standards and publicly available datasets related to multivariate time series clustering. Lastly, from the perspective of the unique structure of multivariate temporal data, this paper outlines several challenging issues and future research directions.
    Reference | Related Articles | Metrics
    Abstract293
    PDF194
    Review of Research on Trajectory Prediction of Road Pedestrian Behavior
    YANG Zhiyong, GUO Jieru, GUO Zihang, ZHANG Ruixiang, ZHOU Yu
    Journal of Frontiers of Computer Science and Technology    2025, 19 (5): 1177-1197.   DOI: 10.3778/j.issn.1673-9418.2407029
    In path planning for shared spaces between autonomous vehicles and pedestrians, accurate and efficient pedestrian trajectory prediction is critical for ensuring road safety. Pedestrian trajectory prediction not only relies on historical behavior data but also requires a comprehensive consideration of the complex dynamic interactions between pedestrians and vehicles, traffic infrastructure, and multi-directional vehicles. Significant advancements have been made in this field in recent years, making it a focal point of research. This paper provides a systematic review of the current research. Firstly, it defines the core concepts of pedestrian trajectory prediction and conducts an in-depth analysis of the main prediction methods. It then comprehensively outlines the primary data sources for pedestrian behavior, including LiDAR, cameras, and other multimodal sensing devices, while exploring key feature extraction methods, such as pedestrian motion features, contextual scene characteristics, the impact of traffic infrastructure, etc. Based on these data, this paper systematically reviews both physics-based and data-driven prediction approaches, with a focus on the development of statistical models, deep learning, and reinforcement learning models. Special emphasis is placed on deep learning methods, categorized by network architecture into sequential models, convolutional neural networks, graph convolutional networks,  generative adversarial networks, etc. This paper also reviews commonly used datasets and evaluation metrics in the field, providing a thorough evaluation of current algorithmic performance. Finally, it addresses the challenges in pedestrian trajectory prediction for autonomous driving, particularly the dynamic coupling between pedestrians with multi-directional traffic and infrastructure, offering potential solutions and discussing future research directions.
    Reference | Related Articles | Metrics
    Abstract287
    PDF133
    Review on Key Techniques of Video Multimodal Sentiment Analysis
    DUAN Zongtao, HUANG Junchen, ZHU Xiaole
    Journal of Frontiers of Computer Science and Technology    2025, 19 (3): 539-558.   DOI: 10.3778/j.issn.1673-9418.2404072
    Sentiment analysis is the process of automatically determining an opinion holder��s attitude or emotional tendency. It is widely used in business, social media analysis, and public opinion monitoring. In unimodal sentiment analysis, most researchers use text, facial expressions, and audio information. With the development of deep learning technology, sentiment analysis has expanded from a unimodal to a multimodal field. Combining multiple modalities can address the limitations of a unimodal and understand the emotions expressed by people more accurately and comprehensively. This paper summarizes the critical techniques of multimodal sentiment analysis based on three kinds of unimodal sentiment analysis. Firstly, the multimodal sentiment analysis background and its research status are briefly introduced. Secondly, the relevant datasets that are commonly used are summarized. Then, this paper describes the unimodal sentiment analysis based on text, facial expression, and audio information. In addition, this paper analyzes the critical techniques of video multimodal sentiment analysis, including multimodal fusion, alignment and modal noise processing, and provides a detailed analysis of these techniques’ relationships and their applications. Next, the performance metrics of different models on three commonly used datasets are analyzed, further validating the effectiveness of these key techniques. Finally, the existing challenges in multimodal sentiment analysis and future development trends are discussed.
    Reference | Related Articles | Metrics
    Abstract280
    PDF192
    Improved YOLOv8 Model for Multi-type Lung Nodule Detection
    BAO Qiangqiang, TANG Siyuan, LI Qingqian, WANG Naiyu, YANG Min, GU Yu, ZHAO Jinliang, GAO Jingbo, WANG Jiaxin, QU Yuhan
    Journal of Frontiers of Computer Science and Technology    2025, 19 (2): 429-442.   DOI: 10.3778/j.issn.1673-9418.2406018
    Currently, lung nodule testing is usually a single type of testing for solid lung nodules, and different kinds of lung nodules correspond to multiple types of lung cancer. Multi-type testing can help improve the overall detection rate of lung cancer and enhance the cure rate. Targeted improvements to the YOLOv8 model are made to enable the detection of multiple types of lung nodules, including solid, mixed, and ground glass. Firstly, the RepViTCAA module is proposed to improve the C2f module of the main part to enhance the accuracy of tiny lung nodule detection and the lightweight design of the model. Secondly, the ECLA-HSFPN module is proposed to reconstruct the feature fusion part of the model to improve the scale-invariant lung nodule detection accuracy. Then, the KAN network is integrated into the model further to improve the detection accuracy of tiny lung nodules and enhance the generalization ability of the model based on the strong learning ability of nonlinear features of the KAN network. Finally, based on the Inner-IoU auxiliary frame idea, the CIoU loss function is improved to solve the problem of unfixed lung nodule scale and enhance the model detection accuracy. Tested on the LUNA16 dataset, the improved model has improved all evaluation indices compared with the original model and mainstream models such as YOLOv9 and RT-DETR. The improved model has better detection performance than the original model when tested on a specialized dataset of four types (solid, ground glass, mixed, and microminiature) of lung nodules. The generalizability is tested on a mixed dataset of LUNA16 and local hospitals, and the improved model has strong generalization ability. The improvement of the model is more effective for the task of detecting multiple types of lung nodules and it can accurately detect different types of lung nodules.
    Reference | Related Articles | Metrics
    Abstract273
    PDF175
    Review of False Information Detection Frameworks Based on Large Language Models
    ZHANG Xin, SUN Jingchao
    Journal of Frontiers of Computer Science and Technology    2025, 19 (6): 1414-1436.   DOI: 10.3778/j.issn.1673-9418.2411001
    Globally, the spread of false information on the Internet, especially on social media, has become an urgent issue to be addressed. With the rise of artificial intelligence technology, the application research of large language models in false information detection has become a hot topic. However, in China, related research in this field is relatively scarce and has not yet formed a complete system. To systematically review the current research status and development trends, this paper provides a comprehensive summary of the application of large language models in false information detection. This paper focuses on the false information detection framework based on large language models and deeply explores the innovative applications of large language models in data generation, data augmentation, information extraction, integration with external knowledge and tools, model improvement, final fusion decision-making, explanation and feedback generation during the false information detection process. It outlines the definition of false information and the background of its spread, elaborates on the core detection process in the framework, sorts out the innovation points in each link of the false information detection framework, summarizes the “internal” and “external” detection processes, and expounds on the model improvements such as retrieval enhancement, prompt engineering, fine-tuning, and final decision-making involved in the detection process. Finally, it analyzes the challenges faced by false information detection based on large language models at present and looks forward to future research directions, with the aim of providing references and inspirations for the development of false information detection based on large language models.
    Reference | Related Articles | Metrics
    Abstract267
    PDF260
    Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models
    HE Jing, SHEN Yang, XIE Runfeng
    Journal of Frontiers of Computer Science and Technology    2025, 19 (5): 1295-1301.   DOI: 10.3778/j.issn.1673-9418.2408080
    With the widespread application of big language models in natural language understanding and generation tasks, their performance in high-precision fields such as healthcare, law, and scientific research has received increasing attention. However, the phenomenon of hallucinations, as a common problem in large language models, greatly restricts their practical application in these fields. At present, there are significant shortcomings in the evaluation and optimization of hallucination phenomena in large language models. Firstly, there is a lack of high-quality and high-precision domain hallucination evaluation datasets. Secondly, most of the existing hallucination assessment methods rely on a single model, which fails to take full advantage of the differences between multiple models. Finally, there are significant differences in the performance of different models in terms of hallucination types and rates, and there is currently no effective method to reduce the hallucination phenomenon in high hallucination rate models. This paper adopts a systematic process of dataset construction, swarm intelligence election, hallucination classification and quantification, and prior knowledge optimization to comprehensively evaluate and optimize the hallucination phenomenon of large language models in the field of medical question answering. Firstly, based on the publicly available dataset Huatuo, a large model illusion evaluation dataset in the medical question answering field is constructed by combining GPT generated question answers and manual annotation. Secondly, advanced big language models such as GPT4o, GPT4, ChatGLM4, Baichuan-13B, and Claude 3.5 are used to generate answers to questions in the dataset. By using a swarm intelligence based method, a LeaderAI is elected, which compares the answers of each model with reference answers to determine the illusion rate of each model. Finally, hallucinations are further divided into two categories: factual hallucinations and fidelity hallucinations. The research results indicate that under the guidance of LeaderAI, the illusion rate of the evaluated large models significantly decreases, especially the fidelity illusion rate.
    Reference | Related Articles | Metrics
    Abstract267
    PDF105
    Design of University Research Management Question Answering System Integrating Knowledge Graph and Large Language Models
    WANG Yong, QIN Jiajun, HUANG Yourui, DENG Jiangzhou
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 107-117.   DOI: 10.3778/j.issn.1673-9418.2406009
    Scientific research management is a crucial aspect of university management. However, existing scientific research management systems cannot meet the individual needs of users. This paper focuses on transforming university scientific research management towards intelligence as the demand orientation, and combines knowledge graph, traditional model and large language models to jointly build a new university scientific research management question answering system. Firstly, the scientific research knowledge is collected to build a scientific research knowledge graph. Then, a multi-task model is used for semantic parsing, simultaneously performing intent classification and entity extraction. Finally, the parsing results are used to generate query statements to retrieve information from the knowledge graph and answer general questions. Additionally, large language models are combined with knowledge graph to assist in processing open problems. Experimental results on datasets with associated intents and entities show that the F1 values of the adopted multi-task model in intent classification and entity recognition tasks are 0.958 and 0.937, respectively, surpassing other comparison models and single-task models. The Cypher generation test demonstrates the effectiveness of the custom Prompt in stimulating the emergent abilities of large language models. The accuracy of text-generated Cyphers using large language models reaches 85.8%, effectively handling open questions based on knowledge graph. The accuracy of the question answering system built with knowledge graph, traditional model and large language models is 0.935, which well meets the needs of intelligent question and answer.
    Reference | Related Articles | Metrics
    Abstract260
    PDF173
    Cross-Modal Multi-level Feature Fusion for Semantic Segmentation of Remote Sensing Images
    LI Zhijie, CHENG Xin, LI Changhua, GAO Yuan, XUE Jingyu, JIE Jun
    Journal of Frontiers of Computer Science and Technology    2025, 19 (4): 989-1000.   DOI: 10.3778/j.issn.1673-9418.2403082
    Multimodal semantic segmentation networks can leverage complementary information from different modalities to improve segmentation accuracy. Thus, they are highly promising for land cover classification. However, existing multimodal remote sensing image semantic segmentation models often overlook the geometric shape information of deep features and fail to fully utilize multi-layer features before fusion. This results in insufficient cross-modal feature extraction and suboptimal fusion effects. To address these issues, a remote sensing image semantic segmentation model based on multimodal feature extraction and multi-layer feature fusion is proposed. By constructing a dual-branch encoder, the model can separately extract spectral information from remote sensing images and elevation information from normalized digital surface model (nDSM), and deeply explore the geometric shape information of the nDSM. Furthermore, a cross-layer enrichment module is introduced to refine and enhance each layer??s features, making full use of multi-layer feature information from deep to shallow layers. The refined features are then processed through an attention feature fusion module for differential complementarity and cross-fusion, mitigating the differences between branch structures and fully exploiting the advantages of multimodal features, thereby improving the segmentation accuracy of remote sensing images. Experiments conducted on the ISPRS Vaihingen and Potsdam datasets demonstrate mF1 scores of 90.88% and 93.41%, respectively, and mean intersection over union (mIoU) scores of 83.49% and 87.85%, respectively. Compared with current mainstream algorithms, this model achieves more accurate semantic segmentation of remote sensing images.
    Reference | Related Articles | Metrics
    Abstract254
    PDF141
    Integrated Sensing, Communication and Computing: Key Technologies, Challenges, and Future Trends
    LIU Zhuang, WU Yuhe, CHEN Yuran, LIU Ruitong, DONG Yanning, ZHAO Jun
    Journal of Frontiers of Computer Science and Technology    2025, 19 (9): 2273-2301.   DOI: 10.3778/j.issn.1673-9418.2412035
    In the construction of a future highly integrated physical and digital world, the deep integration of communication, sensing, and computing has become a key technology for next-generation intelligent networks. Focusing on integrated sensing, communication and computing (ISCC) technology, this paper systematically analyzes its theoretical and practical value. Starting from technological evolution and emerging requirements, the paper clarifies the key role of ISCC in enhancing system intelligence, reducing latency, and optimizing resource utilization, particularly its necessity in meeting emerging business requirements such as immersive extended reality (XR), holographic communication, and autonomous driving. The paper deeply explores the core technical architecture of ISCC, including wireless sensing, multimodal sensing, mobile edge computing, and the deep fusion mechanisms of sensing and communication, reveals its innovative application scenarios in digital twin networks, computing power networks, and space-air-ground integrated networks, demonstrates its advantages in high-precision sensing, efficient data processing, and real-time communication. The paper systematically examines the multi-dimensional challenges faced by ISCC technology in actual deployment, such as the complexity of system architecture design, optimization difficulties in air interface protocols, the dynamic nature of resource management and control, the severity of data security and privacy protection, and the complexity of multi-source interference management. The paper also provides a forward-looking perspective on future research directions, and emphasizes the importance of interdisciplinary theoretical innovation, standardization advancement, and systematic simulation validation.
    Reference | Related Articles | Metrics
    Abstract252
    PDF203
    Advances in Node Importance Ranking Based on Graph Neural Networks
    CAO Lu, DING Cangfeng, MA Lerong, YAN Zhaoyao, YOU Hao, HONG Anqi
    Journal of Frontiers of Computer Science and Technology    2025, 19 (4): 877-900.   DOI: 10.3778/j.issn.1673-9418.2405056
    Node importance ranking is a critical task in graph analysis, as it plays a crucial role in identifying and prioritizing important nodes within a graph. Graph neural networks (GNNs) serve as an effective framework that leverages deep learning to directly comprehend the structural data of graphs, enabling comprehensive understanding of the internal patterns and deeper semantic features associated with nodes and edges. In the context of node importance ranking, GNNs can effectively harness graph structure information and node features to assess the significance of individual nodes. Compared with traditional node ranking methods, GNNs are better equipped to handle the diverse and intricate nature of graph structural data, capturing complex associations and semantic information between nodes while autonomously learning representations for node features. This reduces reliance on manual feature engineering, thereby enhancing accuracy in node importance ranking tasks. Consequently, approaches based on graph neural networks have emerged as the predominant direction for research into node importance. On this basis, this paper provides a classification of recent advancements in node ranking methods utilizing graph neural networks. This paper begins by revisiting core concepts related to node ranking, graph neural networks, and classical metrics for assessing node importance. It then summarizes recent developments in methods for evaluating node importance using graph neural networks. These techniques are categorized into four groups based on fundamental graph neural networks and their variants: basic GNNs, graph convolutional neural networks (GCNs), graph attention networks (GATs), and graph autoencoders (GAEs). Additionally, this paper analyzes the performance of these methods across various application domains, such as social networks, traffic networks, and knowledge graphs. Finally, it offers a comprehensive overview of existing research by analyzing time complexity along with advantages, limitations, and performance characteristics of current methodologies. Furthermore, it discusses future research directions based on identified shortcomings.
    Reference | Related Articles | Metrics
    Abstract242
    PDF141
    Survey of Multi-domain Machine Translation Methods for Fine-Tuning Large Models
    CHEN Zijian, WANG Siriguleng, SI Qintu
    Journal of Frontiers of Computer Science and Technology    2025, 19 (4): 916-928.   DOI: 10.3778/j.issn.1673-9418.2410032
    With the rapid development of machine translation technology, machine translation methods based on pre-trained large models have occupied an important position in the field of natural language processing. However, due to the significant differences in language features, lexical styles and expressions between different domains, it is difficult for a single pre-trained model to achieve efficient and stable performance in multi-domain translation tasks. Therefore, this paper focuses on the key issues of large model fine-tuning technology in multi-domain machine translation tasks, systematically reviews the core principles, main methods and application effects of fine-tuning technology, and focuses on analyzing the performance and applicability scenarios of three types of strategies, namely full-parameter fine-tuning, parameter-efficient fine-tuning, and prompt-tuning. This paper discusses the advantages and limitations of different fine-tuning methods in depth, focusing on how to balance the domain generalization ability and task specificity through efficient fine-tuning strategies under resource-constrained conditions, and demonstrating the significant advantages of parameter-efficient fine-tuning and prompt-tuning in terms of resource utilization efficiency and domain adaptability. The practical effects of different fine-tuning strategies in terms of domain migration and resource utilization are further evaluated through comparative analysis and experimental validation, and their effectiveness is verified through case studies. Future research directions should focus on the efficient utilization of resources, the domain adaptive capability of models, and the improvement of translation quality and robustness, so as to promote the continuous development of multi-domain machine translation systems in terms of performance and adaptability.
    Reference | Related Articles | Metrics
    Abstract241
    PDF150
    Research Progress on Sequence Recommendation Based on Deep Learning and Large Language Model
    XU Fengru, LI Bohan, XU Shuai
    Journal of Frontiers of Computer Science and Technology    2025, 19 (2): 344-366.   DOI: 10.3778/j.issn.1673-9418.2407090
    The recommendation system aims to solve the problem of overloading in the information retrieval system, and is committed to recommending personalized interest to users. The behavior of human interaction with the system has a certain order. When providing recommendations, the order needs to be taken into consideration. This is the sequence recommendation system. The sequence recommendation system analyzes user behavior sequences, captures the dynamic changes of user preferences, and provides accurate personalized recommendation services for many fields such as e-commerce, social media and online videos. This paper provides an overview of the current research progress in sequential recommendation systems and explores their significance and application potential in the field of personalized recommendation. Firstly, the research problem of sequential recommendation is defined, and the core objectives and challenges of recommendation sequences are clarified. Next, the main techniques in sequential recommendation are summarized in detail, including: traditional methods based on Markov chains, which model user behavior sequences by relying on state transition probabilities; deep learning-driven methods, which utilize neural network models to capture long-term dependencies and complex patterns; hybrid models, which combine multiple algorithms to enhance the accuracy and robustness of recommendation systems; and emerging methods based on large language models, which improve the understanding of user behavior and recommendation content through the integration of pre-trained large language models. Finally, the future research directions are prospected, with emphasis on the importance of context perception, multimodal fusion, causal inference, specific large language models of vertical fields, alleviating hallucinations, etc.
    Reference | Related Articles | Metrics
    Abstract238
    PDF157
    Review of Research on Image Compression Techniques
    ZHOU Kaijun, LIAO Ting, TAN Ping, SHI Changfa
    Journal of Frontiers of Computer Science and Technology    2025, 19 (7): 1699-1728.   DOI: 10.3778/j.issn.1673-9418.2411036
    Image compression is a key technology in the fields of image processing and communications and has long been a research hotspot in academia. This paper systematically reviews the basic concepts and principles of image compression, distinguishing between lossless and lossy compression and introducing various encoding techniques. With regard to traditional compression methods, techniques based on the discrete cosine transform, discrete wavelet transform, vector quantization, and fractal compression are comprehensively analyzed, with discussions on their respective advantages, disadvantages, and applicable scenarios. Although these methods have played significant roles in the field of image compression, their limitations have gradually become apparent with further technological developments. In the context of deep learning-based image compression, this paper focuses on the application of convolutional neural networks, recurrent neural networks, generative adversarial networks, as well as the recently emerging Transformer and diffusion model approaches. These methods achieve more efficient compression and image reconstruction by automatically learning image features. In terms of performance evaluation, key metrics such as compression ratio, peak signal-to-noise ratio, and structural similarity index are analyzed, and this paper discusses both the application prospects and the challenges faced by image compression technologies in various fields. Finally, this paper outlines future development directions and research trends in image compression technology, suggesting that with the integration of deep learning and emerging technologies, intelligent image compression will become a crucial development direction in the future.
    Reference | Related Articles | Metrics
    Abstract236
    PDF123
    Review of Smart Contract Vulnerability Detection and Repair Research
    LIU Zhexu, LI Leixiao, LIU Dongjiang, DU Jinze, LIN Hao, SHI Jianping
    Journal of Frontiers of Computer Science and Technology    2025, 19 (4): 854-876.   DOI: 10.3778/j.issn.1673-9418.2405019
    The smart contract is a fundamental technology of blockchain, as it operates without the need for third-party authorities and can directly provide trusted customized services for users. It represents an important advancement in blockchain technology. As the application range of smart contracts continues to expand, ensuring their safe and reliable operation has become a pressing issue in the field of blockchain security. A research framework for smart contract vulnerability detection and repair is proposed, analyzing and summarizing the current research progress in four key aspects: vulnerability datasets, machine learning methods, vulnerability repair techniques, and patch deployment strategies. Firstly, this paper investigates machine learning-based smart contract vulnerability detection methods, comparing and summarizing 8 types of smart contract vulnerabilities, the current state of 15 open-source datasets, and the advantages and disadvantages of existing models, including traditional machine learning methods, deep learning approaches, and large models. Furthermore, a strategy for constructing high-quality smart contract vulnerability datasets is proposed, combining 5 types of vulnerability detection tools and confidence learning. The 5 types of vulnerability detection tools are symbolic execution, fuzz testing, taint analysis, formal verification, and integrated frameworks. Secondly, 3 categories of smart contract vulnerability repair solutions are systematically introduced: automated repair techniques, machine learning-based repair methods, and Ethereum enhancement technologies. A comprehensive comparison of different solutions is conducted, highlighting their respective advantages and limitations, along with an overview of relevant technologies that can be applied to smart contract vulnerability repair in the future. Finally, this paper analyzes existing security challenges in smart contracts and provides insights into future research directions.
    Reference | Related Articles | Metrics
    Abstract234
    PDF138
    Heterogeneous Information Network Embedding Learning Based on Attention: a Survey
    TU Jiaqi, ZHANG Hua, CHANG Xiaojie, WANG Ji, YUAN Shuhong
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 1-29.   DOI: 10.3778/j.issn.1673-9418.2404034
    In recent years, graph embedding learning has become one of the most commonly used techniques in the field of information network analysis, which embeds network objects into low-dimensional dense vector spaces while preserving network structure and content characteristics. Then the learning embeddings are applied to downstream analysis tasks. However, most real-world networks are heterogeneous information networks (HIN), which are composed of multiple object types, relationships between objects and content characteristics. Therefore, in order to learn more effective embedding, researchers integrate attention mechanisms into the embedding learning of HIN to distinguish the degree of influence of different levels of heterogeneity on embedding. Therefore, this paper reviews the existing attention-integrated HIN embedding learning models. Firstly, it comprehensively reviews the research process of HIN embedding in the past five years, summarizes the three challenges it faces in solving network heterogeneity: content heterogeneity, structure heterogeneity and semantic heterogeneity, and summarizes a general framework of attention-integrated model. Secondly, in view of the above challenges, the existing attention-integrated models are divided into three categories: meta-path-based, graph neural network based and scenario-oriented, and various representative models are compared in detail. Then the common datasets, benchmark platform tools and evaluation indicators are introduced. Finally, the future research direction of HIN embedding learning is discussed.
    Reference | Related Articles | Metrics
    Abstract230
    PDF164
    Brain Storm Optimization Algorithm Integrating Independent Thinking and Local Escaping
    JIA Heming, RAO Honghua, WU Di, XUE Bowen, WEN Changsheng, LI Yongchao
    Journal of Frontiers of Computer Science and Technology    2025, 19 (6): 1522-1539.   DOI: 10.3778/j.issn.1673-9418.2407113
    The brain storm optimization algorithm (BSO) is a swarm intelligence optimization algorithm proposed to simulate human brain thinking activities. Aiming at the problems of poor accuracy and weak optimization ability of traditional brainstorming optimization algorithms, which are prone to falling into local optima, an improved brain storm optimization algorithm (IBSO) that integrates independent thinking and local escaping is proposed. Firstly, an independent thinking strategy is proposed, which adds a threshold to determine whether to execute the independent thinking strategy when the algorithm is stuck in a local optimal solution. When the algorithm falls into a local optimum and cannot obtain a better solution, it will use an independent thinking strategy to find a new position, assisting the algorithm in seeking a better solution to escape from the local optimum. Secondly, the local escaping operator (LEO) strategy is adopted to enhance the algorithm’s global exploration capability and improve its search efficiency. Optimization performance of IBSO algorithm is tested using CEC2014 benchmark test function and CEC2020 benchmark test function, and comparative experiments with 8 optimization algorithms are conducted. The results indicate that the improved algorithm has stronger optimization ability, higher stability, and global search capability. Finally, the latest engineering problem evaluation indicators are used to conduct testing experiments on two engineering problems, namely the design of a three bar truss and the design of tension/compression springs, further verifying the practicality of the IBSO algorithm in engineering problems.
    Reference | Related Articles | Metrics
    Abstract225
    PDF75
    Research on Robot Path Planning Based on Improved RRT-Connect Algorithm
    CHEN Zhilan, TANG Haoyang
    Journal of Frontiers of Computer Science and Technology    2025, 19 (2): 396-405.   DOI: 10.3778/j.issn.1673-9418.2404005
    The proposed improved RRT-Connect algorithm (TRRT-Connect) addresses the issues of path elongation, excessive turns, and inadequate pass ability encountered in the standard RRT-Connect algorithm for path planning. Firstly, an improved RRT algorithm is employed to search and add a middle root node, facilitating the simultaneous expansion of four random trees to expedite algorithm convergence. Additionally, a target-biased strategy is employed for random point selection, and an attractive field is superimposed on node generation, along with integration of a greedy search strategy. Furthermore, a novel dynamic step size adjustment method is introduced, which dynamically selects appropriate step sizes by identifying the number of obstacles within the scanning region. Then, a bidirectional pruning optimization method is applied to the generated initial paths to accelerate pruning efficiency and remove redundant nodes along the paths. Finally, path smoothing is conducted at path turning points and the number of paths�� turns is reduced. Simulation comparative experiments are conducted in three different environmental maps. The results indicate that the TRRT-Connect algorithm shows significant improvements compared with the standard RRT-Connect algorithm in terms of path length, number of iterations, and number of nodes. The paths generated are smoother without path turns, and there is better pass ability in densely populated obstacle areas. Experimental results confirm the effectiveness of this algorithm. Moreover, the application of the TRRT-Connect algorithm in field instance simulations reduces the transportation path length of mobile robots by 15.4% compared with traditional fixed paths, with smoother paths, further confirming the practicality of the algorithm.
    Reference | Related Articles | Metrics
    Abstract223
    PDF120
    Survey on Deep Learning Based Trajectory Similarity Measurement Approaches
    MENG Xiangfu, SHI Guangqi, ZHANG Xiaoyan, LENG Qiangkui, FANG Jinfeng
    Journal of Frontiers of Computer Science and Technology    2025, 19 (3): 623-644.   DOI: 10.3778/j.issn.1673-9418.2404006
    The development and application of mobile communication and sensing device technology have generated a large amount of trajectory data, which presents characteristics such as high-dimensional heterogeneity, multi-granularity, and uncertainty, which makes traditional trajectory similarity measurement methods based on point pair matching difficult to apply. In recent years, deep learning techniques have been applied to measuring trajectory similarity, aiming to mine more trajectory features, improve computational efficiency, and enhance model robustness. This paper systematically reviews recent trajectory similarity measurement methods based on deep learning. Firstly, it explains the relevant definitions of trajectories. Then, it provides an overview of these methods from two perspectives: metric representation forms (i.e. sequence representation and graph representation) and learning strategies (i.e. representation learning, metric learning, and contrastive learning). Furthermore, it conducts a detailed and comparative analysis on the implementation principles and characteristics of these methods from three aspects: trajectory data preprocessing, embedding representation learning, and similarity measurement. Afterwards, the commonly used datasets and evaluation metrics for trajectory similarity measurement methods based on deep learning are analyzed, and the sources, evaluation metrics, time complexity, and application scenarios of the learning model are summarized. Finally, it analyzes the challenges faced by trajectory similarity measurement methods and prospects future research directions.
    Reference | Related Articles | Metrics
    Abstract220
    PDF143
    Small Object Detection Based on Enhanced Feature Pyramid and Focal-AIoU Loss
    SHI Yu, WANG Le, YAO Yepeng, MAO Guojun
    Journal of Frontiers of Computer Science and Technology    2025, 19 (3): 693-702.   DOI: 10.3778/j.issn.1673-9418.2403006
    Unmanned aerial vehicle (UAV) aerial images have characteristics such as small target scale and complex backgrounds, making it difficult to achieve satisfactory recognition accuracy using generic object detection methods directly on these types of images. Based on YOLOv8, this paper proposes a small object detection model called CFE-YOLO (cross-level feature-fusion enhanced-YOLO), which incorporates a feature enhancement network and a localized focal loss. Firstly, a cross-level feature-fusion enhanced pyramid network (CFEPN) is designed to improve the traditional feature pyramid structure by fusing attention feature maps. This is achieved by adding high-resolution feature maps from shallow networks and removing deep detection heads to adapt to the requirements of small object detection. Secondly, a focus loss function based on area intersection over union is designed by combining Complete-IOU and Focal loss function ideas. It is used to further improve the detection of small objects. Finally, a lightweight spatial pyramid pooling layer module is implemented by introducing depth-wise separable convolutions, maintaining the detection accuracy of the model while   reducing the parameter count. Extensive experiments conducted on the UAV datasets VisDrone and Tinyperson show that CFE-YOLO improves the mAP0.50 by 4.72 and 5.58 percentage points respectively compared with the baseline, while   reducing the parameter count by 37.74%. Furthermore, it achieves higher accuracy compared with other advanced algorithms.
    Reference | Related Articles | Metrics
    Abstract216
    PDF157
    Multiscale Difference Feature Enhancement Network for Remote Sensing Image Change Detection
    WANG Jie, JIANG Fusong, JIANG Peng
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 211-222.   DOI: 10.3778/j.issn.1673-9418.2401057
    Remote sensing image change detection aims at identifying target differences in remote sensing images of different periods, and methods based on convolutional neural networks have made great progress in remote sensing image change detection tasks in recent years. However, the problem of pseudo-changes in remote sensing images in different periods is still difficult to be solved due to the influence of light changes and seasonal changes. Meanwhile, multi-scale features are not fully utilized in most of the methods, resulting in a certain degree of limitation in the performance and accuracy of the models. To address the above problems, a multi-scale differential feature-enhanced change detection method is proposed. Firstly, a parallel coding framework consisting of a twin network encoder and a differential network encoder is used to extract features at different levels respectively, and the same level of diachronic features and differential features are spliced to establish a complementary relationship between them. Then, the differential feature enhancement module is introduced to obtain more discriminative feature maps as supplementary inputs to the differential network encoder, which enriches the change information and increases the attention of the model to the change area, so that it can accurately distinguish the real change from the pseudo change of the features. Finally, in order to enhance the diversity and expressiveness of the features, the feature mismatch fusion module is used to achieve the cross-fertilisation of the semantic features, so that the semantic information in each feature is fully and differently interacted. The F1 score of this method on the CDD dataset and LEVIR-CD dataset reaches 95.45% and 92.04% respectively, and the intersection over union (IOU) reaches 92.26% and 82.93% respectively, which are optimal compared with the remaining eight mainstream methods, and the experimental results prove the effectiveness of this method.
    Reference | Related Articles | Metrics
    Abstract214
    PDF107
    Design of AGV Based on Autonomous Controllable Platform and Research on Visual Line-Following Algorithm
    XU Shizhu, GAO Jun, SUN Qiujun
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 264-276.   DOI: 10.3778/j.issn.1673-9418.2311095
    This paper designs and implements an AGV based on an autonomous and controllable platform. This AGV meets the requirements of domestic core hardware production and open-source software, and is compatible with multiple autonomous navigation algorithms. Moreover, the tracking accuracy and security of the traditional pure pursuit algorithm is improved and implemented on the AGV designed in this paper. Firstly, when the vehicle is far away from the reference path, Dubins path is introduced for fast convergence. Secondly, the looking-ahead point offset algorithm is used to correct the tracking path at the large curvature reference path and reduce the tracking errors. Thirdly, the application of dynamic looking-ahead distance and dynamic speed can further reduce the tracking errors and safety risk of large curvature sections. After completing the AGV design and algorithm updating, the real environment experiment is carried out. At the speed of 0.1 m/s, the average deviation of the improved algorithm on the section with small curvature is reduced by 3.3% and 7.3%, compared with two pure pursuit variants respectively, and that on the section with large curvature is reduced by 8.34% and 23.06%. At the speed of 0.5 m/s, the average deviation of the road section with small curvature is reduced by 9.08% and 11.33%, respectively, and that of the road section with large curvature is reduced by 2.97% and 24.67%. The overall experimental results of the improved algorithm are better than the results of the two variants, and the AGV designed in this paper can meet the requirements of visual line-following navigation in general environment, and can be applied to a variety of practical scenarios.
    Reference | Related Articles | Metrics
    Abstract209
    PDF52
    Review on Application of Machine Learning in Detecting Suicidal Ideation for Social Media Users
    MENG Xiuyang, WANG Shiyi, LI Dudu, WANG Chunling
    Journal of Frontiers of Computer Science and Technology    2025, 19 (3): 559-581.   DOI: 10.3778/j.issn.1673-9418.2405086
    In recent years, social media platforms have emerged as a novel domain for individuals to express their emotions,  including suicidal ideation, attempts, and behaviors. Consequently, these platforms have evolved into crucial data reposi-  tories and essential assessment criteria for detecting suicidal ideation. With the advent of artificial intelligence technology, the utilization of machine learning in detecting suicidal ideation among social media users has emerged as a scintillating subject. However, in China, the relative research is scarce and has not yet established a comprehensive system. To systematically review the research status and development context of suicide ideation detection, this paper presents a comprehensive summary of machine learning technology in empowering suicide ideation detection. Firstly, this paper provides an overview of the definition, process, commonly employed methods, and evaluation indicators for detecting suicidal ideation. Secondly, this paper provides a comprehensive overview of suicide ideation detection techniques, encompassing both traditional machine learning and deep learning approaches. The key methodologies, fundamental concepts, merits, and limitations of each method are thoroughly compared and analyzed. Furthermore, the urgent issues and innovative solutions in this field are summarized, with a particular focus on the application of large language models such as ChatGPT and multi-modal models. Finally, the limitations of machine learning in the application research of suicidal ideation detection on social media are comprehensively discussed, and future research directions are proposed, in order to further promote the formation of a new paradigm of data-driven, human-computer collaboration, interdisciplinary integration, and cross-cultural domain of suicide ideation detection.
    Reference | Related Articles | Metrics
    Abstract209
    PDF111
    Pedestrian Detection in Fisheye Images Based on Improved YOLOv8 Algorithm
    ZHU Yumin, SUN Guangling, MIAO Fei
    Journal of Frontiers of Computer Science and Technology    2025, 19 (2): 443-453.   DOI: 10.3778/j.issn.1673-9418.2404037
    In view of the problems of inaccurate positioning and insufficient detection accuracy in pedestrian detection in fisheye images in existing target detection algorithms, an improved YOLOv8 algorithm for fisheye image detection is proposed. This method designs the ProbIoU-r algorithm by adding angle parameters, uses the scaling factor to adjust the impact of angle difference on the loss, and enhances the model’s attention to the angle offset of the bounding box in gradient calculation, solving the problems of inaccurate positioning of the original IoU in rotated target detection and poor bounding box fitting effect, so that the YOLOv8 network model has better ability to perceive rotated targets. In order to improve the model’s feature extraction ability for distorted targets in fisheye images and improve detection accuracy, a Parnet-gcs module with multi-scale convolution and attention mechanism as branches is proposed. The feature information of different scales is extracted through DWConv with different convolution kernels, and the CA and SA modules are combined to enhance the model’s feature expression ability. The experiment uses the public fisheye image dataset WEPDTOF. The improved algorithm increases the detection accuracy mAP0.50:0.95 by 2.3 percentage points compared with the original YOLOv8s; the number of parameters is reduced by 38.8% compared with the YOLOv8m algorithm, and the accuracy mAP0.50:0.95 is also 0.5 percentage points higher, indicating that the improved algorithm based on YOLOv8s is better suitable for pedestrian detection tasks in fisheye images.
    Reference | Related Articles | Metrics
    Abstract204
    PDF152
    Research on Development Status of Multimodal Knowledge Graph Fusion Technology in Medical Field
    SHI Zhenpu, LYU Xiao, DONG Yanru, LIU Jing, WANG Xiaoyan
    Journal of Frontiers of Computer Science and Technology    2025, 19 (7): 1729-1746.   DOI: 10.3778/j.issn.1673-9418.2411008
    Multimodal knowledge graph utilizes text, visual and other multimodal data to model entities, relationships and events, demonstrating powerful data processing capabilities and providing richer and deeper understanding for the field of artificial intelligence. Therefore, it has attracted attention in the medical field and has achieved significant results in various research areas such as medical data processing and potential value mining. To better clarify the research status of multimodal knowledge graph in the medical field, firstly, this paper elaborates on the basic knowledge of multimodal knowledge graph and the difficulties and related datasets in constructing multimodal knowledge graph in the medical field. Secondly, this paper analyzes the key technologies involved in multimodal knowledge graph fusion, such as multimodal entity alignment and multimodal entity linking, from the perspectives of traditional methods and deep learning methods. The focus is on the feature extraction and fusion methods of text, image, and audio modalities. This paper summarizes the  advantages and disadvantages of each multimodal fusion method, and elaborates on the application of multimodal large language model in multimodal fusion. Finally, this paper reviews the research progress of multimodal knowledge graphs in fields such as medical visual Q&A, drug development, and medical imaging diagnosis. On this basis, this paper analyzes the limitations and challenges faced by multimodal knowledge graphs in the field of medical multimodal fusion and datasets, and provides future research directions.
    Reference | Related Articles | Metrics
    Abstract203
    PDF106
    Multi-level Fusion Knowledge Graph Completion Model
    YE Zhihong, WU Yunbing, DAI Sichong, ZENG Zhihong
    Journal of Frontiers of Computer Science and Technology    2025, 19 (3): 724-737.   DOI: 10.3778/j.issn.1673-9418.2404032
    Knowledge graph completion aims to expand and enhance knowledge graphs by predicting missing triples. Multi-modal knowledge graph completion integrates entity ontology information such as entity descriptions, entity images, and entity attributes to obtain more accurate entity representations. Existing research projects different modalities into a unified space to obtain joint representations of entities, then combine knowledge graph structural information for predictions. However, existing methods have difficulty in capturing the complex interactions between entity background knowledge when fusing multi-modal information, which inevitably leads to information loss and insufficient feature extraction capabilities; overfitting and limited entity relation interactions restrict the performance of 2D convolution models, making it difficult to integrate knowledge graph structural information. Therefore, this paper uses a multi-level fusion knowledge graph completion model to address the above issues from two aspects: the fusion of entity multimodal information and the integration of knowledge graph structural information. To fully integrate entity multi-modal information, three different fusion methods are simultaneously used to comprehensively capture the interaction of entity background knowledge, along with decision learning, aiming to combine the complementary information provided by different multi-modal fusion methods to obtain rich and diverse entity representations. To fully integrate knowledge graph structural information, feature generalization is proposed to alleviate the overfitting issues of 2D convolution models, combined with feature reshaping to enhance interactions between entities and relations, thereby improving the contextual perception ability of entities and relations. Experiments on multiple public datasets demonstrate the superior performance of the proposed method.
    Reference | Related Articles | Metrics
    Abstract201
    PDF86
    Research Review of Deep Learning in Colon Polyp Image Segmentation
    LI Guowei, LIU Jing, CAO Hui, JIANG Liang
    Journal of Frontiers of Computer Science and Technology    2025, 19 (5): 1198-1216.   DOI: 10.3778/j.issn.1673-9418.2408012
    Colorectal polyp is an abnormal tissue growing in the gastrointestinal tract with the potential to develop into colorectal cancer. Therefore, early detection and removal of colorectal polyps are crucial for preventing colorectal cancer. In recent years, deep learning technology has made significant strides in the field of colonic polyp image segmentation, substantially enhancing both the accuracy and automation levels of segmentation. This paper focuses on research related to deep learning in colorectal polyp image segmentation. Firstly, it introduces various imaging techniques for colonic polyps and commonly used datasets, including both image and video datasets, and elaborates on the characteristics of these datasets. Subsequently, the deep learning-based segmentation methods are summarized, covering fully convolutional networks, Mask R-CNN, generative adversarial networks, U-Net, Transformer, and multi-network fusion models. Particular emphasis is placed on the application of U-Net and its variants in colonic polyp image segmentation, analyzing their structural improvements, performance enhancements, and practical application outcomes. Furthermore, the review comprehensively compares the main improvements, advantages, disadvantages, and segmentation results of each network model. Finally, it points out the main challenges currently faced by deep learning in this field and provides an outlook on future research directions.
    Reference | Related Articles | Metrics
    Abstract198
    PDF115