Most Download articles

    Published in last 1 year| In last 2 years| In last 3 years| All| Most Downloaded in Recent Month | Most Downloaded in Recent Year|

    Published in last 1 year
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Review of One-Stage Universal Object Detection Algorithms in Deep Learning
    WANG Ning, ZHI Min
    Journal of Frontiers of Computer Science and Technology    2025, 19 (5): 1115-1140.   DOI: 10.3778/j.issn.1673-9418.2411032
    In recent years, object detection algorithms have gradually become a hot research direction as a core task in the field of computer vision. They enable computers to recognize and locate target objects in images or video frames, and are widely used in fields such as autonomous driving, biological individual detection, agricultural detection, medical image analysis, etc. With the development of deep learning, general object detection algorithms have shifted from traditional object detection methods to object detection methods based on deep learning. The general object detection algorithms under deep learning are mainly divided into one-stage object detection and two-stage object detection. This paper takes one-stage object detection as the starting point and analyzes and summarizes the mainstream one-stage detection algorithms of the first one-stage object detection algorithm YOLO series (YOLOv1 to YOLOv11, YOLO main improved version), SSD, and DETR series based on Transformer architecture, based on the use of two different architectures: classical convolution and Transformer. This paper introduces the network structure and research progress of various algorithms, summarizes their characteristics, advantages, and limitations based on their structures, summarizes the main common datasets and evaluation indicators in the field of object detection, analyzes the performance of various algorithms and their improvement methods, discusses the application status of various algorithms in different fields, and finally looks forward to the future research directions of one-stage object detection algorithms.
    Reference | Related Articles | Metrics
    Abstract495
    PDF392
    Comprehensive Review of Physics-Guided Deep Learning: Advancements, Challenges, and Perspectives
    CHEN Chong, ZHU Xiaoyu, WANG Fang, XU Yaqian, ZHANG Wei
    Journal of Frontiers of Computer Science and Technology    2025, 19 (2): 277-294.   DOI: 10.3778/j.issn.1673-9418.2407056
    Although deep learning has significant achievements in addressing nonlinear and high-dimensional problems, it faces challenges in complex scientific and engineering domains (such as high computational costs and data requirements, the difficulties in interpreting its black-box nature, and the lack of capabilities for following the physical laws). Therefore, a novel framework called physics-guided deep learning has emerged which enhances the performance, explainability, and physical consistency of deep learning by integrating domain-specific physical knowledge into the construction and training process of deep learning models. This paper reviews and analyzes the researches (e.g., methodologies, applications, etc.) on physics-guided deep learning thoroughly. Firstly, the main motivations and theoretical foundations of the physics-guided deep learning are introduced. Secondly, a detailed discussion is conducted on the two modes: the combination of physical information with deep learning and the fusion of physical information with deep learning. The characteristics, limitations and application scenarios of the two modes are summarized and discussed. Finally, the performance of physics-guided deep learning on various applications is analyzed. Furthermore, the challenges of the physics-guided deep learning are discussed from four perspectives: computational complexity and convergence, biases while involving control equations, dependence on observational data, and difficulties in knowledge fusion, based on which, an outlook for the future direction of this domain is provided. This paper strives for providing research reference and multidimensional perspectives of physics-guided deep learning for the researchers.
    Reference | Related Articles | Metrics
    Abstract529
    PDF362
    Survey of Transformer-Based Model for Time Series Forecasting
    MENG Xiangfu, SHI Haoyuan
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 45-64.   DOI: 10.3778/j.issn.1673-9418.2403070
    Time series forecasting (TSF) refers to predicting future values and trends at specific time points or over time periods by analyzing potential information such as trends and seasonality in historical data. Time series data, generated by sensors, play a significant role in numerous fields, including finance, healthcare, energy, transportation, and meteorology. With the development of IoT sensors, the massive amounts of time series data are difficult to handle using traditional machine learning techniques. However, the Transformer model, which has shown excellent performance in various tasks within natural language processing and computer vision, has been effectively utilized by researchers to capture long-term dependencies, leading to rapid advancements in time series forecasting tasks. Therefore, this paper reviews time series forecasting methods based on the Transformer model. It chronologically outlines the development process of time series forecasting, systematically introduces the preprocessing procedures and methods for time series data, and presents commonly used evaluation metrics and datasets for time series forecasting. By focusing on algorithm frameworks, this paper systematically explains the application methods and working principles of various models based on the Transformer in TSF tasks. Through experiments, it compares the performance, advantages, and limitations of different models, and analyzes the experimental results. Finally, considering the challenges present in current work on Transformer models for time series forecasting, this paper proposes future development trends in this direction.
    Reference | Related Articles | Metrics
    Abstract451
    PDF351
    Review of Neural Network Lightweight
    DUAN Yuchen, FANG Zhenyu, ZHENG Jiangbin
    Journal of Frontiers of Computer Science and Technology    2025, 19 (4): 835-853.   DOI: 10.3778/j.issn.1673-9418.2403071
    With the continuous progress of deep learning technology, artificial neural network models have shown unprecedented performance in many fields such as image recognition, natural language processing, and autonomous driving. These models often have millions or even billions of parameters and learn complex feature representations through large amounts of training data. However, in resource-constrained environments, such as mobile devices, embedded systems and other edge computing scenarios, the power consumption, memory usage and computing efficiency of the model limit the application of large-scale neural network models. To solve this problem, the researchers have proposed a variety of model compression techniques, such as pruning, distillation, neural network search (NAS), quantization, and low-rank decomposition, which aim to reduce the number of parameters, computational complexity, and storage requirements of the model, while maintaining the accuracy of the model as much as possible. The following is a systematic introduction to the development process of these model compression methods, focusing on the main principles and key technologies of each method.    It mainly includes different strategies of pruning techniques, such as structured pruning and unstructured pruning; how to define knowledge in knowledge distillation; search space, search algorithm and network performance evaluation in NAS; post-training quantization and in-training quantization in quantization; and the singular value decomposition and tensor  decomposition in low rank decomposition. Finally, the future development direction of model compression technology is discussed.
    Reference | Related Articles | Metrics
    Abstract513
    PDF329
    Survey of NLP Data Augmentation Methods Based on Large Language Models
    XU Delong, LIN Min, WANG Yurong, ZHANG Shujun
    Journal of Frontiers of Computer Science and Technology    2025, 19 (6): 1395-1413.   DOI: 10.3778/j.issn.1673-9418.2410054
    Currently, large language models show great potential in the field of natural language processing (NLP), but their training process relies on a large number of high-quality samples. In low-resource scenarios, the number of existing data samples can hardly support the convergence of model training as the model size keeps increasing, and this problem has inspired researchers in related fields to investigate data augmentation methods. However, traditional data enhancement methods have limited application scope and data distortion problems in the context of large models in NLP. In contrast, data enhancement methods based on large language models can address this challenge more effectively. This paper offers a comprehensive exploration of data augmentation methods for large language models in the current NLP field and adopts a comprehensive perspective to study data enhancement in the NLP domain. Firstly, the development history of traditional data enhancement methods and big language models in the NLP domain is reviewed. Then, a variety of large language model data enhancement methods in the NLP domain at this stage are summarized, and the scope of application, advantages and limitations of each method are discussed in depth. Subsequently, data enhancement evaluation methods in the field of NLP are introduced. Finally, future research directions of data enhancement methods for large language models in the NLP domain are discussed through comparative experiments and result analyses of current methods, and prospective suggestions are made.
    Reference | Related Articles | Metrics
    Abstract300
    PDF281
    Survey of Entity Relation Extraction Based on Large Language Models
    XIA Jianglan, LI Yanling, GE Fengpei
    Journal of Frontiers of Computer Science and Technology    2025, 19 (7): 1681-1698.   DOI: 10.3778/j.issn.1673-9418.2409086
    Entity relation extraction aims to identify entity pairs and their relationships from unstructured text, serving as the foundation for many downstream tasks in natural language processing. With the development of big data and deep learning technologies, significant progress has been made in entity relation extraction research. In recent years, applying large language models to this task has become a new research trend. Large language models, with their ability to automatically extract features and strong generalization capabilities, can significantly enhance the performance of the task. This paper provides a comprehensive review of entity relation extraction methods, categorizing them into two main types based on the evolution of techniques and models. Firstly, the definitions of named entity recognition and relation extraction tasks are introduced. Next, a systematic review of the development of entity relation extraction methods is presented, with an in-depth analysis of the advantages and disadvantages of the corresponding models. On this basis, this paper focuses on the unique advantages of large language model-based methods in addressing entity relation extraction tasks. Furthermore, the characteristics of current mainstream datasets are summarized, along with common evaluation metrics for entity relation extraction, such as precision, recall, and F1 score. Finally, the challenges in current research are analyzed, and future research directions are discussed.
    Reference | Related Articles | Metrics
    Abstract291
    PDF281
    Review of Research on CNN and Visual Transformer Hybrid Models in Image Processing
    GUO Jialin, ZHI Min, YIN Yanjun, GE Xiangwei
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 30-44.   DOI: 10.3778/j.issn.1673-9418.2403009
    Convolutional neural network (CNN) and vision Transformer are two important deep learning models in the field of image processing, and they have made remarkable achievements in this field after years of continuous research and progress. In recent years, the hybrid model of CNN and vision Transformer is gradually emerging. Extensive research has constantly overcome the weaknesses of the two models, and effectively plays their respective highlights, showing excellent results in image processing tasks. This paper is based on the hybrid model of CNN and vision Transformer. First of all, the architecture, advantages and disadvantages of CNN and vision Transformer model are summarized, and the concept and advantages of hybrid model are summarized. Secondly, this paper comprehensively reviews the research status and actual progress of hybrid models from four aspects: serial structure fusion mode, parallel structure fusion mode, hierarchical cross structure fusion mode and other fusion modes, summarizes the main representative models of various fusion modes, and compares typical hybrid models from various aspects. Then, the application research of the hybrid model in the specific fields of actual image processing such as image recognition, image classification, object detection and image segmentation is described from multiple perspectives, showing the applicability and high efficiency of the hybrid model in practice. Finally, the future research direction of hybrid model is deeply analyzed, and future research and application of this model in image processing are prospected.
    Reference | Related Articles | Metrics
    Abstract438
    PDF268
    Overview of Knowledge Graph Construction and Reasoning Enhanced by Large Language Models
    ZHANG Jing, HUANG Wenfeng, WU Chunjiang, TAN Hao
    Journal of Frontiers of Computer Science and Technology    2025, 19 (11): 2855-2872.   DOI: 10.3778/j.issn.1673-9418.2503034
    With the widespread application of knowledge graphs (KGs) in fields such as intelligent question answering and recommender systems, the technical bottlenecks in large-scale construction and efficient reasoning have become increasingly prominent. Traditional manual or semi-automated construction approaches are costly, while issues such as entity disambiguation and relation extraction accuracy continue to hinder the quality of the resulting graphs. Furthermore, knowledge sparsity and the complexity of reasoning rules limit the generalization capability of KG reasoning. Large language models (LLMs), with their powerful semantic understanding and contextual modeling capabilities, offer promising new avenues to address these challenges. However, current research in this area lacks a systematic review, and the applicability and performance boundaries of various methods remain unclear. To bridge this gap, this paper provides a comprehensive survey of LLM-enhanced knowledge graph construction and reasoning methods. Firstly, this paper introduces the foundational theories of knowledge graphs and large language models. The survey then focuses on four core tasks: knowledge extraction, automated construction, knowledge completion, and reasoning. For knowledge extraction, this paper compares zero-shot extraction methods based on LLMs with domain-adapted extraction through fine-tuning. In terms of automated construction, this paper reviews techniques for LLM-driven ontology generation and iterative graph updates. For knowledge completion, this paper summarizes methods involving pseudo-triple generation via LLMs, prompt-based context planning, and the integration of external retrieval mechanisms. Regarding reasoning tasks, this paper analyzes both static LLM-augmented reasoning and actively planned reasoning approaches. This paper further presents typical application scenarios in domains such as healthcare and education, and compiles a list of general-purpose and domain-specific knowledge graph datasets in both English and Chinese that support research in this area. Finally, this paper highlights the current limitations of existing methods and proposes several future research directions.
    Reference | Related Articles | Metrics
    Abstract154
    PDF257
    Review of False Information Detection Frameworks Based on Large Language Models
    ZHANG Xin, SUN Jingchao
    Journal of Frontiers of Computer Science and Technology    2025, 19 (6): 1414-1436.   DOI: 10.3778/j.issn.1673-9418.2411001
    Globally, the spread of false information on the Internet, especially on social media, has become an urgent issue to be addressed. With the rise of artificial intelligence technology, the application research of large language models in false information detection has become a hot topic. However, in China, related research in this field is relatively scarce and has not yet formed a complete system. To systematically review the current research status and development trends, this paper provides a comprehensive summary of the application of large language models in false information detection. This paper focuses on the false information detection framework based on large language models and deeply explores the innovative applications of large language models in data generation, data augmentation, information extraction, integration with external knowledge and tools, model improvement, final fusion decision-making, explanation and feedback generation during the false information detection process. It outlines the definition of false information and the background of its spread, elaborates on the core detection process in the framework, sorts out the innovation points in each link of the false information detection framework, summarizes the “internal” and “external” detection processes, and expounds on the model improvements such as retrieval enhancement, prompt engineering, fine-tuning, and final decision-making involved in the detection process. Finally, it analyzes the challenges faced by false information detection based on large language models at present and looks forward to future research directions, with the aim of providing references and inspirations for the development of false information detection based on large language models.
    Reference | Related Articles | Metrics
    Abstract263
    PDF256
    Survey on Construction Method of Temporal Knowledge Graph
    LU Jiamin, ZHANG Jing, FENG Jun, AN Qi
    Journal of Frontiers of Computer Science and Technology    2025, 19 (2): 295-315.   DOI: 10.3778/j.issn.1673-9418.2406089
    As a bridge connecting data, knowledge, and intelligence, knowledge graph has been widely applied in fields such as search assistance, intelligent recommendation, question-answering systems, and natural language processing. However, with the expansion of application scenarios, static knowledge graph has shown limitations in handling dynamic knowledge. The emergence of temporal knowledge graph addresses this shortcoming by integrating temporal information into the graph structure, enabling a more accurate representation of dynamic changes in knowledge. This paper provides a comprehensive study on the construction of temporal knowledge graph. It begins by introducing the concept of temporal knowledge graph and clarifying its value in handling dynamic knowledge. Then, it delves into the construction process of temporal knowledge graph, dividing the core process into three key stages: knowledge extraction, knowledge fusion, and knowledge computing. Subsequently, it thoroughly organizes each stage, and each stage is detailed with task definitions, research summaries, and the application of large language models. In the knowledge extraction stage, it focuses on named entity recognition, relation extraction, and time information extraction; in the fusion stage, it discusses entity alignment and entity linking; and in the computation stage, it focuses on knowledge reasoning. Finally, it explores the challenges faced at each stage and looks forward to future research directions.
    Reference | Related Articles | Metrics
    Abstract354
    PDF255
    Research on Lightweight Model of Multi-person Pose Estimation Based on Improved YOLOv8s-Pose
    FU Yu, GAO Shuhui
    Journal of Frontiers of Computer Science and Technology    2025, 19 (3): 682-692.   DOI: 10.3778/j.issn.1673-9418.2403059
    To address the issues of high computational load and slow detection speed in existing human pose estimation models, this paper proposes a lightweight improved algorithm based on the YOLOv8s-Pose model. Firstly, a lightweight module C2f-GhostNetBottleNeckV2 is introduced into the backbone to replace the original C2f, reducing the number of parameters. This paper also introduces the Non_Local attention mechanism to integrate the position information of human key points in the image into the channel dimension, thereby enhancing the efficiency of feature extraction and mitigating the accuracy degradation issues that often occur after model lightweighting. Furthermore, the weighted bidirectional feature pyramid network is incorporated into the neck layer to improve the model’s feature fusion capabilities, ensuring a good balance when processing features of different scales. A small object detection head is then added to the network to reduce the missed detection of small objects. Lastly, the CIOU loss function is replaced with Focal-EIOU to enhance the accuracy of human key point regression. Experimental results show that the improved model reduces the number of parameters by 9.3%, and compared with the original model on the COCO2017 human key points dataset, it achieves an improvement of 0.4 percentage points in mAP@0.50 and an improvement of 0.6 percentage points in mAP@0.50:0.95. Therefore, the proposed lightweight improvement algorithm not only reduces the number of model parameters but also enhances the accuracy of human pose estimation algorithms, especially for small target detection, which provides an effective means to achieve real-time and accurate pose estimation.
    Reference | Related Articles | Metrics
    Abstract320
    PDF246
    Integrated Sensing, Communication and Computing: Key Technologies, Challenges, and Future Trends
    LIU Zhuang, WU Yuhe, CHEN Yuran, LIU Ruitong, DONG Yanning, ZHAO Jun
    Journal of Frontiers of Computer Science and Technology    2025, 19 (9): 2273-2301.   DOI: 10.3778/j.issn.1673-9418.2412035
    In the construction of a future highly integrated physical and digital world, the deep integration of communication, sensing, and computing has become a key technology for next-generation intelligent networks. Focusing on integrated sensing, communication and computing (ISCC) technology, this paper systematically analyzes its theoretical and practical value. Starting from technological evolution and emerging requirements, the paper clarifies the key role of ISCC in enhancing system intelligence, reducing latency, and optimizing resource utilization, particularly its necessity in meeting emerging business requirements such as immersive extended reality (XR), holographic communication, and autonomous driving. The paper deeply explores the core technical architecture of ISCC, including wireless sensing, multimodal sensing, mobile edge computing, and the deep fusion mechanisms of sensing and communication, reveals its innovative application scenarios in digital twin networks, computing power networks, and space-air-ground integrated networks, demonstrates its advantages in high-precision sensing, efficient data processing, and real-time communication. The paper systematically examines the multi-dimensional challenges faced by ISCC technology in actual deployment, such as the complexity of system architecture design, optimization difficulties in air interface protocols, the dynamic nature of resource management and control, the severity of data security and privacy protection, and the complexity of multi-source interference management. The paper also provides a forward-looking perspective on future research directions, and emphasizes the importance of interdisciplinary theoretical innovation, standardization advancement, and systematic simulation validation.
    Reference | Related Articles | Metrics
    Abstract247
    PDF203
    Survey on Applications of AIGC in Multimodal Scenarios
    YUE Qi, ZHANG Chenkang
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 79-96.   DOI: 10.3778/j.issn.1673-9418.2404009
    Although artificial intelligence generated content (AIGC) has been able to achieve excellent results in the field of single-mode applications, using artificial intelligence to generate text, images, videos and other content, it is difficult for a single-mode feature representation to completely contain the complete information of a phenomenon. In order to enable AIGC to show greater generation capability, scholars propose applying multimodal information into AIGC to improve the learning performance and generation capability of models. By processing and integrating multiple modalities, AIGC acquires richer contextual information, which helps models better understand and generate content. The basic architecture, working principle and challenge of AIGC in dealing with multimodal problems are discussed in detail, and the AIGC models combined with multimodal information in recent years are classified and summarized. The application, challenge and development direction of AIGC in multimodal image generation, video generation and 3D shape generation are summarized. In the aspect of image generation, the application and limitation of generative adversarial network (GAN) model and diffusion model are discussed. In the aspect of video generation, the video generation based on diffusion model is analyzed, and the audio and video joint generation method is discussed. In the aspect of 3D shape generation, the 3D shape generation method under the guidance of diffusion model and neural network is discussed. The challenges faced by AIGC in multimodal applications are proposed, and the future research is prospected.
    Reference | Related Articles | Metrics
    Abstract325
    PDF200
    Review of Multivariate Time Series Clustering Algorithms
    ZHENG Desheng, SUN Hanming, WANG Liyuan, DUAN Yaoxin, LI Xiaoyu
    Journal of Frontiers of Computer Science and Technology    2025, 19 (3): 582-601.   DOI: 10.3778/j.issn.1673-9418.2405013
    Multivariate time series (MTS) data, serving as a crucial basis for intelligent technologies across numerous domains, record the state changes of multiple variables in systems over time. Clustering technique, as a core tool in data mining, can partition data into different clusters based on structural similarity, thereby uncovering the structure and internal relationships within data to discover systemic development patterns and variable correlations. Faced with the challenges such as the complexity of multivariate time series data structures, the interconnectivity between variables, and data high-dimensionality, a substantial amount of research has been conducted internationally. This paper provides an overview of clustering analysis algorithms for multivariate time series data scenarios. Initially, based on classification standards such as feature extraction methods, similarity measurement algorithms, and clustering partition frameworks, this paper conducts a comparative analysis of existing multivariate time series clustering algorithms. For each category of detection technology, a detailed summary is provided, covering algorithm principles, representative methods, advantages and disadvantages, and the problems they address. Further discussion includes common evaluation standards and publicly available datasets related to multivariate time series clustering. Lastly, from the perspective of the unique structure of multivariate temporal data, this paper outlines several challenging issues and future research directions.
    Reference | Related Articles | Metrics
    Abstract288
    PDF194
    Review on Key Techniques of Video Multimodal Sentiment Analysis
    DUAN Zongtao, HUANG Junchen, ZHU Xiaole
    Journal of Frontiers of Computer Science and Technology    2025, 19 (3): 539-558.   DOI: 10.3778/j.issn.1673-9418.2404072
    Sentiment analysis is the process of automatically determining an opinion holder��s attitude or emotional tendency. It is widely used in business, social media analysis, and public opinion monitoring. In unimodal sentiment analysis, most researchers use text, facial expressions, and audio information. With the development of deep learning technology, sentiment analysis has expanded from a unimodal to a multimodal field. Combining multiple modalities can address the limitations of a unimodal and understand the emotions expressed by people more accurately and comprehensively. This paper summarizes the critical techniques of multimodal sentiment analysis based on three kinds of unimodal sentiment analysis. Firstly, the multimodal sentiment analysis background and its research status are briefly introduced. Secondly, the relevant datasets that are commonly used are summarized. Then, this paper describes the unimodal sentiment analysis based on text, facial expression, and audio information. In addition, this paper analyzes the critical techniques of video multimodal sentiment analysis, including multimodal fusion, alignment and modal noise processing, and provides a detailed analysis of these techniques’ relationships and their applications. Next, the performance metrics of different models on three commonly used datasets are analyzed, further validating the effectiveness of these key techniques. Finally, the existing challenges in multimodal sentiment analysis and future development trends are discussed.
    Reference | Related Articles | Metrics
    Abstract272
    PDF191
    Review of PCB Defect Detection Algorithm Based on Machine Vision
    YANG Sinian, CAO Lijia, YANG Yang, GUO Chuandong
    Journal of Frontiers of Computer Science and Technology    2025, 19 (4): 901-915.   DOI: 10.3778/j.issn.1673-9418.2409061
    Printed circuit board (PCB) as a core component of electronic products, its quality directly affects the reliability of the product. As electronic products move toward lighter, thinner, and more sophisticated, machine vision-based PCB defect detection faces challenges such as the difficulty of detecting tiny defects. In order to further study the PCB defect detection technology, the algorithms of each stage are discussed in detail according to the development history. Firstly, the main challenges in the field are pointed out, and traditional PCB defect detection methods and their limitations are introduced. Then, from the perspective of traditional machine learning and deep learning, this paper systematically reviews the PCB defect detection methods and their advantages and disadvantages in recent years. Next, this paper summarizes the commonly used evaluation indicators and mainstream datasets of PCB defect detection algorithms, compares the performance of the latest research methods on PCB-Defect, DeeP-PCB and HRIPCB datasets in the past three years, and analyzes the reasons for the differences. Finally, based on the current situation and the problems to be solved, the future development trend is prospected.
    Reference | Related Articles | Metrics
    Abstract325
    PDF188
    Improved YOLOv8 Model for Multi-type Lung Nodule Detection
    BAO Qiangqiang, TANG Siyuan, LI Qingqian, WANG Naiyu, YANG Min, GU Yu, ZHAO Jinliang, GAO Jingbo, WANG Jiaxin, QU Yuhan
    Journal of Frontiers of Computer Science and Technology    2025, 19 (2): 429-442.   DOI: 10.3778/j.issn.1673-9418.2406018
    Currently, lung nodule testing is usually a single type of testing for solid lung nodules, and different kinds of lung nodules correspond to multiple types of lung cancer. Multi-type testing can help improve the overall detection rate of lung cancer and enhance the cure rate. Targeted improvements to the YOLOv8 model are made to enable the detection of multiple types of lung nodules, including solid, mixed, and ground glass. Firstly, the RepViTCAA module is proposed to improve the C2f module of the main part to enhance the accuracy of tiny lung nodule detection and the lightweight design of the model. Secondly, the ECLA-HSFPN module is proposed to reconstruct the feature fusion part of the model to improve the scale-invariant lung nodule detection accuracy. Then, the KAN network is integrated into the model further to improve the detection accuracy of tiny lung nodules and enhance the generalization ability of the model based on the strong learning ability of nonlinear features of the KAN network. Finally, based on the Inner-IoU auxiliary frame idea, the CIoU loss function is improved to solve the problem of unfixed lung nodule scale and enhance the model detection accuracy. Tested on the LUNA16 dataset, the improved model has improved all evaluation indices compared with the original model and mainstream models such as YOLOv9 and RT-DETR. The improved model has better detection performance than the original model when tested on a specialized dataset of four types (solid, ground glass, mixed, and microminiature) of lung nodules. The generalizability is tested on a mixed dataset of LUNA16 and local hospitals, and the improved model has strong generalization ability. The improvement of the model is more effective for the task of detecting multiple types of lung nodules and it can accurately detect different types of lung nodules.
    Reference | Related Articles | Metrics
    Abstract268
    PDF174
    Design of University Research Management Question Answering System Integrating Knowledge Graph and Large Language Models
    WANG Yong, QIN Jiajun, HUANG Yourui, DENG Jiangzhou
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 107-117.   DOI: 10.3778/j.issn.1673-9418.2406009
    Scientific research management is a crucial aspect of university management. However, existing scientific research management systems cannot meet the individual needs of users. This paper focuses on transforming university scientific research management towards intelligence as the demand orientation, and combines knowledge graph, traditional model and large language models to jointly build a new university scientific research management question answering system. Firstly, the scientific research knowledge is collected to build a scientific research knowledge graph. Then, a multi-task model is used for semantic parsing, simultaneously performing intent classification and entity extraction. Finally, the parsing results are used to generate query statements to retrieve information from the knowledge graph and answer general questions. Additionally, large language models are combined with knowledge graph to assist in processing open problems. Experimental results on datasets with associated intents and entities show that the F1 values of the adopted multi-task model in intent classification and entity recognition tasks are 0.958 and 0.937, respectively, surpassing other comparison models and single-task models. The Cypher generation test demonstrates the effectiveness of the custom Prompt in stimulating the emergent abilities of large language models. The accuracy of text-generated Cyphers using large language models reaches 85.8%, effectively handling open questions based on knowledge graph. The accuracy of the question answering system built with knowledge graph, traditional model and large language models is 0.935, which well meets the needs of intelligent question and answer.
    Reference | Related Articles | Metrics
    Abstract252
    PDF172
    Heterogeneous Information Network Embedding Learning Based on Attention: a Survey
    TU Jiaqi, ZHANG Hua, CHANG Xiaojie, WANG Ji, YUAN Shuhong
    Journal of Frontiers of Computer Science and Technology    2025, 19 (1): 1-29.   DOI: 10.3778/j.issn.1673-9418.2404034
    In recent years, graph embedding learning has become one of the most commonly used techniques in the field of information network analysis, which embeds network objects into low-dimensional dense vector spaces while preserving network structure and content characteristics. Then the learning embeddings are applied to downstream analysis tasks. However, most real-world networks are heterogeneous information networks (HIN), which are composed of multiple object types, relationships between objects and content characteristics. Therefore, in order to learn more effective embedding, researchers integrate attention mechanisms into the embedding learning of HIN to distinguish the degree of influence of different levels of heterogeneity on embedding. Therefore, this paper reviews the existing attention-integrated HIN embedding learning models. Firstly, it comprehensively reviews the research process of HIN embedding in the past five years, summarizes the three challenges it faces in solving network heterogeneity: content heterogeneity, structure heterogeneity and semantic heterogeneity, and summarizes a general framework of attention-integrated model. Secondly, in view of the above challenges, the existing attention-integrated models are divided into three categories: meta-path-based, graph neural network based and scenario-oriented, and various representative models are compared in detail. Then the common datasets, benchmark platform tools and evaluation indicators are introduced. Finally, the future research direction of HIN embedding learning is discussed.
    Reference | Related Articles | Metrics
    Abstract223
    PDF164
    Research Progress on Sequence Recommendation Based on Deep Learning and Large Language Model
    XU Fengru, LI Bohan, XU Shuai
    Journal of Frontiers of Computer Science and Technology    2025, 19 (2): 344-366.   DOI: 10.3778/j.issn.1673-9418.2407090
    The recommendation system aims to solve the problem of overloading in the information retrieval system, and is committed to recommending personalized interest to users. The behavior of human interaction with the system has a certain order. When providing recommendations, the order needs to be taken into consideration. This is the sequence recommendation system. The sequence recommendation system analyzes user behavior sequences, captures the dynamic changes of user preferences, and provides accurate personalized recommendation services for many fields such as e-commerce, social media and online videos. This paper provides an overview of the current research progress in sequential recommendation systems and explores their significance and application potential in the field of personalized recommendation. Firstly, the research problem of sequential recommendation is defined, and the core objectives and challenges of recommendation sequences are clarified. Next, the main techniques in sequential recommendation are summarized in detail, including: traditional methods based on Markov chains, which model user behavior sequences by relying on state transition probabilities; deep learning-driven methods, which utilize neural network models to capture long-term dependencies and complex patterns; hybrid models, which combine multiple algorithms to enhance the accuracy and robustness of recommendation systems; and emerging methods based on large language models, which improve the understanding of user behavior and recommendation content through the integration of pre-trained large language models. Finally, the future research directions are prospected, with emphasis on the importance of context perception, multimodal fusion, causal inference, specific large language models of vertical fields, alleviating hallucinations, etc.
    Reference | Related Articles | Metrics
    Abstract234
    PDF157
    Small Object Detection Based on Enhanced Feature Pyramid and Focal-AIoU Loss
    SHI Yu, WANG Le, YAO Yepeng, MAO Guojun
    Journal of Frontiers of Computer Science and Technology    2025, 19 (3): 693-702.   DOI: 10.3778/j.issn.1673-9418.2403006
    Unmanned aerial vehicle (UAV) aerial images have characteristics such as small target scale and complex backgrounds, making it difficult to achieve satisfactory recognition accuracy using generic object detection methods directly on these types of images. Based on YOLOv8, this paper proposes a small object detection model called CFE-YOLO (cross-level feature-fusion enhanced-YOLO), which incorporates a feature enhancement network and a localized focal loss. Firstly, a cross-level feature-fusion enhanced pyramid network (CFEPN) is designed to improve the traditional feature pyramid structure by fusing attention feature maps. This is achieved by adding high-resolution feature maps from shallow networks and removing deep detection heads to adapt to the requirements of small object detection. Secondly, a focus loss function based on area intersection over union is designed by combining Complete-IOU and Focal loss function ideas. It is used to further improve the detection of small objects. Finally, a lightweight spatial pyramid pooling layer module is implemented by introducing depth-wise separable convolutions, maintaining the detection accuracy of the model while   reducing the parameter count. Extensive experiments conducted on the UAV datasets VisDrone and Tinyperson show that CFE-YOLO improves the mAP0.50 by 4.72 and 5.58 percentage points respectively compared with the baseline, while   reducing the parameter count by 37.74%. Furthermore, it achieves higher accuracy compared with other advanced algorithms.
    Reference | Related Articles | Metrics
    Abstract214
    PDF155
    Pedestrian Detection in Fisheye Images Based on Improved YOLOv8 Algorithm
    ZHU Yumin, SUN Guangling, MIAO Fei
    Journal of Frontiers of Computer Science and Technology    2025, 19 (2): 443-453.   DOI: 10.3778/j.issn.1673-9418.2404037
    In view of the problems of inaccurate positioning and insufficient detection accuracy in pedestrian detection in fisheye images in existing target detection algorithms, an improved YOLOv8 algorithm for fisheye image detection is proposed. This method designs the ProbIoU-r algorithm by adding angle parameters, uses the scaling factor to adjust the impact of angle difference on the loss, and enhances the model’s attention to the angle offset of the bounding box in gradient calculation, solving the problems of inaccurate positioning of the original IoU in rotated target detection and poor bounding box fitting effect, so that the YOLOv8 network model has better ability to perceive rotated targets. In order to improve the model’s feature extraction ability for distorted targets in fisheye images and improve detection accuracy, a Parnet-gcs module with multi-scale convolution and attention mechanism as branches is proposed. The feature information of different scales is extracted through DWConv with different convolution kernels, and the CA and SA modules are combined to enhance the model’s feature expression ability. The experiment uses the public fisheye image dataset WEPDTOF. The improved algorithm increases the detection accuracy mAP0.50:0.95 by 2.3 percentage points compared with the original YOLOv8s; the number of parameters is reduced by 38.8% compared with the YOLOv8m algorithm, and the accuracy mAP0.50:0.95 is also 0.5 percentage points higher, indicating that the improved algorithm based on YOLOv8s is better suitable for pedestrian detection tasks in fisheye images.
    Reference | Related Articles | Metrics
    Abstract202
    PDF151
    Pedestrian Trajectory Prediction Based on Transformer and Multi-relation Graph Convolutional Networks
    LIU Guihong, ZHOU Zongrun, MENG Xiangfu
    Journal of Frontiers of Computer Science and Technology    2025, 19 (5): 1353-1364.   DOI: 10.3778/j.issn.1673-9418.2405004
    In the field of autonomous navigation, pedestrian trajectories are relatively complex, and accurately predicting pedestrian trajectories is crucial for ensuring safe travel and autonomous driving. Pedestrian trajectories are highly random, dynamic, and influenced by their surroundings, necessitating the effective modeling of their temporal and spatial interactions. To address this, a pedestrian trajectory prediction model combining Transformer and multi-relation graph convolutional network (GCN) is proposed. The model is composed of interaction capture module, anchor control module, and trajectory refinement module. The interaction capture module extracts motion features of each pedestrian on temporal and spatial sequences by using T-Transformer and GCN, while the anchor control module reduces errors by inferring intermediate destinations. The trajectory refinement module enhances predictions. Adding the inverse relationship when extracting features can obtain more optimized results, and using Gaussian pruning to reduce the generation of false paths can also improve the efficiency of the model. Experimental results on ETH and UCY datasets show superior performance in average displacement error (ADE) and final displacement error (FDE) compared with mainstream models. The excellent performance of the model on pedestrian trajectory prediction minimizes unnecessary trajectory changes and collision risks, offering a promising solution for pedestrian trajectory prediction applications.
    Reference | Related Articles | Metrics
    Abstract188
    PDF150
    Survey of Multi-domain Machine Translation Methods for Fine-Tuning Large Models
    CHEN Zijian, WANG Siriguleng, SI Qintu
    Journal of Frontiers of Computer Science and Technology    2025, 19 (4): 916-928.   DOI: 10.3778/j.issn.1673-9418.2410032
    With the rapid development of machine translation technology, machine translation methods based on pre-trained large models have occupied an important position in the field of natural language processing. However, due to the significant differences in language features, lexical styles and expressions between different domains, it is difficult for a single pre-trained model to achieve efficient and stable performance in multi-domain translation tasks. Therefore, this paper focuses on the key issues of large model fine-tuning technology in multi-domain machine translation tasks, systematically reviews the core principles, main methods and application effects of fine-tuning technology, and focuses on analyzing the performance and applicability scenarios of three types of strategies, namely full-parameter fine-tuning, parameter-efficient fine-tuning, and prompt-tuning. This paper discusses the advantages and limitations of different fine-tuning methods in depth, focusing on how to balance the domain generalization ability and task specificity through efficient fine-tuning strategies under resource-constrained conditions, and demonstrating the significant advantages of parameter-efficient fine-tuning and prompt-tuning in terms of resource utilization efficiency and domain adaptability. The practical effects of different fine-tuning strategies in terms of domain migration and resource utilization are further evaluated through comparative analysis and experimental validation, and their effectiveness is verified through case studies. Future research directions should focus on the efficient utilization of resources, the domain adaptive capability of models, and the improvement of translation quality and robustness, so as to promote the continuous development of multi-domain machine translation systems in terms of performance and adaptability.
    Reference | Related Articles | Metrics
    Abstract236
    PDF150
    PPSC: High-Precision and Scalable Encrypted Privacy-Preserving Speech Classification
    WANG Leilei, SONG Kao, ZHANG Yuanyuan, BI Renwan, XIONG Jinbo
    Journal of Frontiers of Computer Science and Technology    2025, 19 (2): 528-538.   DOI: 10.3778/j.issn.1673-9418.2311085
    To address the challenges of low computational efficiency and classification accuracy in existing fully homomorphic encryption technology for speech classification tasks, a high-precision and scalable encrypted privacy-preserving speech classification (PPSC) scheme is proposed. First of all, a secure multiplication protocol based on CKKS fully homomorphic encryption technology is designed to avoid the use of expensive bootstrapping operations, which can effectively improve the computational efficiency of deep ciphertext multiplication, so that the scheme can be extended to deeper neural networks. Based on the above architecture, secure non-polynomial protocols such as secure exponent, secure reciprocal and secure comparison are designed. Compared with the method of polynomial approximate fitting of non-polynomial operations, the protocols improve computation accuracy and reduce computation overhead. Secondly, the PPSC scheme securely implements the fundamental modules such as the convolutional layer, ReLU layer, average pooling layer, fully connected layer, and Softmax layer. This ensures the privacy of speech data, speech classification models, and intermediate computing results. Finally, a detailed theoretical analysis of the PPSC scheme is conducted to evaluate its effectiveness and security. The analysis demonstrates that the secure multiplication protocol exhibits higher computational efficiency in deeper multiplication operations. Experimental results on the Speech Command Database validate the effectiveness of the PPSC scheme in achieving accurate speech classification while preserving the privacy of data and model parameters. Furthermore, the proposed scheme achieves an accuracy that is 3.57 percentage points higher than that of the HEKWS scheme.
    Reference | Related Articles | Metrics
    Abstract174
    PDF145
    Multimodal Rumor Detection Method Based on Multi-granularity Emotional Features of Image-Text
    LIU Xianbo, XIANG Ao, DU Yanhui
    Journal of Frontiers of Computer Science and Technology    2025, 19 (4): 1021-1035.   DOI: 10.3778/j.issn.1673-9418.2406053
    Rumors involving public safety, disasters, and other mass incidents often contain rich emotional features in text or images, which easily mobilize netizens’ emotional responses, inducing them to like, comment, and share. However, existing multimodal rumor detection methods lack effective extraction techniques for the emotional features contained in multimodal data and fail to consider the interrelationship between modalities during feature fusion, resulting in redundant and less accurate feature representations. To explore the role of cross-modal emotional features in rumor detection, a multimodal rumor detection method that integrates multi-granularity emotional features of image-text is proposed. Without relying on social information such as comments and dissemination patterns, this method integrates multi-granularity emotional features into the multimodal rumor detection process. It employs a cross-modal multi-granularity emotional feature fusion method based on an interactive attention mechanism to fully integrate deep features of multimedia information. To evaluate the effectiveness of the proposed method, comparative and ablation experiments are conducted on two public datasets, Weibo and Twitter. The results indicate that the proposed method improves rumor detection accuracy to 0.912 on the Weibo dataset and 0.839 on the Twitter dataset, showing superior performance across multiple metrics such as F1 value, effectively enhancing rumor detection performance and the interpretability of the model. To some extent, it can assist public security agencies in handling rumors during mass incidents, providing technical support for grassroots police operations.
    Reference | Related Articles | Metrics
    Abstract172
    PDF143
    Survey on Deep Learning Based Trajectory Similarity Measurement Approaches
    MENG Xiangfu, SHI Guangqi, ZHANG Xiaoyan, LENG Qiangkui, FANG Jinfeng
    Journal of Frontiers of Computer Science and Technology    2025, 19 (3): 623-644.   DOI: 10.3778/j.issn.1673-9418.2404006
    The development and application of mobile communication and sensing device technology have generated a large amount of trajectory data, which presents characteristics such as high-dimensional heterogeneity, multi-granularity, and uncertainty, which makes traditional trajectory similarity measurement methods based on point pair matching difficult to apply. In recent years, deep learning techniques have been applied to measuring trajectory similarity, aiming to mine more trajectory features, improve computational efficiency, and enhance model robustness. This paper systematically reviews recent trajectory similarity measurement methods based on deep learning. Firstly, it explains the relevant definitions of trajectories. Then, it provides an overview of these methods from two perspectives: metric representation forms (i.e. sequence representation and graph representation) and learning strategies (i.e. representation learning, metric learning, and contrastive learning). Furthermore, it conducts a detailed and comparative analysis on the implementation principles and characteristics of these methods from three aspects: trajectory data preprocessing, embedding representation learning, and similarity measurement. Afterwards, the commonly used datasets and evaluation metrics for trajectory similarity measurement methods based on deep learning are analyzed, and the sources, evaluation metrics, time complexity, and application scenarios of the learning model are summarized. Finally, it analyzes the challenges faced by trajectory similarity measurement methods and prospects future research directions.
    Reference | Related Articles | Metrics
    Abstract218
    PDF142
    Cross-Modal Multi-level Feature Fusion for Semantic Segmentation of Remote Sensing Images
    LI Zhijie, CHENG Xin, LI Changhua, GAO Yuan, XUE Jingyu, JIE Jun
    Journal of Frontiers of Computer Science and Technology    2025, 19 (4): 989-1000.   DOI: 10.3778/j.issn.1673-9418.2403082
    Multimodal semantic segmentation networks can leverage complementary information from different modalities to improve segmentation accuracy. Thus, they are highly promising for land cover classification. However, existing multimodal remote sensing image semantic segmentation models often overlook the geometric shape information of deep features and fail to fully utilize multi-layer features before fusion. This results in insufficient cross-modal feature extraction and suboptimal fusion effects. To address these issues, a remote sensing image semantic segmentation model based on multimodal feature extraction and multi-layer feature fusion is proposed. By constructing a dual-branch encoder, the model can separately extract spectral information from remote sensing images and elevation information from normalized digital surface model (nDSM), and deeply explore the geometric shape information of the nDSM. Furthermore, a cross-layer enrichment module is introduced to refine and enhance each layer??s features, making full use of multi-layer feature information from deep to shallow layers. The refined features are then processed through an attention feature fusion module for differential complementarity and cross-fusion, mitigating the differences between branch structures and fully exploiting the advantages of multimodal features, thereby improving the segmentation accuracy of remote sensing images. Experiments conducted on the ISPRS Vaihingen and Potsdam datasets demonstrate mF1 scores of 90.88% and 93.41%, respectively, and mean intersection over union (mIoU) scores of 83.49% and 87.85%, respectively. Compared with current mainstream algorithms, this model achieves more accurate semantic segmentation of remote sensing images.
    Reference | Related Articles | Metrics
    Abstract247
    PDF141
    Advances in Node Importance Ranking Based on Graph Neural Networks
    CAO Lu, DING Cangfeng, MA Lerong, YAN Zhaoyao, YOU Hao, HONG Anqi
    Journal of Frontiers of Computer Science and Technology    2025, 19 (4): 877-900.   DOI: 10.3778/j.issn.1673-9418.2405056
    Node importance ranking is a critical task in graph analysis, as it plays a crucial role in identifying and prioritizing important nodes within a graph. Graph neural networks (GNNs) serve as an effective framework that leverages deep learning to directly comprehend the structural data of graphs, enabling comprehensive understanding of the internal patterns and deeper semantic features associated with nodes and edges. In the context of node importance ranking, GNNs can effectively harness graph structure information and node features to assess the significance of individual nodes. Compared with traditional node ranking methods, GNNs are better equipped to handle the diverse and intricate nature of graph structural data, capturing complex associations and semantic information between nodes while autonomously learning representations for node features. This reduces reliance on manual feature engineering, thereby enhancing accuracy in node importance ranking tasks. Consequently, approaches based on graph neural networks have emerged as the predominant direction for research into node importance. On this basis, this paper provides a classification of recent advancements in node ranking methods utilizing graph neural networks. This paper begins by revisiting core concepts related to node ranking, graph neural networks, and classical metrics for assessing node importance. It then summarizes recent developments in methods for evaluating node importance using graph neural networks. These techniques are categorized into four groups based on fundamental graph neural networks and their variants: basic GNNs, graph convolutional neural networks (GCNs), graph attention networks (GATs), and graph autoencoders (GAEs). Additionally, this paper analyzes the performance of these methods across various application domains, such as social networks, traffic networks, and knowledge graphs. Finally, it offers a comprehensive overview of existing research by analyzing time complexity along with advantages, limitations, and performance characteristics of current methodologies. Furthermore, it discusses future research directions based on identified shortcomings.
    Reference | Related Articles | Metrics
    Abstract237
    PDF141
    Time Series Anomaly Detection Based on Spatio-Temporal Feature Fusion and Sequence Reconstruction
    YANG Bin, MA Tinghuai, HUANG Xuejian, WANG Yubo, WANG Zhaoming, ZHAO Bowen, YU Xin
    Journal of Frontiers of Computer Science and Technology    2025, 19 (9): 2384-2398.   DOI: 10.3778/j.issn.1673-9418.2411060
    Anomaly detection is a critical component of time series analysis for identifying anomalous events. To address the limitations of traditional methods in integrating spatio-temporal correlations, capturing normal data distributions, and handling time-varying characteristics, this paper proposes a time series anomaly detection model based on spatio-temporal feature fusion and sequence reconstruction (AnomNet). This model comprises three main components: spatio-temporal feature fusion network (STF), time series reconstruction network (TSR), and anomaly detection mechanism (ADM). The STF combines temporal convolutional networks and graph attention influence networks to capture temporal long-term dependencies and global attribute associations, thereby achieving joint modeling of spatio-temporal features. The TSR employs a multi-layer encoder-decoder architecture, utilizes spatio-temporal fused features and cyclical information to learn the normal distribution of samples, which amplifies the discrepancies between reconstructed data and potential anomalies. The ADM detects anomalies by fitting the tail distribution of the reconstruction deviations. Once the anomaly score exceeds a predefined threshold, the mechanism triggers an update of the generalized Pareto distribution parameters, providing the latest standards for subsequent detection. The experimental results on five datasets validate that the AnomNet achieves state-of-the-art performance in the field of time series anomaly detection. Compared with OmniAnomaly, the proposed model shows an average performance improvement of 11.89%.
    Reference | Related Articles | Metrics
    Abstract149
    PDF141