Journal of Frontiers of Computer Science and Technology

Select

Survey of Multimodal Data Fusion Research

ZHANG Hucheng, LI Leixiao, LIU Dongjiang

Journal of Frontiers of Computer Science and Technology 2024, 18 (10): 2501-2520. DOI: 10.3778/j.issn.1673-9418.2403083

Although the powerful learning ability of deep learning has achieved excellent results in the field of single-modal applications, it has been found that the feature representation of a single modality is difficult to fully contain the complete information of a phenomenon. In order to break through the obstacles of feature representation on a single modality and make greater use of the value contained in multiple modalities, scholars have begun to propose the use of multimodal fusion to improve model learning performance. Multimodal fusion technology is to make the machine use the correlation and complementarity between modalities to fuse into a better feature representation in text, speech, image and video, which provides a basis for model training. At present, the research of multimodal fusion is still in the early stage of development. This paper starts from the hot research field of multimodal fusion in recent years, and expounds the multimodal fusion method and the multimodal alignment technology in the fusion process. Firstly, the application, advantages and disadvantages of joint fusion method, cooperative fusion method, encoder fusion method and split fusion method in multimodal fusion are analyzed. The problem of multimodal alignment in the fusion process is expounded, including explicit alignment and implicit alignment, as well as the application, advantages and disadvantages. Secondly, it expounds the application of popular datasets in multimodal fusion in different fields in recent years. Finally, the challenges and research prospects of multimodal fusion are expounded to further promote the development and application of multimodal fusion.

Reference | Related Articles | Metrics

Abstract （1338）

PDF （782）

Select

YOLOv8-VSC: Lightweight Algorithm for Strip Surface Defect Detection

WANG Chunmei, LIU Huan

Journal of Frontiers of Computer Science and Technology 2024, 18 (1): 151-160. DOI: 10.3778/j.issn.1673-9418.2308060

Currently, in the field of strip steel surface defect detection, the generalized target detection algorithm is highly complex and computationally large, while terminal equipment responsible for the detection of some small and medium-sized enterprises usually does not have strong computational capabilities, and the computational resources are limited, which leads to difficulties in the deployment of detection algorithms. To solve this problem, this paper proposes a lightweight strip steel surface defect detection model YOLOv8-VSC based on the YOLOv8n target detec-tion framework, which uses a lightweight VanillaNet network as the backbone feature extraction network and reduces the complexity of the model by reducing the unnecessary branching structure. Meanwhile, the SPD module is introduced to speed up the inference of the model while reducing the number of network layers. To further improve the detection accuracy, a lightweight up-sampling operator, CARAFE, is used in the feature fusion network to improve the quality and richness of the features. Finally, extensive experiments on the NEU-DET dataset yield a model with parametric and computational quantities of 1.96×106 and 6.0 GFLOPs, which are only 65.1% and 74.1% of the baseline, and the mAP reaches 80.8%, which is an improvement of 1.8 percentage points from the baseline. In addition, experimental results on the aluminum surface defect dataset and the VOC2012 dataset show that the proposed algorithm has good robustness. Compared with advanced target detection algorithms, the proposed algorithm requires fewer computational resources while ensuring high detection accuracy.

Reference | Related Articles | Metrics

Abstract （816）

PDF （685）

Select

Review of Attention Mechanisms in Image Processing

QI Xuanhao, ZHI Min

Journal of Frontiers of Computer Science and Technology 2024, 18 (2): 345-362. DOI: 10.3778/j.issn.1673-9418.2305057

Attention mechanism in image processing has become one of the popular and important techniques in the field of deep learning, and is widely used in various deep learning models in image processing because of its excellent plug-and-play convenience. By weighting the input features, the attention mechanism focuses the model’s attention on the most important regions to improve the accuracy and performance of image processing tasks. Firstly, this paper divides the development process of attention mechanism into four stages, and on this basis, reviews and summarizes the research status and progress of four aspects: channel attention, spatial attention, channel and spatial mixed attention, and self-attention. Secondly, this paper provides a detailed discussion on the core idea, key structure and specific implementation of attention mechanism, and further summarizes the advantages and disadvantages of used models. Finally, by comparing the current mainstream attention mechanisms and analyzing the results, this paper discusses the problems of attention mechanisms in the image processing field at this stage, and provides an outlook on the future development of attention mechanisms in image processing, so as to provide references for further research.

Reference | Related Articles | Metrics

Abstract （794）

PDF （548）

Select

Survey on Visual Transformer for Image Classification

PENG Bin, BAI Jing, LI Wenjing, ZHENG Hu, MA Xiangyu

Journal of Frontiers of Computer Science and Technology 2024, 18 (2): 320-344. DOI: 10.3778/j.issn.1673-9418.2310092

Transformer is a deep learning model based on the self-attention mechanism, showing tremendous potential in computer vision. In image classification tasks, the key challenge lies in efficiently and accurately capturing both local and global features of input images. Traditional approaches rely on convolutional neural networks to extract local features at the lower layers, expanding the receptive field through stacked convolutional layers to obtain global features. However, this strategy aggregates information over relatively short distances, making it difficult to model long-term dependencies. In contrast, the self-attention mechanism of Transformer directly compares features across all spatial positions, capturing long-range dependencies at both local and global levels and exhibiting stronger global modeling capabilities. Therefore, a thorough exploration of the challenges faced by Transformer in image classification tasks is crucial. Taking Vision Transformer as an example, this paper provides a detailed overview of the core principles and architecture of Transformer. It then focuses on image classification tasks, summarizing key issues and recent advancements in visual Transformer research related to performance enhancement, computational costs, and training optimization. Furthermore, applications of Transformer in specific domains such as medical imagery, remote sensing, and agricultural images are summarized, highlighting its versatility and generality. Finally, a comprehensive analysis of the research progress in visual Transformer for image classification is presented, offering insights into future directions for the development of visual Transformer.

Reference | Related Articles | Metrics

Abstract （737）

PDF （505）

Select

Deep Learning-Based Infrared and Visible Image Fusion: A Survey

WANG Enlong, LI Jiawei, LEI Jia, ZHOU Shihua

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 899-915. DOI: 10.3778/j.issn.1673-9418.2306061

How to preserve the complementary information in multiple images to represent the scene in one image is a challenging topic. Based on this topic, various image fusion methods have been proposed. As an important branch of image fusion, infrared and visible image fusion (IVIF) has a wide range of applications in segmentation, target detection and military reconnaissance fields. In recent years, deep learning has led the development direction of image fusion. Researchers have explored the field of IVIF using deep learning. Relevant experimental work has proven that applying deep learning to achieving IVIF has significant advantages compared with traditional methods. This paper provides a detailed analysis on the advanced algorithms for IVIF based on deep learning. Firstly, this paper reports on the current research status from the aspects of network architecture, method innovation, and limitations. Secondly, this paper introduces the commonly used datasets in IVIF methods and provides the definition of commonly used evaluation metrics in quantitative experiments. Qualitative and quantitative evaluation experiments of fusion and segmentation and fusion efficiency analysis experiments are conducted on some representative methods mentioned in the paper to comprehensively evaluate the performance of the methods. Finally, this paper provides conclusions and prospects for possible future research directions in the field.

Reference | Related Articles | Metrics

Abstract （719）

PDF （602）

Select

Word Embedding Methods in Natural Language Processing: a Review

ZENG Jun, WANG Ziwei, YU Yang, WEN Junhao, GAO Min

Journal of Frontiers of Computer Science and Technology 2024, 18 (1): 24-43. DOI: 10.3778/j.issn.1673-9418.2303056

Word embedding, as the first step in natural language processing (NLP) tasks, aims to transform input natural language text into numerical vectors, known as word vectors or distributed representations, which artificial intelligence models can process. Word vectors, the foundation of NLP tasks, are a prerequisite for accomplishing various NLP downstream tasks. However, most existing review literature on word embedding methods focuses on the technical routes of different word embedding methods, neglecting comprehensive analysis of the tokenization methods and the complete evolutionary trends of word embedding. This paper takes the introduction of the word2vec model and the Transformer model as pivotal points. From the perspective of whether generated word vectors can dynamically change their implicit semantic information to adapt to the overall semantics of input sentences, this paper categorizes word embedding methods into static and dynamic approaches and extensively discusses this classification. Simultaneously, it compares and analyzes tokenization methods in word embedding, including whole and sub-word segmentation. This paper also provides a detailed enumeration of the evolution of language models used to train word vectors, progressing from probability language models to neural probability language models and the current deep contextual language models. Additionally, this paper summarizes and explores the training strategies employed in pre-training language models. Finally, this paper concludes with a summary of methods for evaluating word vector quality, an analysis of the current state of word embedding methods, and a prospective outlook on their development.

Reference | Related Articles | Metrics

Abstract （673）

PDF （491）

Select

Review of Research on Rolling Bearing Health Intelligent Monitoring and Fault Diagnosis Mechanism

WANG Jing, XU Zhiwei, LIU Wenjing, WANG Yongsheng, LIU Limin

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 878-898. DOI: 10.3778/j.issn.1673-9418.2307005

As one of the most critical and failure-prone parts of the mechanical systems of industrial equipment, bearings are subjected to high loads for long periods of time. When they fail or wear irreversibly, they may cause accidents or even huge economic losses. Therefore, effective health monitoring and fault diagnosis are of great significance to ensure safe and stable operation of industrial equipment. In order to further promote the development of bearing health monitoring and fault diagnosis technology, the current existing models and methods are analyzed and summarized, and the existing technologies are divided and compared. Starting from the distribution of vibration signal data used, firstly, the relevant methods under uniform data distribution are sorted out, the classification, analysis and summary of the current research status are carried out mainly according to signal-based analysis and data-driven-based, and the shortcomings and defects of the fault detection methods in this case are outlined. Secondly, considering the problem of uneven data acquisition under actual working conditions, the detection methods for dealing with such cases are summarized, and different processing techniques for this problem in existing research are classified into data processing methods, feature extraction methods, and model improvement methods according to their different focuses, and the existing problems are analyzed and summarized. Finally, the challenges and future development directions of bearing fault detection in existing industrial equipment are summarized and prospected.

Reference | Related Articles | Metrics

Abstract （662）

PDF （345）

Select

Review of Research on 3D Reconstruction of Dynamic Scenes

SUN Shuifa, TANG Yongheng, WANG Ben, DONG Fangmin, LI Xiaolong, CAI Jiacheng, WU Yirong

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 831-860. DOI: 10.3778/j.issn.1673-9418.2305016

As static scene 3D reconstruction algorithms become more mature, dynamic scene 3D reconstruction has become a hot and challenging research topic in recent years. Existing static scene 3D reconstruction algorithms have good reconstruction results for stationary objects. However, when objects in the scene undergo deformation or relative motion, their reconstruction results are not ideal. Therefore, developing research on 3D reconstruction of dynamic scenes is essential. This paper first introduces the related concepts and basic knowledge of 3D reconstruction, as well as the research classification and current status of static and dynamic scene 3D reconstruction. Then, the latest research progress on dynamic scene 3D reconstruction is comprehensively summarized, and the reconstruction algorithms are classified into dynamic 3D reconstruction based on RGB data sources and dynamic 3D reconstruction based on RGB-D data sources. RGB data sources can be further divided into template based dynamic 3D reconstruction, non rigid motion recovery structure based dynamic 3D reconstruction, and learning based dynamic 3D reconstruction under RGB data sources. The RGB-D data source mainly summarizes dynamic 3D reconstruction based on learning, with typical examples introduced and compared. The applications of dynamic scene 3D reconstruction in medical, intelligent manufacturing, virtual reality and augmented reality, and transportation fields are also discussed. Finally, future research directions for dynamic scene 3D reconstruction are proposed, and an outlook on the research progress in this rapidly developing field is presented.

Reference | Related Articles | Metrics

Abstract （587）

PDF （502）

Select

Review of Attention Mechanisms in Reinforcement Learning

XIA Qingfeng, XU Ke'er, LI Mingyang, HU Kai, SONG Lipeng, SONG Zhiqiang, SUN Ning

Journal of Frontiers of Computer Science and Technology 2024, 18 (6): 1457-1475. DOI: 10.3778/j.issn.1673-9418.2312006

In recent years, the combination of reinforcement learning and attention mechanisms has attracted an increasing attention in algorithmic research field. Attention mechanisms play an important role in improving the performance of algorithms in reinforcement learning. This paper mainly focuses on the development of attention mechanisms in deep reinforcement learning and examining their applications in the multi-agent reinforcement learning domain. Relevant researches are conducted accordingly. Firstly, the background and development of attention mechanisms and reinforcement learning are introduced, and relevant experimental platforms in this field are also presented. Secondly, classical algorithms of reinforcement learning and attention mechanisms are reviewed and attention mechanism is categorized from different perspectives. Thirdly, practical applications of attention mechanisms in the reinforcement field are sorted out based on three types of tasks including fully cooperative, fully competitive and mixed, with focus on the application in the field of multi-agent. Finally, the improvement of attention mechanisms on reinforcement learning algorithms is summarized. The challenges and future prospects in this field are discussed.

Reference | Related Articles | Metrics

Abstract （569）

PDF （538）

Select

Research Progress in Application of Deep Learning in Animal Behavior Analysis

SHEN Tong, WANG Shuo, LI Meng, QIN Lunming

Journal of Frontiers of Computer Science and Technology 2024, 18 (3): 612-626. DOI: 10.3778/j.issn.1673-9418.2306033

In recent years, animal behavior analysis has become one of the most important methods in the fields of neuroscience and artificial intelligence. Taking advantage of the powerful deep-learning-based image analysis technology, researchers have developed state-of-the-art automatic animal behavior analysis methods with complex functions. Compared with traditional methods of animal behavior analysis, special labeling is not required in these methods, animal pose can be efficiently estimated and tracked. These methods like in a natural environment, which hold the potential for complex animal behavior experiments. Therefore, the application of deep learning in animal behavior analysis is reviewed. Firstly, this paper analyzes the tasks and current status of animal behavior analysis. Then, it highlights and compares existing deep learning-based animal behavior analysis tools. According to the dimension of experimental analysis, the deep learning-based animal behavior analysis tools are divided into two-dimensional animal behavior analysis tools and three-dimensional animal behavior analysis tools, and the functions, performance and scope of application of tools are discussed. Furthermore, the existing animal datasets and evaluation metrics are introduced, and the algorithm mechanism used in the existing animal behavior analysis tool is summarized from the advantages, limitations and applicable scenarios. Finally, the deep learning-based animal behavior analysis tools are prospected from the aspects of dataset, experimental paradigm and low latency.

Reference | Related Articles | Metrics

Abstract （521）

PDF （487）

Select

Critical Review of Multi-focus Image Fusion Based on Deep Learning Method

LI Ziqi, SU Yuxuan, SUN Jun, ZHANG Yonghong, XIA Qingfeng, YIN Hefeng

Journal of Frontiers of Computer Science and Technology 2024, 18 (9): 2276-2292. DOI: 10.3778/j.issn.1673-9418.2306058

Multi-focus image fusion is an effective image fusion technology, which aims to combine source images from different focal planes of the same scene to obtain a good fusion result. This means that the fused image will focus on all focal planes, that is, it contains more abundant scene information. The development of deep learning promotes the great progress of image fusion, and the powerful feature extraction and reconstruction ability of neural network makes the fusion result promising. In recent years, more and more multi-focus image fusion methods based on deep learning have been proposed, such as convolutional neural network (CNN), generative adversarial network (GAN) and automatic encoder, etc. In order to provide effective reference for relevant researchers and technicians, firstly, this paper introduces the concept of multi-focus image fusion and some evaluation indicators. Then, it analyzes more than ten advanced methods of multi-focus image fusion based on deep learning in recent years, discusses the characteristics and innovation of various methods, and summarizes their advantages and disadvantages. In addition, it reviews the application of multi-focus image fusion technology in various scenes, including photographic visualization, medical diagnosis, remote sensing detection and other fields. Finally, it proposes some challenges faced by current multi-focus image fusion related fields and looks forward to future possible research trends.

Reference | Related Articles | Metrics

Abstract （468）

PDF （345）

Select

Survey of Research on SMOTE Type Algorithms

WANG Xiaoxia, LI Leixiao, LIN Hao

Journal of Frontiers of Computer Science and Technology 2024, 18 (5): 1135-1159. DOI: 10.3778/j.issn.1673-9418.2309079

Synthetic minority oversampling technique (SMOTE) has become one of the mainstream methods for dealing with unbalanced data due to its ability to effectively deal with minority samples, and many SMOTE improvement algorithms have been proposed, but very little research existing considers popular algorithmic-level improvement methods. Therefore a more comprehensive analysis of existing SMOTE class algorithms is provided. Firstly, the basic principles of the SMOTE method are elaborated in detail, and then the SMOTE class algorithms are systematically analyzed mainly from the two levels of data level and algorithmic level, and the new ideas of the hybrid improvement of data level and algorithmic level are introduced. Data-level improvement is to balance the data distribution by deleting or adding data through different operations during preprocessing; algorithmic-level improvement will not change the data distribution, and mainly strengthens the focus on minority samples by modifying or creating algorithms. Comparison between these two kinds of methods shows that, data-level methods are less restricted in their application, and algorithmic-level improvements generally have higher algorithmic robustness. In order to provide more comprehensive basic research material on SMOTE class algorithms, this paper finally lists the commonly used datasets, evaluation metrics, and gives ideas of research in the future to better cope with unbalanced data problem.

Reference | Related Articles | Metrics

Abstract （460）

PDF （348）

Select

Survey on Deep Learning in Oriented Object Detection in Remote Sensing Images

LAN Xin, WU Song, FU Boyi, QIN Xiaolin

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 861-877. DOI: 10.3778/j.issn.1673-9418.2308031

The objects in remote sensing images have the characteristics of arbitrary direction and dense arrangement, and thus objects can be located and separated more precisely by using inclined bounding boxes in object detection task. Nowadays, oriented object detection in remote sensing images has been widely applied in both civil and military defense fields, which shows great significance in the research and application, and it has gradually become a research hotspot. This paper provides a systematic summary of oriented object detection methods in remote sensing images. Firstly, three widely-used representations of inclined bounding boxes are summarized. Then, the main challenges faced in supervised learning are elaborated from four aspects: feature misalignment, boundary discontinuity, inconsistency between metric and loss and oriented object location. Next, according to the motivations and improved strategies of different methods, the main ideas and advantages and disadvantages of each algorithm are analyzed in detail, and the overall framework of oriented object detection in remote sensing images is summarized. Furthermore, the commonly used oriented object detection datasets in remote sensing field are introduced. Experimental results of classical methods on different datasets are given, and the performance of different methods is evaluated. Finally, according to the challenges of deep learning applied to oriented object detection in remote sensing images tasks, the future research trend in this direction is prospected.

Reference | Related Articles | Metrics

Abstract （435）

PDF （482）

Select

Advances of Adversarial Attacks and Robustness Evaluation for Graph Neural Networks

WU Tao, CAO Xinwen, XIAN Xingping, YUAN Lin, ZHANG Shu, CUI Canyixing, TIAN Kan

Journal of Frontiers of Computer Science and Technology 2024, 18 (8): 1935-1959. DOI: 10.3778/j.issn.1673-9418.2311117

In recent years, graph neural networks (GNNs) have gradually become an important research direction in artificial intelligence. However, the adversarial vulnerability of GNNs poses severe challenges to their practical applications. To gain a comprehensive understanding of adversarial attacks and robustness evaluation on GNNs, related state-of-the-art advancements are reviewed and discussed. Firstly, this paper introduces the research background of adversarial attacks on GNNs, provides a formal definition of these attacks, and elucidates the basic concepts and framework for research on adversarial attacks and robustness evaluation in GNNs. Following this, this paper gives an overview of the specific methods proposed in the field of adversarial attacks on GNNs, and details the foremost methods while categorizing them based on the type of adversarial attack and range of attack targets. Their operating mechanisms, principles, and pros and cons are also analyzed. Additionally, considering the model robustness evaluation's dependency on adversarial attack methods and adversarial perturbation degree, this paper focuses on direct evaluation indicators. To aid in designing and evaluating adversarial attack methods and GNNs' robust models, this paper compares representative methods considering implementation ease, accuracy, and execution time. This paper foresees ongoing challenges and future research areas. Current research on GNNs?? adversarial robustness is experiment-oriented, lacking a guiding theoretical framework, necessitating further systematic theoretical research to ensure GNN-based systems' trustworthiness.

Reference | Related Articles | Metrics

Abstract （432）

PDF （313）

Select

Overview of Cross-Chain Identity Authentication Based on DID

BAI Yirui, TIAN Ning, LEI Hong, LIU Xuefeng, LU Xiang, ZHOU Yong

Journal of Frontiers of Computer Science and Technology 2024, 18 (3): 597-611. DOI: 10.3778/j.issn.1673-9418.2304003

With the emergence of concepts such as metaverse and Web3.0, blockchain plays a very important role in many fields. Cross-chain technology is an important technical means to achieve inter-chain interconnection and value transfer. At this stage, traditional cross-chain technologies such as notary and sidechain have trust issues. At the same time, in the field of cross-chain identity authentication, there are problems that the identities of each chain are not unified and users do not have control over their own identities. Firstly, it systematically summarizes the development process and technical solutions of digital identity and cross-chain technology, and analyzes and compares four digital identity models and nine mainstream cross-chain projects. Secondly, by analyzing the main research results of cross-chain identity authentication in recent years, a general model of cross-chain identity authentication is designed, and the shortcomings of existing solutions are summarized. Then, it focuses on the cross-chain identity authentication implementation scheme based on DID, and analyzes the technical characteristics, advantages and disadvantages of different solutions. On this basis, three DID-based cross-chain identity authentication models are summarized, the main implementation steps are functionally described, and their advantages, limitations and efficiency are analyzed. Finally, in view of the shortcomings of the current DID-based cross-chain identity authentication model, its development difficulties are discussed and five possible future research directions are given.

Reference | Related Articles | Metrics

Abstract （429）

PDF （410）

Select

Improved YOLOv4-Tiny Lightweight Target Detection Algorithm

HE Xiangjie, SONG Xiaoning

Journal of Frontiers of Computer Science and Technology 2024, 18 (1): 138-150. DOI: 10.3778/j.issn.1673-9418.2301034

Object detection is an important branch of deep learning. A large number of edge devices need lightweight object detection algorithms, but the existing lightweight universal object detection algorithms have problems of low detection accuracy and slow detection speed. To solve this problem, an improved YOLOv4-Tiny algorithm based on attention mechanism is proposed. The structure of the original backbone network of YOLOv4-Tiny algorithm is adjusted, the ECA (efficient channel attention) attention mechanism is introduced, the traditional spatial pyramid pooling (SPP) structure is improved to DC-SPP structure by using void convolution, and the CSATT (channel spatial attention) attention mechanism is proposed. The neck network of CSATT-PAN (channel spatial attention path aggregation network) is formed with the feature fusion network PAN, which improves the feature fusion capability of the network. Compared with the original YOLOv4-Tiny algorithm, the proposed YOLOv4-CSATT algorithm is significantly more sensitive to information and accurate in classification when the detection speed is basically the same. The accuracy is increased by 12.3 percentage points on VOC dataset and 6.4 percentage points is increased on COCO dataset. Moreover, the accuracy is 3.3，5.5，6.3，17.4，10.3，0.9 and 0.6 percentage points higher than the Faster R-CNN, SSD, Efficientdet-d1, YOLOv3-Tiny, YOLOv4-MobileNetv1, YOLOv4-MobileNetv2 and PP-YOLO algorithms respectively on VOC dataset, and 2.8, 7.1, 4.2, 18.0, 12.2, 2.1 and 4.0 percentage points higher in recall rate, respectively, with an FPS of 94. In this paper, the CSATT attention mechanism is proposed to improve the model’s ability to capture spatial channel information, and the ECA attention mechanism is combined with the feature fusion pyramid algorithm to improve the model’s feature fusion ability and target detection accuracy.

Reference | Related Articles | Metrics

Abstract （423）

PDF （272）

Select

Review of Application of Generative Adversarial Networks in Image Restoration

GONG Ying, XU Wentao, ZHAO Ce, WANG Binjun

Journal of Frontiers of Computer Science and Technology 2024, 18 (3): 553-573. DOI: 10.3778/j.issn.1673-9418.2307073

With the rapid development of generative adversarial networks, many image restoration problems that are difficult to solve based on traditional methods have gained new research approaches. With its powerful generation ability, generative adversarial networks can restore intact images from damaged images, so they are widely used in image restoration. In order to summarize the relevant theories and research on the problem of using generative adversarial networks to repair damaged images in recent years, based on the categories of damaged images and their adapted repair methods, the applications of image restoration are divided into three main aspects: image inpainting, image deblurring, and image denoising. For each aspect, the applications are further subdivided through technical principles, application objects and other dimensions. For the field of image inpainting, different image completion methods based on generative adversarial networks are discussed from the perspectives of using conditional guidance and latent coding. For the field of image deblurring, the essential differences between motion blurred images and static blurred images and their repair methods are explained. For the field of image denoising, personalized denoising methods for different categories of images are summarized. For each type of applications, the characteristics of the specific GAN models employed are summarized. Finally, the advantages and disadvantages of GAN applied to image restoration are summarized, and the future application scenarios are prospected.

Reference | Related Articles | Metrics

Abstract （408）

PDF （427）

Select

MFFNet: Image Semantic Segmentation Network of Multi-level Feature Fusion

WANG Yan, NAN Peiqi

Journal of Frontiers of Computer Science and Technology 2024, 18 (3): 707-717. DOI: 10.3778/j.issn.1673-9418.2209110

In the task of image semantic segmentation, most methods do not make full use of features of different scales and levels, but directly upsampling, which will cause some effective information to be dismissed as redundant information, thus reducing the accuracy and sensitivity of segmentation of some small categories and similar categories. Therefore, a multi-level feature fusion network (MFFNet) is proposed. MFFNet uses encoder-decoder structure, during the encoding stage, the context information and spatial detail information are obtained through the context information extraction path and spatial information extraction path respectively to enhance the inter-pixel correlation and boundary accuracy. During the decoding stage, a multi-level feature fusion path is designed, and the context information is fused by the mixed bilateral fusion module. Deep information and spatial information are fused by high-low feature fusion module. The global channel-attention fusion module is used to obtain the connections between different channels and realize global fusion of different scale information. The MIoU (mean intersection over union) of MFFNet network on the PASCAL VOC 2012 and Cityscapes validation sets is 80.70% and 76.33%, respectively, achieving better segmentation results.

Reference | Related Articles | Metrics

Abstract （404）

PDF （258）

Select

Multi-strategy Improved Dung Beetle Optimizer and Its Application

GUO Qin, ZHENG Qiaoxian

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 930-946. DOI: 10.3778/j.issn.1673-9418.2308020

Dung beetle optimizer (DBO) is an intelligent optimization algorithm proposed in recent years. Like other optimization algorithms, DBO also has disadvantages such as low convergence accuracy and easy to fall into local optimum. A multi-strategy improved dung beetle optimizer (MIDBO) is proposed. Firstly, it improves acceptance of local and global optimal solutions by brood balls and thieves, so that the beetles can dynamically change according to their own searching ability, which not only improves the population quality but also maintains the good searching ability of individuals with high fitness. Secondly, the follower position updating mechanism in the sparrow search algorithm is integrated to disturb the algorithm, and the greedy strategy is used to update the location, which improves the convergence accuracy of the algorithm. Finally, when the algorithm stagnates, Cauchy Gaussian variation strategy is introduced to improve the ability of the algorithm to jump out of the local optimal solution. Based on 20 benchmark test functions and CEC2019 test function, the simulation experiment verifies the effectiveness of the three improved strategies. The convergence analysis of the optimization results of the improved algorithm and the comparison algorithms and Wilcoxon rank sum test prove that MIDBO has good optimization performance and robustness. The validity and reliability of MIDBO in solving practical engineering problems are further verified by applying MIDBO to the solution of automobile collision optimization problems.

Reference | Related Articles | Metrics

Abstract （404）

PDF （325）

Select

Survey of Transformer-Based Single Image Dehazing Methods

ZHANG Kaili, WANG Anzhi, XIONG Yawei, LIU Yun

Journal of Frontiers of Computer Science and Technology 2024, 18 (5): 1182-1196. DOI: 10.3778/j.issn.1673-9418.2307103

As a fundamental computer vision task, image dehazing aims to preprocess degraded images by restoring color contrast and texture information to improve visibility and image quality, thereby the clear images can be recovered for subsequent high-level visual tasks, such as object detection, tracking, and object segmentation. In recent years, neural network-based dehazing methods have achieved notable success, with a growing number of Transformer-based dehazing approaches being proposed. Up to now, there is a lack of comprehensive review that thoroughly analyzes Transformer-based image dehazing algorithms. To fill this gap, this paper comprehensively sorts out Transformer-based daytime, nighttime and remote sensing image dehazing algorithms, which not only covers the fundamental principles of various types of dehazing algorithms, but also explores the applicability and performance of these algorithms in different scenarios. In addition, the commonly used datasets and evaluation metrics in image dehazing tasks are introduced. On this basis, analysis of the performance of existing representative dehazing algorithms is carried out from both quantitative and qualitative perspectives, and the performance of typical dehazing algorithms in terms of dehazing effect, operation speed, resource consumption is compared. Finally, the application scenarios of image dehazing technology are summarized, and the challenges and future development directions in the field of image dehazing are analyzed and prospected.

Reference | Related Articles | Metrics

Abstract （402）

PDF （389）

Select

Construction and Application of Knowledge Graph for Water Engineering Scheduling Based on Large Language Model

FENG Jun, CHANG Yanghong, LU Jiamin, TANG Hailin, LYU Zhipeng, QIU Yuchun

Journal of Frontiers of Computer Science and Technology 2024, 18 (6): 1637-1647. DOI: 10.3778/j.issn.1673-9418.2311098

With the growth of water conservancy and the increasing demand for information, handling and representing large volumes of water-related data has become complex. Particularly, scheduling textual data often exists in natural language form, lacking clear structure and standardization. Processing and utilizing such diverse data necessitates extensive domain knowledge and professional expertise. To tackle this challenge, a method based on large language model has been proposed to construct a knowledge graph for water engineering scheduling. This approach involves collecting and preprocessing scheduling rule data at the data layer, leveraging large language models to extract embedded knowledge, constructing the ontology at the conceptual layer, and extracting the “three-step” method prompt strategy at the instance layer. Under the interaction of the data, conceptual, and instance layers, high-performance extraction of rule texts is achieved, and the construction of the dataset and knowledge graph is completed. Experimental results show that the F1 value of the extraction method in this paper reaches 85.5%, and the effectiveness and rationality of the modules of the large language model are validated through ablation experiments. This graph integrates dispersed water conservancy rule information, effectively handles unstructured textual data, and offers visualization querying and functionality tracing. It aids professionals in assessing water conditions and selecting appropriate scheduling schemes, providing valuable support for conservancy decision-making and intelligent reasoning.

Reference | Related Articles | Metrics

Abstract （399）

PDF （389）

Select

Survey on Solving Cold Start Problem in Recommendation Systems

MAO Qian, XIE Weicheng, QIAO Yitian, HUANG Xiaolong, DONG Gang

Journal of Frontiers of Computer Science and Technology 2024, 18 (5): 1197-1210. DOI: 10.3778/j.issn.1673-9418.2308044

Recommender systems provide important functions in areas such as dealing with data overload, providing personalized consulting services, and assisting clients in investment decisions. However, the cold start problem in recommender systems has always been in urgent need of solution and optimization. Based on this, this paper classifies the traditional methods and cutting-edge methods to solve the cold start problem, and expounds the research progress and excellent methods in recent years. Firstly, three traditional solutions to the cold start problem are summarized: recommendation based on content filtering, recommendation based on collaborative filtering, and hybrid recommendation. Secondly, the current cutting-edge recommendation algorithms to solve the cold start problem are summarized, and they are classified into the data-driven strategy and the method-driven strategy. The method-driven strategy is divided into algorithms based on meta-learning, algorithms based on context information and session str-ategy, algorithms based on random walk, algorithms based on heterogeneous graph information and attribute graph, and algorithms based on adversarial mechanism. According to the type of cold start problem, the algorithms are divided into two categories: new users and new items. Then, according to the particularity of the recommendation field, the cold start problem of the recommendation in the multimedia information field and the online e-commerce platform field is expounded. Finally, the possible research directions to solve the cold start problem in the future are summarized.

Reference | Related Articles | Metrics

Abstract （378）

PDF （273）

Select

Survey on Natural Scene Text Recognition Methods of Deep Learning

ZENG Fanzhi, FENG Wenjie, ZHOU Yan

Journal of Frontiers of Computer Science and Technology 2024, 18 (5): 1160-1181. DOI: 10.3778/j.issn.1673-9418.2306024

Natural scene text recognition holds significant value in both academic research and practical applications, making it one of the research hotspots in the field of computer vision. However, the recognition process faces challenges such as diverse text styles and complex background environments, leading to unsatisfactory efficiency and accuracy. Traditional text recognition methods based on manually designed features have limited representation capabilities, which are insufficient for effectively handling complex tasks in natural scene text recognition. In recent years, significant progress has been made in natural scene text recognition by adopting deep learning methods. This paper systematically reviews the recent research work in this area. Firstly, the natural scene text recognition methods are categorized into segmentation-based and non-segmentation-based approaches based on character segmentation required or not. The non-segmentation-based methods are further subdivided according to their technical implementation characteristics, and the working principles of the most representative methods in each category are described. Next, commonly used datasets and evaluation metrics are introduced, and the performance of various methods is compared on these datasets. The advantages and limitations of different approaches are discussed from multiple perspectives. Finally, the shortcomings and challenges are given, and the future development trends are also put forward.

Reference | Related Articles | Metrics

Abstract （345）

PDF （360）

Select

Deep Learning Compiler Load Balancing Optimization Method for Model Training

WANG Li, GAO Kai, ZHAO Yaqian, LI Rengang, CAO Fang, GUO Zhenhua

Journal of Frontiers of Computer Science and Technology 2024, 18 (1): 111-126. DOI: 10.3778/j.issn.1673-9418.2209026

For computing-intensive artificial intelligence (AI) training tasks, the computational graph is more complex, and data loading, task division of the computational graph, and load balancing of task scheduling have become the key factors affecting the computing performance. This paper proposes three optimization methods to make the task scheduling of model training in deep learning compilers reach the load balance state. Firstly, the load balance between CPU and back-end computing devices is realized by automatically establishing an efficient pipeline for data loading and model training, which improves the overall energy efficiency of the system. Secondly, the layered optimization technology of computational graph is used to realize the load balance of computational graph when the back-end devices are scheduling. Finally, this paper improves the resource utilization of back-end devices by automatically establishing efficient pipeline between layers. Experimental results show that the proposed optimization method achieves the system load balancing in the process of automatically mapping the training tasks to underlying hardware devices. Compared with traditional deep learning frameworks and compilers such as TensorFlow, nGraph, etc., this paper achieves 2%~10% performance improvement in the training of different AI models, and the overall power consumption of the training system can be reduced by more than 10%.

Reference | Related Articles | Metrics

Abstract （343）

PDF （502）

Select

Survey of Multi-task Recommendation Algorithms

WEN Minwei, MEI Hongyan, YUAN Fengyuan, ZHANG Xiaoyu, ZHANG Xing

Journal of Frontiers of Computer Science and Technology 2024, 18 (2): 363-377. DOI: 10.3778/j.issn.1673-9418.2303014

Single-task recommendation algorithms have problems such as sparse data, cold start and unstable recommendation effect. Multi-task recommendation algorithms can jointly model multiple types of user behaviour data and additional information, to better explore the user’s interests and needs in order to improve the recommendation effect and user satisfaction, which provides a new way of thinking to solve a series of problems existing in single-task recommendation algorithms. Firstly, the development background and trend of multi-task recommendation algorithms are sorted out. Secondly, the implementation steps of the multi-task recommendation algorithm and the construction principle are introduced, and the advantages of multi-task learning with data enhancement, feature identification, feature complementation and regularization effect are elaborated. Then, the application of multi-task learning methods in recommendation algorithms with different sharing models is introduced, and the advantages and disadvantages of some classical models and the relationship between tasks are summarized. Then, the commonly used datasets and evaluation metrics for multi-task recommendation algorithms are introduced, and the differences and connections with other recommendation algorithms in terms of dataset evaluation metrics are elaborated. Finally, it is pointed out that multi-task learning has shortcomings such as negative migration, parameter optimization conflicts, poor interpretability, etc., and an outlook is given to the combination of multi-task recommendation algorithms with reinforcement learning, convex function optimization methods, and heterogeneous information networks.

Reference | Related Articles | Metrics

Abstract （338）

PDF （386）

Select

Survey of AI Painting

ZHANG Zeyu, WANG Tiejun, GUO Xiaoran, LONG Zhilei, XU Kui

Journal of Frontiers of Computer Science and Technology 2024, 18 (6): 1404-1420. DOI: 10.3778/j.issn.1673-9418.2401075

AI painting, as a popular research direction in the field of computer vision, is expanding its application boundaries in the fields of art creation, film and media, industrial design, and art education through natural language processing, graphic pre-training models, and diffusion models. Two types of AI painting, namely, image-to-image and text-to-image, are taken as the main lines, and the representative models and their key technologies and methods are analyzed in depth. For the image-to-image, the development lineage, generation principle, and advantages and disadvantages of each model are explored from two types of models based on AE and GAN, and their effects on the public dataset are summarized. For the text-to-image, the structural differences of the three types of models based on diffusion model and other models, as well as the generation effects of various types of models on three datasets are summarized. It is pointed out that the text-to-image utilizing the diffusion model has become a hot topic nowadays, which predicts the diversified development of image generation in the future. And the current mainstream AI painting platforms are compared and summarized from the perspectives of usage and generation speed. Finally, on the basis of summarizing the problems and controversies faced by AI painting at the technical and social levels, future trends such as the complementary development of AI painting and human artists, the increased interactivity of the painting process, and the emergence of new professions and industries are envisioned.

Reference | Related Articles | Metrics

Abstract （337）

PDF （325）

Select

Survey of Image Adversarial Example Defense Techniques

LIU Ruiqi, LI Hu, WANG Dongxia, ZHAO Chongyang, LI Boyu

Journal of Frontiers of Computer Science and Technology 2023, 17 (12): 2827-2839. DOI: 10.3778/j.issn.1673-9418.2303080

The rapid and extensive growth of artificial intelligence introduces new security challenges. The generation and defense of adversarial examples for deep neural networks is one of the hot spots. Deep neural networks are most widely used in the field of images and most easily cheated by image adversarial examples. The research on the defense techniques for image adversarial examples is an important tool to improve the security of AI applications. There is no standard explanation for the existence of image adversarial examples, but it can be observed and understood from different dimensions, which can provide insights for proposing targeted defense approaches. This paper sorts out and analyzes current mainstream hypotheses of the reason for the existence of adversarial examples, such as the blind spot hypothesis, linear hypothesis, decision boundary hypothesis, and feature hypothesis, and the correlations between various hypotheses and typical adversarial example generation methods. Based on this, this paper summarizes the image adversarial example defense techniques in two dimensions, model-based and data-based, and compares and analyzes the adaptation scenarios, advantages and disadvantages of different technical methods. Most of the existing image adversarial example defense techniques are aimed at defending against specific adversarial example generation methods, and there is no universal defense theory and method yet. In the real application, it needs to consider the specific application scenarios, potential security risks and other factors, optimize and combine the configuration in the existing defense methods. Future researchers can deepen their technical research in terms of generalized defense theory, evaluation of defense effectiveness, and systematic protection strategies.

Reference | Related Articles | Metrics

Abstract （334）

PDF （260）

Select

Survey of Research on Construction Method of Industry Internet Security Knowledge Graph

CHANG Yu, WANG Gang, ZHU Peng, KONG Lingfei, HE Jingheng

Journal of Frontiers of Computer Science and Technology 2024, 18 (2): 279-300. DOI: 10.3778/j.issn.1673-9418.2304081

The industry Internet security knowledge graph plays an important role in enriching the semantic relationships of security concepts, improving the quality of the security knowledge base, and enhancing the ability to visualize and analyze the security situation. It has become the key to recognize, trace and protect against the attacks targeting new energy industry control systems. However, compared with the construction of the general domain knowledge graph, there are still many problems in each stage of the construction of the industry Internet security knowledge graph, which affect its practical application effect. This paper introduces the concept and significance of the industry Internet security knowledge graph and its difference from the general knowledge graph, summarizes the related work and role of the ontology construction of industry Internet security knowledge graph. Under the background of industry Internet security, it focuses on the related work of the three important components of knowledge graph construction, respectively named entity recognition, relationship extraction and reference resolution. For each component, it detailedly reports on the development history and research status of this component in the domain, and deeply analyses the domain challenges in this component, such as non-continuous entity recognition, candidate word extraction, the lack of domain-quality datasets and so on. It predicts the future research directions of this component, provides reference and enlightenment to further improve the quality and usefulness of industry Internet security knowledge graph, so as to deal with emerging threats and attacks more effectively.

Reference | Related Articles | Metrics

Abstract （313）

PDF （361）

Select

Dual Features Local-Global Attention Model with BERT for Aspect Sentiment Analysis

LI Jin, XIA Hongbin, LIU Yuan

Journal of Frontiers of Computer Science and Technology 2024, 18 (1): 205-216. DOI: 10.3778/j.issn.1673-9418.2210012

Aspect-based sentiment analysis aims to predict the sentiment polarity of a specific aspect in a sentence or document. Most of recent research uses attention mechanism to model the context. But there is a problem in that the context information needs to be considered according to different contexts when the BERT model is used to calculate the dependencies between representations to extract features by sentiment classification models, which leads to the lack of contextual knowledge of the modelled features. And the importance of aspect words is not given more attention, affecting the overall classification performance of the model. To address the problems above, this paper proposes a dual features local-global attention model with BERT (DFLGA-BERT). Local and global feature extraction modules are designed respectively to fully capture the semantic association between aspect words and context. Moreover, an improved quasi-attention mechanism is used in DFLGA-BERT, which leads to the model using minus attention in the fusion of attention to weaken the effect of noise on classification in the text. The feature fusion structure of local and global features is designed to better integrate regional and global features based on conditional layer normalization (CLN). Experiments are conducted on the SentiHood and SemEval 2014 Task 4 datasets. Experimental results show that the performance of the proposed model is significantly improved compared with the baselines after incorporating contextual features.

Reference | Related Articles | Metrics

Abstract （308）

PDF （270）

Select

Survey of Entity Relationship Extraction Methods in Knowledge Graphs

ZHANG Xishuo, LIU Lin, WANG Hailong, SU Guibin, LIU Jing

Journal of Frontiers of Computer Science and Technology 2024, 18 (3): 574-596. DOI: 10.3778/j.issn.1673-9418.2305019

Entity-relationship extraction has gained more and more attention from researchers as a basis for knowledge graph construction. Entity-relationship extraction can automatically and accurately obtain knowledge from a large amount of data, and represent and store it in a structured form. Therefore, the correctness of entity-relationship extraction directly affects the accuracy of knowledge graph construction and the effect of subsequent knowledge graph application. However, for different research hotspots such as complex structure, open domain, multi-language, multi-modal, small sample data, and joint extraction of entity-relationships, the existing entity-relationship extraction methods still have some limitations. Based on the current research hotspots of entity-relationship extraction, this paper tries to categorize entity-relationship extraction into six aspects: complex structure, open domain, multilingual, multimodal, small-sample data, and joint entity-relationship extraction, and categorizes each aspect according to the specific problems and lists out some solutions. Not only the current problems and solutions of each category are systematically sorted out, but the research results of each category are summarized, and the advantages and disadvantages of each method are analyzed in detail from the dimensions of quantitative analysis and qualitative analysis. Finally, the problems to be solved in the current hot areas are summarized, and the future development trend of entity-relationship extraction methods in the knowledge graph is also prospected.

Reference | Related Articles | Metrics

Abstract （306）

PDF （264）

Select

Knowledge Graph Completion Algorithm with Multi-view Contrastive Learning

QIAO Zifeng, QIN Hongchao, HU Jingjing, LI Ronghua, WANG Guoren

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 1001-1009. DOI: 10.3778/j.issn.1673-9418.2301038

Knowledge graph completion is a process of reasoning new triples based on existing entities and relations in knowledge graph. The existing methods usually use the encoder-decoder framework. Encoder uses graph convolutional neural network to get the embeddings of entities and relations. Decoder calculates the score of each tail entity according to the embeddings of the entities and relations. The tail entity with the highest score is the inference result. Decoder inferences triples independently, without consideration of graph information. Therefore, this paper proposes a graph completion algorithm based on contrastive learning. This paper adds a multi-view contrastive learning framework into the model to constrain the embedded information at graph level. The comparison of multiple views in the model constructs different distribution spaces for relations. Different distributions of relations fit each other, which is more suitable for completion tasks. Contrastive learning constraints the embedding vectors of entity and subgraph and enhahces peroformance of the task. Experiments are carried out on two datasets. The results show that MRR is improved by 12.6% over method A2N and 0.8% over InteractE on FB15k-237 dataset, and 7.3% over A2N and 4.3% over InteractE on WN18RR dataset. Experimental results demonstrate that this model outperforms other completion methods.

Reference | Related Articles | Metrics

Abstract （301）

PDF （322）

Select

Survey of Research on Personalized News Recommendation Approaches

MENG Xiangfu, HUO Hongjin, ZHANG Xiaoyan, WANG Wanchun, ZHU Jinxia

Journal of Frontiers of Computer Science and Technology 2023, 17 (12): 2840-2860. DOI: 10.3778/j.issn.1673-9418.2303026

Personalized news recommendation is an important technology to help users obtain the news information they are interested in and alleviate information overload. In recent years, with the development of information technology and society, personalized news recommendation has been increasingly widely studied, and has achieved remarkable success in improving the news reading experience of users. This paper aims to systematically summarize personalized news recommendation methods based on deep learning. Firstly, it introduces personalized news recommendation methods and analyzes their characteristics and influencing factors. Then, the overall framework of personalized news recommendation is given, and the personalized news recommendation methods based on deep learning are analyzed and summarized. On this basis, it focuses on personalized news recommendation methods based on graph structure learning, including user-news interaction graph, knowledge graph and social relationship graph. Finally, it analyzes the challenges of the current personalized news recommendation, discusses how to solve the problems of data sparsity, model interpretability, diversity of recommendation results and news privacy protection in personalized news recommendation system, and puts forward more specific and operable research ideas in the future research direction.

Reference | Related Articles | Metrics

Abstract （298）

PDF （187）

Select

Affection Enhanced Dual Graph Convolution Network for Aspect Based Sentiment Analysis

ZHANG Wenxuan, YIN Yanjun, ZHI Min

Journal of Frontiers of Computer Science and Technology 2024, 18 (1): 217-230. DOI: 10.3778/j.issn.1673-9418.2209033

Aspect-based sentiment analysis is a fine-grained sentiment classification task. In recent years, graph neural network on dependency tree has been used to model the dependency relationship between aspect terms and their opinion terms. However, such methods usually have the disadvantage of highly dependent on the quality of dependency parsing. Furthermore, most existing works focus on syntactic information, while ignoring the effect of affective knowledge in modeling the sentiment-related dependencies between specific aspects and context. In order to solve these problems, an affection enhanced dual graph convolution network is designed and proposed for aspect-based sentiment analysis. The model establishes a dual channel structure based on the dependency tree and attention mechanism, which can more accurately and efficiently capture the syntactic and semantic dependencies between aspects and contexts, and reduce the dependence of the model on the dependency tree. In addition, affective knowledge is integrated to enhance the graph structure and help the model better extract the sentiment-related dependencies of specific aspects. The accuracy of the model on the three open benchmark datasets Rest14, Lap14 and Twitter reaches 84.32%, 78.20% and 76.12% respectively, approaching or exceeding the state-of-the-art perfor-mance. Experiments show that the method proposed can make rational use of semantic and syntactic information, and achieves advanced sentiment classification performance with fewer parameters.

Reference | Related Articles | Metrics

Abstract （293）

PDF （162）

Select

Review of Research on Multi-agent Reinforcement Learning Algorithms

LI Mingyang, XU Ke’er, SONG Zhiqiang, XIA Qingfeng, ZHOU Peng

Journal of Frontiers of Computer Science and Technology 2024, 18 (8): 1979-1997. DOI: 10.3778/j.issn.1673-9418.2401020

In recent years, the technique of multi-agent reinforcement learning algorithm has been widely used in the field of artificial intelligence. This paper systematically analyses the multi-agent reinforcement learning algorithm, examines its application and progress in multi-agent systems, and explores the relevant research results in depth. Firstly, it introduces the research background and development history of multi-agent reinforcement learning and summarizes the existing relevant research results. Secondly, it briefly reviews the application of traditional reinforcement learning algorithms under different tasks. Then, it highlights the classification of multi-agent reinforcement learning algorithms and their application in multi-agent systems according to the three main types of tasks (path planning, pursuit and escape game, task allocation), challenges, and solutions. Finally, it explores the existing algorithm training environments in the field of multi-agents, summarizes the improvement of deep learning on multi-agent reinforcement learning algorithms, proposes challenges and looks forward to future research directions in this field.

Reference | Related Articles | Metrics

Abstract （293）

PDF （251）

Select

Multivariate Time Series Density Clustering Algorithm Using Shapelet Space

SHENG Jinchao, DU Mingjing, SUN Jiarui, LI Yurui

Journal of Frontiers of Computer Science and Technology 2024, 18 (2): 387-402. DOI: 10.3778/j.issn.1673-9418.2211099

Multivariate time series clustering has become an important research topic in the task of time series analysis. Compared with univariate time series, the research of multivariate time series is more complex and difficult. Although many clustering algorithms for multivariate time series have been proposed, these algorithms still have difficulties in solving the accuracy and interpretation at the same time. Firstly, most of the current work does not consider the length redundancy and variable correlation of multivariable time series, resulting in large errors in the final similarity matrix. Secondly, the data are commonly used in the clustering process with the division paradigm, when the numerical space presents a complex distribution, this idea does not perform well, and it does not have the explanatory power of each variable and space. To address the above problems, this paper proposes a multivariate time series adaptive weight density clustering algorithm using Shapelet (high information-rich continuous subsequence) space (MDCS). This algorithm firstly performs a Shapelet search for each variable, and obtains its own Shapelet space through an adaptive strategy. Then, it weights the numerical distribution generated by each variable to obtain a similarity matrix that is more consistent with the characteristics of data distribution. Finally, the data are finally allocated using the shared nearest neighbor density peak clustering algorithm with improved density calculation and secondary allocation. Experimental results on several real datasets demonstrate that MDCS has better clustering results compared with current state-of-the-art clustering algorithms, with an average increase of 0.344 and 0.09 in the normalized mutual information and Rand index, balancing performance and interpretability.

Reference | Related Articles | Metrics

Abstract （290）

PDF （928）

Select

Application Progress of Deep Learning in Imaging Examination of Breast Cancer

WANG Yifan, LIU Jing, MA Jingang, SHAO Runhua, CHEN Tianzhen, LI Ming

Journal of Frontiers of Computer Science and Technology 2024, 18 (2): 301-319. DOI: 10.3778/j.issn.1673-9418.2309033

Breast cancer is the most common malignant tumor in women and its early detection is decisive. Breast imaging plays an important role in early detection of breast cancer as well as monitoring and evaluation during treatment, but manual detection of medical images is usually time-consuming and labor-intensive. Recently, deep learning algorithms have made significant progress in early breast cancer diagnosis. By combing the relevant literature in recent years, a systematic review of the application of deep learning techniques in breast cancer diagnosis with different imaging modalities is conducted, aiming to provide a reference for in-depth research on deep learning-based breast cancer diagnosis. Firstly, four breast cancer imaging modalities, namely mammography, ultrasonography, magnetic resonance imaging and positron emission tomography, are outlined and briefly compared, and the public datasets corresponding to multiple imaging modalities are listed. Focusing on the different tasks (lesion detection, segmentation and classification) of deep learning architectures based on the above four different imaging modalities, a systematic review of the algorithms is conducted, and the performance of each algorithm, improvement ideas, and their advantages and disadvantages are compared and analyzed. Finally, the problems of the existing techniques are analyzed and the future development direction is prospected with respect to the limitations of the current work.

Reference | Related Articles | Metrics

Abstract （277）

PDF （249）

Select

Green Supply Chain Emission Reduction Strategies and Smart Contracts Under Blockchain Technology

WANG Xin, WANG Yasheng, ZHANG Shuhua, WANG Xinyu, XU Shuai

Journal of Frontiers of Computer Science and Technology 2024, 18 (1): 265-278. DOI: 10.3778/j.issn.1673-9418.2302003

Under the background of “double carbon”, the concept of green consumption has been deeply rooted in the hearts of people. However, consumers do not fully trust the greenness of products. The information transparency and traceability mechanism of blockchain technology can well dispel consumers?? doubts. Introducing blockchain technology into traditional green supply chains, considering consumer green preferences and green trust, a game model is constructed among members of the green supply chain before and after the application of blockchain technology, as well as under different power structures, to quantitatively study their emission reduction and pricing strategies, and to explore how to achieve optimal consumer surplus and total social welfare. On the basis of adopting blockchain technology, smart wholesale price contracts and cost sharing smart contracts are designed, and the reasonable range of smart wholesale prices and the optimal cost sharing ratio are calculated, to improve enterprise operational efficiency and achieve supply chain coordination. The results show that when consumers?? green preference is high, the use of blockchain can bring more benefits to all participants in the supply chain. At the same time, the higher the willingness of consumers to buy green products, the greater the benefits. Through numerical analysis, it is found that the smart contracts can better coordinate the supply chain in the case of retailers leading the supply chain. Finally, the validity of the relevant conclusions is verified through empirical cases.

Reference | Related Articles | Metrics

Abstract （274）

PDF （274）

Select

Small Object Detection Based on Two-Stage Calculation Transformer

XU Shoukun, GU Jianan, ZHUANG Lihua, LI Ning, SHI Lin, LIU Yi

Journal of Frontiers of Computer Science and Technology 2023, 17 (12): 2967-2983. DOI: 10.3778/j.issn.1673-9418.2210120

Despite the current small object detection task has achieved significant improvements, it still suffers from some problems. For example, it is a challenge to extract small object features because of little information in the scene of small objects, which may lose the original feature information of small object, resulting in poor detection results. To address this problem, this paper proposes a two-stage calculation Transformer (TCT) based small object detection network. Firstly, a two-stage calculation Transformer is embedded in the backbone feature extraction network for feature enhancement. Based on the traditional Transformer values computation, multiple 1D dilated convolutional layer branches with different feature fusions are utilized to implement global self-attention for the purpose of improving the feature representation and information interaction. Secondly, this paper proposes an effective residual connection module to improve the low-efficiency convolution and activation of the current CSPLayer, which helps to advance the information flow and learn more rich contextual details. Finally, this paper proposes a feature fusion and refinement module for fusing multi-scale features and improving the target feature representation capability. Quantitative and qualitative experiments on PASCAL VOC2007+2012 dataset, COCO2017 dataset and TinyPerson dataset show that the proposed algorithm has better ability of target feature extraction and higher detection accuracy for small target detection, compared with YOLOX.

Reference | Related Articles | Metrics

Abstract （269）

PDF （291）

Select

Named Entity Recognition Model Based on k-best Viterbi Decoupling Knowledge Distillation

ZHAO Honglei, TANG Huanling, ZHANG Yu, SUN Xueyuan, LU Mingyu

Journal of Frontiers of Computer Science and Technology 2024, 18 (3): 780-794. DOI: 10.3778/j.issn.1673-9418.2211052

Knowledge distillation is a general approach to improve the performance of the named entity recognition (NER) models. However, the classical knowledge distillation loss functions are coupled, which leads to poor logit distillation. In order to decouple and effectively improve the performance of logit distillation, this paper proposes an approach, k-best Viterbi decoupling knowledge distillation (kvDKD), which combines k-best Viterbi decoding to improve the computational efficiency, effectively improving the model performance. Additionally, the NER based on deep learning is easy to introduce noise in data augmentation. Therefore, a data augmentation method combining data filtering and entity rebalancing algorithm is proposed, aiming to reduce noise introduced by the original dataset and to enhance the problem of mislabeled data, which can improve the quality of data and reduce overfitting. Based on the above method, a novel named entity recognition model NER-kvDKD (named entity recognition model based on k-best Viterbi decoupling knowledge distillation) is proposed. The comparative experimental results on the datasets of MSRA, Resume, Weibo, CLUENER and CoNLL-2003 show that the proposed method can improve the generalization ability of the model and also effectively improves the student model performance.

Reference | Related Articles | Metrics

Abstract （262）

PDF （192）

Select

Feature Refinement and Multi-scale Attention for Transformer Image Denoising Network

YUAN Heng, GENG Yikun

Journal of Frontiers of Computer Science and Technology 2024, 18 (7): 1838-1851. DOI: 10.3778/j.issn.1673-9418.2308091

In order to enhance the relevance of global context information, strengthen the attention to multi-scale features, improve the image denoising effect while preserving the details to the greatest extent, a Transformer based feature refinement and multi-scale attention image denoising network (TFRADNet) is proposed. The network not only uses Transformer in the codec part to solve the long-term dependence problem of large-scale images and improve the efficiency of model noise reduction, but also adds a position awareness layer after the up-sampling operation to enhance the network’s perception ability of pixel positions in the feature map. To cope with Transformer’s neglect of spatial relationships among pixels, which may result in local detail distortion, a feature refinement block (FRB) is designed at feature reconstruction stage. A serial structure is used to introduce nonlinear transformations layer by layer, to enhance the recognition of local image features with complex noise levels. Meanwhile, a multi-scale attention block (MAB) is designed, which adopts a parallel double-branch structure to jointly model spatial attention and channel attention, effectively capturing and weighting image features of different scales, and improving the model’s perception ability of multi-scale features. Experimental results on real noise datasets SIDD, DND and RNI15 show that TFRADNet can take into account global information and local details, and has stronger noise suppression ability and robustness than other advanced methods.

Reference | Related Articles | Metrics

Abstract （258）

PDF （260）

Most Read articles