Journal of Frontiers of Computer Science and Technology

Select

Review of Deep Learning Applied to Time Series Prediction

LIANG Hongtao, LIU Shuo, DU Junwei, HU Qiang, YU Xu

Journal of Frontiers of Computer Science and Technology 2023, 17 (6): 1285-1300. DOI: 10.3778/j.issn.1673-9418.2211108

The time series is generally a set of random variables that are observed and collected at a certain frequency in the course of something??s development. The task of time series forecasting is to extract the core patterns from a large amount of data and to make accurate estimates of future data based on known factors. Due to the access of a large number of IoT data collection devices, the explosive growth of multidimensional data and the increasingly demanding requirements for prediction accuracy, it is difficult for classical parametric models and traditional machine learning algorithms to meet high efficiency and high accuracy requirements of prediction tasks. In recent years, deep learning algorithms represented by convolutional neural networks, recurrent neural networks and Trans-former models have achieved fruitful results in time series forecasting tasks. To further promote the development of time series prediction technology, common characteristics of time series data, evaluation indexes of datasets and models are reviewed, and the characteristics, advantages and limitations of each prediction algorithm are experimentally compared and analyzed with time and algorithm architecture as the main research line. Several time series prediction methods based on Transformer model are highlighted and compared. Finally, according to the problems and challenges of deep learning applied to time series prediction tasks, this paper provides an outlook on the future research trends in this direction.

Reference | Related Articles | Metrics

Abstract （3523）

PDF （3931）

Select

Research on Question Answering System on Joint of Knowledge Graph and Large Language Models

ZHANG Heyi, WANG Xin, HAN Lifan, LI Zhao, CHEN Zirui, CHEN Zhe

Journal of Frontiers of Computer Science and Technology 2023, 17 (10): 2377-2388. DOI: 10.3778/j.issn.1673-9418.2308070

The large language model (LLM), including ChatGPT, has shown outstanding performance in understanding and responding to human instructions, and has a profound impact on natural language question answering (Q&A). However, due to the lack of training in the vertical field, the performance of LLM in the vertical field is not ideal. In addition, due to its high hardware requirements, training and deploying LLM remains difficult. In order to address these challenges, this paper takes the application of traditional Chinese medicine formulas as an example, collects the domain related data and preprocesses the data. Based on LLM and knowledge graph, a vertical domain Q&A system is designed. The system has the following capabilities: (1) Information filtering. Filter out vertical domain related questions and input them into LLM to answer. (2) Professional Q&A. Generate answers with more professional knowledge based on LLM and self-built knowledge base. Compared with the fine-tuning method of introducing professional data, using this technology can deploy large vertical domain models without the need for retraining. (3) Extract conversion. By strengthening the information extraction ability of LLM and utilizing its generated natural language responses, structured knowledge is extracted and matched with a professional knowledge graph for professional verification. At the same time, structured knowledge can be transformed into readable natural language, achieving a deep integration of large models and knowledge graphs. Finally, the effect of the system is demonstrated and the performance of the system is verified from both subjective and objective perspectives through two experiments of subjective evaluation of experts and objective evaluation of multiple choice questions.

Reference | Related Articles | Metrics

Abstract （2675）

PDF （2581）

Select

Survey on Deep Learning in Oriented Object Detection in Remote Sensing Images

LAN Xin, WU Song, FU Boyi, QIN Xiaolin

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 861-877. DOI: 10.3778/j.issn.1673-9418.2308031

The objects in remote sensing images have the characteristics of arbitrary direction and dense arrangement, and thus objects can be located and separated more precisely by using inclined bounding boxes in object detection task. Nowadays, oriented object detection in remote sensing images has been widely applied in both civil and military defense fields, which shows great significance in the research and application, and it has gradually become a research hotspot. This paper provides a systematic summary of oriented object detection methods in remote sensing images. Firstly, three widely-used representations of inclined bounding boxes are summarized. Then, the main challenges faced in supervised learning are elaborated from four aspects: feature misalignment, boundary discontinuity, inconsistency between metric and loss and oriented object location. Next, according to the motivations and improved strategies of different methods, the main ideas and advantages and disadvantages of each algorithm are analyzed in detail, and the overall framework of oriented object detection in remote sensing images is summarized. Furthermore, the commonly used oriented object detection datasets in remote sensing field are introduced. Experimental results of classical methods on different datasets are given, and the performance of different methods is evaluated. Finally, according to the challenges of deep learning applied to oriented object detection in remote sensing images tasks, the future research trend in this direction is prospected.

Reference | Related Articles | Metrics

Abstract （299）

PDF （370）

Select

Survey of Causal Inference for Knowledge Graphs and Large Language Models

LI Yuan, MA Xinyu, YANG Guoli, ZHAO Huiqun, SONG Wei

Journal of Frontiers of Computer Science and Technology 2023, 17 (10): 2358-2376. DOI: 10.3778/j.issn.1673-9418.2307065

In recent decades, causal inference has been a significant research topic in various fields, including statistics, computer science, education, public policy, and economics. Most causal inference methods focus on the analysis of sample observational data and text corpora. However, with the emergence of various knowledge graphs and large language models, causal inference tailored to knowledge graphs and large models has gradually become a research hotspot. In this paper, different causal inference methods are classified based on their orientation towards sample observational data, text data, knowledge graphs, and large language models. Within each classification, this paper provides a detailed analysis of classical research works, including their problem definitions, solution methods, contributions, and limitations. Additionally, this paper places particular emphasis on discussing recent advancements in the integration of causal inference methods with knowledge graphs and large language models. Various causal inference methods are analyzed and compared from the perspectives of efficiency and cost, and specific applications of knowledge graphs and large language models in causal inference tasks are summarized. Finally, future development directions of causal inference in combination with knowledge graphs and large models are prospected.

Reference | Related Articles | Metrics

Abstract （1050）

PDF （1214）

Select

Multi-strategy Improved Dung Beetle Optimizer and Its Application

GUO Qin, ZHENG Qiaoxian

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 930-946. DOI: 10.3778/j.issn.1673-9418.2308020

Dung beetle optimizer (DBO) is an intelligent optimization algorithm proposed in recent years. Like other optimization algorithms, DBO also has disadvantages such as low convergence accuracy and easy to fall into local optimum. A multi-strategy improved dung beetle optimizer (MIDBO) is proposed. Firstly, it improves acceptance of local and global optimal solutions by brood balls and thieves, so that the beetles can dynamically change according to their own searching ability, which not only improves the population quality but also maintains the good searching ability of individuals with high fitness. Secondly, the follower position updating mechanism in the sparrow search algorithm is integrated to disturb the algorithm, and the greedy strategy is used to update the location, which improves the convergence accuracy of the algorithm. Finally, when the algorithm stagnates, Cauchy Gaussian variation strategy is introduced to improve the ability of the algorithm to jump out of the local optimal solution. Based on 20 benchmark test functions and CEC2019 test function, the simulation experiment verifies the effectiveness of the three improved strategies. The convergence analysis of the optimization results of the improved algorithm and the comparison algorithms and Wilcoxon rank sum test prove that MIDBO has good optimization performance and robustness. The validity and reliability of MIDBO in solving practical engineering problems are further verified by applying MIDBO to the solution of automobile collision optimization problems.

Reference | Related Articles | Metrics

Abstract （217）

PDF （239）

Select

Review of Research on 3D Reconstruction of Dynamic Scenes

SUN Shuifa, TANG Yongheng, WANG Ben, DONG Fangmin, LI Xiaolong, CAI Jiacheng, WU Yirong

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 831-860. DOI: 10.3778/j.issn.1673-9418.2305016

As static scene 3D reconstruction algorithms become more mature, dynamic scene 3D reconstruction has become a hot and challenging research topic in recent years. Existing static scene 3D reconstruction algorithms have good reconstruction results for stationary objects. However, when objects in the scene undergo deformation or relative motion, their reconstruction results are not ideal. Therefore, developing research on 3D reconstruction of dynamic scenes is essential. This paper first introduces the related concepts and basic knowledge of 3D reconstruction, as well as the research classification and current status of static and dynamic scene 3D reconstruction. Then, the latest research progress on dynamic scene 3D reconstruction is comprehensively summarized, and the reconstruction algorithms are classified into dynamic 3D reconstruction based on RGB data sources and dynamic 3D reconstruction based on RGB-D data sources. RGB data sources can be further divided into template based dynamic 3D reconstruction, non rigid motion recovery structure based dynamic 3D reconstruction, and learning based dynamic 3D reconstruction under RGB data sources. The RGB-D data source mainly summarizes dynamic 3D reconstruction based on learning, with typical examples introduced and compared. The applications of dynamic scene 3D reconstruction in medical, intelligent manufacturing, virtual reality and augmented reality, and transportation fields are also discussed. Finally, future research directions for dynamic scene 3D reconstruction are proposed, and an outlook on the research progress in this rapidly developing field is presented.

Reference | Related Articles | Metrics

Abstract （402）

PDF （392）

Select

Survey on Sequence Data Augmentation

GE Yizhou, XU Xiang, YANG Suorong, ZHOU Qing, SHEN Furao

Journal of Frontiers of Computer Science and Technology 2021, 15 (7): 1207-1219. DOI: 10.3778/j.issn.1673-9418.2012062

To pursue higher accuracy, the structure of deep learning model is getting more and more complex, with deeper and deeper network. The increase in the number of parameters means that more data are needed to train the model. However, manually labeling data is costly, and it is not easy to collect data in some specific fields limited by objective reasons. As a result, data insufficiency is a very common problem. Data augmentation is here to alleviate the problem by artificially generating new data. The success of data augmentation in the field of computer vision leads people to consider using similar methods on sequence data. In this paper, not only the time-domain methods such as flipping and cropping but also some augmentation methods in frequency domain are described. In addition to experience-based or knowledge-based methods, detailed descriptions on machine learning models used for automatic data generation such as GAN are also included. Methods that have been widely applied to various sequence data such as text, audio and time series are mentioned with their satisfactory performance in issues like medical diagnosis and emotion classification. Despite the difference in data type, these methods are designed with similar ideas. Using these ideas as a clue, various data augmentation methods applied to different types of sequence data are introduced, and some discussions and prospects are made.

Reference | Related Articles | Metrics

Abstract （2385）

PDF （2372）

Select

Deep Learning-Based Infrared and Visible Image Fusion: A Survey

WANG Enlong, LI Jiawei, LEI Jia, ZHOU Shihua

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 899-915. DOI: 10.3778/j.issn.1673-9418.2306061

How to preserve the complementary information in multiple images to represent the scene in one image is a challenging topic. Based on this topic, various image fusion methods have been proposed. As an important branch of image fusion, infrared and visible image fusion (IVIF) has a wide range of applications in segmentation, target detection and military reconnaissance fields. In recent years, deep learning has led the development direction of image fusion. Researchers have explored the field of IVIF using deep learning. Relevant experimental work has proven that applying deep learning to achieving IVIF has significant advantages compared with traditional methods. This paper provides a detailed analysis on the advanced algorithms for IVIF based on deep learning. Firstly, this paper reports on the current research status from the aspects of network architecture, method innovation, and limitations. Secondly, this paper introduces the commonly used datasets in IVIF methods and provides the definition of commonly used evaluation metrics in quantitative experiments. Qualitative and quantitative evaluation experiments of fusion and segmentation and fusion efficiency analysis experiments are conducted on some representative methods mentioned in the paper to comprehensively evaluate the performance of the methods. Finally, this paper provides conclusions and prospects for possible future research directions in the field.

Reference | Related Articles | Metrics

Abstract （453）

PDF （436）

Select

Review on Multi-lable Classification

LI Dongmei, YANG Yu, MENG Xianghao, ZHANG Xiaoping, SONG Chao, ZHAO Yufeng

Journal of Frontiers of Computer Science and Technology 2023, 17 (11): 2529-2542. DOI: 10.3778/j.issn.1673-9418.2303082

Multi-label classification refers to the classification problem where multiple labels may coexist in a single sample. It has been widely applied in fields such as text classification, image classification, music and video classification. Unlike traditional single-label classification problems, multi-label classification problems become more complex due to the possible correlation or dependence among labels. In recent years, with the rapid development of deep learning technology, many multi-label classification methods combined with deep learning have gradually become a research hotspot. Therefore, this paper summarizes the multi-label classification methods from the traditional and deep learning-based perspectives, and analyzes the key ideas, representative models, and advantages and disadvantages of each method. In traditional multi-label classification methods, problem transformation methods and algorithm adaptation methods are introduced. In deep learning-based multi-label classification methods, the latest multi-label classification methods based on Transformer are reviewed particularly, which have become one of the mainstream methods to solve multi-label classification problems. Additionally, various multi-label classification datasets from different domains are introduced, and 15 evaluation metrics for multi-label classification are briefly analyzed. Finally, future work is discussed from the perspectives of multi-modal data multi-label classification, prompt learning-based multi-label classification, and imbalanced data multi-label classification, in order to further promote the development and application of multi-label classification.

Reference | Related Articles | Metrics

Abstract （900）

PDF （824）

Select

Review of Attention Mechanisms in Image Processing

QI Xuanhao, ZHI Min

Journal of Frontiers of Computer Science and Technology 2024, 18 (2): 345-362. DOI: 10.3778/j.issn.1673-9418.2305057

Attention mechanism in image processing has become one of the popular and important techniques in the field of deep learning, and is widely used in various deep learning models in image processing because of its excellent plug-and-play convenience. By weighting the input features, the attention mechanism focuses the model’s attention on the most important regions to improve the accuracy and performance of image processing tasks. Firstly, this paper divides the development process of attention mechanism into four stages, and on this basis, reviews and summarizes the research status and progress of four aspects: channel attention, spatial attention, channel and spatial mixed attention, and self-attention. Secondly, this paper provides a detailed discussion on the core idea, key structure and specific implementation of attention mechanism, and further summarizes the advantages and disadvantages of used models. Finally, by comparing the current mainstream attention mechanisms and analyzing the results, this paper discusses the problems of attention mechanisms in the image processing field at this stage, and provides an outlook on the future development of attention mechanisms in image processing, so as to provide references for further research.

Reference | Related Articles | Metrics

Abstract （477）

PDF （361）

Select

Review of Research on Rolling Bearing Health Intelligent Monitoring and Fault Diagnosis Mechanism

WANG Jing, XU Zhiwei, LIU Wenjing, WANG Yongsheng, LIU Limin

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 878-898. DOI: 10.3778/j.issn.1673-9418.2307005

As one of the most critical and failure-prone parts of the mechanical systems of industrial equipment, bearings are subjected to high loads for long periods of time. When they fail or wear irreversibly, they may cause accidents or even huge economic losses. Therefore, effective health monitoring and fault diagnosis are of great significance to ensure safe and stable operation of industrial equipment. In order to further promote the development of bearing health monitoring and fault diagnosis technology, the current existing models and methods are analyzed and summarized, and the existing technologies are divided and compared. Starting from the distribution of vibration signal data used, firstly, the relevant methods under uniform data distribution are sorted out, the classification, analysis and summary of the current research status are carried out mainly according to signal-based analysis and data-driven-based, and the shortcomings and defects of the fault detection methods in this case are outlined. Secondly, considering the problem of uneven data acquisition under actual working conditions, the detection methods for dealing with such cases are summarized, and different processing techniques for this problem in existing research are classified into data processing methods, feature extraction methods, and model improvement methods according to their different focuses, and the existing problems are analyzed and summarized. Finally, the challenges and future development directions of bearing fault detection in existing industrial equipment are summarized and prospected.

Reference | Related Articles | Metrics

Abstract （468）

PDF （232）

Select

Few-Shot Image Classification Method with Feature Maps Enhancement Prototype

XU Huajie, LIANG Shuwei

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 990-1000. DOI: 10.3778/j.issn.1673-9418.2302015

Due to the scarcity of labeled samples, the class prototype obtained by support set samples is difficult to represent the real distribution of the whole class in metric-based few-shot image classification methods. Meanwhile, samples of the same class may also have large difference in many aspects and the large intra-class bias may make the sample features deviate from the class center. Aiming at the above problems that may seriously affect the performance, a few-shot image classification method with feature maps enhancement prototype (FMEP) is proposed. Firstly, this paper selects some similar features of the query set sample feature maps with cosine similarity and adds them to class prototypes to obtain more representative prototypes. Secondly, this paper aggregates similar features of the query set to alleviate the problem caused by large intra-class bias and makes features distribution of the same class closer. Finally, this paper compares enhanced prototypes and aggregated features which are both closer to real distribution to get better results. The proposed method is tested on four commonly used few-shot classification datasets, namely MiniImageNet, TieredImageNet, CUB-200 and CIFAR-FS. The results show that the proposed method can not only improve the performance of the baseline model, but also obtain better performance compared with the same type of methods.

Reference | Related Articles | Metrics

Abstract （143）

PDF （172）

Select

Survey of Deep Learning Based Multimodal Emotion Recognition

ZHAO Xiaoming, YANG Yijiao, ZHANG Shiqing

Journal of Frontiers of Computer Science and Technology 2022, 16 (7): 1479-1503. DOI: 10.3778/j.issn.1673-9418.2112081

Multimodal emotion recognition aims to recognize human emotional states through different modalities related to human emotion expression such as audio, vision, text, etc. This topic is of great importance in the fields of human-computer interaction, a.pngicial intelligence, affective computing, etc., and has attracted much attention. In view of the great success of deep learning methods developed in recent years in various tasks, a variety of deep neural networks have been used to learn high-level emotional feature representations for multimodal emotion recog-nition. In order to systematically summarize the research advance of deep learning methods in the field of multi-modal emotion recognition, this paper aims to present comprehensive analysis and summarization on recent multi-modal emotion recognition literatures based on deep learning. First, the general framework of multimodal emotion recognition is given, and the commonly used multimodal emotional dataset is introduced. Then, the principle of representative deep learning techniques and its advance in recent years are briefly reviewed. Subsequently, this paper focuses on the advance of two key steps in multimodal emotion recognition: emotional feature extraction methods related to audio, vision, text, etc., including hand-crafted feature extraction and deep feature extraction; multi-modal information fusion strategies integrating different modalities. Finally, the challenges and opportunities in this field are analyzed, and the future development direction is pointed out.

Table and Figures | Reference | Related Articles | Metrics

Abstract （1429）

PDF （1157）

HTML （982）

Select

Review of Image Super-resolution Reconstruction Algorithms Based on Deep Learning

YANG Caidong, LI Chengyang, LI Zhongbo, XIE Yongqiang, SUN Fangwei, QI Jin

Journal of Frontiers of Computer Science and Technology 2022, 16 (9): 1990-2010. DOI: 10.3778/j.issn.1673-9418.2202063

The essence of image super-resolution reconstruction technology is to break through the limitation of hardware conditions, and reconstruct a high-resolution image from a low-resolution image which contains less infor-mation through the image super-resolution reconstruction algorithms. With the development of the technology on deep learning, deep learning has been introduced into the image super-resolution reconstruction field. This paper summarizes the image super-resolution reconstruction algorithms based on deep learning, classifies, analyzes and compares the typical algorithms. Firstly, the model framework, upsampling method, nonlinear mapping learning module and loss function of single image super-resolution reconstruction method are introduced in detail. Secondly, the reference-based super-resolution reconstruction method is analyzed from two aspects: pixel alignment and Patch matching. Then, the benchmark datasets and image quality evaluation indices used for image super-resolution recon-struction algorithms are summarized, the characteristics and performance of the typical super-resolution recons-truction algorithms are compared and analyzed. Finally, the future research trend on the image super-resolution reconstruction algorithms based on deep learning is prospected.

Table and Figures | Reference | Related Articles | Metrics

Abstract （1550）

PDF （847）

HTML （383）

Select

Survey of Research on Image Inpainting Methods

LUO Haiyin, ZHENG Yuhui

Journal of Frontiers of Computer Science and Technology 2022, 16 (10): 2193-2218. DOI: 10.3778/j.issn.1673-9418.2204101

Image inpainting refers to restoring the pixels in damaged areas of an image to make them as consistent as possible with the original image. Image inpainting is not only crucial in the computer vision tasks, but also serves as an important cornerstone of other image processing tasks. However, there are few researches related to image inpainting. In order to better learn and promote the research of image inpainting tasks, the classic image inpainting algorithms and representative deep learning image inpainting methods in the past ten years are reviewed and analyzed. Firstly, the classical traditional image inpainting methods are briefly summarized, and divided into partial differential equation-based and sample-based image inpainting methods, and the limitations of traditional image methods are further analyzed. Deep learning image inpainting methods are divided into single image inpainting and pluralistic image inpainting according to the number of output images of the model, and different methods are analyzed and summarized in combination with application images, loss functions, types, advantages, and limitations. After that, the commonly used datasets and quantitative evaluation indicators of image inpainting methods are described in detail, and the quantitative data of image inpainting methods to inpaint damaged areas of different areas on different image datasets are given. According to the quantitative data, the performance of image inpainting methods based on deep learning is compared and analyzed. Finally, the limitations of existing image inpainting methods are summarized and analyzed, and new ideas and prospects for future key research directions are proposed.

Table and Figures | Reference | Related Articles | Metrics

Abstract （1797）

PDF （1135）

HTML （528）

Select

Survey on 3D Reconstruction Methods Based on Visual Deep Learning

LI Mingyang, CHEN Wei, WANG Shanshan, LI Jie, TIAN Zijian, ZHANG Fan

Journal of Frontiers of Computer Science and Technology 2023, 17 (2): 279-302. DOI: 10.3778/j.issn.1673-9418.2205054

In recent years, as one of the important tasks of computer vision, 3D reconstruction has received extensive attention. This paper focuses on the research progress of using deep learning to reconstruct the 3D shape of general objects in recent years. Taking the steps of 3D reconstruction by deep learning as the context, according to the data feature representation in the process of 3D reconstruction, it is divided into voxel, point cloud, surface mesh and implicit surface. Then, according to the number of inputting 2D images, it can be divided into single view 3D reconstruction and multi-view 3D reconstruction, which are subdivided according to the network architecture and the training mechanism they use. While the research progress of each category is discussed, the development prospects, advantages and disadvantages of each training method are analyzed. This paper studies the new hotspots in specific 3D reconstruction fields in recent years, such as 3D reconstruction of dynamic human bodies and 3D completion of incomplete geometric data, compares some key papers and summarizes the problems in these fields. Then this paper introduces the key application scenarios and parameters of 3D datasets at this stage. The development prospect of 3D reconstruction in specific application fields in the future is illustrated and analyzed, and the research direction of 3D reconstruction is prospected.

Reference | Related Articles | Metrics

Abstract （1223）

PDF （1252）

Select

Knowledge Graph Completion Algorithm with Multi-view Contrastive Learning

QIAO Zifeng, QIN Hongchao, HU Jingjing, LI Ronghua, WANG Guoren

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 1001-1009. DOI: 10.3778/j.issn.1673-9418.2301038

Knowledge graph completion is a process of reasoning new triples based on existing entities and relations in knowledge graph. The existing methods usually use the encoder-decoder framework. Encoder uses graph convolutional neural network to get the embeddings of entities and relations. Decoder calculates the score of each tail entity according to the embeddings of the entities and relations. The tail entity with the highest score is the inference result. Decoder inferences triples independently, without consideration of graph information. Therefore, this paper proposes a graph completion algorithm based on contrastive learning. This paper adds a multi-view contrastive learning framework into the model to constrain the embedded information at graph level. The comparison of multiple views in the model constructs different distribution spaces for relations. Different distributions of relations fit each other, which is more suitable for completion tasks. Contrastive learning constraints the embedding vectors of entity and subgraph and enhahces peroformance of the task. Experiments are carried out on two datasets. The results show that MRR is improved by 12.6% over method A2N and 0.8% over InteractE on FB15k-237 dataset, and 7.3% over A2N and 4.3% over InteractE on WN18RR dataset. Experimental results demonstrate that this model outperforms other completion methods.

Reference | Related Articles | Metrics

Abstract （248）

PDF （263）

Select

Review of Medical Image Segmentation Based on UNet

XU Guangxian, FENG Chun, MA Fei

Journal of Frontiers of Computer Science and Technology 2023, 17 (8): 1776-1792. DOI: 10.3778/j.issn.1673-9418.2301044

As one of the most important semantic segmentation frameworks in convolutional neural networks (CNN), UNet is widely used in image processing tasks such as classification, segmentation, and target detection of medical images. In this paper, the structural principles of UNet are described, and a comprehensive review of UNet-based networks and variant models is presented. The model algorithms are fully investigated from several perspectives, and an attempt is made to establish an evolutionary pattern among the models. Firstly, the UNet variant models are categorized according to the seven medical imaging systems they are applied to, and the algorithms with similar core composition are compared and described. Secondly, the principles, strengths and weaknesses, and applicable scenarios of each model are analyzed. Thirdly, the main UNet variant networks are summarized in terms of structural principles, core composition, datasets, and evaluation metrics. Finally, the inherent shortcomings and solutions of the UNet network structure are objectively described in light of the latest advances in deep learning, providing directions for continued improvement in the future. At the same time, other technological evolutions and application scenarios that can be combined with UNet are detailed, and the future development trend of UNet-based variant networks is further envisaged.

Reference | Related Articles | Metrics

Abstract （1403）

PDF （1514）

Select

Pre-weighted Modulated Dense Graph Convolutional Networks for 3D Human Pose Estimation

MA Jinlin, CUI Qilei, MA Ziping, YAN Qi, CAO Haojie, WU Jiangtao

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 963-977. DOI: 10.3778/j.issn.1673-9418.2302065

Graph convolutional networks (GCN) have increasingly become one of the main research hotspots in 3D human pose estimation. The method of modeling the relationship between human joint points by GCN has achieved good performance in 3D human pose estimation. However, the 3D human pose estimation method based on GCN has issues of over-smooth and indistinguishable importance between joint points and adjacent joint points. To address these issues, this paper designs a modulated dense connection (MDC) module and a pre-weighted graph convolutional module, and proposes a pre-weighted modulated dense graph convolutional network (WMDGCN) for 3D human pose estimation based on these two modules. For the problem of over-smoothing, the modulation dense connection can better realize feature reuse through hyperparameter [α] and [β] (hyperparameter [α] represents the weight proportion of features of layer L to previous layers, and hyperparameter [β] represents the propagation strategies of the features of previous layers to layer L), thus effectively improving the expression ability of features. To address the issue of not distinguishing the importance of the joint points and adjacent joint points, the pre-weighted graph convolution is used to assign higher weights to the joint point. Different weight matrices are used for the joint point and its adjacent joint points to capture human joint point features more effectively. Comparative experimental results on the Human3.6M dataset show that the proposed method achieves the best performance in terms of parameter number and performance. The parameter number, MPJPE and P-MPJPE values of WMDGCN are 0.27 MB, 37.46 mm and 28.85 mm, respectively.

Reference | Related Articles | Metrics

Abstract （88）

PDF （134）

Select

Review of Human Action Recognition Based on Deep Learning

QIAN Huifang, YI Jianping, FU Yunhu

Journal of Frontiers of Computer Science and Technology 2021, 15 (3): 438-455. DOI: 10.3778/j.issn.1673-9418.2009095

Human action recognition is one of the important topics in video understanding. It is widely used in video surveillance, human-computer interaction, motion analysis, and video information retrieval. According to the chara-cteristics of the backbone network, this paper introduces the latest research results in the field of action recognition from three perspectives: 2D convolutional neural network, 3D convolutional neural network, and spatiotemporal decomposition network. And their advantages and disadvantages are qualitatively analyzed and compared. Then, from the two aspects of scene-related and temporal-related, the commonly used action video datasets are comprehensively summarized, and the characteristics and usage of different datasets are emphatically discussed. Subsequently, the common pre-training strategies in action recognition tasks are introduced, and the influence of pre-training techniques on the performance of action recognition models is emphatically analyzed. Finally, starting from the latest research trends, the future development direction of action recognition is discussed from six perspectives: fine-grained action recognition, streamlined model, few-shot learning, unsupervised learning, adaptive network, and video super-resolution action recognition.

Reference | Related Articles | Metrics

Abstract （1474）

PDF （1813）

Select

Survey of Camouflaged Object Detection Based on Deep Learning

SHI Caijuan, REN Bijuan, WANG Ziwen, YAN Jinwei, SHI Ze

Journal of Frontiers of Computer Science and Technology 2022, 16 (12): 2734-2751. DOI: 10.3778/j.issn.1673-9418.2206078

Camouflaged object detection (COD) based on deep learning is an emerging visual detection task, which aims to detect the camouflaged objects “perfectly” embedded in the surrounding environment. However, most exiting work primarily focuses on building different COD models with little summary work for the existing methods. Therefore, this paper summarizes the existing COD methods based on deep learning and discusses the future development of COD. Firstly, 23 existing COD models based on deep learning are introduced and analyzed according to five detection mechanisms: coarse-to-fine strategy, multi-task learning strategy, confidence-aware learning strategy, multi-source information fusion strategy and transformer-based strategy. The advantages and disadvantages of each strategy are analyzed in depth. And then, 4 widely used datasets and 4 evaluation metrics for COD are introduced. In addition, the performance of the existing COD models based on deep learning is compared on four datasets, including quantitative comparison, visual comparison, efficiency analysis, and the detection effects on camouflaged objects of different types. Furthermore, the practical applications of COD in medicine, industry, agriculture, military, art, etc. are mentioned. Finally, the deficiencies and challenges of existing methods in complex scenes, multi-scale objects, real-time performance, practical application requirements, and COD in other multimodalities are pointed out, and the potential directions of COD are discussed.

Table and Figures | Reference | Related Articles | Metrics

Abstract （2330）

PDF （1425）

HTML （358）

Select

Review of Chinese Named Entity Recognition Research

WANG Yingjie, ZHANG Chengye, BAI Fengbo, WANG Zumin, JI Changqing

Journal of Frontiers of Computer Science and Technology 2023, 17 (2): 324-341. DOI: 10.3778/j.issn.1673-9418.2208028

With the rapid development of related technologies in the field of natural language processing, as an upstream task of natural language processing, improving the accuracy of named entity recognition is of great significance for subsequent text processing tasks. However, due to the differences between Chinese and English languages, it is difficult to transfer the research results of English named entity recognition into Chinese research effectively. Therefore, the key issues in the current research of Chinese named entity recognition are analyzed from the following four aspects: Firstly, the development of named entity recognition is taken as the main clue, the advantages and disadvantages, common methods and research results of each stage are comprehensively discussed. Secondly, the Chinese text preprocessing methods are summarized from the perspective of sequence annotation, evaluation index, Chinese word segmentation methods and datasets. Then, aiming at the Chinese character and word feature fusion method, the current research is summarized from the perspective of character fusion and word fusion, and the optimization direction of the current Chinese named entity recognition model is discussed. Finally, the practical applications of Chinese named entity recognition in various fields are analyzed. This paper discusses the current research on Chinese named entity recognition, aiming to help researchers understand the research direction and significance of this task more comprehensively, so as to provide a certain reference for proposing new methods and new improvements.

Reference | Related Articles | Metrics

Abstract （1177）

PDF （1217）

Select

Review of Super-Resolution Image Reconstruction Algorithms

ZHONG Mengyuan, JIANG Lin

Journal of Frontiers of Computer Science and Technology 2022, 16 (5): 972-990. DOI: 10.3778/j.issn.1673-9418.2111126

In human visual perception system, high-resolution (HR) image is an important medium to clearly express its spatial structure, detailed features, edge texture and other information, and it has a very wide range of practical value in medicine, criminal investigation, satellite and other fields. Super-resolution image reconstruction (SRIR) is a key research task in the field of computer vision and image processing, which aims to reconstruct a high-resolution image with clear details from a given low-resolution (LR) image. In this paper, the concept and mathematical model of super-resolution image reconstruction are firstly described, and the image reconstruction methods are systematically classified into three kinds of super-resolution image reconstruction methods：based on interpolation, based on reconstruction, based on learning (before and after deep learning). Secondly, the typical, commonly used and latest algorithms among the three methods and their research are comprehensively reviewed and summarized, and the listed image reconstruction algorithms are combed from the aspects of network structure, learning mechanism, application scenarios, advantages and limitations. Then, the datasets and image quality evaluation indices used for super-resolution image reconstruction algorithms are summarized, and the characteristics and performance of various super-resolution image reconstruction algorithms based on deep learning are compared. Finally, the future research direction or angle of super-resolution image reconstruction is prospected from four aspects.

Table and Figures | Reference | Related Articles | Metrics

Abstract （1454）

PDF （941）

HTML （311）

Select

Research Review of Image Semantic Segmentation Method in High-Resolution Remote Sensing Image Interpretation

MA Yan, Gulimila·Kezierbieke

Journal of Frontiers of Computer Science and Technology 2023, 17 (7): 1526-1548. DOI: 10.3778/j.issn.1673-9418.2211015

Rapid acquisition of remote sensing information has important research significance for the development of image semantic segmentation methods in remote sensing image interpretation applications. With more and more types of data recorded by satellite remote sensing images and more and more complex feature information, accurate and effective extraction of information in remote sensing images has become the key to interpret remote sensing images by image semantic segmentation methods. In order to explore the image semantic segmentation method for fast and efficient interpretation of remote sensing images, a large number of image semantic segmentation methods for remote sensing images are summarized. Firstly, the traditional image semantic segmentation methods are reviewed and divided into edge detection-based segmentation methods, region-based segmentation methods, threshold-based segmentation methods and segmentation methods combined with specific theories. At the same time, the limitations of traditional image semantic segmentation methods are analyzed. Secondly, the semantic segmentation methods based on deep learning are elaborated in detail, and the basic ideas and technical characteristics of each method are used as the classification criteria. They are divided into four categories: FCN-based methods, codec-based methods, dilated convolution-based methods and attention-based methods. The sub-methods contained in each type of method are summarized, and the advantages and disadvantages of these methods are compared and analyzed. Then, the common datasets and performance evaluation indexes of remote sensing image semantic segmentation are briefly introduced. Experimental results of classical network models on different datasets are given, and the performance of different models is evaluated. Finally, the challenges of image semantic segmentation methods in high-resolution remote sensing image interpretation are analyzed, and the future development trend is prospected.

Reference | Related Articles | Metrics

Abstract （1093）

PDF （838）

Select

Research on Sentiment Analysis of Short Video Network Public Opinion by Integrating BERT Multi-level Features

HAN Kun, PAN Hongpeng, LIU Zhongyi

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 1010-1020. DOI: 10.3778/j.issn.1673-9418.2311023

The era of self-media and the widespread popularity of online social software have led to short video platforms becoming “incubators” easily for the origin and fermentation of public opinion events. Analyzing the public opinion comments on these platforms is crucial for the early warning, handling, and guidance of such incidents. In view of this, this paper proposes a text classification model combining BERT and TextCNN, named BERT-MLFF-TextCNN, which integrates multi-level features from BERT for sentiment analysis of relevant comment data on the Douyin short video platform. Firstly, the BERT pre-trained model is used to encode the input text. Secondly, semantic feature vectors from each encoding layer are extracted and fused. Subsequently, a self-attention mechanism is integrated to highlight key features, thereby effectively utilizing them. Finally, the resulting feature sequence is input into the TextCNN model for classification. The results demonstrate that the BERT-MLFF-TextCNN model outperforms BERT-TextCNN, GloVe-TextCNN, and Word2vec-TextCNN models, achieving an [F1] score of 0.977. This model effectively identifies the emotional tendencies in public opinions on short video platforms. Based on this, using the TextRank algorithm for topic mining allows for the visualization of thematic words related to the sentiment polarity of public opinion comments, providing a decision-making reference for relevant departments in the public opinion management work.

Reference | Related Articles | Metrics

Abstract （134）

PDF （120）

Select

Overview of Image Denoising Methods

LIU Liping, QIAO Lele, JIANG Liucheng

Journal of Frontiers of Computer Science and Technology 2021, 15 (8): 1418-1431. DOI: 10.3778/j.issn.1673-9418.2101035

In real scenes, due to the imperfections of equipment and systems or the existence of low-light environments, the collected images are noisy. The images will also be affected by additional noise during the compression and transmission process, which will interfere with subsequent image segmentation and feature extraction processes. Traditional denoising methods use the non-local self-similarity (NLSS) characteristics of the image and the sparse representation in the transform domain, and the method based on block-matching and three-dimensional filtering (BM3D) shows a powerful image denoising performance. With the development of artificial intelligence, image denoising methods based on deep learning have achieved outstanding performance. But so far, there is almost no relevant research on the comprehensive comparison of image denoising methods. Aiming at the traditional image denoising methods and the image denoising methods based on deep neural networks that have emerged in recent years, this paper first introduces the basic framework of the classic traditional denoising and deep neural network denoising methods and classifies and summarizes the denoising methods. Then the existing denoising methods are analyzed and compared quantitatively and qualitatively on the public denoising data set. Finally, this paper points out some potential challenges and future research directions in the field of image denoising.

Reference | Related Articles | Metrics

Abstract （970）

PDF （1182）

Select

Self-supervised Hybrid Graph Neural Network for Session-Based Recommendation

ZHANG Yusong, XIA Hongbin, LIU Yuan

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 1021-1031. DOI: 10.3778/j.issn.1673-9418.2212043

Session-based recommendation aims to predict user actions based on anonymous sessions. Most of the existing session recommendation algorithms based on graph neural network (GNN) only extract user preferences for the current session, but ignore the high-order multivariate relationships from other sessions, which affects the recommendation accuracy. Moreover, session-based recommendation suffers more from the problem of data sparsity due to the very limited short-term interactions. To solve the above problems, this paper proposes a model named self- supervised hybrid graph neural network (SHGN) for session-based recommendation. Firstly, the model describes the relationship between sessions and objects by constructing the original data into three views. Next, a graph attention network is used to capture the low-order transitions information of items within a session, and then a residual graph convolutional network is proposed to mine the high-order transitions information of items and sessions. Finally, self-supervised learning (SSL) is integrated as an auxiliary task. By maximizing the mutual information of session embeddings learnt from different views, data augmentation is performed to improve the recommendation performance. In order to verify the effectiveness of the proposed method, comparative experiments with mainstream baseline models such as SR-GNN, GCE-GNN and DHCN are carried out on four benchmark datasets of Tmall, Diginetica, Nowplaying and Yoochoose, and the results are improved in P@20, MRR@20 and other performance indices.

Reference | Related Articles | Metrics

Abstract （136）

PDF （144）

Select

Approach to Multi-path Coverage Testing Based on Path Similarity Table and Individual Migration

QIAN Zhongsheng, SUN Zhiwang, YU Qingyuan, QIN Langyue, JIANG Peng, WAN Zilong, WANG Yahui

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 947-962. DOI: 10.3778/j.issn.1673-9418.2301018

The application of genetic algorithm in multi-path coverage testing is a research hotspot. In the process of iteration between the old and new populations, the old population may contain excellent individuals from other sub-populations, which are not fully utilized, resulting in resource waste. At the same time, the number of individuals in the population will be much greater than that of reachable paths, and each individual will go through a reachable path. This causes multiple individuals to pass through the same path, leading to repeated calculation of the similarity between the individual and the target path. Based on this, a multi-path coverage testing method combined with path similarity table and individual migration is proposed to improve testing efficiency. By storing the calculated path similarity value in the path similarity table, the value can be avoided from being calculated repeatedly and the testing time can be reduced. In the evolutionary process, the individual path is compared with other target paths, and if the similarity reaches the threshold, the excellent individual is migrated to the sub-population corresponding to the path, which improves the utilization rate of individuals and reduces the evolutionary generation. Experiments show that, compared with other six classic methods, the proposed method reduces the average generation time on eight programs by up to 44.64%, and the minimum is 2.64%, and the average evolution generation is reduced by up to 35.08%, and the minimum is 6.13%. Therefore, the proposed method effectively improves the test efficiency.

Reference | Related Articles | Metrics

Abstract （82）

PDF （66）

Select

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 0-0.

Most Download articles