Journal of Frontiers of Computer Science and Technology

Select

X-ray Prohibited Items Detection Based on Inverted Bottleneck and Light Convolution Block Attention Module

DONG Yishan, GUO Jingyuan, LI Mingze, SUN Jia'ao, LU Shuhua

Journal of Frontiers of Computer Science and Technology 2024, 18 (5): 1259-1270. DOI: 10.3778/j.issn.1673-9418.2301041

To resolve the problems of position and angle change causing miss and false detection, low accuracy of difficult samples in X-ray luggage images, using YOLOv5 as the baseline, this paper proposes a model by inverted bottleneck and light convolution block attention module for the X-ray prohibited items detection. The inverted bottle-neck design is introduced in the backbone to emphasize the detailed features and improve the model to cope with the large-angle change problem. The light convolution block attention module is used to suppress background interference and reduce model parameter. The Gaussian error linear unit activation function and improved loss function are used to enhance the nonlinear expression ability, increasing the punishment of predicted value to optimize the model??s detection ability for difficult samples. The proposed model is trained and tested on three large public datasets OPIXray, SIXray, and HiXray, resulting in the mAP of 91.9%, 93.4%, and 82.2%, respectively. The results show that the proposed method can effectively solve the problem of angel change in X-ray luggage, indicating its high accuracy and robustness.

Reference | Related Articles | Metrics

Abstract （58）

PDF （20）

Select

Super-Resolution Reconstruction Algorithm of Remote Sensing Image with Two-Branch Semantic Enhanced Perception

WANG Chaoxue, DAI Ning

Journal of Frontiers of Computer Science and Technology 2024, 18 (5): 1271-1285. DOI: 10.3778/j.issn.1673-9418.2303044

Aiming at the problem of poor reconstruction due to the blurring of feature targets in remote sensing images and the influence of background noise, in this paper, a super-resolution reconstruction algorithm for remote sensing images incorporating two-branch semantic enhanced perception is proposed. Firstly, a global-local spatial attention module is designed to enhance the semantic representation of features at different scales of spatial global-local, and at the same time strengthen the discriminative ability of the network for effective feature groups. Secondly, a channel grouping-aggregation attention module is proposed to enhance the model’s discriminative ability for ground objects features by designing feature grouping-aggregation and channel attention modules, and strengthen the model’s ability to focus on effective feature channels. Experiments show that on the UC Merced dataset, the PSNR reaches 34.397 dB, 29.920 dB and 28.128 dB respectively at the ×2/×3/×4 multiplier, and the structural similarity reaches 0.931, 0.834 and 0.791 at the ×2/×3/×4 multiplier. On the AID dataset, the PSNR reaches 32.524 dB, 29.317 dB and 27.522 dB respectively at the×2/×3/×4 multiplier, and the structural similarity reaches 0.895, 0.829 and 0.721 at the ×2/×3/×4 multiplier. Compared with other mainstream algorithms, both indices are improved, and the edge and regional details of reconstructed images are better, effectively overcoming the problems of fuzzy feature information of ground objects and background noise, which lead to poor reconstruction effect of remote sensing images.

Reference | Related Articles | Metrics

Abstract （70）

PDF （38）

Select

Dense Pedestrian Detection Based on Shifted Window Attention Multi-scale Equalization

YU Fan, ZHANG Jing

Journal of Frontiers of Computer Science and Technology 2024, 18 (5): 1286-1300. DOI: 10.3778/j.issn.1673-9418.2303110

Due to the large differences in the shape and scale of pedestrian targets in real-world scenarios, compared with traditional methods, which often have lower average accuracy in pedestrian detection, transformer-based networks with attention mechanisms have shown strong performance in the field of pedestrian detection. However, there are still some difficulties in multi-scale detection in dense scenes. In dense scenes, there are usually a large number of occluded or small-scale pedestrian targets, leading to a large number of false and missed detections, as well as a significant amount of computing resources. Additionally, accurate detection of all targets becomes extremely difficult when pedestrian targets overlap significantly. To address these issues, a dense scene multi-scale pedestrian detection algorithm based on shifted window attention is proposed. Using modified Swin blocks in backbone enables the network to extract more detailed features while reducing the heavy computational burden brought by attention mechanisms. To effectively solve the feature fusion problem, DyHead blocks are used in the neck to unify multiple attention operations, thereby improving feature fusion efficiency. To address the feature balance issue, a feature scale-equalizing module based on full connection is designed, which constructs different residual structures between various levels of the feature pyramid to balance features and assist the model in generating higher-quality feature maps. Experimental results on the WiderPerson dataset show that this algorithm improves AP value by 1.1 percentage points, with 1.0 and 0.7 percentage points improvement in the most important small and medium targets, respectively.

Reference | Related Articles | Metrics

Abstract （93）

PDF （68）

Select

UAV Remote Sensing Object Detection Based on 3D Multi-layer Feature Collaboration

LYU Fu, FU Yuheng, HE Lina, YANG Dongpeng

Journal of Frontiers of Computer Science and Technology 2024, 18 (5): 1301-1317. DOI: 10.3778/j.issn.1673-9418.2401007

To solve the large proportion of small targets and complex background in UAV (unmanned aerial vehicle) aerial image, the current object detection model has the problems of low accuracy and missed detection of small targets. Based on the YOLOv8s model, this paper proposes a 3D multi-layer feature collaboration UAV remote sensing object detection algorithm. Firstly, based on the coordinate attention, this paper proposes 3D multi-branch coordinate attention (MBCA), which improves the global feature extraction ability of the model and reduces the computation of spatial dimension by increasing the information interaction of channel dimension and the splitting and fusion of extended branches. Secondly, SPD-Conv is used to replace part of the standard convolution, which effectively retains more feature information and speeds up inference during downsampling. Then, a more efficient FastDBB_Bottleneck module is used in the C2f module, combining PConv and DBB structure reparameterization superposition to further reduce the calculation of the model. Finally, PG-Detect detection head is introduced to significantly reduce the calculation and effectively reduce the missed detection rate of small targets. Experimental results on the VisDrone2019 dataset show that the mAP50 value of the proposed method reaches 44.5%, which is 5.7 percentage points higher than that of the YOLOv8s baseline model. Simultaneously, the crack detection verification experiment is carried out on the self-built dam crack dataset, and the mAP50 value of the improved method is 3.3 percentage points higher than that of YOLOv8s, the FPS reaches 289 frames. Experimental results show that the proposed method improves the accuracy and real-time performance of the detection model in complex scene object detection, and has good adaptability and robustness.

Reference | Related Articles | Metrics

Abstract （46）

PDF （18）

Select

Pre-weighted Modulated Dense Graph Convolutional Networks for 3D Human Pose Estimation

MA Jinlin, CUI Qilei, MA Ziping, YAN Qi, CAO Haojie, WU Jiangtao

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 963-977. DOI: 10.3778/j.issn.1673-9418.2302065

Graph convolutional networks (GCN) have increasingly become one of the main research hotspots in 3D human pose estimation. The method of modeling the relationship between human joint points by GCN has achieved good performance in 3D human pose estimation. However, the 3D human pose estimation method based on GCN has issues of over-smooth and indistinguishable importance between joint points and adjacent joint points. To address these issues, this paper designs a modulated dense connection (MDC) module and a pre-weighted graph convolutional module, and proposes a pre-weighted modulated dense graph convolutional network (WMDGCN) for 3D human pose estimation based on these two modules. For the problem of over-smoothing, the modulation dense connection can better realize feature reuse through hyperparameter [α] and [β] (hyperparameter [α] represents the weight proportion of features of layer L to previous layers, and hyperparameter [β] represents the propagation strategies of the features of previous layers to layer L), thus effectively improving the expression ability of features. To address the issue of not distinguishing the importance of the joint points and adjacent joint points, the pre-weighted graph convolution is used to assign higher weights to the joint point. Different weight matrices are used for the joint point and its adjacent joint points to capture human joint point features more effectively. Comparative experimental results on the Human3.6M dataset show that the proposed method achieves the best performance in terms of parameter number and performance. The parameter number, MPJPE and P-MPJPE values of WMDGCN are 0.27 MB, 37.46 mm and 28.85 mm, respectively.

Reference | Related Articles | Metrics

Abstract （91）

PDF （148）

Select

Skin Disease Segmentation Method Combining Dense Encoder and Dual-Path Attention

WANG Longye, XIAO Yue, ZENG Xiaoli, ZHANG Kaixin, MA Ao

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 978-989. DOI: 10.3778/j.issn.1673-9418.2303122

Aiming at the problems of different shapes and sizes, discontinuous and blurred boundaries, and high similarity between the lesion area and the background in dermoscopic image lesion areas, a skin lesion segmentation network integrating dense encoder and dual-path attention (DEDA-Net) is proposed. Firstly, the network employs a dense coding module for multi-scale information fusion to enhance network feature extraction capabilities, alleviating blurred edges in dermoscopic images. Skip connection and residual path are used to reduce the semantic gap in the network coding and decoding parts. Secondly, a global normal pooling layer is proposed that weights feature points in the feature map based on their degree of relevance, and a dual-path attention module that extracts feature information in two dimensions, space and channel, is designed to avoid the problem that it is difficult to distinguish the lesion area from the background due to insufficient global information acquisition. Finally, using the idea of an auxiliary loss function, a weighted loss function is employed on both sides of the middle of the network and the final output layer to improve generalization ability of the network. Experimental results show that the algorithm achieves a segmentation accuracy of 96.45%, a specificity of 97.82%, a Dice coefficient of 93.16%, and an IoU of 86.61% on the ISIC2017 dataset, which are 5.93 percentage points, 6.45 percentage points, 6.53 percentage points, and 5.63 percentage points higher than the baseline U-Net, demonstrating the effectiveness of the proposed algorithm in accurately segmenting skin lesion areas.

Reference | Related Articles | Metrics

Abstract （61）

PDF （61）

Select

Few-Shot Image Classification Method with Feature Maps Enhancement Prototype

XU Huajie, LIANG Shuwei

Journal of Frontiers of Computer Science and Technology 2024, 18 (4): 990-1000. DOI: 10.3778/j.issn.1673-9418.2302015

Due to the scarcity of labeled samples, the class prototype obtained by support set samples is difficult to represent the real distribution of the whole class in metric-based few-shot image classification methods. Meanwhile, samples of the same class may also have large difference in many aspects and the large intra-class bias may make the sample features deviate from the class center. Aiming at the above problems that may seriously affect the performance, a few-shot image classification method with feature maps enhancement prototype (FMEP) is proposed. Firstly, this paper selects some similar features of the query set sample feature maps with cosine similarity and adds them to class prototypes to obtain more representative prototypes. Secondly, this paper aggregates similar features of the query set to alleviate the problem caused by large intra-class bias and makes features distribution of the same class closer. Finally, this paper compares enhanced prototypes and aggregated features which are both closer to real distribution to get better results. The proposed method is tested on four commonly used few-shot classification datasets, namely MiniImageNet, TieredImageNet, CUB-200 and CIFAR-FS. The results show that the proposed method can not only improve the performance of the baseline model, but also obtain better performance compared with the same type of methods.

Reference | Related Articles | Metrics

Abstract （146）

PDF （187）

Select

Prototype Rectification Few-Shot Classification Model with Dual-Path Cooperation

LYU Jia, ZENG Mengyao, DONG Baosen

Journal of Frontiers of Computer Science and Technology 2024, 18 (3): 693-706. DOI: 10.3778/j.issn.1673-9418.2301070

In the learning process of the metric-based meta-learning, there are some problems, such as the lack of prior knowledge acquired due to the distribution of scarce data, the interference of weakly related or unrelated features extracted from a single-view sample, and the deviations of representative features caused by classification. To solve these problems, a prototype rectification few-shot classification model with dual-path cooperation is proposed in this paper. Firstly, the dual-path cooperation module adaptively highlights key features and weakens weakly related features from a multi-view perspective, and makes full use of feature information to obtain prior knowledge to improve the expression ability of features. Secondly, the problem of intra-class prototype with deviations is solved by the prototype rectification classification strategy with the sample feature information of the query set. Finally, the model parameters are updated reversely by means of the loss function, and the classification accuracy of the model is improved. Comparative experiments of 5-way 1-shot and 5-way 5-shot are conducted on five public datasets. Compared with baseline model, on the miniImageNet dataset, the accuracy is increased by 5.57 percentage points and 3.90 percentage points. On the tieredImageNet dataset, the accuracy is increased by 5.68 percentage points and 3.93 percentage points. On the CUB dataset, the accuracy is increased by 6.93 percentage points and 3.13 percentage points. On the CIFAR-FS dataset, the accuracy is increased by 8.03 percentage points and 1.65 percentage points. On the FC-100 dataset, the accuracy is increased by 4.25 percentage points and 4.89 percentage points. Experimental results show that the proposed model has good performance in the field of few-shot learning, and the modules in the model can be migrated to other models.

Reference | Related Articles | Metrics

Abstract （117）

PDF （144）

Select

MFFNet: Image Semantic Segmentation Network of Multi-level Feature Fusion

WANG Yan, NAN Peiqi

Journal of Frontiers of Computer Science and Technology 2024, 18 (3): 707-717. DOI: 10.3778/j.issn.1673-9418.2209110

In the task of image semantic segmentation, most methods do not make full use of features of different scales and levels, but directly upsampling, which will cause some effective information to be dismissed as redundant information, thus reducing the accuracy and sensitivity of segmentation of some small categories and similar categories. Therefore, a multi-level feature fusion network (MFFNet) is proposed. MFFNet uses encoder-decoder structure, during the encoding stage, the context information and spatial detail information are obtained through the context information extraction path and spatial information extraction path respectively to enhance the inter-pixel correlation and boundary accuracy. During the decoding stage, a multi-level feature fusion path is designed, and the context information is fused by the mixed bilateral fusion module. Deep information and spatial information are fused by high-low feature fusion module. The global channel-attention fusion module is used to obtain the connections between different channels and realize global fusion of different scale information. The MIoU (mean intersection over union) of MFFNet network on the PASCAL VOC 2012 and Cityscapes validation sets is 80.70% and 76.33%, respectively, achieving better segmentation results.

Reference | Related Articles | Metrics

Abstract （325）

PDF （211）

Select

Lightweight Cross-Gating Transformer for Image Restoration and Enhancement#br# #br#

XUE Jinqiang, WU Qin

Journal of Frontiers of Computer Science and Technology 2024, 18 (3): 718-730. DOI: 10.3778/j.issn.1673-9418.2301050

Recent image restoration and image enhancement methods are difficult to balance the robustness of multiple subtasks with the small number of parameters and computational costs. To solve this problem, this paper proposes a lightweight cross-gating transformer (CGT) for efficient image restoration task. On the one hand, this paper summarizes the limitations of traditional global self-attention mechanism to capture global dependencies, and improves the global self-attention mechanism to a cross-level cross-gating self-attention mechanism. Meanwhile, a lightweight feed-forward neural network is proposed to learn cross-level local dependencies at a very small computational cost and reconstruct clear features in the adjacent locality. On the other hand, in view of the defect that the traditional method of adding or concatenating encoder and decoder equally leads to information interference, a long-distance reset update module is proposed to suppress and update useless information and clear features respectively. This paper conducts extensive quantitative experiments and is compared with 25 state-of-the-art methods on 9 datasets for image denoising, image deraining and low-light image enhancement, respectively. Experimental results prove that the proposed lightweight cross-gating transformer achieves high peak signal-to-noise ratio and structural similarity in image restoration and image enhancement tasks with a small number of parameters and computation, and reconstructs clear images close to real-world scenes, achieving state-of-the-art image restoration performance.

Reference | Related Articles | Metrics

Abstract （199）

PDF （215）

Select

Detection and Removal of Noise in Images Based on Amount of Knowledge Associated with Intuitionistic Fuzzy Sets

GUO Kaihong, ZHOU Yongzhi, WU Zheng, ZHANG Lei

Journal of Frontiers of Computer Science and Technology 2024, 18 (2): 439-452. DOI: 10.3778/j.issn.1673-9418.2209019

In response to the shortcomings of existing image noise detection algorithms that rely on the flawed intuitionistic fuzzy entropy (IFE) theory, a method of image noise detection and removal based on intuitionistic fuzzy amount of knowledge (IFAK) is proposed by introducing the latest knowledge measure (KM) theory and model. In the noise detection stage, the optimal average intensity of the noisy image foreground and background is determined based on the maximum IFAK, and the parametric model of noise detection is constructed accordingly to mark the probability of noise pixels and suspected noise pixels, showing excellent performance of noise detection. In the noise removal stage, a denoising model based on IFAK and probability of noise pixels is proposed by using the noise probability matrix, which can not only effectively denoise, but also better protect the characteristics of image edges and non-noise extreme pixels. Comparative experiments are carried out on standard datasets and classical test images, respectively. Experimental results show that the proposed method can accurately identify the image impulse noise and effectively realize image denoising. The overall performance outperforms other similar algorithms. The key metrics PSNR and SSIM are increased by 14.81% and 11.35%, respectively. In this paper, the latest KM theory is applied to image denoising, and excellent evaluation metrics and visual effects are obtained, while innovative applications of this theory in other related fields are also achieved.

Reference | Related Articles | Metrics

Abstract （146）

PDF （173）

Select

GUS-YOLO Remote Sensing Target Detection Algorithm Introducing Context Information and Attention Gate

ZHANG Huawei, ZHANG Wenfei, JIANG Zhanjun, LIAN Jing, WU Baijing

Journal of Frontiers of Computer Science and Technology 2024, 18 (2): 453-464. DOI: 10.3778/j.issn.1673-9418.2305005

At present, there are still some problems in the remote sensing target detection algorithm based on the general YOLO (you only look once) series, such as not making full use of the global context information of the image, not narrowing the semantic gap in the feature fusion pyramid part, and not suppressing the interference of redundant information. On the basis of combining the advantages of YOLO algorithms, this paper proposes GUS-YOLO (network of global context extraction unit and attention gate-based YOLOS) algorithm. It has a backbone network Global Backbone that can make full use of global context information. Other than that, this algorithm introduces the Attention Gate module into the top-down structure of the fused feature pyramid, which can emphasize the necessary feature information and suppress redundant information. Furthermore, this paper designs the best network structure for the Attention Gate module and proposes the feature fusion structure U-Net of proposed network. Finally, because the ReLU activation function may lead to the problem that the model gradient is no longer updated, the Attention Gate module uses a learnable SMU (smooth maximum unit) activation function, which can improve the robustness of the model. On the NWPU VHR-10 remote sensing dataset, this algorithm achieves 1.64 percentage points and 9.39 percentage points performance improvement on mAP0.50 and mAP0.75 respectively compared with YOLOV7. Compared with the current 7 mainstream detection algorithms, this algorithm achieves better detection performance.

Reference | Related Articles | Metrics

Abstract （134）

PDF （74）

Select

Emotional Rendering of 3D Indoor Scene with Chinese Elements

SHENG Jiachuan, HU Guolin, LI Yuzhi

Journal of Frontiers of Computer Science and Technology 2024, 18 (2): 465-476. DOI: 10.3778/j.issn.1673-9418.2210096

One of the challenging tasks is to use computer technology to automatically design a virtual indoor scene that both satisfies realness and matches the target emotion. The subjective nature of emotions brings uncertainty of results. At present, there is a lack of approach to identify and evaluate emotion of indoor scenes. In addition, under the premise of fully considering emotional appeals, the authenticity of scene is also one of important factors in indoor scene design. Aiming at above problems, a novel optimization algorithm combining Chinese elements for indoor scenes rendering is proposed. Firstly, an emotion classifier is trained to identify and evaluate the emotion with the features extracted via deep learning from a indoor scene dataset containing 25000 images. Secondly, in order to ensure the authenticity of rendering results, an algorithm is proposed to evaluate how realistic the colors of the objects’ textures. Next, an algorithm is designed to render indoor scene automatically according to the target emotion. Then, a style transfer algorithm integrating with Chinese elements is used to carry out fine-grained refinement processing on the furnishings in an indoor scene, improve the spatial connotation, cultural connotation and emotional expression of rendering results, and enhance the visual appeal. Finally, the approach is tested in four indoor scenes, and the correctness and effectiveness of the approach are verified through statistical analysis of results and user survey data.

Reference | Related Articles | Metrics

Abstract （159）

PDF （167）

Select

Counting Method Based on Density Graph Regression and Object Detection

GAO Jie, ZHAO Xinxin, YU Jian, XU Tianyi, PAN Li, YANG Jun, YU Mei, LI Xuewei

Journal of Frontiers of Computer Science and Technology 2024, 18 (1): 127-137. DOI: 10.3778/j.issn.1673-9418.2209065

In response to the low recall rate of detection-based methods and the problem of missing target location information in density-based methods, which are the two mainstream dense-counting methods, a detection and counting method based on density map regression is proposed by combining the two tasks, achieving the counting and positioning of target objects in dense scenes. Complementing the advantages of two methods not only improves recall rate but also calibrates all targets. To extract richer feature information to deal with complex data scenarios, a feature pyramid optimization module is proposed, which vertically fuses low-level high-resolution features with top-level abstract semantic features and horizontally fuses same-size features to enrich the semantic expression of target objects. To address the issue of low pixel proportions occupied by target objects in dense counting scenarios, an attention mechanism for small targets is proposed to improve the network’s detection sensitivity, which can enhance the attention of the network to target objects by constructing a mask on the input image. Experimental results demonstrate that the proposed method significantly improves recall rate and accurately locates targets while maintaining accuracy, effectively providing counting and positioning information of input image, which has a wide range of application prospects in various fields such as industry and ecology.

Reference | Related Articles | Metrics

Abstract （103）

PDF （71）

Select

Improved YOLOv4-Tiny Lightweight Target Detection Algorithm

HE Xiangjie, SONG Xiaoning

Journal of Frontiers of Computer Science and Technology 2024, 18 (1): 138-150. DOI: 10.3778/j.issn.1673-9418.2301034

Object detection is an important branch of deep learning. A large number of edge devices need lightweight object detection algorithms, but the existing lightweight universal object detection algorithms have problems of low detection accuracy and slow detection speed. To solve this problem, an improved YOLOv4-Tiny algorithm based on attention mechanism is proposed. The structure of the original backbone network of YOLOv4-Tiny algorithm is adjusted, the ECA (efficient channel attention) attention mechanism is introduced, the traditional spatial pyramid pooling (SPP) structure is improved to DC-SPP structure by using void convolution, and the CSATT (channel spatial attention) attention mechanism is proposed. The neck network of CSATT-PAN (channel spatial attention path aggregation network) is formed with the feature fusion network PAN, which improves the feature fusion capability of the network. Compared with the original YOLOv4-Tiny algorithm, the proposed YOLOv4-CSATT algorithm is significantly more sensitive to information and accurate in classification when the detection speed is basically the same. The accuracy is increased by 12.3 percentage points on VOC dataset and 6.4 percentage points is increased on COCO dataset. Moreover, the accuracy is 3.3，5.5，6.3，17.4，10.3，0.9 and 0.6 percentage points higher than the Faster R-CNN, SSD, Efficientdet-d1, YOLOv3-Tiny, YOLOv4-MobileNetv1, YOLOv4-MobileNetv2 and PP-YOLO algorithms respectively on VOC dataset, and 2.8, 7.1, 4.2, 18.0, 12.2, 2.1 and 4.0 percentage points higher in recall rate, respectively, with an FPS of 94. In this paper, the CSATT attention mechanism is proposed to improve the model’s ability to capture spatial channel information, and the ECA attention mechanism is combined with the feature fusion pyramid algorithm to improve the model’s feature fusion ability and target detection accuracy.

Reference | Related Articles | Metrics

Abstract （334）

PDF （218）

Select

YOLOv8-VSC: Lightweight Algorithm for Strip Surface Defect Detection

WANG Chunmei, LIU Huan

Journal of Frontiers of Computer Science and Technology 2024, 18 (1): 151-160. DOI: 10.3778/j.issn.1673-9418.2308060

Currently, in the field of strip steel surface defect detection, the generalized target detection algorithm is highly complex and computationally large, while terminal equipment responsible for the detection of some small and medium-sized enterprises usually does not have strong computational capabilities, and the computational resources are limited, which leads to difficulties in the deployment of detection algorithms. To solve this problem, this paper proposes a lightweight strip steel surface defect detection model YOLOv8-VSC based on the YOLOv8n target detec-tion framework, which uses a lightweight VanillaNet network as the backbone feature extraction network and reduces the complexity of the model by reducing the unnecessary branching structure. Meanwhile, the SPD module is introduced to speed up the inference of the model while reducing the number of network layers. To further improve the detection accuracy, a lightweight up-sampling operator, CARAFE, is used in the feature fusion network to improve the quality and richness of the features. Finally, extensive experiments on the NEU-DET dataset yield a model with parametric and computational quantities of 1.96×106 and 6.0 GFLOPs, which are only 65.1% and 74.1% of the baseline, and the mAP reaches 80.8%, which is an improvement of 1.8 percentage points from the baseline. In addition, experimental results on the aluminum surface defect dataset and the VOC2012 dataset show that the proposed algorithm has good robustness. Compared with advanced target detection algorithms, the proposed algorithm requires fewer computational resources while ensuring high detection accuracy.

Reference | Related Articles | Metrics

Abstract （683）

PDF （571）

Select

Object Detection Algorithm with Dynamic Loss and Enhanced Feature Fusion

ZHAO Qiming, ZHANG Tao, SUN Jun

Journal of Frontiers of Computer Science and Technology 2023, 17 (12): 2942-2953. DOI: 10.3778/j.issn.1673-9418.2301006

Object detection is one of the hottest directions in the field of computer vision. In order to further improve the performance of the object detection algorithm, a dynamic intersection over union loss (DYIoU Loss) based on the intersection over union (IoU) is proposed to solve the limitations of the position loss function in the training process. The relationship between the internal components of the position loss function is fully considered, and the weight of the position loss components can be given dynamically at different stages of the training to more specifically constrain the network. This enables the network to optimize different parts more effectively during the early, middle, and late stages of training to better align with the characteristics of the object detection task. In addition, in order to solve the deficiency of the feature fusion stage in the object detection network, deformable convolution is applied to the PAN (path aggregation network) structure, and a deformable path aggregation network neck (DePAN Neck) that can be plugged in is designed to improve the model’s ability to fuse multi-scale features and improve its detection performance on small objects. The above methods are applied to YOLOv6 models of YOLOv6-N, YOLOv6-T and YOLOv6-S sizes, and rich experiments are designed on the COCO2017 dataset to validate the effectiveness. The results show an average increase of 2.0 percentage points in the average precision (mAP).

Reference | Related Articles | Metrics

Abstract （202）

PDF （215）

Select

Open World Object Detection Combining Graph-FPN and Robust Optimization

XIE Binhong, ZHANG Pengju, ZHANG Rui

Journal of Frontiers of Computer Science and Technology 2023, 17 (12): 2954-2966. DOI: 10.3778/j.issn.1673-9418.2211068

Open world object detection (OWOD) requires detecting all known and unknown object categories in the image, and the model must gradually learn new categories to adaptively update knowledge. Aiming at the problems of low recall rate of unknown objects and catastrophic forgetting of incremental learning in ORE (open world object detection) method, this paper proposes adjustable robust optimization of ORE based on graph feature pyramid (GARO-ORE). Firstly, using the superpixel image structure in Graph-FPN and the hierarchical design of context layer and hierarchical layer, rich semantic information can be obtained and the model can accurately locate unknown object. Then, using the robust optimization method to comprehensively consider the uncertainty, a base class learning strategy based on flat minimum is proposed, which greatly ensures that the model avoids forgetting the previously learnt category knowledge while learning new categories. Finally, the classification weights initiali-zation method based on knowledge transfer is used to improve the adaptability of the model to new classes. Experimental results on the OWOD dataset show that GARO-ORE achieves better detection results on the recall rate of unknown categories. In the three types of incremental object detection tasks of 10 + 10, 15 + 5, and 19 + 1, the mAP is increased by 1.38, 1.42 and 1.44 percentage points, respectively. It can be seen that GARO-ORE can improve the recall rate of unknown object detection, and promote the learning of subsequent tasks while effectively alleviating the catastrophic forgetting problem of old tasks.

Reference | Related Articles | Metrics

Abstract （119）

PDF （124）

Select

Small Object Detection Based on Two-Stage Calculation Transformer

XU Shoukun, GU Jianan, ZHUANG Lihua, LI Ning, SHI Lin, LIU Yi

Journal of Frontiers of Computer Science and Technology 2023, 17 (12): 2967-2983. DOI: 10.3778/j.issn.1673-9418.2210120

Despite the current small object detection task has achieved significant improvements, it still suffers from some problems. For example, it is a challenge to extract small object features because of little information in the scene of small objects, which may lose the original feature information of small object, resulting in poor detection results. To address this problem, this paper proposes a two-stage calculation Transformer (TCT) based small object detection network. Firstly, a two-stage calculation Transformer is embedded in the backbone feature extraction network for feature enhancement. Based on the traditional Transformer values computation, multiple 1D dilated convolutional layer branches with different feature fusions are utilized to implement global self-attention for the purpose of improving the feature representation and information interaction. Secondly, this paper proposes an effective residual connection module to improve the low-efficiency convolution and activation of the current CSPLayer, which helps to advance the information flow and learn more rich contextual details. Finally, this paper proposes a feature fusion and refinement module for fusing multi-scale features and improving the target feature representation capability. Quantitative and qualitative experiments on PASCAL VOC2007+2012 dataset, COCO2017 dataset and TinyPerson dataset show that the proposed algorithm has better ability of target feature extraction and higher detection accuracy for small target detection, compared with YOLOX.

Reference | Related Articles | Metrics

Abstract （248）

PDF （280）

Select

Object Detection Algorithm for 3D Coordinate Attention Path Aggregation Network

TU Xiaomei, BAO Xiao'an, WU Biao, JIN Yuting, ZHANG Qingqi

Journal of Frontiers of Computer Science and Technology 2023, 17 (12): 2984-2998. DOI: 10.3778/j.issn.1673-9418.2211102

In practical industrial applications, YOLO series algorithms are not accurate enough to locate the object prediction boxes, and it is difficult to apply to realistic scenarios with high positioning requirements. The object detection algorithm YOLO-T of the three-dimensional coordinate attention path aggregation network is proposed. Firstly, the shortcut connection method is used to fuse the cross-layer features of the path aggregation feature pyramid to retain its shallow semantic information. Secondly, based on the coordinate attention mechanism, a three-dimensional coordinate attention (TDCA) model is proposed, which is used to pay attention weight to the features in the path aggregation feature pyramid (TPA-FPN (TDCA path aggregation feature pyramid networks)) to retain useful information and remove redundant information. Thirdly, the loss matrix calculation method of SimOTA (simplify optimal transport assignment) in the label allocation strategy is improved, which enhances the performance while ensuring no loss of efficiency. Finally, Depthwise Separable Conv is used to improve the convolution module in the backbone feature extraction network to make the model lightweight. Experimental results show that the detection accuracy mAP@0.50 of the algorithm is 1.3 percentage points higher than that of YOLOX-S on the PASCAL VOC2007+2012 dataset, and the mAP@0.50:0.95 is improved by 3.8 percentage points. The average detection accuracy mAP@0.50:0.95 is improved by 2.4 percentage points on the COCO2017 dataset.

Reference | Related Articles | Metrics

Abstract （167）

PDF （146）

Select

Fine-Grained Visual Categorization: Deep Pairwise Feature Comparison Interaction Algorithm

WANG Min, ZHAO Peng, GUO Xinping, MIN Fan

Journal of Frontiers of Computer Science and Technology 2023, 17 (11): 2663-2675. DOI: 10.3778/j.issn.1673-9418.2207091

Fine-grained visual categorization is an important but challenging task in computer vision due to high intraclass and low inter-class variance. Classical fine-grained image recognition methods use a single-input with single-output approach, which limits the ability of the model to learn inference from paired images. Inspired by the behavior of human beings when discriminating fine-grained images, a deep pairwise feature comparison interactive fine-grained classification algorithm (PCI) is proposed to find common or different features between image pairs and effectively improve the fine-grained recognition accuracy. Firstly, PCI establishes a positive-negative pair input strategy to extract pairwise depth features of fine-grained images. Secondly, a deep pairwise feature interaction mechanism is established to realize global information learning, depth comparison and depth adaptive interaction of paired depth features. Finally, a pairwise feature contrastive learning mechanism is established to constrain pairwise deep fine-grained features through contrastive learning, increasing the similarity between positive pairs and reducing the similarity between negative pairs. Extensive experiments are conducted on the popular fine-grained datasets CUB-200-2011, Stanford Dogs, Stanford Cars, and FGVC-Aircraft, and the experimental results show that PCI outperforms current state-of-the-art methods.

Reference | Related Articles | Metrics

Abstract （181）

PDF （186）

Select

Real-Time Traffic Sign Detection Algorithm Combining Attention Mechanism and Contextual Information

FENG Aiqi, WU Xiaojun, XU Tianyang

Journal of Frontiers of Computer Science and Technology 2023, 17 (11): 2676-2688. DOI: 10.3778/j.issn.1673-9418.2212065

Traffic sign detection has received widespread concern in recent years. However, existing methods often fail to meet the real-time detection requirements, and there are many cases of missing detection in small-scale traffic sign detection. To solve these problems, a real-time traffic sign detection algorithm combining attention mechanism and contextual information is proposed. Using YOLOv5 as the base model, firstly, spatial attention mechanism is embedded in the backbone to adaptively enhance the features of important positions and suppress interference information to improve the feature extraction capability of the backbone network. Secondly, the cross stage partial window Transformer module is designed to learn correlations of different locations and to capture rich contextual information around traffic signs, which is beneficial to improving the detection accuracy of small-scale traffic signs. Thirdly, the lightweight feature fusion network is proposed to fuse the feature maps of different scales, which can reduce the computational burden and ensure the effective feature fusion. Finally, in the post-processing stage, Gaussian weighted fusion is used to amend the prediction boxes to improve the positioning accuracy. Experiments on TT100K and DFG traffic sign detection datasets show that the proposed method can effectively improve the missing detection of small-scale traffic signs, with higher accuracy and real-time performance, and can meet the requirements of traffic sign detection in actual scenarios.

Reference | Related Articles | Metrics

Abstract （211）

PDF （234）

Select

HSKDLR: Lightweight Lip Reading Method Based on Homogeneous Self-Knowledge Distillation

MA Jinlin, LIU Yuhao, MA Ziping, GONG Yuanwen, ZHU Yanbin

Journal of Frontiers of Computer Science and Technology 2023, 17 (11): 2689-2702. DOI: 10.3778/j.issn.1673-9418.2208032

In order to solve the problems of low recognition rate and large amount of calculation in lip reading, this paper proposes a lightweight model for lip reading named HSKDLR （homogeneous self-knowledge distillation for lip reading）. Firstly, the S-SE （spatial SE）attention module is designed to pay attention to the spatial features of the lip image, which can construct the i-Ghost Bottleneck （improved Ghost Bottleneck） module to extract the channel features and spatial features of the lip image, thereby improving the accuracy of the lip language recognition model. Secondly, a lip reading model is built based on i-Ghost Bottleneck, which reduces the model computation by optimizing the combination of bottleneck structures to a certain extent. Then, in order to improve the accuracy of the model and reduce time consumption, a model optimization method of the homogeneous self-knowledge distillation （HSKD） is proposed. Finally, this paper employs the HSKD to train the lip reading model and verify its recognition performance. And the experimental results show that HSKDLR has higher recognition accuracy and lower computational complexity than the compared methods. The accuracy of the proposed method on LRW dataset is 87.3%, the floating-point number computation is as low as 2.564 GFLOPs, and the parameter quantity is as low as 3.8723×107. Moreover, HSKD can be applied to most lip reading models to improve recognition accuracy effectively and reduce training time.

Reference | Related Articles | Metrics

Abstract （183）

PDF （160）

Select

Distortion-Aware Correlation Filter Object Tracking Algorithm

JIANG Wentao, REN Jinrui

Journal of Frontiers of Computer Science and Technology 2023, 17 (11): 2703-2720. DOI: 10.3778/j.issn.1673-9418.2209093

A distortion-aware correlation filter object tracking algorithm is proposed to address the problem that the existing correlation filters have insufficient ability to deal with target distortion and the filter model updating error accumulation easily leads to tracking failure. Firstly, particle sampling is used to construct a spatial reference weight for enhancing the target information and adapt to changes in the target appearance between adjacent frames so that the filter is focused on the reliable part of the learning target and the interference of background information is suppressed. Meanwhile, to optimize the algorithm and reduce computational complexity, the alternating direction multiplier method is used to solve the objective optimal function value with fewer iterations. Finally, to further enhance the discrimination ability of the filter, a target distortion-aware strategy is designed, which combines the average peak correlation energy and the response map peak temporal constrain to measure the distortion of the target affected by interference factors and to determine whether the current tracking result is reliable. When the reliability of target tracking and positioning is low, the particle filter is used to selectively re-detect the target. Depending on the extent of distortion of the tracking target at any given time, the filter model is adaptively updated. Compared with various representative correlation filters on the OTB50, OTB100, and DTB70 datasets, the experimental results show that the tracking success rate and precision of the distortion-aware correlation filter object tracking algorithm are the best, and it has strong robustness in the face of targets distorted by multiple interference factors in the actual scene.

Reference | Related Articles | Metrics

Abstract （113）

PDF （153）

Select

Image Inpainting Combining Semantic Priors and Deep Attention Residuals

CHEN Xiaolei, YANG Jia, LIANG Qiduo

Journal of Frontiers of Computer Science and Technology 2023, 17 (10): 2450-2461. DOI: 10.3778/j.issn.1673-9418.2208014

To overcome the shortcomings of existing image inpainting methods, such as the lack of authenticity in the inpainting results, the lack of effective processing of missing region and non-missing region information, and the lack of effective processing of image feature information in different stages, an image inpainting method combining semantic priors and deep attention residual group is proposed. The image inpainting network is mainly composed of semantic priors network, deep attention residual group and full-scale skip connection. The semantic priors network learns the complete semantic priors information of visual elements in the missing region, and uses the learned semantic information to complete the missing region. The deep attention residual group enables the generator not only to pay more attention to the missing area of the image, but also to learn the features of each channel adaptively. The full-scale skip connection can combine the low-level feature map containing the image boundary with the high-level feature map containing the image texture and detail to inpaint the missing area of the image. In this paper, a full comparison experiment is conducted on CelebA-HQ dataset and Paris Street View dataset, and the experimental results show that the proposed method is superior to the current representative advanced image inpainting methods.

Reference | Related Articles | Metrics

Abstract （147）

PDF （124）

Select

Enhanced Foreground Perception Correlation Filtering Target Tracking

JIANG Wentao, XU Xiaoqing

Journal of Frontiers of Computer Science and Technology 2023, 17 (10): 2462-2477. DOI: 10.3778/j.issn.1673-9418.2207055

In order to alleviate the problem that the tracking accuracy of the correlation filtering target tracking algorithm is low due to the influence of deformation, fast motion, motion blur and similarity interference, this paper proposes the correlation filtering target tracking with enhanced foreground perception. An improved color histogram interference sensing model is introduced based on the correlation filtering algorithm. Firstly, based on the traditional background object model, the color difference component between foreground histogram and background histogram is enhanced to obtain a more prominent foreground color histogram interference sensing model. The correlation filter algorithm and the color histogram interference perception model are used to extract corresponding features and calculate their respective responses. Then the color histogram interference perception model is used to calculate the average probability that the pixels in the target area belong to the target. The average probability controls the fusion weights of correlation filter response and color histogram response. The maximum position of the fusion interference perception response graph is used to locate the target. Finally, the discriminant conditions of tracking anomalies are set. When abnormal conditions occur, no model update is carried out. When the tracking confidence is high, the range of target change is judged by frame difference method and Euclidean distance between two frames. The corresponding learning rate of correlation filtering template is set to realize the adaptive updating of tracking template. Experimental comparison with mainstream algorithms is conducted on OTB100 dataset, and the experimental results show that the proposed algorithm has better tracking performance and robustness than other algorithms under complex challenges such as deformation, fast motion, motion blur and similarity interference.

Reference | Related Articles | Metrics

Abstract （91）

PDF （68）

Select

3D Human Animation Synthesis with Transformer-CVAE

FENG Wenke, SHI Min, ZHU Dengming, LI Zhaoxin

Journal of Frontiers of Computer Science and Technology 2023, 17 (9): 2137-2147. DOI: 10.3778/j.issn.1673-9418.2206060

3D human animation synthesis is a dominant technology in the domain of 3D animation. Traditional workflows depending on motion capture cannot generate human animation quickly due to complicated procedure and long authoring period. Existing data-driven methods have limited learning capability and therefore the gene-rated animations are lack of realism and the categories of the generation are relatively limited. To that end, this paper presents a 3D human animation synthesis method based on a Transformer-based conditional variation auto-encoder (Transformer-CVAE). Firstly, the motion dataset is constructed and classified by the motion category. Then, the temporal relationship between different frames in a common sequence is established by means of the Transformer architecture, and a variational autoencoder is further combined with the Transformer to infer the probabilistic distribution of human motions. Next, to control the desired body motion generated, the constraints are imposed on the latent space. Finally, a series of experiments are conducted on AMASS, HumanACT12 and UESTC datasets and the qualitative and quantitative evaluation is made from two aspects: the visual effect and the performance. Experimental results demonstrate that the method achieves superior performance in the metrics like STED, RMSE, etc. compared with the state-of-art, while capable of synthesizing various human animations with realism.

Reference | Related Articles | Metrics

Abstract （231）

PDF （258）

Select

Research on Crisp Edge Detection with Fusion of Convolutional Features

WANG Bing, HUANG Gang, ZHANG Xingpeng

Journal of Frontiers of Computer Science and Technology 2023, 17 (9): 2148-2160. DOI: 10.3778/j.issn.1673-9418.2206085

Benefiting from convolutional neural networks (CNN), edge detection has surpassed human performance on several benchmark datasets. However, such algorithms cannot guarantee the crispness of edges and the accuracy of positioning. In order to obtain the target edge map that is refined and clear, suppressing background texture effectively, and locating accurately, a crisp edge detection algorithm with fusion of convolutional features (FCF) is proposed in this paper. The algorithm uses VGG16 as the backbone network for convolutional feature extraction, fuses the convolutional features at different stages after upsampling, and obtains a crisp edge map through the refine fusion block (RFB) designed in this paper. RFB uses multiple GroupNorm refine blocks (GRB) to refine the resulting edge map. In addition, to balance edge pixels and non-edge pixels, this paper also proposes a refine dice loss (RD) function. On the BSDS500 dataset, the method proposed in this paper improves the F-score (ODS) of deep edge detectors such as HED and RCF by 2.8% and 2.1%, respectively. When edge detection evaluation is performed without non-maximal suppression (NMS), the F-score (ODS) and F-score (OIS) reach 0.801 and 0.816, respectively, outperforming other algorithms.

Reference | Related Articles | Metrics

Abstract （193）

PDF （210）

Select

Transformer Object Tracking Algorithm Based on Spatio-Temporal Template Update

WANG Qiang, LU Xianling

Journal of Frontiers of Computer Science and Technology 2023, 17 (9): 2161-2173. DOI: 10.3778/j.issn.1673-9418.2208034

Currently, the mainstream Transformer tracking algorithm only uses Transformer for feature enhancement and feature fusion, ignoring the Transformer??s feature extraction ability, and lacks an effective template update strategy for disturbing factors such as scale change and deformation during the tracking process. Aiming at above problems, a Transformer tracking algorithm based on spatio-temporal template updating and bounding box refining is proposed. Firstly, the improved Swin Transformer is used as the backbone network, and self-attention calculation and global information modeling are performed by shifting windows to enhance the feature extraction ability of the backbone network. Secondly, the Transformer encoder-decoder structure is used to fuse the template area and search area infor-mation, and the attention mechanism is used to establish feature correlation. At the same time, the template is dynamically updated according to the size of confidence score every fixed frame to adjust the appearance state of the template during the tracking process. Finally, the bounding box refinement module is used to refine the regression range of the bounding box and improve the accuracy of the algorithm. Performance comparison experiments with mainstream advanced algorithms have been performed on multiple challenging datasets. The success rate and precision on the OTB2015 dataset respectively reach 70.2% and 91.0%. The average overlap on the GOT-10k dataset is improved 0.02 compared with benchmark algorithm TransT, the success rate on the LaSOT dataset is increased by 0.024 compared with the benchmark algorithm TransT, and it can also perform real-time tracking at a tracking speed of 42 FPS.

Reference | Related Articles | Metrics

Abstract （290）

PDF （287）

Select

Face Recognition Method Based on Attention Mechanism and Curriculum Learning

WANG Haiyong, PAN Haitao, LIU Guinan

Journal of Frontiers of Computer Science and Technology 2023, 17 (8): 1893-1903. DOI: 10.3778/j.issn.1673-9418.2209111

Aiming at the problems that the facial features extracted from current face recognition algorithms are not distinguishable and the discrimination of difficult and easy samples is not enough, a face recognition algorithm combining attention mechanism and curriculum learning is proposed, which is called efficient cooperative attention and curriculum face (ECACFace). The algorithm proposes an efficient spatial channel attention module (ESCA) and integrates it into the basic module of the feature extraction network. The efficient channel attention module (ECA) is used to obtain the channel attention, and the spatial attention module is added after the ECA. On the basis of paying attention to the image channel information, the spatial attention is further obtained, and the face feature vector with richer information is obtained for face classification. At the same time, the loss function based on curriculum learning is introduced to distinguish the difficult and easy samples in the training process. The simple samples are trained in the early stage and the difficult samples are trained in the later stage to realize the discriminative sample learning. Training ECACFace based on lightweight network and shallow network on CASIA-WebFace dataset and it has an accuracy improvement of more than 1.5 percentage points compared with the original network. ECACFace based on deep network is trained on MS1MV2 dataset which has millions of data, and the accuracy tested on CPLFW dataset is increased by 1.14 percentage points compared with ArcFace. Experimental results show that the cooperation of ESCA module and the loss function based on curriculum learning can further improve the perfor-mance of face recognition.

Reference | Related Articles | Metrics

Abstract （217）

PDF （159）

Select

Low-Light Enhancement Method for Light Field Images by Fusing Multi-scale Features

LI Mingyue, YAN Tao, JING Huahua, LIU Yuan

Journal of Frontiers of Computer Science and Technology 2023, 17 (8): 1904-1916. DOI: 10.3778/j.issn.1673-9418.2202064

Light field images (LFI) record rich 3D structural and textural details of target scenes, which have great advantages in a wide range of computer vision tasks. However, LFI captured under low-light conditions always suffer from low brightness and strong noise, which may seriously degrade the quality of LFI. In this paper, a low-light LFI enhancement method by fusing multi-scale light field (LF) structural features is proposed , which adopts digital single-lens reflex camera (DSLR) images to supervise the generated enlightened LFI. To explore and exploit light field structural features, angular and spatial Transformers are introduced to extract LF structural features from LFI at different scales, i.e., the complementary information between sub-views and local and long-range dependencies within each sub-view. A recurrent fusion module is proposed to preserve the long-term memory of features at diffe-rent scales by using a long-short-term memory network, while adaptively aggregating LF structural features in the entire feature space through local and global fusion layers. A 4D residual reconstruction module is designed to reconstruct target LFI sub-views from the aggregated features. In additional, a dataset of low-light LFI and normal-light DSLR image pairs is constructed to train the proposed network. Extensive experiment demonstrates that the proposed network can effectively improve the quality of low-light LFI, and it obviously outperforms other state-of-the-art methods.

Reference | Related Articles | Metrics

Abstract （364）

PDF （427）

Select

Distortion Correction of Two-Dimensional Spectral Image Based on Neural Network

YIN Qian, WANG Yan, GUO Ping, ZHENG Xin

Journal of Frontiers of Computer Science and Technology 2023, 17 (7): 1622-1633. DOI: 10.3778/j.issn.1673-9418.2201072

Two-dimensional spectral images are generally distorted. Spectrum extraction operation is affected by such distortion, which reduces the quality of one-dimensional spectral data. Aiming at above problem, an effective correction method for the distorted two-dimensional spectral images based on neural network is proposed. Firstly, by extracting the center line of each fiber from the flat-field spectrum and fitting the equal-wavelength line at each specific wavelength from the calibration lamp spectrum, data that represent distortion characteristics from two-dimensional spectral images can be obtained. The training samples are thus constructed according to these two sets of feature lines. Secondly, a neural network model is then designed and trained to fit the relation between the pixel coordinates of the image before and after correction. Therefore, all pixel coordinate values of the corrected image can be calculated by the model. Finally, the flux values of the corrected image are filled one-to-one in accord with the flux value of the original distorted image. The correction experiments are carried out with the flat-field spectrum, calibration lamp spectrum, and object spectrum respectively. The spectral extraction results of the object spectrum before and after correction are compared. Experimental results prove that the method can correct the distorted two-dimensional spectral image effectively and improve quality of one-dimensional spectral data to an extent.

Reference | Related Articles | Metrics

Abstract （101）

PDF （71）

Select

Channel Pruning Method for Anchor-Free Detector

RAN Mengying, YANG Wenzhu, YIN Qunjie

Journal of Frontiers of Computer Science and Technology 2023, 17 (7): 1634-1643. DOI: 10.3778/j.issn.1673-9418.2111102

Aiming at the problems of large redundant parameters, high computational cost and slow detection speed of the anchor-free detector, a channel pruning method guided by double attention modules (CPDAM) is proposed to compress the anchor-free object detectors. The performance of the channel attention and spatial attention submodules is further improved using pooling and group normalization. The improved channel attention and spatial attention submodules are fused using a channel grouping strategy and are continuously trained to generate a scale value for each channel indicating the importance of the channel on the classification task. A global scale value is calculated using the scale values and the channel pruning of the backbone network is performed based on the evaluation of channel importance by this value. The improved anchor-free object detector is experimentally validated on PASCAL VOC, ImageNet and CIFAR-100 datasets, and the experimental results show that the number of parameters of CenterNet-ResNet101 before and after pruning is decreased from 6.995×107 to 2.238×107, and the FPS is increased from 27 to 46, with only 0.6 percentage points mAP loss.

Reference | Related Articles | Metrics

Abstract （188）

PDF （157）

Select

Object Tracking Algorithm with Channel and Anomaly Adaptation

JIANG Wentao, ZHANG Boqiang

Journal of Frontiers of Computer Science and Technology 2023, 17 (7): 1644-1657. DOI: 10.3778/j.issn.1673-9418.2111142

The spatial regularization tracking algorithm ignores channel and anomalies adaptation, which can easily lead to tracking failure in complex tracking scenarios such as illumination changes, occlusions, and motion blur. To address these problems, this paper proposes a tracking algorithm with adaptive channel and anomaly adaptation. Firstly, the appearance model of the target is built by the gradient and color features. Secondly, a channel weighting strategy is proposed. An adaptive channel regularizer is constructed to optimize the channel weights simultaneously in the training phase to reduce the impact of redundant information in multi-channel features and channel reliability changes on the tracking performance. Then, the adaptive anomaly regularizer is constructed to constrain the response map's abnormal changes and improve the tracker??s robustness when the region changes rapidly. Finally, the filter is correlated with the current sample to obtain the target scale and position in the detection stage. The peak-versus-noise smoothness index of the response map is calculated to judge the occlusion and exclude low-quality samples to enhance the abnormal adaptation when occlusion occurs. Comparative experiments with several mainstream methods are performed on the OTB50, OTB100 and TC-128 benchmarks. Experimental results show that the proposed algorithm has better robustness in complex scenarios such as illumination variation, occlusion, motion blur, etc. The tracking success rate is higher than similar algorithms, and the accuracy is higher.

Reference | Related Articles | Metrics

Abstract （110）

PDF （76）

Select

Detection Optimized Labeled Multi-Bernoulli Algorithm for Visual Multi-target Tracking

JIANG Lingyun, YANG Jinlong

Journal of Frontiers of Computer Science and Technology 2023, 17 (6): 1343-1358. DOI: 10.3778/j.issn.1673-9418.2109110

In a video multi-target tracking algorithm combining a detector with a tracker, the quality of detector affects the performance of the whole tracking algorithm. The missing and false detection will lead to the missing and false tracking of target, increase the fragmentation trajectory and increase the number of identity tag transformation. In order to solve these problems, this paper further optimizes the tracking algorithm in the framework of labeled multi-Bernoulli filter, designs a new measurement driven newborn target recognition method to capture newborn targets more quickly and accurately, designs a new target recognition method which can maintain the label invariance in a short time and reduce the fragmentation trajectory and label jumping, and introduces a new template selection strategy to avoid polluting the template by adding the occluded target to the template. Considering the labeled multi-Bernoulli filter is an online reasoning algorithm, parallelization is adopted to speed up the operation efficiency of the algorithm. The result shows that the proposed algorithm can effectively solve the problems of label jumping and inaccurate tracking by target occlusion. It is tested on the challenging MOT17 dataset and has good tracking effect compared with other relevant filtering methods.

Reference | Related Articles | Metrics

Abstract （201）

PDF （158）

Select

Dual-channel Quaternion Convolutional Network for Denoising

CAO Yiqin, RAO Zhechu, ZHU Zhiliang, WAN Sui

Journal of Frontiers of Computer Science and Technology 2023, 17 (6): 1359-1372. DOI: 10.3778/j.issn.1673-9418.2109042

The color image denoising based on deep learning usually uses convolution on each channel, and then merges multi-channel data into single channel data. This method does not fully consider the spectral correlation between color channels, which may casuse distortion of the denoising results. Quaternion convolution can solve this problem by treating a color pixel as a whole. However, a single quaternion convolutional network can not restore the image details well. To solve the problem, a dual-channel quaternion convolutional network (DQNet) for color random impulse noise removal is proposed. Firstly, according to the strategy of structure channel and color channel fusion, a structure detail restoration network based on dilated convolution is proposed to obtain structure and edge features, and quaternion convolution network is used to extract cross-channel color information. Secondly, aiming at the problem that convolution operation will cause partial global information loss, the long line connection is used to fuse the input noise image with the convolution results, and then, a feature enhancement module based on attention mechanism is designed to guide the network to extract potential noise features from complex background. Finally, the residual learning is used to achieve the restoration of color random impulse noise. Experimental results show that the proposed algorithm has better denoising performance, especially in moderate noise level or high noise level.

Reference | Related Articles | Metrics

Abstract （208）

PDF （295）

Select

Manifold Background-Aware Correlation Filter Target Tracking

YUAN Heng, ZHAO Xiaoyi

Journal of Frontiers of Computer Science and Technology 2023, 17 (6): 1373-1386. DOI: 10.3778/j.issn.1673-9418.2111023

In order to solve the problem that target is easy to lose in complex scenes such as similar background, occlusion, fast motion and motion blur, a new manifold background-aware correlation filter tracking algorithm is proposed. Firstly, the object tracking region is selected to extract the appearance features of target to establish object model. Then, taking target location as the origin, the manifold search area is constructed by using double expo-nential distribution. According to the target motion speed and direction, the manifold search range and search angle are dynamically adjusted. The background in the manifold search area is extracted, and the filter template is obtained by training the background information and the target feature model. Finally, the filter template is used to determine the target position and track the target. According to the speed and direction of the target motion, the manifold background-aware algorithm proposed adopts dynamic search mechanism to search, which covers the prob-ability space range of the target random motion. It can effectively search targets in complex scenarios, control calculation quantity, and improve the accuracy and speed of the target tracking algorithm. A great quantity of experiments are carried out on the standard dataset OTB100. Experimental results indicate that the proposed algorithm has good performance in accuracy, real time and robustness for target tracking under complex conditions such as similar background, occlusion, fast motion and motion blur in comparison with other mainstream algorithms.

Reference | Related Articles | Metrics

Abstract （113）

PDF （45）

Select

Object Detection Based on Improved YOLOX-S Model in Construction Sites

HU Hao, GUO Fang, LIU Zhao

Journal of Frontiers of Computer Science and Technology 2023, 17 (5): 1089-1101. DOI: 10.3778/j.issn.1673-9418.2205012

The existing YOLOX-S model has a low object detection average precision (AP) under the complex environmental disturbance in construction sites, which cannot well meet the needs of practical applications. In view of the above problems, the YOLOX-S model is improved from three aspects: the introduction of structural re-parameterization module, the introduction of convolutional attention module, and the introduction of AdamW optimization algorithm. Firstly, RepVGGBlock is used to decouple the model structure of the training phase and the testing phase. More residual structures are built in Backbone and Neck in the training phase to improve the model??s feature extraction capability. Secondly, the LKA (large kernel attention) module is used to extract local feature information and long-distance dependencies, providing more effective attention guidance for the subsequent calcu-lation of the position and size of bounding boxes, and improving the detection average precision. Thirdly, AdamW instead of Adam optimization algorithm is used to update the model parameters, which can further improve the model convergence results, and improve the model generalization ability. Finally, experimental results are carried out on the MOCS (moving objects in construction sites) dataset, which show that the improved YOLOX-S model??s average precision of detecting all targets is increased by 3.3 percentage points. And the average precision of detecting large objects, medium objects and small objects is increased by 3.2, 2.3, and 2.2 percentage points, respectively. At the same time, computational cost of the improved YOLOX-S model does not increase significantly, which can better meet the needs of object detection average precision in construction sites under the condition of real-time requirements.

Reference | Related Articles | Metrics

Abstract （313）

PDF （299）

Select

Object Detector with Residual Learning and Multi-scale Feature Enhancement

JIA Tianhao, PENG Li, DAI Feifei

Journal of Frontiers of Computer Science and Technology 2023, 17 (5): 1102-1111. DOI: 10.3778/j.issn.1673-9418.2109099

At present, deep learning has achieved great success in the field of computer vision, but small object detection is still a challenging problem in the field of object detection. Aiming at the problems of low resolution of small objects, blurred images, and less information carried, one object detector that introduces residual learning and multi-scale feature enhancement is proposed. Firstly, an enhanced feature mapping block based on residual learning is introduced into the backbone network. Through channel averaging and normalization, the model more focuses on the object area instead of the background, and it provides additional semantics information for the effective feature layer while taking into account the detection speed. Then the feature map increases the receptive field of the effective feature map through feature fusion block sensitive to context information, and fuses the shallow feature layer and the deep feature layer used for prediction to improve the detection performance at low resolution. Finally, a dual attention block is used to suppress background noise, and key features are embedded in attention. While preserving spatial information, it strengthens the information association between channels, thereby enhancing the expressive ability of features. In order to better detect small objects, the number of a priori boxes for shallow feature mapping is also adjusted. Experimental results show that on the dataset of PASCAL VOC2007, the detection accuracy (mAP) of the algorithm for 300×300 input scale is 79.9%, which is 2.7 percentage points higher than that of SSD, and the detection accuracy of small objects bird, bottle, chair, and plant is improved 5.1 percentage points, 7.5 percentage points, 3.9 percentage points, 7.2 percentage points，respectively. The detection accuracy (mAP) on the OAP self-made aerial dataset is 82.7%.

Reference | Related Articles | Metrics

Abstract （184）

PDF （208）

Select

Object Detection Algorithm Based on Channel Separation Dual Attention Mechanism

ZHAO Shan, ZHENG Ailing, LIU Zilu, GAO Yu

Journal of Frontiers of Computer Science and Technology 2023, 17 (5): 1112-1125. DOI: 10.3778/j.issn.1673-9418.2109115

For the problems of low detection accuracy and high leakage rate of small targets in two-stage object detection algorithm, a target detection algorithm based on channel separation and dual attention mechanism is proposed to improve the detection accuracy of small targets by improving the Faster+FPN backbone network. Firstly, in response to the problem that neural networks can not automatically learn the importance between features, a dual-attention mechanism is proposed to build a deep neural network in the channel separation process, and other techniques such as group convolution and dilated convolution are combined to reduce network parameters. Secondly, to address the problem of information loss caused by high resolution features passing through a deep CNN, the detail extraction module and channel attention feature fusion module are added to extract more detailed features. Finally, considering that the general loss function cannot be focused on assessing the confidence level of the target’s location, the KL scatter is combined with the loss function optimization to make the prediction distribution closer to the real distribution through training, and the problems associated with the direct use of neural networks for object detection are effectively addressed. PASCAL VOC2007, KITTI and Pedestrian datasets are adopted to train the network, and the proposed model is compared with several object detection algorithms. Experimental results show that the proposed algorithm in this paper can recognize images efficiently and has high detection accuracy.

Reference | Related Articles | Metrics

Abstract （193）

PDF （157）

Content of Graphics·Image in our journal