Content of Graphics and Image in our journal

        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Aerial Image Object Detection of UAV Based on Multi-level Feature Fusion
    XU Guangda, MAO Guojun
    Journal of Frontiers of Computer Science and Technology    2023, 17 (3): 635-645.   DOI: 10.3778/j.issn.1673-9418.2205114
    Aiming at the problem that there are many small target samples and few feature information in the aerial image of the unmanned aerial vehicle (UAV), which is susceptible to interference of background information, a multi-layer feature fusion UAV aerial image detection algorithm based on YOLOv5 (you only look once version 5) is proposed. Firstly, the high-resolution feature map of the shallow network is used to enrich the feature information of the small target. At the same time, the corresponding scale detection head is added to enhance the detection ability of small targets. Secondly, considering the differences in the contribution of different hierarchical features to small object detection tasks, a multi-level feature fusion layer is designed to integrate different sensory field information, the context information is aggregated by fusing different levels of feature maps, and the output weights of each level feature map are generated adaptively according to the train target sample size to optimize the expression ability of feature maps dynamically. Finally, in order to reduce the conflict of demand characteristic information in different tasks in the forecasting process, the decoupled head is used to replace the original coupled head. Thus, classification and positioning tasks can be better completed. Experimental results on the public dataset VisDrone show that the average mean accuracy of the method reaches 35.5%, which is 4.4 percentage points higher than that of the baseline method YOLOv5, and the detection accuracy is also higher than that of the mainstream detection method. The results show that the proposed method has good performance for small object detection tasks.
    Reference | Related Articles | Metrics
    Abstract368
    PDF228
    DnRFD:Progressive Residual Fusion Dense Network for Image Denoising
    CAO Yiqin, RAO Zhechu, ZHU Zhiliang, ZHANG Hongbin
    Journal of Frontiers of Computer Science and Technology    2022, 16 (12): 2841-2850.   DOI: 10.3778/j.issn.1673-9418.2103030

    The denoising method based on deep learning can achieve better denoising effect than the traditional method, but the existing deep learning denoising methods often have the problem of excessive computational complexity caused by too deep network. To solve this problem, a progressive residual fusion dense network (DnRFD) is proposed to remove Gaussian noise. Firstly, dense blocks are used to learn the noise distribution in the image, and the network parameters are greatly reduced while the local features of the image are fully extracted. Then, a progressive strategy is used to connect the shallow convolution features with the deep features to form a residual fusion network to extract more global features for noise. Finally, the output characteristic images of each dense block are fused and input to the reconstructed output layer to get the final output result. Experimental results show that, when the Gaussian white noise level is 25 and 50, the network can achieve higher mean PSNR and mean structural similarity, and the average time of denoising is half of the DnCNN method and one third of the FFDNet method. In general, the overall denoising performance of the network is better than that of the correlative comparison algorithms, and it can effectively remove the white Gaussian noise and natural noise in the image, and can restore the edge and texture details of the image better.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract316
    PDF204
    HTML65
    Intra-class Low-Rank Subspace Learning for Face Recognition
    CAI Yuhong, WU Xiaojun
    Journal of Frontiers of Computer Science and Technology    2022, 16 (12): 2851-2859.   DOI: 10.3778/j.issn.1673-9418.2104088

    As a simple and effective tool, linear regression has been widely used in pattern recognition. However, the direct projection from high-dimensional data to binary labels may not be flexible enough and suitable data rep-resentation for classification problems cannot be got. In order to solve this problem, the label relaxation method has been proposed. Although its effectiveness has been proven, the problems that it will increase the difference between targets from same class still exist. Therefore, an intra-class low-rank subspace learning (ICLRSL) method is pro-posed in this paper, which is different from the original linear regression and the label relaxation based method. Double projection matrices are used to perform intra-class low-rank subspace projection and label space projection respectively. The intra-class low-rank subspace obtained by ICLRSL is used as a bridge between the high-dimensional data space and the label space, and the preliminary coding of the data can be obtained, which has similar intra-class correlation with the final regression targets through intra-class low-rank constraint. At the same time, the row sparsity constraint ensures that the subspace projection focuses on the few features most relevant to the intra-class low-rank property, and reduces the negative impact of redundant information to some extent. Through the con-nection of intermediate subspace, on the one hand, it has more flexibility than directly learning a single projection matrix, and on the other hand, it can also obtain discriminative data representation. Experimental results on four public face datasets demonstrate the effectiveness of the ICLRSL algorithm.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract256
    PDF121
    HTML4
    Fast 3D-CNN Combined with Depth Separable Convolution for Hyperspectral Image Classification
    WANG Yan, LIANG Qi
    Journal of Frontiers of Computer Science and Technology    2022, 16 (12): 2860-2869.   DOI: 10.3778/j.issn.1673-9418.2103051

    In the process of feature extraction and classification of hyperspectral images using convolution neural networks, there are problems such as insufficient extraction of spatial spectrum features and too many layers of networks, which lead to large parameters and complex calculations. A lightweight convolution model based on fast three-dimensional convolution neural networks (3D-CNN) and depth separable convolutions (DSC) is proposed.Firstly, incremental principal component analysis (IPCA) is used to preprocess the dimension reduction of the input data. Secondly, the pixels of the input model are divided into small overlapped 3D small convolution blocks, and the ground label is formed on the segmented small blocks based on the center pixel. The 3D kernel function is used for convolution processing to form a continuous 3D feature map, retaining the spatial spectral features. 3D-CNN is used to extract spatial spectrum features at the same time, and then depth separable convolution is added to 3D convolution to extract spatial features again, which enriches spatial spectrum features while reducing the number of parameters, thus reducing the calculation time and improving the classification accuracy. The proposed model is verified on Indian Pines, Salinas Scene and University of Pavia public datasets, and compared with other classical classification methods. Experimental results show that this method can not only greatly save the learnable para-meters and reduce the complexity of the model, but also show good classification performance, in which the overall accuracy (OA), average accuracy (AA) and Kappa coefficient can all reach more than 99%.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract499
    PDF356
    HTML59
    SSD Object Detection Algorithm with Attention and Cross-Scale Fusion
    LI Qingyuan, DENG Zhaohong, LUO Xiaoqing, GU Xin, WANG Shitong
    Journal of Frontiers of Computer Science and Technology    2022, 16 (11): 2575-2586.   DOI: 10.3778/j.issn.1673-9418.2102001

    In order to further improve the performance of the SSD (single shot multibox detector) algorithm, and solve the problems of unbalanced feature map information and difficulty in small target recognition during multi-scale prediction of the SSD algorithm, in this paper, plug-and-play modules are designed to fully integrate the information contained in feature maps of different scales and model the relationships within feature maps to enhance the representation ability of feature maps. Firstly, a novel feature fusion method is designed to solve the problem of information disparity in cross-scale feature fusion. Secondly, according to the idea of pooling pyramid, a depth feature extraction module is designed to extract the information of different receptive fields, so as to improve the detection ability of the model to object of different sizes. Finally, in order to further optimize the feature map, highlight the effective information of the feature map for the current task, and establish the global long-distance relationship between pixels and the importance relationship between each channel, a lightweight attention module is proposed. Through the above mechanism, the structure of SSD model is modified in this paper, which effectively improves the detection accuracy and robustness of SSD algorithm. Extensive experiments have been conducted on PASCAL VOC datasets to verify the efficiency of the proposed method. On PASCAL VOC2007 test datasets, the proposed method improves 2.9 percentage points mean average precision (mAP) over SSD algorithm, while maintaining the ability of real-time detection.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract406
    PDF341
    HTML121
    Improved Siamese Adaptive Network and Multi-feature Fusion Tracking Algorithm
    LI Rui, LIAN Jirong
    Journal of Frontiers of Computer Science and Technology    2022, 16 (11): 2587-2595.   DOI: 10.3778/j.issn.1673-9418.2103044

    Aiming at the problem that tracking accuracy and tracking speed are difficult to balance in the current target tracking field. For example, a tracker based on correlation filtering can run at a very high speed, but the tracking accuracy is extremely low; a tracker based on deep learning can achieve high tracking accuracy, but the tracking speed is extremely low. On this basis, an improved Siamese adaptive network and multi-feature fusion target tracking algorithm are proposed. Firstly, the AlexNet network and the improved ResNet network are constructed on each branch of the Siamese network at the same time for feature extraction. Secondly, through end-to-end training at the same time, the tracking problem is decomposed into sub-problems of classifying each position label and returning to the bounding box. Finally, the shallow features and deep features are selected adaptively, and the target recognition and location are carried out based on multi-feature fusion. The proposed algorithm and some existing trackers are tested on the target tracking standard dataset. Experimental results show that the proposed algorithm can achieve high target tracking accuracy and success rate while ensuring tracking speed. At the same time, the algorithm has strong robustness in complex situations such as illumination changes, deformations, and background clutter.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract313
    PDF166
    HTML8
    Target Detection of SSD Aircraft Remote Sensing Images Based on Anchor Frame Strategy Matching
    WANG Haotong, GUO Zhonghua
    Journal of Frontiers of Computer Science and Technology    2022, 16 (11): 2596-2608.   DOI: 10.3778/j.issn.1673-9418.2105108

    Aiming at the problem that the accuracy and real-time performance of current aircraft remote sensing image target detection algorithms cannot be balanced, a target detection algorithm based on single shot MultiBox detector (SSD) is proposed for anchor frame scale densification and anchor frame strategy matching. The algorithm uses an improved deep residual network to replace the original feature extraction network of the SSD algorithm. Combined with the small-scale and dense features of aircraft remote sensing images, this paper redesigns the size and proportion of anchor frame and adds a feature layer containing two scales. Then, the anchor frame densification operation is performed on each feature layer to make the anchor frame laying density of the feature layer basically equal, and to improve the probability of matching the anchor frames of different scales to the real target. On the issue of the large gap in the number of positive sample anchor frames of different scales, an anchor frame strategy matching method that makes the number of positive sample anchor frames of different scales tend to the overall positive sample average is proposed, which improves the effectiveness of training and robustness of target detection to a certain extent. Related experiments are conducted on the aircraft remote sensing dataset, the average precision reaches 91.15%, and the frame per second is 33.4. The results show that the improved algorithm can not only increase the detection accuracy on the basis of adding fewer training parameters, but also retain the real-time detec-tability of the SSD algorithm.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract291
    PDF276
    HTML5
    Person Re-identification Based on Heterogeneous Branch Correlative Features Fusion
    CHEN Fan, PENG Li
    Journal of Frontiers of Computer Science and Technology    2022, 16 (11): 2609-2618.   DOI: 10.3778/j.issn.1673-9418.2103082

    Most of the multi-branch network based person re-identification (Person Re-ID) methods face the pro-blem of lack of heterogeneous features in the procedure of extraction of pedestrian features. In this paper, a novel Person Re-ID algorithm based on heterogeneous branch correlative features fusion is proposed. In the training stage, the attention-based OSNet is designed as the backbone sharing network, which can extract more significant and distinguished key features. The pedestrian features from branch network are segmented equally in the vertical axis. The relevant stripe features are extracted to utilize the synthesis information between different stripes. The heterogeneous features extraction module is designed to increase the structural diversity of the model for learning difference features. In the inference stage, multiple feature vectors are fused into a new feature vector, and the similarity judgment is performed. The effectiveness of the proposed algorithm is verified by experiments on Market-1501 and DukeMTMC-reID datasets, and the experiment results are analyzed. The proposed algorithm can improve the accuracy of Person Re-ID, and the features extracted by the model have strong robustness and discriminability.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract248
    PDF108
    HTML8
    Rapid and Ultra-lightweight Semantic Segmentation in Urban Traffic Scene
    SHI Min, SHEN Jialin, YI Qingming, LUO Aiwen
    Journal of Frontiers of Computer Science and Technology    2022, 16 (10): 2377-2386.   DOI: 10.3778/j.issn.1673-9418.2203015

    Recently, with the rapid development of automatic driving, more and more researchers begin to explore the lightweight of image semantic segmentation network and apply it to road traffic scenes. However, the existing semantic segmentation networks are usually difficult to deploy in edge devices with limited hardware resources due to the large number of parameters. Aiming at solving this problem, a rapid and ultra-lightweight dual attention lightweight network (DALNet) composed of channel attention bottleneck backbone (CABb) network and spatial attention decoder (SAD) module is proposed in this paper, which has outstanding performance in extracting the context semantic information and spatial information of the image. The CABb network is mainly composed of channel attention bottleneck (CABt) module. Split strategy is employed in CABt to separate feature channels and process multi-scale feature maps in parallel. And channel attention mechanism is introduced for channel fusion and multi-scale semantic information extraction. The spatial attention mechanism is adopted in SAD module to guide the decoder to upsample the feature maps using bilinear interpolation and recover the edge information and detail information of segmentation target. Experimental results show that the proposed DALNet has only 0.48 million parameters and achieves 74.1% and 70.1% mean intersection over union (mIoU) in the popular urban traffic datasets of Cityscapes and CamVid. With the resolution of 512×1024, DALNet achieves 74 frame/s inference speed on a GTX 1080Ti card, which meets the speed requirements of real-time semantic segmentation adequately.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract354
    PDF196
    HTML16
    Modified Algorithm of Capsule Network for Classifying Small Sample Image
    WANG Feilong, LIU Ping, ZHANG Ling, LI Gang
    Journal of Frontiers of Computer Science and Technology    2022, 16 (10): 2387-2394.   DOI: 10.3778/j.issn.1673-9418.2102026

    In order to address the problem that the capsule network can not classify complex small sample images effectively, a classification model is proposed on the basis of fusing the improved Darknet with the capsule network. Firstly, the Darknet is upgraded containing both the shallow level extractor and the deep level extractor. The shallow level extractor adopts a 5×5 convolution kernel to capture long-distance edge contour features and the deep level extractor uses a 3×3 convolution kernel to capture deeper semantic features. Then, the extracted edge features and semantic features are fused to preserve effective features of images. Next, the capsule network is used to vectorize these effective features to work out the loss of spatial representation. Finally, L2 regularization is added in the loss function to avoid the over-fitting. Experimental results show that, on the small sample dataset, the classification accuracy of the proposed model is 28.51 percentage points and 24.40 percentage points higher than that of the models of the capsule network and the DCaps respectively, 21.57 percentage points and 18.02 percentage points higher than that of the ResNet50 and the Xception respectively. Hence it suggests that the method proposed in this paper gains a better performance in classifying complex small sample images. Meanwhile, on the large sample dataset, the classification accuracy of the proposed model has also been improved to a certain extent.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract351
    PDF190
    HTML8
    Salient Object Detection with Feature Hybrid Enhancement and Multi-loss Fusion
    LI Chunbiao, XIE Linbo, PENG Li
    Journal of Frontiers of Computer Science and Technology    2022, 16 (10): 2395-2404.   DOI: 10.3778/j.issn.1673-9418.2104104

    To tackle the problem of missing features and poor regional consistency in existing salient object detec-tion algorithms, a salient object detection network which uses feature hybrid enhancement and multi-loss fusion based on fully convolutional neural network is proposed. The network includes a context-aware prediction module (CAPM) and a feature hybrid enhancement module (FHEM). First, the context-aware prediction module is used to extract the multi-scale feature information of the image, in which the spatial-aware module (SAM) is embedded to further extract the high-level semantic information of the image. Furthermore, the feature hybrid enhancement module is used to effectively integrate the global feature information and the detailed feature information generated by the prediction module, and the integrated feature is enhanced through embedded feature aggregation module (FAM). In addition, the multi-loss fusion method is used to supervise the network, which combines the binary cross-entropy (BCE) loss function, the structured similarity (SSIM) loss function and the proposed regional augmentation (RA) loss function. The network with the multi-loss fusion method can maintain the integrity of the foreground region and enhance the regional pixel consistency. The algorithm is verified on five image datasets with multiple salient objects and complex backgrounds. Experimental results demonstrate that the algorithm effectively improves the detection accuracy of saliency objects in complex scenes, and alleviates the problem of saliency map features missing and poor regional consistency.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract262
    PDF180
    HTML5
    Improved Two-Branch Capsule Network for Hyperspectral Image Classification
    ZHANG Haitao, CHAI Simin
    Journal of Frontiers of Computer Science and Technology    2022, 16 (10): 2405-2414.   DOI: 10.3778/j.issn.1673-9418.2102073

    The method based on the dual-channel capsule network extracts spectral information and spatial informa-tion separately in two channels, which not only retains the feature extraction method of the dual-channel convolu-tional neural network, but also improves the classification accuracy. However, when researchers train the capsule network, the dynamic routing process generates a large number of training parameters because the hyperspectral image (HSI) usually consists of hundreds of channels. To address this limitation, 1D and 2D constraint windows are proposed to reduce the number of capsules from two extraction channels. It uses the capsule vector group as the calculation unit to perform convolution operations and reduce the amount of parameters and computational com-plexity of the capsule network. Based on this parameter reduction optimization method, a new dual-branch capsule neural network (DuB-ConvCapsNet-MRF) is proposed and applied to the task of hyperspectral image classifica-tion. In addition, in order to further improve the classification accuracy, Markov random field (MRF) is introduced to smooth the spatial region and the final output is got. The results of performing ablation experiments on two repre-sentative hyperspectral image datasets and comparing the proposed method with six existing classification methods show that DuB-ConvCapsNet-MRF is superior to other methods in classification performance, and effectively re-duces the cost of training capsule network.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract253
    PDF104
    HTML7
    COVID-19 Detection Algorithm Combining Grad-CAM and Convolutional Neural Network
    ZHU Bingyu, LIU Zhen, ZHANG Jingxiang
    Journal of Frontiers of Computer Science and Technology    2022, 16 (9): 2108-2120.   DOI: 10.3778/j.issn.1673-9418.2105117

    In the detection of COVID-19, chest X-ray (CXR) images and CT scan images are two main technical methods, which provide an important basis for doctors' diagnosis. Currently, convolutional neural network (CNN) in detecting the COVID-19 medical radioactive images has problems of low accuracy, complex algorithms, and inability to mark feature regions. In order to solve these problems, this paper proposes an algorithm combining Grad-CAM color visualization and convolutional neural network (GCCV-CNN). The algorithm can quickly classify lung X-ray images and CT scan images of COVID-19-positive patients, COVID-19-negative patients, general pneumonia patients and healthy people. At the same time, it can quickly locate the critical area in X-ray images and CT images. Finally, the algorithm can get more accurate detection results through the synthesis of deep learning algorithms. In order to verify the effectiveness of the GCCV-CNN algorithm, experiments are performed on three COVID-19-positive patient datasets and it is compared with existing algorithms. The results show that the classification performance of the algorithm is better than the COVID-Net algorithm and the DeTraC-Net algorithm. The GCCV-CNN algorithm achieves a high accuracy of 98.06%, which is faster and more robust.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract333
    PDF392
    HTML64
    Co-segmentation of 3D Point Cloud Shape Clusters Based on Weakly Supervised Learning
    YANG Jun, LEI Xiwen
    Journal of Frontiers of Computer Science and Technology    2022, 16 (9): 2121-2131.   DOI: 10.3778/j.issn.1673-9418.2012036

    With the rapid development of 3D acquisition technology, point cloud data have gradually become one of the basic data formats to represent 3D shapes, which can retain more raw geometric information of the shape in 3D space. However, in the field of 3D point cloud shape segmentation, most deep learning network architectures rely on high-quality labeled data, which leads to high training cost. Therefore, in order to solve the problem of co-segmen-tation of 3D shape clusters by using training samples with a small number of labeled points, a consistent segmenta-tion of 3D point cloud shape clusters based on weakly supervised learning method is proposed. Firstly, the local neighborhood graph between points is established by K-nearest neighbor algorithm. Then, the feature of the point cloud model is extracted by local convolution method, and similar component matrices are constructed by using the extracted component features. Finally, an energy function reverse iteration is used to optimize the network weights to obtain the consistent segmentation results of the shape clusters. Experimental results show that the segmentation accuracy of this algorithm is 85.0% on ShapeNet Parts. Compared with the existing supervised learning algorithms, when the number of labeled points in the training samples is reduced to 10%, the proposed algorithm can still achieve similar or even better results, and compared with the current mainstream weakly monitoring algorithms, accuracy of the segmentation is further improved.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract294
    PDF180
    HTML8
    Image Semantic Segmentation Method with Fusion of Transposed Convolution and Deep Residual
    LIU Lamei, WANG Xiaona, LIU Wanjun, QU Haicheng
    Journal of Frontiers of Computer Science and Technology    2022, 16 (9): 2132-2142.   DOI: 10.3778/j.issn.1673-9418.2012063

    Aiming at the problems of low segmentation accuracy and high loss of deep learning image semantic segmentation methods, image semantic segmentation method with fusion of transposed convolution and deep residual is proposed. Firstly, in order to solve the problems of decreasing segmentation accuracy and slow convergence speed caused by increasing of the depth of neural network, a deep residual learning module is designed to improve the training efficiency and convergence speed of the network. After that, in order to make the feature map fusion more accurate in upsampling and feature extraction process, two upsampling methods of UpSampling2D and transposed convolution in the deep residual U-net model are merged to form a new upsampling module. Finally, to solve the over-fitting of the weights between training set and validation set in the process of network training, Dropout is introduced in the skip connection layer of the improved network, which enhances learning ability of the model. The performance of algorithm is proven on the CamVid datasets. The semantic segmentation accuracy of the algorithm reaches 89.93% and the loss is reduced to 0.23. Compared with U-net model, the verification set accuracy is improved by 13.13 percentage points, and the loss is reduced by 1.20, which is better than the current image semantic segmentation methods. The proposed model of image semantic segmentation combines the advantages of U-net, which makes the image semantic segmentation more accurate, with better effect, and effectively improves the robustness of algorithm.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract331
    PDF389
    HTML10
    Small Object Detection Algorithm Based on Weighted Network
    CHEN Haoran, PENG Li, LI Wentao, DAI Feifei
    Journal of Frontiers of Computer Science and Technology    2022, 16 (9): 2143-2150.   DOI: 10.3778/j.issn.1673-9418.2101040

    For the observation of a picture, people may instinctly pay more attention to the eye-catching objects in the picture. Usually such objects tend to occupy a larger proportion in the picture, which leads to small targets being ignored. Because the area where the small target is located is often a weak detection area, and the features that can be extracted in the process of extracting features by the detector are few and are easily lost in the process of feature information transmission after the feature is extracted, the effect of small target detection is not good. Therefore, on the basis of the single-order detector, this paper adds a cross-channel interaction mechanism to ensure the integrity of the information between layers, adopts target enhancement of training samples and designs a general loss function. Apart from this, this paper improves the sample weighting on the basis of the loss function to predict weight of samples. The mAP of this paper framework UWN (unified weighted network) on the VOC public dataset is 81.2% and the mAP on the self-made small target aerial photography dataset is 82.3%. Compared with the FSSD algorithm, some speed is sacrificed, and the accuracy is greatly improved.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract323
    PDF472
    HTML55
    XR-MSF-Unet: Automatic Segmentation Model for COVID-19 Lung CT Images
    XIE Juanying, ZHANG Kaiyun
    Journal of Frontiers of Computer Science and Technology    2022, 16 (8): 1850-1864.   DOI: 10.3778/j.issn.1673-9418.2203023

    The COVID-19 epidemic has threatened the human being. The automatic and accurate segmentation for the infected area of the COVID-19 CT images can help doctors to make correct diagnosis and treatment in time. However, it is very challenging to achieve perfect segmentation due to the diffuse infections of the COVID-19 to the patient lungs and irregular shapes of the infected areas and very similar infected areas to other lung tissues. To tackle these challenges, the XR-MSF-Unet model is proposed in this paper for segmenting the COVID-19 lung CT images of patients. The XR (X ResNet) convolution module is proposed in this model to replace the two-layer convolution operations of U-Net, so as to extract more informative features for achieving good segmentation results by multiple branches of XR. The plug and play attention mechanism module MSF (multi-scale features fusion module) is proposed in XR-MSF-Unet to fuse multi-scale features from different scales of reception fields, global, local and spatial features of CT images, so as to strengthen the detail segmentation effect of the model. Extensive experiments on the public COVID-19 CT images demonstrate that the proposed XR module can strengthen the capability of the XR-MSF-Unet model to extract effective features, and the MSF module plus XR module can effectively improve the segmentation capability of the XR-MSF-Unet model for the infected areas of the COVID-19 lung CT images. The proposed XR-MSF-Unet model obtains good segmentation results. Its segmentation perfor-mance is superior to that of the original U-Net model by 3.21, 5.96, 1.22 and 4.83 percentage points in terms of Dice, IOU, F1-Score and Sensitivity, and it defeats other same type of models, realizing automatic segmentation to the COVID-19 lung CT images.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract1113
    PDF374
    HTML123
    Salient Instance Segmentation via Multiscale Boundary Characteristic Network
    HE Li, ZHANG Hongyan, FANG Wanlin
    Journal of Frontiers of Computer Science and Technology    2022, 16 (8): 1865-1876.   DOI: 10.3778/j.issn.1673-9418.2012041

    Locating objects of interest is a basic task in the application of computer vision. Salient instance segmentation can obtain instance of interest by detecting visually significant objects and segmenting them at pixel level. In order to utilize the ability of feature separation between target object and its surrounding background, single stage salient-instance segmentation (S4Net) designs a new region feature extraction layer called ROIMasking. For the characteristics of convolutional neural network, repeated convolution and upsampling will result in the loss of boundary information, rough boundary segmentation and the reduction of segmentation accuracy. To solve this problem, using the target edge detection method, a new end-to-end salient instance segmentation via multiscale boun-dary characteristic network (MBCNet) based on S4Net is proposed. This method designs a multiscale boundary feature extraction branch. A boundary refinement block with hybrid dilation convolution and residual network structure is used to enhance the extraction of the instance boundary information. The MBCNet sharing layers realize to transfer the boundary information. At the same time, in order to promote the accuracy of segmentation, a new boundary-segmentation joint loss function is proposed, realizing synchronous training of target boundary feature ext-raction and instance segmentation in the same network. Experimental results show that, compared with S4Net, the mAP0.5 and mAP0.7 of the proposal are 88.90% and 67.94% on the saliency instance dataset, with the improvement of 2.20 percentage points and 4.24 percentage points, respectively.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract420
    PDF227
    HTML10
    Application Research of Improved U-shaped Network in Detection of Retinopathy
    YANG Zhiqiao, ZHANG Ying, WANG Xinjie, ZHANG Dongbo, WANG Yu
    Journal of Frontiers of Computer Science and Technology    2022, 16 (8): 1877-1884.   DOI: 10.3778/j.issn.1673-9418.2012011

    Fundus retinal blood vessel analysis and detection of exudates and bleeding points are important methods for judging the degree of diabetic retinopathy. Aiming at the problems such as poor segmentation effect of bifurcation and end points of microvessels, unclear exudate boundary, difficult segmentation of small and scattered bleeding points, an improved U-shaped network is proposed to extract more rich high-level features by improving the context extraction coding module. And in the feature encoding stage, a hybrid attention mechanism (HAM) is added to highlight the features of microvessels and lesions, and reduce the impact of background and noise. Experimental results show that the segmentation accuracy, sensitivity, specificity and AUC value of the proposed algorithm on the fundus retinal blood vessel segmentation dataset DRIVE are better than U-NET, CE-NET and other existing methods. The sensitivity is increased by 0.0146 compared with CE-Net network. On diabetic retinopathy lesion segmentation dataset DIARETDB1, the segmentation effect of exudates and bleeding points is better than U-NET, CE-NET and other existing methods, which can effectively assist doctors in diagnosis.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract348
    PDF406
    HTML35
    Medical Image Classification Algorithm Based on Weight Initialization-Sliding Window CNN
    AN Fengping, LI Xiaowei, CAO Xiang
    Journal of Frontiers of Computer Science and Technology    2022, 16 (8): 1885-1897.   DOI: 10.3778/j.issn.1673-9418.2011091

    Deep learning has the following problems in medical image classification application: first, it is impossible to construct a deep learning model hierarchy for medical image properties; second, the network initialization weight of the deep learning model has not been optimized. To this end, this paper starts from the perspective of network optimization, and then improves the nonlinear modeling ability of the network through optimization methods. Then, this paper proposes a new network weight initialization method, which alleviates the problem that the initialization theory of existing deep learning is limited by the nonlinear unit type, and increases the potential of neural network to deal with different visual tasks. At the same time, in order to make full use of the characteristics of medical images, this paper deeply studies the multi-column convolutional neural network framework and finds that through changing the number of features and the convolution kernel size of different levels of convolutional neural networks, it can construct different convolutional neural network models to better adapt to the medical characteristics of the medical images to be processed and train the obtained heterogeneous multi-column convolutional neural networks. Finally, the classification task of medical images is completed by the method proposed in this paper. Based on the above ideas, this paper proposes a medical classification algorithm based on weight initialization-sliding window fusion of multi-layer convolutional neural networks. The methods of this paper are used to classify breast mass classification, brain tumor tissue classification experiment and medical image database classification. The experimental results show that the proposed method not only has higher average accuracy than traditional machine learning and other deep learning methods, but also has better stability and robustness.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract594
    PDF222
    HTML31
    Target Tracking System Constructed by ELM-AE and Transfer Representation Learning
    YANG Zheng, DENG Zhaohong, LUO Xiaoqing, GU Xin, WANG Shitong
    Journal of Frontiers of Computer Science and Technology    2022, 16 (7): 1633-1648.   DOI: 10.3778/j.issn.1673-9418.2012028

    In the target tracking algorithm, the feature model’s ability to quickly learn image features and the ability to adapt to changes in target features during tracking has always been one of the main research directions of target tracking algorithms. Especially for discriminative target trackers based on image block learning, these two points have become decisive factors affecting the efficiency and robustness of the tracker. However, the performance of most existing similar algorithms on these two abilities cannot achieve satisfactory results. To solve this problem, an efficient and robust feature model is proposed. The feature model first uses extreme learning machine autoencoder (ELM-AE) to quickly perform random feature mapping on complex image features of the target and background image blocks, and then uses the transfer learning ability of transfer representation learning (TRL) to improve the adaptability of random feature space. The feature model is named transfer representation learning with ELM-AE (TRL-ELM-AE). Compared with original complex image features, this model can provide the classifier with more compact and expressive shared features, so that the classifier can learn and classify more quickly and efficiently. In addition, in the target tracking process, the target and background usually change continuously over time. Although the feature migration capability of TRL can already adapt to this, in order to further improve the robustness of the tracker, a strategy of dynamically updating training samples is adopted. Through a large number of experimental and analysis results on the 11 target tracking challenge scenarios proposed by OTB, it is proven that the proposed target tracker has significant advantages over the existing target tracker.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract470
    PDF602
    HTML51
    Multi-scale Selection Pyramid Networks for Small-Sample Target Detection Algorithms
    PENG Hao, LI Xiaoming
    Journal of Frontiers of Computer Science and Technology    2022, 16 (7): 1649-1660.   DOI: 10.3778/j.issn.1673-9418.2109081

    Target detection is to detect the specified target in the image. This technology has been widely used in automatic driving, face recognition and other fields, and has become a major research hotspot in the field of computer vision at home and abroad. Traditional target detection often requires a large number of annotated datasets, so it is a challenge to detect targets with only a small number of annotated samples. To address this problem, this paper proposes a multi-scale selection pyramid network algorithm for small sample target detection so that detection no longer relies on large-scale labeled datasets. Firstly, this paper designs a multi-scale selection pyramid network for small sample target detection, which consists of three components: context layer attention module, feature scale enhancement module, and feature scale selection module. Secondly, this paper performs feature fusion after the RoI features generated by the RPN network using maximum pooling and average pooling to improve the correlation between features. This paper uses feature subtraction to highlight the category information in the features, which can improve the sensitivity to new class parameters while maintaining the stability of the model to the sample parameters. Finally, the orthogonal mapping loss function is used to constrain the features before the classification layer, which can well measure the similarity between features even in the case of a small number of samples.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract430
    PDF612
    HTML77
    High Frame Rate Light-Weight Siamese Network Target Tracking
    LI Yunhuan, WEN Jiwei, PENG Li
    Journal of Frontiers of Computer Science and Technology    2022, 16 (6): 1405-1416.   DOI: 10.3778/j.issn.1673-9418.2012016

    With the widespread use of target tracking in many life scenarios, the demand for high-precision and high-speed tracking algorithms is also increasing. For some specific scenarios such as mobile terminals, embedded devices, etc., under the premise of relatively insufficient computing power of the device, it is still necessary to ensure that the tracker achieves good tracking accuracy and high-speed real-time tracking. A high frame rate tracking algorithm based on light-weight siamese network is proposed to solve this problem. Firstly, the light-weight convolutional neural network MobileNetV1 is selected, which is easy to be deployed in embedded devices, as the feature extraction backbone network, and deep network is more capable of extracting target features. Then, two optimization strategies are proposed to address the shortcomings of the backbone network, feature map is cropped and the total network step length is adjusted to make the backbone network suitable for tracking tasks. Finally, after the template branch of the siamese network, an ultra-lightweight channel attention module is added to weight important information that highlights the target characteristics. The proposed algorithm parameters are reduced by 59.8% in comparison with current mainstream algorithm SiamFC. Simulation and experimental results on the OTB2015 dataset show that the tracking accuracy is increased by 5.4%, and the algorithm can better cope with complex and changeable challenges in tracking tasks. Simulation and experimental results on the VOT2018 dataset show that the comprehensive index expected average overlap (EAO) is increased by 26.6%, and the average speed of the algorithm under NVIDIA GTX1080Ti is 120 frame/s, which achieves high frame rate real-time tracking.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract722
    PDF266
    HTML41
    Object Tracking Algorithm with Fusion of Multi-feature and Channel Awareness
    ZHAO Yunji, FAN Cunliang, ZHANG Xinliang
    Journal of Frontiers of Computer Science and Technology    2022, 16 (6): 1417-1428.   DOI: 10.3778/j.issn.1673-9418.2011057

    In order to solve the problem of drift or overfitting in the tracking process of depth feature description target, an object tracking algorithm combining multiple features and channel perception is proposed. The depth feature of the tracking target is extracted by the pre-training model, the correlation filter is built according to the feature, and the weight coefficient of each channel filter is calculated. According to the weight coefficient, the feature channel generated by the pre-training model is screened. The standard deviation of the retained features is calculated to generate statistical features and they are fused with the original features. The fused features are used to construct related filters and correlation operations are performed to obtain feature response maps to determine the location and scale of the target. Based on the depth feature of the tracking result area, the filter constructed by fusion feature is made sparse online updates. The algorithm in this paper and some current mainstream tracking algorithms are tested on the public datasets OTB100, VOT2015 and VOT2016. Compared with UDT, without affecting the tracking speed, the proposed algorithm has stronger robustness and higher tracking accuracy. The experimental results show that the proposed algorithm shows strong robustness under the challenges of target scale variation, fast motion and background clutters.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract432
    PDF362
    HTML7
    Multi-discriminator Generative Adversarial Networks Based on Selective Ensemble Learning
    SHEN Ruicai, ZHAI Junhai, HOU Yingzhen
    Journal of Frontiers of Computer Science and Technology    2022, 16 (6): 1429-1438.   DOI: 10.3778/j.issn.1673-9418.2011010

    Generative adversarial networks (GAN) are widely used in image generation. However, there is still a big gap between the samples generated by unsupervised and supervised networks. In order to solve the problems such as poor diversity, low quality and long training time of GAN in unsupervised environment, a new model with selective ensemble learning is proposed. Specifically, the discriminator in GAN is adopted in the form of integrated discrimination system, which can effectively reduce the discrimination error caused by the poor performance of single discriminator. Considering that if the integrated discriminant networks are set up in a unified network, each base discriminant network will tend to a form of expression in the model training. In order to encourage the diversity of discriminant network results and avoid the network falling into the same one, the discriminant networks with different network structures are set up. The majority voting strategy with dynamically adjusting the voting weight of the base discriminant network is introduced to vote the results of the integrated discriminant network. This has been shown to be effective in promoting model convergence and reducing experimental error significantly. Finally, the proposed model and the models in the same direction are evaluated with different evaluation indices under different datasets. Experimental results show that the proposed model is superior to several competitive models in terms of the diversity of generated samples, the quality of generated samples and the convergence speed of the model.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract431
    PDF534
    HTML73
    Image Segmentation Algorithm Combining Visual Salient Regions and Active Contour
    HE Yaru, GE Hongwei
    Journal of Frontiers of Computer Science and Technology    2022, 16 (5): 1155-1168.   DOI: 10.3778/j.issn.1673-9418.2011043

    When the traditional regional active contour model is used to segment the weak edge image, the evolution curve is subject to background interference, and it is easy to fall into the local extreme value, which leads to slow evolution speed. Moreover, as the local term only considers the spatial information, it cannot better retain the target boundary, which affects the segmentation accuracy. To solve the above problems, firstly, this paper uses the improved saliency detection algorithm to preprocess the original image, obtains the target candidate regions and automatically sets the initial contour curve. In addition, the obtained priori information of the target is combined with the bitmap with the maximum contrast in the image to be segmented. An adaptive symbolic function is designed to weight the optimized LoG (Laplacian of Gaussian) energy terms, in a linear fashion into RSF (region-scalable fitting) model, improving the adaptive ability of the model. Secondly, a new local grayscale measure is proposed, which is combined with local kernel function to improve the local energy term. It can improve the sensitivity of the model at the weak edge, and accurately locate the target boundary. Experimental results show that this model can automatically set the initial contour and effectively retain the target edge details. Visual and quantitative experimental results show that this model is superior to some mainstream active contour models.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract222
    PDF71
    HTML12
    Deep Convolutional Neural Network Algorithm Fusing Global and Local Features
    CHENG Weiyue, ZHANG Xueqin, LIN Kezheng, LI Ao
    Journal of Frontiers of Computer Science and Technology    2022, 16 (5): 1146-1154.   DOI: 10.3778/j.issn.1673-9418.2104106

    In order to further improve the accuracy of facial expression recognition, a deep convolutional neural network algorithm fusing global and local features (GL-DCNN) is proposed. The algorithm consists of two improved convolutional neural network branches, global branch and local branch, which are used to extract global features and local features respectively. The features of the two branches are weighted and fused, and the fused features are used for classification. Firstly, global features are extracted. The global branch is based on transfer learning, and the improved VGG19 network model is used for feature extraction. Secondly, local features are extracted. In the local branch, central symmetric local binary pattern (CSLBP) algorithm is used for the first feature extraction, and the local texture information of the original image is obtained, which is input into shallow convolutional neural network for the second feature extraction, so that the local features related to facial expressions are automatically extracted. Thirdly, two cascaded fully connected layers are used to reduce the dimension of the features of the two branches, and different weights are assigned to them for weighted fusion. Finally, softmax classifier is used for classification. The experiment is validated on CK+ and JAFFE datasets, and the classification accuracy is over 95% and 93%, respectively. Compared with other five algorithms, this algorithm has a good overall performance, good recognition effect and good robustness, which can provide an effective basis for facial expression recognition.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract731
    PDF190
    HTML33
    Fully Convolutional Neural Network with Attention Module for Semantic Segmentation
    OU Yangliu, HE Xi, QU Shaojun
    Journal of Frontiers of Computer Science and Technology    2022, 16 (5): 1136-1145.   DOI: 10.3778/j.issn.1673-9418.2105095

    A fully convolutional neural network is a powerful end-to-end model that is widely used in the field of semantic segmentation and has achieved great success. Researchers have proposed a series of methods based on a fully convolutional neural network. However, with the continuous subsampling of convolutions and pooling, the image contextual information will be lost, affecting the pixel-level classification. To solve the problem of context loss in a fully convolutional network, a pixel-based attention method is proposed, which calculates the relationship bet-ween high-level feature map pixels to obtain global information and enhance the correlation between pixels com-bined with atrous spatial pyramid pooling to further extract the image feature information. To solve the problem of pixel loss in the high-level feature map of an image, an attention method based on different levels of the image is proposed. This method uses the information in the high-level feature map as a guide to mine the hidden information in the low-level feature map and then fuses it with the high-level feature map to make full use of the high-level feature map and the low-level feature map information. In the experiment, the effectiveness of the proposed method is verified by comparing the effects of different modules on the segmentation results of a fully convolutional neural network. At the same time, experiments are carried out on the recognized image semantic segmentation dataset called Cityscapes and compared with the current advanced networks. The results show that the proposed method has advantages in both objective evaluation indicators and subjective effects, and achieves 69.3% accuracy in the Cityscapes official website test set. The performance is 3 to 5 percentage points higher than that of several recent advanced networks.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract326
    PDF158
    HTML27
    Micro-expression Recognition Convolutional Network for Dual-stream Temporal-Domain Information Interaction
    ZHU Weijie, CHEN Ying
    Journal of Frontiers of Computer Science and Technology    2022, 16 (4): 950-958.   DOI: 10.3778/j.issn.1673-9418.2011039

    The current mainstream deep learning methods used for micro-expression recognition have the problem of very scarce experimental data, which leads to the limited knowledge acquisition of neural networks in the learning process and it is difficult to improve the accuracy. The dual-stream network temporal-domain information interaction micro-expression recognition method is proposed, and a dual-stream temporal-domain information inter-action neural convolution network (dual scale temporal interactive convolution neural network, DSTICNN), is constructed to process the micro-expression sequence, and then realize automatic recognition of micro-expressions. The algorithm improves the final recognition rate by improving the deep mutual learning strategy to guide the network to learn different temporal domain information of the same image sequence. The algorithm builds DSTICNN32 and DSTICNN64 based on different temporal scales, and improves the loss function of deep mutual learning in the training phase. At the same time, mean square error loss is added to the feature maps of the two-stream network close to the decision-making layer, and finally cross-entropy loss, JS divergence loss and mean square error loss are used to jointly supervise training, so that the two-stream network learns and strengthens each other and improves their respective prediction samples ability. The algorithm is tested on CASME Ⅱ and SMIC databases, and the results show that the algorithm in this paper can effectively improve the recognition rate of micro-expressions. The recognition rate is improved by 6.83 percentage points on the CASME Ⅱ database and 1.65 percentage points on the SMIC database. The overall algorithm is better than existing algorithms.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract417
    PDF294
    HTML14
    Fine-Grained Image Classification Model Based on Bilinear Aggregate Residual Attention
    LI Kuankuan, LIU Libo
    Journal of Frontiers of Computer Science and Technology    2022, 16 (4): 938-949.   DOI: 10.3778/j.issn.1673-9418.2010031

    Due to diversity in local information between categories is relatively subtle in fine-grained image classification tasks, it often causes problems such as insufficient ability of the model to capture discriminative features, and poor interdependence between channels when extracting features. As a result, the network cannot learn the salient and diverse image category features, which ultimately affects the classification performance. Therefore, this paper proposes a bilinear aggregate residual attention network (BARAN). In order to improve the feature capture ability of the network, firstly, based on the original bilinear convolutional neural networks model (B-CNN), the original feature extraction sub-network is transformed into a more learning aggregate residual network. And then, a distraction module is embedded in each aggregate residual block, so that the network focuses on integrating cross-dimensional features, and strengthens the degree of close association between channels in the feature acquisition process. Finally, the fused bilinear feature map is input into the cross-channel attention module, and the discriminative and distinctive sub-components included in the cross-channel attention module are used to further learn more subtle, diverse and mutually exclusive local inter-classes confusing information. Experimental results show that the classification accuracy on the fine-grained image datasets of CUB-200-2011, FGVC-Aircraft and Stanford Cars is 87.9%, 92.9% and 94.7%, which is superior to primary mainstream methods in classification performance. Moreover, the improvement is 0.038, 0.088 and 0.034 compared with the original B-CNN model.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract587
    PDF386
    HTML41
    Deep Small Object Detection Algorithm Integrating Attention Mechanism
    ZHAO Pengfei, XIE Linbo, PENG Li
    Journal of Frontiers of Computer Science and Technology    2022, 16 (4): 927-937.   DOI: 10.3778/j.issn.1673-9418.2108087

    Insufficient feature extraction of the backbone network and lack of semantic information in the shallow convolution layer often lead to poor detection results on small objects. In order to improve the accuracy and robustness of small object detection, this paper proposes a deep small object detection algorithm that integrates attention mechanism. Firstly, to address the problem of insufficient feature extraction capability of the backbone network, Darknet-53 is selected as the network of feature extraction, and a new grouped residual connection is proposed to replace the residual connection structure in the original Darknet-53. This forms a new enhanced backbone network named I-Darknet53. This grouped residual structure can effectively increase the size of the receptive field by interweaving the feature information of different channels. Secondly, in the multi-scale detection phase, a shallow feature enhancement network is proposed to obtain shallow enhanced features by fusing the shallow layer and deep layer. The network including feature enhancement module and an efficient feature fusion strategy guided by channel attention mechanism is used to improve the lack of semantic information of shallow features. Experimental results show that the proposed algorithm has better performance than the SSD algorithm on PASCAL VOC dataset. When the input image size is 300 × 300, the average accuracy of the proposed model is 80.2%; when the input image size is 500 × 500, the average accuracy of the proposed model is 82.3%. In addition, it can effectively improve the detection accuracy of small objects under the premise of ensuring the detection speed.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract680
    PDF540
    HTML54
    Gradient-Guided Object Tracking Algorithm with Channel Selection
    CHENG Shilong, XIE Linbo, PENG Li
    Journal of Frontiers of Computer Science and Technology    2022, 16 (3): 649-660.   DOI: 10.3778/j.issn.1673-9418.2010029

    In object tracking task, the target object to be tracked is arbitrary, and there may be similar distractor around the target, which often leads to the target features extracted by the pre-trained network not fully applicable to the tracked target. To solve the above-mentioned issues, the gradient-guided object tracking algorithm is proposed in the Siamese tracking framework. Firstly, the pre-trained network is used to extract the features of object. To eliminate the interference of similar objects, the switch-penalty loss function is used to impose penalty operation on similar objects in the background. Secondly, in the feature channel selection stage, the most expressive feature channels are selected according to the gradient information of back propagation in loss function. Finally, in the part of cross correlation between template branch and search branch, the accurate target position is obtained by using multi-channel cross correlation of the weighted response score map. The proposed algorithm is compared with the mainstream algorithms on OTB and VOT public datasets. Experimental results show that the proposed algorithm has good anti-background interference ability and robustness. The algorithm achieves the performance of the mainstream tracking algorithms in the main tracking indicators.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract281
    PDF130
    HTML12
    Attention and Texture Feature Enhancement for Person Re-identification
    LI Jie
    Journal of Frontiers of Computer Science and Technology    2022, 16 (3): 661-668.   DOI: 10.3778/j.issn.1673-9418.2010046

    In view of the low accuracy of existing person re-identification to deal with low image resolution, illuminative difference, posture and perspective diversity, this paper proposes a multi-task pedestrian recognition algorithm based on spatial attention and texture feature enhancement. The spatial attention module designed by the algorithm pays more attention to the potential image areas related to the pedestrian attributes, which further explores attribute features. The texture feature enhancement module of the person re-identification network reduces the interference of light, occlusion on person re-identification by fusing the global and local features corresponding to different spatial levels. Finally, the multi-stage weighted loss function integrates the attribute features and pedestrian features to avoid the decrease of mean average precision caused by attribute heterogeneity. Experimental results show that the mean average precision can achieve 81.1% and 70.1% respectively on the Market-1501 and DukeMTMC-reID datasets.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract383
    PDF202
    HTML19
    Research on Edge-Guided Image Repair Algorithm
    JIANG Yi, XU Jiajie, LIU Xu, ZHU Junwu
    Journal of Frontiers of Computer Science and Technology    2022, 16 (3): 669-682.   DOI: 10.3778/j.issn.1673-9418.2009091

    The continuous development of deep learning technology has provided new ideas for image repair research over the years, and the image repair methods can understand the semantic information of image through the study of massive image data. Although the existing image repair methods have been able to generate desirable repair results, it is insufficient to deal with the details of missing part from the image when facing the image with more complex missing part, thus the restoration results are excessively smooth or blurry, and the complex structural information that misses from the image cannot be repaired well. In order to solve the issues above, an edge-guided image repair method based on generative adversarial networks technology and the corresponding algorithm are proposed in this paper, and the repair process is divided into two stages. First, the edge repair model is trained to generate more realistic edge information of the missing area. Then, the content generation model is trained to fill in the missing content information based on the edge information that has been repaired. Lastly, the experimental verification is conducted on the CelebA dataset and ParisStreet-View dataset to compare with the Shift-Net model,deep image prior (DIP) model and field factorization machine (FFM) model, and the visual qualitative analysis and quantitative index analysis are carried out on the experimental repair results. The experimental results prove that the repair method proposed in this paper for the missing complex structure information in the image is superior to the existing methods, and also reflect the edge information plays a crucial role in image repair.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract460
    PDF297
    HTML13
    Application of Improved U-Net in Retinal Vessel Segmentation
    GU Penghui, XIAO Zhiyong
    Journal of Frontiers of Computer Science and Technology    2022, 16 (3): 683-691.   DOI: 10.3778/j.issn.1673-9418.2010061

    In order to solve the problems that it is difficult to accurately identify the vascular boundary and the low contrast between the blood vessel and the background in fundus retinal vascular segmentation, an encoder-decoder algorithm is proposed. In order to improve the segmentation ability of the algorithm at the vascular boundary, the global convolutional network (GCN) and boundary refinement (BR) are used to replace the traditional convolution layer in the coding part, and the improved position attention (PA) module and channel attention (CA) module are introduced in the jump connection part. The aim is to increase the contrast between the blood vessels and the background, so that the network can better separate the blood vessels from the background. In addition, in order to improve the performance of the network, the dense convolution network is used in the last layer of the coding part to solve the problem of network overfitting, and in order to solve the problem of gradient explosion and gradient disappearance to a certain extent, in each layer of the decoding part, the convolution long-short memory network is used to improve the ability of the network to obtain feature information. Tested on the common datasets DRIVE and CHASE_DB1, the sensitivity, specificity, accuracy, F1-Score and AUC are used as evaluation indicators, in which the accuracy and AUC reach 96.99%, 98.77% and 97.51%, 99.01%, respectively. This algorithm can effectively improve the accuracy of blood vessel segmentation in fundus image.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract454
    PDF475
    HTML14
    Micro-cracks Detection of Solar Cells Based on Few Shot Samples with Multi-loss
    NA Zhixiong, FAN Tao, SUN Tao, XIE Xiangying, LAI Guangzhi
    Journal of Frontiers of Computer Science and Technology    2022, 16 (2): 458-467.   DOI: 10.3778/j.issn.1673-9418.2111036

    Aiming at the problem of micro-cracks detection of photovoltaic modules in industrial production line, in order to reduce labor cost, improve detection efficiency and quickly adapt to the micro-cracks detection of new products with the support of a few number of samples, a micro-cracks detection algorithm of solar cells based on few shot samples with multi-loss is proposed. Firstly, in order to enrich the semantic information extracted by convolutional neural network, Transformer’s multi-head attention mechanism is introduced to alleviate the impact of the distribution difference of each batch of products on crack detection, and promote the model to focus on crack information from diversified products. Secondly, the strategy of combining multiple loss functions to constrain the model training is used to optimize feature extraction. On the basis of direct classification loss, the triplet loss is used to shorten the feature distance between cracked samples. In addition, the implicit classification loss is designed to adapt to the characteristics of type differences between the two types of cells with or without cracks, and fully learn the diversity of historical component data. This algorithm can use a small number of samples to quickly extract the features of new components and detect micro-crack defects of new products accurately. The experimental results on the actual industrial production data sets show that the recall of the algorithm can be improved by 10 percentage points compared with other baseline models. This algorithm can effectively alleviate the problem of scarce samples with hidden cracks and greatly reduce the cost of frequent data labeling and model training for each batch of new products.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract335
    PDF211
    HTML10
    Hyperspectral Change Detection Using Collaborative Sparsity and Nonlocal Low-Rank Tensor
    ZHAN Tianming, SONG Bo, SUN Le, WAN Minghua, YANG Guowei
    Journal of Frontiers of Computer Science and Technology    2022, 16 (2): 448-457.   DOI: 10.3778/j.issn.1673-9418.2009009

    Hyperspectral image change detection can provide timely change information on the surface of the earth, which is essential for urban and rural planning and management. Due to the higher spectral resolution, hyperspectral images are often used to detect finer changes. Aiming at the problem of change detection by using hyperspectral image, a hyperspectral change detection method based on collaborative sparsity and nonlocal low-rank tensor is proposed. This method first obtains hyperspectral differential image at different time points, and then extracts different nonlocal similar block tensor clusters according to the nonlocal distribution characteristics of the image blocks in the differential image. Then, based on collaborative sparse regularization and low-rank regularization, a change detection model using collaborative sparsity and non-local low-rank tensor is established, and the representa-tion coefficient is obtained by solving the model using the alternating direction method of multipliers. Finally, the projection residuals of the tensor in different categories are obtained according to the representation coefficients, and then the projection residual minimization criterion is judged whether the tensor has changed. Experiments on Farm-land and Urban area in San Francisco City datasets demonstrate that the proposed method can achieve much better changes detection accuracy.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract520
    PDF383
    HTML10
    SSD Object Detection Algorithm with Effective Fusion of Attention and Multi-scale
    WANG Yanni, YU Lixian
    Journal of Frontiers of Computer Science and Technology    2022, 16 (2): 438-447.   DOI: 10.3778/j.issn.1673-9418.2105048

    In order to solve the problems of weak effective information of feature map and high miss rate of difficult objects in the traditional single shot multibox detector (SSD) for multi-scale object detection, an improved SSD object detection algorithm is proposed. Firstly, a lightweight attention mechanism is introduced at the output of the network feature map. Through non-dimensionality reduction, local cross-channel interaction and adaptive core size selection, it can effectively highlight the key information in the feature map while maintaining the original amount of network computation. This module helps to enhance the difference between background information and object information, and can effectively improve the performance of the network without increasing the complexity of the network. Then, a new feature fusion module is designed to effectively fuse features of different scales. It can make the shallow feature layer not only contain rich detailed information, but also make full use of contextual semantic information. The multi-scale fusion module helps to enrich the feature map information and improve the detection performance of the network for difficult objects. The experimental results on the PASCAL VOC dataset show that the improved network has a detection accuracy of 79.6% on the PASCAL VOC2007 test set, which is increased by 2.4 percentage points than the original SSD algorithm, and increased by 4.7 percentage points on the occlusion target dataset. It is proven that the improved method has certain timeliness and robustness.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract570
    PDF442
    HTML217
    Improved YOLOv5 Traffic Light Real-Time Detection Robust Algorithm
    QIAN Wu, WANG Guozhong, LI Guoping
    Journal of Frontiers of Computer Science and Technology    2022, 16 (1): 231-241.   DOI: 10.3778/j.issn.1673-9418.2105033

    Traffic light detection algorithm, a critical procedure for realization of automatic driving, is directly related to the driving safety of intelligent vehicles. However, due to the small size of traffic lights and complicated environment, the algorithm research meets plenty of difficulties. This paper puts forward a traffic light detection algorithm based on optimized YOLOv5. Firstly, it uses a visible label ratio to determine the model input. Secondly, the ACBlock structure is introduced to increase the feature extraction ability of the backbone network; the SoftPool is designed to reduce the sample loss of the backbone network and the DSConv convolution kernel is used to reduce the model parameters. Finally, a memory feature fusion network is designed to efficiently utilize high level semantic information and low level features. As a result, the improvement of model input and backbone network directly improves the feature extraction ability of the model in complex environment; the improvement of feature fusion network enables the model to make full use of feature information and increase the accuracy of target positioning and boundary regression. Experimental results show that, the proposed algorithm achieves 74.3% AP and 111 frame/s detection speed on BDD100K, which is 11.0 percentage points higher than the AP of YOLOv5. In Bosch data set, 84.4% AP and 126 frame/s detection speed are obtained, which is 9.3 percentage points higher than the AP of YOLOv5. The robustness test results show that the proposed algorithm has significantly improved the detection ability of tar-gets in a variety of complex environments, and the robustness is increased to achieve high-precision real-time detection.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract1138
    PDF730
    HTML272
    Combining Cascaded Network and Adversarial Network for Object Detection
    LI Zhixin, CHEN Shengjia, ZHOU Tao, MA Huifang
    Journal of Frontiers of Computer Science and Technology    2022, 16 (1): 217-230.   DOI: 10.3778/j.issn.1673-9418.2007059

    Recognizing multi-scale objects and objects with occlusions is a key and difficult point of task in object detection. In order to detect objects with different sizes, the object detector usually uses the hierarchical structure of multi-scale feature map constructed by convolutional neural network (CNN). However, due to the small convolution layer of the bottom feature map, the top-down structure lacks the detailed information needed to capture the features of small object. The performance of these object detectors is limited. Therefore, based on the Faster R-CNN (region-convolutional neural network) framework, this paper proposes Collaborative R-CNN. This paper designs a cascaded network structure that integrates multi-scale feature maps to generate deeply fused feature information and thereby improving the ability to detect small objects. Moreover, the quantization in the RoIPooling process greatly limits the recognition ability of small objects. In order to further improve the robustness of the method, a multi-scale RoIAlign is designed to eliminate such quantization, and the ability of network to detect objects with different scales is improved by multi-scale pooling. Finally, this paper combines an adversarial network with the proposed network to generate training samples with occlusions, significantly improving the classification ability of the model, and robustness to detect occlusions. Experimental results for the PASCAL VOC 2012 and PASCAL VOC 2007 datasets demonstrate the superiority of proposed approach relative to several state-of-the-art approaches.

    Table and Figures | Reference | Related Articles | Metrics
    Abstract401
    PDF233
    HTML15