Loading...

Table of Content

    2024-04-01, Volume 18 Issue 4
    Frontiers·Surveys
    Review of Research on 3D Reconstruction of Dynamic Scenes
    SUN Shuifa, TANG Yongheng, WANG Ben, DONG Fangmin, LI Xiaolong, CAI Jiacheng, WU Yirong
    2024, 18(4):  831-860.  DOI: 10.3778/j.issn.1673-9418.2305016
    Abstract ( )   PDF (1324KB) ( )  
    References | Related Articles | Metrics
    As static scene 3D reconstruction algorithms become more mature, dynamic scene 3D reconstruction has become a hot and challenging research topic in recent years. Existing static scene 3D reconstruction algorithms have good reconstruction results for stationary objects. However, when objects in the scene undergo deformation or relative motion, their reconstruction results are not ideal. Therefore, developing research on 3D reconstruction of dynamic scenes is essential. This paper first introduces the related concepts and basic knowledge of 3D reconstruction, as well as the research classification and current status of static and dynamic scene 3D reconstruction. Then, the latest research progress on dynamic scene 3D reconstruction is comprehensively summarized, and the reconstruction algorithms are classified into dynamic 3D reconstruction based on RGB data sources and dynamic 3D reconstruction based on RGB-D data sources. RGB data sources can be further divided into template based dynamic 3D reconstruction, non rigid motion recovery structure based dynamic 3D reconstruction, and learning based dynamic 3D reconstruction under RGB data sources. The RGB-D data source mainly summarizes dynamic 3D reconstruction based on learning, with typical examples introduced and compared. The applications of dynamic scene 3D reconstruction in medical, intelligent manufacturing, virtual reality and augmented reality, and transportation fields are also discussed. Finally, future research directions for dynamic scene 3D reconstruction are proposed, and an outlook on the research progress in this rapidly developing field is presented.
    Survey on Deep Learning in Oriented Object Detection in Remote Sensing Images
    LAN Xin, WU Song, FU Boyi, QIN Xiaolin
    2024, 18(4):  861-877.  DOI: 10.3778/j.issn.1673-9418.2308031
    Abstract ( )   PDF (995KB) ( )  
    References | Related Articles | Metrics
    The objects in remote sensing images have the characteristics of arbitrary direction and dense arrangement, and thus objects can be located and separated more precisely by using inclined bounding boxes in object detection task. Nowadays, oriented object detection in remote sensing images has been widely applied in both civil and military defense fields, which shows great significance in the research and application, and it has gradually become a research hotspot. This paper provides a systematic summary of oriented object detection methods in remote sensing images. Firstly, three widely-used representations of inclined bounding boxes are summarized. Then, the main challenges faced in supervised learning are elaborated from four aspects: feature misalignment, boundary discontinuity, inconsistency between metric and loss and oriented object location. Next, according to the motivations and improved strategies of different methods, the main ideas and advantages and disadvantages of each algorithm are analyzed in detail, and the overall framework of oriented object detection in remote sensing images is summarized. Furthermore, the commonly used oriented object detection datasets in remote sensing field are introduced. Experimental results of classical methods on different datasets are given, and the performance of different methods is evaluated. Finally, according to the challenges of deep learning applied to oriented object detection in remote sensing images tasks, the future research trend in this direction is prospected.
    Review of Research on Rolling Bearing Health Intelligent Monitoring and Fault Diagnosis Mechanism
    WANG Jing, XU Zhiwei, LIU Wenjing, WANG Yongsheng, LIU Limin
    2024, 18(4):  878-898.  DOI: 10.3778/j.issn.1673-9418.2307005
    Abstract ( )   PDF (779KB) ( )  
    References | Related Articles | Metrics
    As one of the most critical and failure-prone parts of the mechanical systems of industrial equipment, bearings are subjected to high loads for long periods of time. When they fail or wear irreversibly, they may cause accidents or even huge economic losses. Therefore, effective health monitoring and fault diagnosis are of great significance to ensure safe and stable operation of industrial equipment. In order to further promote the development of bearing health monitoring and fault diagnosis technology, the current existing models and methods are analyzed and summarized, and the existing technologies are divided and compared. Starting from the distribution of vibration signal data used, firstly, the relevant methods under uniform data distribution are sorted out, the classification, analysis and summary of the current research status are carried out mainly according to signal-based analysis and data-driven-based, and the shortcomings and defects of the fault detection methods in this case are outlined. Secondly, considering the problem of uneven data acquisition under actual working conditions, the detection methods for dealing with such cases are summarized, and different processing techniques for this problem in existing research are classified into data processing methods, feature extraction methods, and model improvement methods according to their different focuses, and the existing problems are analyzed and summarized. Finally, the challenges and future development directions of bearing fault detection in existing industrial equipment are summarized and prospected.
    Deep Learning-Based Infrared and Visible Image Fusion: A Survey
    WANG Enlong, LI Jiawei, LEI Jia, ZHOU Shihua
    2024, 18(4):  899-915.  DOI: 10.3778/j.issn.1673-9418.2306061
    Abstract ( )   PDF (1203KB) ( )  
    References | Related Articles | Metrics
    How to preserve the complementary information in multiple images to represent the scene in one image is a challenging topic. Based on this topic, various image fusion methods have been proposed. As an important branch of image fusion, infrared and visible image fusion (IVIF) has a wide range of applications in segmentation, target detection and military reconnaissance fields. In recent years, deep learning has led the development direction of image fusion. Researchers have explored the field of IVIF using deep learning. Relevant experimental work has proven that applying deep learning to achieving IVIF has significant advantages compared with traditional methods. This paper provides a detailed analysis on the advanced algorithms for IVIF based on deep learning. Firstly, this paper reports on the current research status from the aspects of network architecture, method innovation, and limitations. Secondly, this paper introduces the commonly used datasets in IVIF methods and provides the definition of commonly used evaluation metrics in quantitative experiments. Qualitative and quantitative evaluation experiments of fusion and segmentation and fusion efficiency analysis experiments are conducted on some representative methods mentioned in the paper to comprehensively evaluate the performance of the methods. Finally, this paper provides conclusions and prospects for possible future research directions in the field.
    Survey of 3D Model Recognition Based on Deep Learning
    ZHOU Yan, LI Wenjun, DANG Zhaolong, ZENG Fanzhi, YE Dewang
    2024, 18(4):  916-929.  DOI: 10.3778/j.issn.1673-9418.2309010
    Abstract ( )   PDF (763KB) ( )  
    References | Related Articles | Metrics
    With the rapid advancement of three-dimensional visual perception devices such as 3D scanners and LiDAR, the field of 3D model recognition is gradually gaining the attention of a growing number of researchers. This domain encompasses two core tasks: 3D model classification and retrieval. Since deep learning has already achieved significant success in two-dimensional visual tasks, its introduction into the realm of three-dimensional visual perception not only breaks free from the constraints of traditional methods but also makes notable strides in areas such as autonomous driving and intelligent robotics. However, the application of deep learning techniques to 3D model recognition tasks still faces several challenges. In light of this, there is a need for a comprehensive review of the application of deep learning in 3D model recognition. This review begins by discussing commonly used evaluation metrics and public datasets, providing relevant information and sources for each dataset. Subsequently, it delves into representative methods from various angles, including point clouds, views, voxels, and multimodal fusion. It also summarizes recent research development in the field. Through performance comparison on these datasets, the strengths and limitations of each method are analyzed. Finally, based on the merits and demerits of these approaches, the review outlines the challenges currently faced by 3D model recognition tasks and provides an outlook on future trends in this field.
    Theory·Algorithm
    Multi-strategy Improved Dung Beetle Optimizer and Its Application
    GUO Qin, ZHENG Qiaoxian
    2024, 18(4):  930-946.  DOI: 10.3778/j.issn.1673-9418.2308020
    Abstract ( )   PDF (1072KB) ( )  
    References | Related Articles | Metrics
    Dung beetle optimizer (DBO) is an intelligent optimization algorithm proposed in recent years. Like other optimization algorithms, DBO also has disadvantages such as low convergence accuracy and easy to fall into local optimum. A multi-strategy improved dung beetle optimizer (MIDBO) is proposed. Firstly, it improves acceptance of local and global optimal solutions by brood balls and thieves, so that the beetles can dynamically change according to their own searching ability, which not only improves the population quality but also maintains the good searching ability of individuals with high fitness. Secondly, the follower position updating mechanism in the sparrow search algorithm is integrated to disturb the algorithm, and the greedy strategy is used to update the location, which improves the convergence accuracy of the algorithm. Finally, when the algorithm stagnates, Cauchy Gaussian variation strategy is introduced to improve the ability of the algorithm to jump out of the local optimal solution. Based on 20 benchmark test functions and CEC2019 test function, the simulation experiment verifies the effectiveness of the three improved strategies. The convergence analysis of the optimization results of the improved algorithm and the comparison algorithms and Wilcoxon rank sum test prove that MIDBO has good optimization performance and robustness. The validity and reliability of MIDBO in solving practical engineering problems are further verified by applying MIDBO to the solution of automobile collision optimization problems.
    Approach to Multi-path Coverage Testing Based on Path Similarity Table and Individual Migration
    QIAN Zhongsheng, SUN Zhiwang, YU Qingyuan, QIN Langyue, JIANG Peng, WAN Zilong, WANG Yahui
    2024, 18(4):  947-962.  DOI: 10.3778/j.issn.1673-9418.2301018
    Abstract ( )   PDF (925KB) ( )  
    References | Related Articles | Metrics
    The application of genetic algorithm in multi-path coverage testing is a research hotspot. In the process of iteration between the old and new populations, the old population may contain excellent individuals from other sub-populations, which are not fully utilized, resulting in resource waste. At the same time, the number of individuals in the population will be much greater than that of reachable paths, and each individual will go through a reachable path. This causes multiple individuals to pass through the same path, leading to repeated calculation of the similarity between the individual and the target path. Based on this, a multi-path coverage testing method combined with path similarity table and individual migration is proposed to improve testing efficiency. By storing the calculated path similarity value in the path similarity table, the value can be avoided from being calculated repeatedly and the testing time can be reduced. In the evolutionary process, the individual path is compared with other target paths, and if the similarity reaches the threshold, the excellent individual is migrated to the sub-population corresponding to the path, which improves the utilization rate of individuals and reduces the evolutionary generation. Experiments show that, compared with other six classic methods, the proposed method reduces the average generation time on eight programs by up to 44.64%, and the minimum is 2.64%, and the average evolution generation is reduced by up to 35.08%, and the minimum is 6.13%. Therefore, the proposed method effectively improves the test efficiency.
    Graphics·Image
    Pre-weighted Modulated Dense Graph Convolutional Networks for 3D Human Pose Estimation
    MA Jinlin, CUI Qilei, MA Ziping, YAN Qi, CAO Haojie, WU Jiangtao
    2024, 18(4):  963-977.  DOI: 10.3778/j.issn.1673-9418.2302065
    Abstract ( )   PDF (833KB) ( )  
    References | Related Articles | Metrics
    Graph convolutional networks (GCN) have increasingly become one of the main research hotspots in 3D human pose estimation. The method of modeling the relationship between human joint points by GCN has achieved good performance in 3D human pose estimation. However, the 3D human pose estimation method based on GCN has issues of over-smooth and indistinguishable importance between joint points and adjacent joint points. To address these issues, this paper designs a modulated dense connection (MDC) module and a pre-weighted graph convolutional module, and proposes a pre-weighted modulated dense graph convolutional network (WMDGCN) for 3D human pose estimation based on these two modules. For the problem of over-smoothing, the modulation dense connection can better realize feature reuse through hyperparameter [α] and [β] (hyperparameter [α] represents the weight proportion of features of layer L to previous layers, and hyperparameter [β] represents the propagation strategies of the features of previous layers to layer L), thus effectively improving the expression ability of features. To address the issue of not distinguishing the importance of the joint points and adjacent joint points, the pre-weighted graph convolution is used to assign higher weights to the joint point. Different weight matrices are used for the joint point and its adjacent joint points to capture human joint point features more effectively. Comparative experimental results on the Human3.6M dataset show that the proposed method achieves the best performance in terms of parameter number and performance. The parameter number, MPJPE and P-MPJPE values of WMDGCN are 0.27 MB, 37.46 mm and 28.85 mm, respectively.
    Skin Disease Segmentation Method Combining Dense Encoder and Dual-Path Attention
    WANG Longye, XIAO Yue, ZENG Xiaoli, ZHANG Kaixin, MA Ao
    2024, 18(4):  978-989.  DOI: 10.3778/j.issn.1673-9418.2303122
    Abstract ( )   PDF (1321KB) ( )  
    References | Related Articles | Metrics
    Aiming at the problems of different shapes and sizes, discontinuous and blurred boundaries, and high similarity between the lesion area and the background in dermoscopic image lesion areas, a skin lesion segmentation network integrating dense encoder and dual-path attention (DEDA-Net) is proposed. Firstly, the network employs a dense coding module for multi-scale information fusion to enhance network feature extraction capabilities, alleviating blurred edges in dermoscopic images. Skip connection and residual path are used to reduce the semantic gap in the network coding and decoding parts. Secondly, a global normal pooling layer is proposed that weights feature points in the feature map based on their degree of relevance, and a dual-path attention module that extracts feature information in two dimensions, space and channel, is designed to avoid the problem that it is difficult to distinguish the lesion area from the background due to insufficient global information acquisition. Finally, using the idea of an auxiliary loss function, a weighted loss function is employed on both sides of the middle of the network and the final output layer to improve generalization ability of the network. Experimental results show that the algorithm achieves a segmentation accuracy of 96.45%, a specificity of 97.82%, a Dice coefficient of 93.16%, and an IoU of 86.61% on the ISIC2017 dataset, which are 5.93 percentage points, 6.45 percentage points, 6.53 percentage points, and 5.63 percentage points higher than the baseline U-Net, demonstrating the effectiveness of the proposed algorithm in accurately segmenting skin lesion areas.
    Few-Shot Image Classification Method with Feature Maps Enhancement Prototype
    XU Huajie, LIANG Shuwei
    2024, 18(4):  990-1000.  DOI: 10.3778/j.issn.1673-9418.2302015
    Abstract ( )   PDF (949KB) ( )  
    References | Related Articles | Metrics
    Due to the scarcity of labeled samples, the class prototype obtained by support set samples is difficult to represent the real distribution of the whole class in metric-based few-shot image classification methods. Meanwhile, samples of the same class may also have large difference in many aspects and the large intra-class bias may make the sample features deviate from the class center. Aiming at the above problems that may seriously affect the performance, a few-shot image classification method with feature maps enhancement prototype (FMEP) is proposed. Firstly, this paper selects some similar features of the query set sample feature maps with cosine similarity and adds them to class prototypes to obtain more representative prototypes. Secondly, this paper aggregates similar features of the query set to alleviate the problem caused by large intra-class bias and makes features distribution of the same class closer. Finally, this paper compares enhanced prototypes and aggregated features which are both closer to real distribution to get better results. The proposed method is tested on four commonly used few-shot classification datasets, namely MiniImageNet, TieredImageNet, CUB-200 and CIFAR-FS. The results show that the proposed method can not only improve the performance of the baseline model, but also obtain better performance compared with the same type of methods.
    Artificial Intelligence·Pattern Recognition
    Knowledge Graph Completion Algorithm with Multi-view Contrastive Learning
    QIAO Zifeng, QIN Hongchao, HU Jingjing, LI Ronghua, WANG Guoren
    2024, 18(4):  1001-1009.  DOI: 10.3778/j.issn.1673-9418.2301038
    Abstract ( )   PDF (629KB) ( )  
    References | Related Articles | Metrics
    Knowledge graph completion is a process of reasoning new triples based on existing entities and relations in knowledge graph. The existing methods usually use the encoder-decoder framework. Encoder uses graph convolutional neural network to get the embeddings of entities and relations. Decoder calculates the score of each tail entity according to the embeddings of the entities and relations. The tail entity with the highest score is the inference result. Decoder inferences triples independently, without consideration of graph information. Therefore, this paper proposes a graph completion algorithm based on contrastive learning. This paper adds a multi-view contrastive learning framework into the model to constrain the embedded information at graph level. The comparison of multiple views in the model constructs different distribution spaces for relations. Different distributions of relations fit each other, which is more suitable for completion tasks. Contrastive learning constraints the embedding vectors of entity and subgraph and enhahces peroformance of the task. Experiments are carried out on two datasets. The results show that MRR is improved by 12.6% over method A2N and 0.8% over InteractE on FB15k-237 dataset, and 7.3% over A2N and 4.3% over InteractE on WN18RR dataset. Experimental results demonstrate that this model outperforms other completion methods.
    Research on Sentiment Analysis of Short Video Network Public Opinion by Integrating BERT Multi-level Features
    HAN Kun, PAN Hongpeng, LIU Zhongyi
    2024, 18(4):  1010-1020.  DOI: 10.3778/j.issn.1673-9418.2311023
    Abstract ( )   PDF (912KB) ( )  
    References | Related Articles | Metrics
    The era of self-media and the widespread popularity of online social software have led to short video platforms becoming “incubators” easily for the origin and fermentation of public opinion events. Analyzing the public opinion comments on these platforms is crucial for the early warning, handling, and guidance of such incidents. In view of this, this paper proposes a text classification model combining BERT and TextCNN, named BERT-MLFF-TextCNN, which integrates multi-level features from BERT for sentiment analysis of relevant comment data on the Douyin short video platform. Firstly, the BERT pre-trained model is used to encode the input text. Secondly, semantic feature vectors from each encoding layer are extracted and fused. Subsequently, a self-attention mechanism is integrated to highlight key features, thereby effectively utilizing them. Finally, the resulting feature sequence is input into the TextCNN model for classification. The results demonstrate that the BERT-MLFF-TextCNN model outperforms BERT-TextCNN, GloVe-TextCNN, and Word2vec-TextCNN models, achieving an [F1] score of 0.977. This model effectively identifies the emotional tendencies in public opinions on short video platforms. Based on this, using the TextRank algorithm for topic mining allows for the visualization of thematic words related to the sentiment polarity of public opinion comments, providing a decision-making reference for relevant departments in the public opinion management work.
    Self-supervised Hybrid Graph Neural Network for Session-Based Recommendation
    ZHANG Yusong, XIA Hongbin, LIU Yuan
    2024, 18(4):  1021-1031.  DOI: 10.3778/j.issn.1673-9418.2212043
    Abstract ( )   PDF (923KB) ( )  
    References | Related Articles | Metrics
    Session-based recommendation aims to predict user actions based on anonymous sessions. Most of the existing session recommendation algorithms based on graph neural network (GNN) only extract user preferences for the current session, but ignore the high-order multivariate relationships from other sessions, which affects the recommendation accuracy. Moreover, session-based recommendation suffers more from the problem of data sparsity due to the very limited short-term interactions. To solve the above problems, this paper proposes a model named self-  supervised hybrid graph neural network (SHGN) for session-based recommendation. Firstly, the model describes the relationship between sessions and objects by constructing the original data into three views. Next, a graph attention network is used to capture the low-order transitions information of items within a session, and then a residual graph convolutional network is proposed to mine the high-order transitions information of items and sessions. Finally, self-supervised learning (SSL) is integrated as an auxiliary task. By maximizing the mutual information of session embeddings learnt from different views, data augmentation is performed to improve the recommendation performance. In order to verify the effectiveness of the proposed method, comparative experiments with mainstream baseline models such as SR-GNN, GCE-GNN and DHCN are carried out on four benchmark datasets of Tmall, Diginetica, Nowplaying and Yoochoose, and the results are improved in P@20, MRR@20 and other performance indices.
    Policy Search Reinforcement Learning Method in Latent Space
    ZHAO Tingting, WANG Ying, SUN Wei, CHEN Yarui, WANG Yuan, YANG Jucheng
    2024, 18(4):  1032-1046.  DOI: 10.3778/j.issn.1673-9418.2211106
    Abstract ( )   PDF (1012KB) ( )  
    References | Related Articles | Metrics
    Policy search is an efficient learning method in the field of deep reinforcement learning (DRL), which is capable of solving large-scale problems with continuous state and action spaces and widely used in real-world problems. However, such method usually requires a large number of trajectory samples and extensive training time, and may suffer from poor generalization ability, making it difficult to generalize the learned policy model to seemingly small changes in the environment. In order to solve the above problems, this paper proposes a policy search DRL method based on latent space. Specifically, this paper extends the idea of state representation learning to action representation learning, i.e. learning a policy in the latent space of action representations, and then mapping the action representations to the real action space. With the introduction of representation learning models, this paper abandons the traditional end-to-end training manner in DRL and divides the whole task into two stages: large-scale representation model learning and the small-scale policy model learning, where unsupervised learning methods are employed to learn the representation models and policy search methods are used to learn the small-scale policy model. Large-scale representation models can ensure the capacity for generalization and expressiveness, while small-scale policy model can reduce the burden of policy learning, thus alleviating the issues of low sample utilization, low learning efficiency and weak generalization of action selection in DRL to some extent. Finally, the effectiveness of introducing the latent state and action representations is demonstrated by the intelligent control task CarRacing and Cheetah.
    Potential Relationship Based Joint Entity and Relation Extraction
    PENG Yanfei, ZHANG Ruisi, WANG Ruihua, GUO Jialong
    2024, 18(4):  1047-1056.  DOI: 10.3778/j.issn.1673-9418.2301061
    Abstract ( )   PDF (713KB) ( )  
    References | Related Articles | Metrics
    The role of joint entity and relation extraction is to identify entities and their corresponding relations from specific texts, and it is also the basis for constructing and updating knowledge graph. Currently, joint extraction methods ignore information redundancy in the extraction process while pursuing performance. To address this issue, a model based on latent relations for joint entity and relation extraction is proposed. This paper designs a new decoding method to reduce the redundant information of relationships, entities and triples in the prediction process, and it is divided into two steps: extracting potential entity pairs and decoding relationships to complete the extraction of triples. Firstly, the potential entity pair extractor is used to predict whether there is potential relationship between entities, and at the same time, the entity pairs with high confidence are selected as the final potential entity pairs. Secondly, the relational decoding is regarded as a multi-label binary classification task, and the confidence of all relationships between each potential entity pair is predicted by the relational decoder. Finally, the number and type of relationships are determined by confidence to complete the task of extracting triples. Experimental results on two general datasets show that the proposed model is better than the baseline models in terms of accuracy and F1 indicators, which verifies the effectiveness of the proposed model. The ablation experiment also proves the effectiveness of the internal parts of the model.
    Multi-feature Interaction for Aspect Sentiment Triplet Extraction
    CHEN Linying, LIU Jianhua, ZHENG Zhixiong, LIN Jie, XU Ge, SUN Shuihua
    2024, 18(4):  1057-1067.  DOI: 10.3778/j.issn.1673-9418.2302077
    Abstract ( )   PDF (731KB) ( )  
    References | Related Articles | Metrics
    Aspect sentiment triple extraction is one of the subtasks of aspect-level sentiment analysis, which aims to extract aspect terms, corresponding opinion terms and sentiment polarity in sentence. Previous studies focus on designing a new paradigm to complete the triplet extraction task in an end-to-end manner. They ignore the role of external knowledge in the model, thus semantic information, part-of-speech information and local context information are not fully explored and utilized. Aiming at the above problems, multi-feature interaction for aspect sentiment triplet extraction (MFI-ASTE) is proposed. Firstly, the bidirectional encoder representation from transformers (BERT) model is used to learn the context semantic feature information, meanwhile, the self-attention mechanism is used to strengthen the semantic feature. Secondly, the semantic feature interacts with the extracted part-of-speech feature and both learn from each other to strengthen the combination ability of the part-of-speech and semantic information. Thirdly, many convolutional neural networks are used to extract multiple local context features of each word, and then multi-point gate mechanism is used to filter these features. Fourthly, three features of external knowledge are fused by two linear layers. Finally, biaffine attention is used for predicting grid tagging and specific decoding schemes are used for decoding triplets. Experimental results show that the proposed model improves the F1 score by 6.83%, 5.60%, 0.54% and 1.22% respectively on four datasets compared with existing mainstream models.
    Network·Security
    Image-Text Retrieval Backdoor Attack with Diffusion-Based Image-Editing
    YANG Shun, LU Hengyang
    2024, 18(4):  1068-1082.  DOI: 10.3778/j.issn.1673-9418.2305032
    Abstract ( )   PDF (1079KB) ( )  
    References | Related Articles | Metrics
    Deep neural networks are susceptible to backdoor attacks during the training stage. When training an image-text retrieval model, if an attacker maliciously injects image-text pairs with a backdoor trigger into the training dataset, the backdoor will be embedded into the model. During the model inference stage, the infected model performs well on benign samples, whereas the secret trigger can activate the hidden backdoor and maliciously change the inference result to the result set by the attacker. The existing researches on backdoor attacks in image-text retrieval are based on the method of directly overlaying the trigger patterns on images, which has the disadvantages of low success rate, obvious abnormal features in poisoned image samples, and low visual concealment. This paper proposes a new backdoor attack method (Diffusion-MUBA) for image-text retrieval models based on diffusion models, designing trigger prompts for the diffusion model. Based on the correspondence between text keywords and regions of interest (ROI) in image-text pair samples, the ROI region in the image samples is edited to generate covert, smooth and natural poisoned training samples, to fine-tune through the pretrained model, establishing incorrect fine-grained word to region alignment in the image-text retrieval model, and embed hidden backdoors into the retrieval model. This paper designs the attack strategy of diffusion model image editing, proposes the backdoor attack model of bidirectional image-text retrieval, and achieves good results in the backdoor attack experiments of image-text retrieval and text-image retrieval. Compared with other backdoor attack methods, it improves the attack success rate, and avoids the impact of introducing specific characteristics of trigger patterns, watermarks, perturbations, local distortions and deformation in the poisoned samples. On this basis, this paper proposes a backdoor attack defense method based on object detection and text matching. It is hoped that the study on the feasibility, concealment, and implementation of backdoor attacks in image and text retrieval may contribute to the development of multimodal backdoor attack defense.
    Cryptomining Malware Early Detection Method Based on AECD Embedding
    CAO Chuanbo, GUO Chun, LI Xianchao, SHEN Guowei
    2024, 18(4):  1083-1093.  DOI: 10.3778/j.issn.1673-9418.2307023
    Abstract ( )   PDF (697KB) ( )  
    References | Related Articles | Metrics
    Cryptomining malware can compromise system security, reduce hardware lifetime, and cause significant power consumption. Therefore, implementing cryptomining malware early detection to stop its damage in time is critical to system security. The existing dynamic analysis-based cryptomining malware early detection methods are hard to balance the timeliness and accuracy of detection. To detect cryptomining malware accurately and timely, this paper integrates a certain length of API (application programming interface) names, API operation categories and DLLs (dynamic link libraries) called by cryptomining malware in the early stage of operation to more fully describe its behavioral information in this stage, and proposes the AECD (API embedding based on category and DLL) embedding and further proposes a cryptomining malware early detection method based on AECD embedding (CEDMA). CEDMA uses the API sequence called by software in the early stage of operation as the object of detection and uses AECD embedding and TextCNN (text convolutional neural network) to build a detection model to implement cryptomining malware early detection. Experimental results show that when CEDMA takes the 3000 API sequence called for the first time after the software runs as input, it can detect the known and unknown cryptomining malware samples in the experiment with 98.21% and 96.76% accuracy values, respectively.
    High Performance Computing
    TEB: Efficient SpMV Storage Format for Matrix Decomposition and Reconstruction on GPU
    WANG Yuhua, ZHANG Yuqi, HE Junfei, XU Yuezhu, CUI Huanyu
    2024, 18(4):  1094-1108.  DOI: 10.3778/j.issn.1673-9418.2304039
    Abstract ( )   PDF (1249KB) ( )  
    References | Related Articles | Metrics
    Sparse matrix-vector multiplication (SpMV) is a crucial computing process in the field of science and engineering. CSR (compressed sparse row) format is one of the most commonly used storage formats for sparse matrix. In the process of implementing parallel SpMV on the graphics processing unit (GPU), it only stores non-zero elements of sparse matrix, avoiding computational redundancy caused by zero element filling, and saving storage space. But there is a problem of load imbalance, which wastes computing resources. To address the aforementioned issues, storage formats with good performance in recent years have been studied, and a row by row decomposition and reorganization storage format—TEB (threshold-exchangeorder block) format has been proposed. The format first uses a heuristic threshold selection algorithm to determine the appropriate segmentation threshold, and combines the row merging algorithm based on reordering to reconstruct and decompose the sparse matrix, so that the number of non-zero elements between blocks is as close as possible. Furthermore, combined with CUDA (computer unified device architecture) thread technology, a parallel SpMV algorithm between sub blocks based on TEB storage format is proposed, which can reasonably allocate computing resources and solve the problem of load imbalance, thus improving the parallel computing efficiency of SpMV. In order to verify the effectiveness of the TEB storage format, experiments are conducted on the NVIDIA Tesla V100 platform. The results show that compared to PBC (partition-block-CSR), AMF-CSR (adaptive multi-row folding of CSR), CSR-Scalar (compressed sparse row-scalar), and CSR5 (compressed sparse row 5) storage formats, TEB can improve SpMV time performance by an average of 3.23×, 5.83×, 2.33×, and 2.21×. In terms of floating-point computing performance, the average improvement can be 3.36×, 5.95×, 2.29×, and 2.13×