Loading...

Table of Content

    2024-12-01, Volume 18 Issue 12
    Frontiers·Surveys
    Survey on Application of Homomorphic Encryption in Deep Learning
    YANG Hongchao, YI Mengjun, LI Peijia, ZHANG Hanwen, SHEN Furao, ZHAO Jian, WANG Liuwang
    2024, 18(12):  3065-3079.  DOI: 10.3778/j.issn.1673-9418.2406098
    Abstract ( )   PDF (5651KB) ( )  
    References | Related Articles | Metrics
    With the widespread application of deep learning in various fields, data privacy and security issues have become increasingly important. Homomorphic encryption, a technique that allows computations to be performed directly on encrypted data, offers a potential solution to these problems. This paper surveys methods that combine deep learning with homomorphic encryption, exploring how to effectively apply deep learning models in encrypted environments. Firstly, the basics of homomorphic encryption are introduced, covering its basic principles, different classifications (including partially homomorphic encryption, somewhat homomorphic encryption and fully homomorphic encryption), and the development history of fully homomorphic encryption. Key models in deep learning, such as convolutional neural network and Transformer, are then detailed. The steps of combining homomorphic encryption with deep learning and how to adapt various layers of deep learning (e.g., convolutional layers, attention layer and activation function layer) to the homomorphic encryption environments are discussed. Subsequently, existing methods that integrate convolutional neural network and Transformer with homomorphic encryption are focused on. Specific implementation schemes for performing deep learning computations on encrypted data and performance optimization strategies employed to enhance efficiency and accuracy are discussed. The advantages and limitations of each method are summarized. Finally, current research progress is summarized, and an outlook on future research directions is provided.
    Review of Research on Adversarial Attack in Three Kinds of Images
    XU Yuhui, PAN Zhisong, XU Kun
    2024, 18(12):  3080-3099.  DOI: 10.3778/j.issn.1673-9418.2404001
    Abstract ( )   PDF (8700KB) ( )  
    References | Related Articles | Metrics
    In recent years, there have been numerous breakthroughs in deep learning, leading to the expansion of applications based on deep learning into a wide range of fields. However, due to the vulnerability of deep neural networks, they are highly susceptible to threats from adversarial samples, posing significant security challenges in their application. As a result, adversarial attack has been a hot research area. Since deep neural networks are widely used in image tasks, research on adversarial attacks in the image field is a key to enhancing security, and a lot of research from different perspectives has been carried out. Existing studies on image attacks can mainly be categorized into three forms: visible light images, infrared images, and synthetic aperture radar (SAR) images. Firstly, this paper introduces the basic concepts and adversarial sample terms related to image adversarial samples, and then summarizes the adversarial attack methods for three types of images according to their attack ideas. Meanwhile, the attack success rate (ASR), memory size, and applicable scenarios of the attack methods for three types of images are compared and analyzed. At the same time, a brief introduction is made to the defense strategy research in the field of image adversarial samples, mainly summarizing three existing defense methods. Finally, the current status of image adversarial samples is analyzed, the possible research directions of adversarial attacks in the future image field are prospected, the potential problems that may be encountered in the future are summarized, and corresponding solutions are provided.
    Video Anomaly Detection Methods: a Survey
    WU Peichen, YUAN Lining, GUO Fang, LIU Zhao
    2024, 18(12):  3100-3125.  DOI: 10.3778/j.issn.1673-9418.2404041
    Abstract ( )   PDF (9008KB) ( )  
    References | Related Articles | Metrics
    Video abnormal behavior detection is a hot research topic in computer vision. It involves extracting temporal and spatial features from video content to determine the presence of abnormal events and their types within the video, as well as to localize the regions and time where anomalies occur. This paper systematically reviews and categorizes existing methods for video abnormal behavior detection based on supervised/unsupervised learning. This paper categorizes the supervised methods into methods based on deviation mean calculation and multimodal methods. For unsupervised methods, it summarizes various completely unsupervised approaches. Starting from the current mainstream modeling approaches, this paper gives a detailed explanation of deviation mean calculation methods, summarizes multimodal methods based on the utilization and processing of different modal features, and introduces completely unsupervised methods based on two training approaches. By comparing the network architectures of different models, this paper summarizes the test datasets, use cases, advantages, and limitations of various abnormal behavior detection models. Furthermore, it compares and evaluates models using benchmark datasets and common evaluation standards such as frame-level and pixel-level standards, and conducts intra-class comparisons based on performance results, followed by analysis of the outcomes. Lastly, it explores trends in video abnormal behavior detection through five directions: virtual synthetic datasets, multimodal large models, lightweight models, etc.
    Application of Deep Learning in Classification and Diagnosis of Mild Cognitive Impairment
    ZHOU Qixiang, WANG Xiaoyan, ZHANG Wenkai, HE Xin
    2024, 18(12):  3126-3143.  DOI: 10.3778/j.issn.1673-9418.2402004
    Abstract ( )   PDF (7381KB) ( )  
    References | Related Articles | Metrics
    Alzheimer's disease is an irreversible neurodegenerative disease that has not been completely cured, but its progression can be delayed by early intervention. Mild cognitive impairment is the initial stage of Alzheimer??s disease. It is of great significance to correctly identify this stage for early diagnosis and early intervention of Alzheimer??s disease. Deep learning has become a research hotspot in assisting the classification and diagnosis of mild cognitive impairment because it can automatically extract image features. In order to better classify mild cognitive impairment, this paper reviews the classification and diagnosis of mild cognitive impairment based on deep learning in recent years. Firstly, the commonly used datasets in the classification and diagnosis of mild cognitive impairment are introduced, and the data quantity, data type and download address of each dataset are sorted out. Secondly, this paper  summarizes the commonly used data preprocessing methods and model evaluation indicators. Then it focuses on the application of deep learning models and methods in the classification and diagnosis of mild cognitive impairment, including but not limited to automatic encoders, deep belief networks, generative adversarial networks, convolutional neural networks, and graph convolutional neural networks, and points out the model interpretability techniques used in the research. Finally, the main ideas, advantages and disadvantages of various algorithms are summarized, and the classification and diagnosis performance of mild cognitive impairment classification methods based on deep learning on public datasets is compared. The shortcomings in related research are summarized, and the future research direction is prospected.
    Research on Blockchain-Based Inter-Domain Routing Security Enhancement
    WANG Qun, LI Fujuan, NI Xueli, XIA Lingling, MA Zhuo
    2024, 18(12):  3144-3174.  DOI: 10.3778/j.issn.1673-9418.2407065
    Abstract ( )   PDF (1410KB) ( )  
    References | Related Articles | Metrics
    The border gateway protocol (BGP) is currently the de facto interdomain routing standard in the Internet, with its security based on the authenticity and integrity of autonomous systems (AS) identities and paths. However, BGP itself lacks intrinsic security mechanisms, and its security issues have garnered significant attention. Blockchain, as an innovative technology for building a new generation of information infrastructure, can establish a distributed multi-party trust system in an open Internet environment. It offers technical solutions to the issues and challenges encountered in traditional BGP security enhancement. According to technology development and security iteration, this paper categorizes the BGP security enhancement into three progressive stages: the theoretical exploration stage represented by secure border gateway protocol (S-BGP), the practical application stage with resource public key infrastructure (RPKI) as the security foundation, and the innovative development stage with the blockchain technology. Firstly, this paper analyzes the vulnerabilities in BGP routing propagation methods and routing policies, along with three typical security threats: prefix hijacking, path spoofing, and route leaks. Next, traditional BGP security enhancement techniques and research context are reviewed, with a particular focus on proactive defense and anomaly detection mechanisms to counter erroneous route announcement attacks. Then, following a brief introduction to the fundamental principles of blockchain technology, the paper explores the implementation concepts, paths, and methods of blockchain-based BGP security enhancements, utilizing the blockchain’s characteristics of decentralization, tamper resistance, traceability, and distributed deployment. Finally, the challenges faced in BGP security enhancement with blockchain technology are discussed, and future research directions are outlined.
    Theory·Algorithm
    Self-Supervised Social Recommendation Algorithm Fusing Residual Networks
    WANG Yujie, YANG Zhe
    2024, 18(12):  3175-3188.  DOI: 10.3778/j.issn.1673-9418.2401006
    Abstract ( )   PDF (6053KB) ( )  
    References | Related Articles | Metrics
    Social recommendation based on graph neural networks learns the embedded relationships between users and items through the information of social graphs and interaction graphs to get the final recommendation results. However, the existing algorithms mainly utilize the static social graph structure, which is unable to mine the potential linking relationship between users, and at the same time do not solve the noise problem in the user-item interaction behavior. Therefore, a self-supervised social recommendation algorithm incorporating residual networks is proposed. Firstly, the algorithm employs a variational hypergraph auto-encoder for link prediction in social networks to obtain a reconstructed social graph, which is used to mine the positive link relationships hidden among users. Secondly, an attention mechanism is utilized to assign different attention coefficients to the original and the reconstructed residual social graphs to obtain a more accurate representation of users. Lastly, to alleviate the problem of noise in the data, an adaptive hypergraph global relation extractor is constructed. Self-supervised signals are created using local embedding information and global embedding information in collaboration with this extractor, which optimizes the local embedding representation and thus mitigates the effect of noise. The algorithm is experimentally compared with baseline models such as NGCF, LightGCN, and MHCN on three datasets, Ciao, Epinions and Yelp. On the Ciao dataset, Recall@10 is improved by 17.1% to 48.5%, NDCG@10 is improved by 1.4% to 37.9%; on the Epinions dataset, Recall@10 is improved by 8.3% to 56.2%, NDCG@10 is improved by 3.7% to 29.8%; on the Yelp dataset, Recall@10 is improved by 9.1% to 53.3%, NDCG@10 is improved by 11.2% to 66.6%. Experimental results show that the algorithm has good recommendation performance compared with the benchmark model.
    Search Guidance Network Assisted Dynamic Particle Swarm Optimization Algorithm
    LIU Zhi, SONG Wei
    2024, 18(12):  3189-3202.  DOI: 10.3778/j.issn.1673-9418.2312030
    Abstract ( )   PDF (5945KB) ( )  
    References | Related Articles | Metrics
    In dynamic optimization problems (DOPs), environmental changes can be characterized as different dynamics, and adaption of dynamic optimization algorithms (DOAs) in different dynamic environments is vital. In addition, the local and global diversity loss is one of the main reasons behind the degradation of the exploitation and exploration capabilities of DOAs. Maintaining local and global diversity in dynamic environments can effectively avoid diversity loss. To this end, a search guidance network-based particle swarm optimization (SGN-PSO) is proposed. The learning target of each input particle is selected based on the hidden layer of SGN, and its acceleration coefficient is adjusted in the output layer to guide the search of particles. Specifically, SGN is a single-hidden layer radial basis function neural network, and each of its hidden layer nodes consists of a center and radius. By setting multiple hidden nodes whose centers, i.e. the subpopulation centers, are far from each other, multiple subpopulations can be obtained. Each particle selects the local learning target from the personal best historical positions that belong to its subpopulation, and selects the global learning target from the subpopulation centers that are far from each other, contributing to maintaining local and global diversity of the population. Reinforcement learning is employed to obtain the desired output of the input particles and extreme learning machine is utilized to pre-train the network. Furthermore, the significance and crowding degree metrics of hidden nodes are designed to obtain a compact network structure, and incremental learning is used to ensure the network approximation ability. No matter which dynamic occurs, SGN-PSO can adapt to different environments through learning for guiding the search of particles, and can effectively address DOPs of different dynamics. Compared with five mainstream DOAs on MPB and DRPBG benchmark test suites, the results demonstrate that SGN-PSO achieves significant performance improvement in solving DOPs.
    Comparative Analysis of Convergence and Performance of Improved Northern Goshawk Optimization Algorithm
    ZHENG Xinyu, LI Yuan, LIU Xiaolin
    2024, 18(12):  3203-3218.  DOI: 10.3778/j.issn.1673-9418.2403073
    Abstract ( )   PDF (7906KB) ( )  
    References | Related Articles | Metrics
    In order to solve the problems of the northern goshawk optimization (NGO) algorithm, which quickly falls into local optimal, an improved northern goshawk optimization (INGO) algorithm is proposed in this paper. Firstly, during the population initialization stage, the good point set method is introduced to map to the search space, improving the population??s diversity and avoiding precociousness. In the position update stage, the osprey local exploration position update strategy and adaptive inertia weight factor are added to enhance global exploration and local development capabilities and improve the convergence speed and accuracy of the algorithm. Secondly, the Markov chain model of the hunting process of the northern goshawk, based on the INGO algorithm, is established to prove the global convergence. The effectiveness of the INGO algorithm is verified through experimental simulation and comparative analysis with six classical intelligent algorithms. The INGO algorithm??s convergence curve and Wilcoxon rank sum test analysis are carried out. Experimental results show that the INGO algorithm can effectively avoid falling into local optimality and has vital convergence accuracy and robustness. Finally, in order to further characterize the practical application capability of the INGO algorithm, the algorithm is successfully applied to engineering design problems to verify the effectiveness of the INGO algorithm in practical applications.
    Graphics·Image
    Human Uncivilized Behavior Detection Method Integrating Non-uniform Sampling and Feature Enhancement
    YE Hao, WANG Longye, ZENG Xiaoli, XIAO Yue
    2024, 18(12):  3219-3234.  DOI: 10.3778/j.issn.1673-9418.2401064
    Abstract ( )   PDF (5718KB) ( )  
    References | Related Articles | Metrics
    In order to solve the problems of misdetection of similar behaviors and low accuracy for detecting local body behaviors in the spatio-temporal action detection of abnormal human behavior, based on the self-made uncivilized behavior spatio-temporal action detection dataset (UBSAD), a method that integrates non-uniform sampling and feature enhancement is proposed. Firstly, this method incorporates the video swin transformer (VST) as the backbone network in the spatio-temporal feature extraction stage to capture long-term temporal dependencies in videos, and enhance the network’s global information learning capability. Additionally, a ringed residual VST block replaces the standard VST block in the final stage of the backbone network, enlarging the difference between target area and background area. Combined with the multi-head self-attention mechanism, the feature extraction of the target area is strengthened. Furthermore, during the video frame collection stage, a unique non-uniform sampling method is proposed to adjust the input data distribution according to task requirements, allowing the model to obtain action change information in a hierarchical manner, effectively improving the network’s attention to detailed features of similar behaviors. Finally, after the feature extraction network, a new cascaded pooling three-dimensional spatial pyramid feature enhancement module incorporating shallow features is embedded to further enhance feature applicability at various scales, reduce the loss of detailed motion information during the feature extraction process, reduce the interference of background information, and achieve the effect of feature enhancement. Experimental results show that the method achieves mAP of 71.93% and 83.09% respectively on the UBSAD dataset and the public dataset UCF101-24. They are 7.39 percentage points and 1.22 percentage points higher than those of using the baseline network VST as the spatio-temporal feature extraction model, demonstrating the method’s effectiveness in accurately detecting behavior.
    Point Cloud Action Recognition Method Based on Masked Self-Supervised Learning
    HE Yundong, LI Ping, PING Chenhao
    2024, 18(12):  3235-3246.  DOI: 10.3778/j.issn.1673-9418.2404045
    Abstract ( )   PDF (6418KB) ( )  
    References | Related Articles | Metrics
    Point cloud action recognition methods can provide precise 3D motion monitoring and recognition services, with broad application prospects in fields such as intelligent interaction, intelligent security, and medical health. Existing methods typically use a large amount of annotated point cloud data to train models, but point cloud videos contain a large number of 3D coordinates, precise annotation of point clouds is very expensive, and point cloud videos are highly redundant with uneven distribution of point cloud information in the video, all of which increase the difficulty of annotation. To address the aforementioned issue and achieve superior performance in point cloud action recognition, a novel masked self-supervised action recognition method called MSTD-Transformer is proposed, which can capture the spatiotemporal structure of point cloud videos without the need for manual annotation. Specifically, the point cloud video is divided into point tubes and adaptive video-level masks are generated based on importance, learning the appearance and motion features of point cloud videos through self-supervised learning of point cloud reconstruction and motion prediction dual-stream. To better capture motion information, MSTD-Transformer extracts dynamic attention from the displacement of point cloud keypoints and embeds it into a Transformer, using a dual-branch structure for differential learning to capture motion information and global structure separately. Experimental results on the standard dataset MSRAction-3D show that the proposed method achieves an accuracy of 96.17% for 24-frame point cloud video action recognition, which is 2.09 percentage points higher than the best existing method, confirming the effectiveness of the masking strategy and dynamic attention.
    Target Detection Algorithm Based on Global Feature Fusion in Parallel Dual Path Backbone
    QIU Yunfei, XIN Hao
    2024, 18(12):  3247-3259.  DOI: 10.3778/j.issn.1673-9418.2312050
    Abstract ( )   PDF (4789KB) ( )  
    References | Related Articles | Metrics
    The active downsampling of the backbone of conventional single path architecture often leads to insufficient feature extraction and information loss. At the same time, simply adding or splicing feature pyramids is not conducive to the integration of shallow to deep features. To solve these problems, a target detection algorithm based on global feature fusion in parallel dual path backbone is proposed. Firstly, the dual path architecture backbone is used to extract spatial and semantic information in parallel, and the dual path fusion module is used to promote the mutual complement between feature information. Secondly, the top feature is added to the pyramid pooled multi-scale pool mapping at the same time, and the attention mechanism is used to gather the multi-scale pooled features, so as to further improve the multi-scale detection performance. Then, the global scale information is gathered, which is integrated into different layers of features by using self-attention mechanism, and repeated many times to construct the neck network structure of global feature fusion, which effectively improves the ability of neck network to fuse global context information. Finally, the head adopts Ghost Conv combined with channel shuffling operation to maintain model performance and reduce parameter redundancy. Experiments on KITTI, BDD100K and PASCAL VOC datasets show that the average accuracy of the proposed algorithm is improved by 3.5, 3.4 and 2.7 percentage points compared with the baseline model (YOLOv7-tiny), respectively. Experimental results show that the proposed algorithm improves the detection performance in complex scenes, and has low requirements for computing power and other resources.
    Artificial Intelligence·Pattern Recognition
    Mixture of Expert Large Language Model for Legal Case Element Recognition
    YIN Hua, WU Zihao, LIU Tingting, ZHANG Jiajia, GAO Ziqian
    2024, 18(12):  3260-3271.  DOI: 10.3778/j.issn.1673-9418.2406047
    Abstract ( )   PDF (5237KB) ( )  
    References | Related Articles | Metrics
    The intelligent judicial decision-making is gradually aligning with the logic of legal adjudication. Case element recognition is a fundamental task proposed in recent years. Compared with earlier methods based on deep learning and machine reading comprehension, the generative element recognition approach using large language models (LLM) holds greater potential for complex reasoning. However, the current performance of judicial LLM on these fundamental tasks remains suboptimal. This paper introduces a conversational mixture of expert element recognition LLM. The proposed model in this paper first designs specific prompts tailored to the characteristics of cases for the ChatGLM3-6B-base model. The LLM is then fine-tuned with full parameters to acquire basic element recognition capabilities, with its weights shared among subsequent hybrid experts to reduce learning costs. To address different case types and label imbalance scenarios, case-specific DoRA experts and label-specific DoRA experts are integrated into the LLM’s attention layer, enhancing the model’s ability to differentiate between tasks. A learnable gating mechanism is also designed to facilitate the selection of label experts. The proposed model is tested on the CAIL2019 dataset and a desensitized theft case element recognition dataset from a certain province,  nine benchmark models across three types of methods are compared, and ablation experiments are conducted. Experimental results show that the proposed model’s overall performance, measured by the F1 score, exceeds the best-performance model by 5.9 percentage points. On the label-imbalanced CAIL2019 dataset, the label expert effectively mitigates the impact of extreme data imbalance. Additionally, without repeated full-parameter fine-tuning, the basic model trained on CAIL2019 achieves optimal results in theft cases of a certain province after lightweight fine-tuning by case and label experts, demonstrating the model’s scalability.
    CFB:Financial Large Models Evaluation Methods
    LI Yi, LI Hao, XU Xiaozhe, YANG Yifan
    2024, 18(12):  3272-3287.  DOI: 10.3778/j.issn.1673-9418.2406055
    Abstract ( )   PDF (6806KB) ( )  
    References | Related Articles | Metrics
    As the potential applications of large language models (LLMs) in the financial sector continue to emerge, evaluating the performance of financial LLMs becomes increasingly important. However, current financial evaluation methods face limitations such as singular evaluation tasks, insufficient coverage of evaluation datasets, and contamination of benchmark data. Consequently, the potential of LLMs in the financial domain has not been fully explored. To address these issues, this paper proposes the Chinese financial benchmark (CFB) for evaluating financial LLMs. The CFB encompasses 36 datasets, covers 24 financial tasks, and involves 7 evaluation tasks: question answering, terminology explanation, text generation, text translation, classification task, voice recognition, and predictive decision. It also establishes corresponding benchmarks. The new approach of the CFB includes a broader range of tasks and data, the introduction of a benchmark decontamination method based on LLMs, and three evaluation methods: instruction fine-tuning, knowledge retrieval enhancement, and prompt engineering. The evaluation of 12 LLMs, including GPT-4o, ChatGPT, and Gemini, reveals that though LLMs excel in information extraction and text analysis, they struggle with advanced reasoning and complex tasks. GPT-4o performs exceptionally in information extraction and stock trading, whereas Gemini excels in text generation and prediction. Instruction fine-tuning improves LLMs’ performance in text analysis but offers limited benefits for complex tasks.
    Knowledge-Aware Debiased Inference Model Integrating Intervention and Counter-factual
    SUN Shengjie, MA Tinghuai, HUANG Kai
    2024, 18(12):  3288-3299.  DOI: 10.3778/j.issn.1673-9418.2403018
    Abstract ( )   PDF (6796KB) ( )  
    References | Related Articles | Metrics
    The abductive natural language inference task (Abductive NLI) seeks to select more plausible hypothetical events based on given antecedent events and consequent events. However, inherent biases such as “logical defects” and “single-sentence label leakage” stemming from mediator and confounding variables in the inference process pose challenge. To address these issues, this paper proposes a novel knowledge-aware debiased inference model integrating intervention and counterfactual (KDIC). The model comprises three key modules: the mediator modulation module, the hypothesis-only bias module, and the external knowledge fusion module. Firstly, the mediator modulation module consists of causal graph intervention and hypothesis contrast learning. Causal graph intervention constructs a potential causal graph from given events and then extracts mediator variables, standing for the potential feature of unobserved events, via self-attention mechanism and graph convolutional network for guiding deep encoding. Concurrently, hypothesis contrast learning encourages the model to discern key factors affecting hypothesis judgment, rectifying logical inconsistencies. Secondly, the hypothesis-only bias module addresses the counterfactual problem by proactively identifying the inference biases arising from “single-sentence label leakage”. This module reduces the model’s reliance on specific words or phrases in the hypothesis, thereby enhancing robustness. Finally, this paper leverages a pre-trained common sense knowledge graph encoder, ComET, within the external knowledge fusion module. This integration enriches the model’s understanding of observed events’ motivations and potential outcomes, bolstering logical coherence across events. Experiments results on the αNLI dataset demonstrate that  the inference ability of KDIC is second only to Electra-large-discriminator trained via self-supervised learning. However, KDIC exhibits greater robustness to alleviate biases in the inference process.
    Speech Emotion Recognition Using Two-Stage Multiple Instance Learning Networks
    ZHANG Shiqing, CHEN Chen, ZHAO Xiaoming
    2024, 18(12):  3300-3310.  DOI: 10.3778/j.issn.1673-9418.2402013
    Abstract ( )   PDF (5342KB) ( )  
    References | Related Articles | Metrics
    In the task of speech emotion recognition (SER), each utterance is usually divided into several equal-length segments when processing the speech signals with unequal lengths, and finally emotion classification is obtained based on the average of the prediction results of all divided segments. However, such processing methods require human emotional expression to be evenly distributed throughout the speech signals. This is not consistent with the actual situation. To address this issue, this paper proposes an SER method using two-stage multiple instance learning networks. In the first stage, each utterance is regarded as a “bag”, and is segmented with equal lengths. A variety of acoustic features are extracted from the segmented samples, which are taken as “instances”. Then, they are fed into the relevant local acoustic feature encoder to learn the corresponding deep feature representations. A consistency-attention mechanism is used to perform feature interaction and enhancement on these extracted different feature representations. In the second stage, a hybrid aggregator based on multi-instance learning is designed so that instance predictions and instance features are fused at the global scale to calculate “bag” level prediction scores. Firstly, an instance distillation module is proposed to filter redundant instances with weak emotional information. Then, the distillation results are combined into a pseudo bag. The pseudo bag features are merged through an adaptive feature aggregation scheme, and then the prediction results are obtained through a classifier. Finally, instance-level and bag-level prediction results are combined by using an adaptive decision aggregation scheme so as to obtain the final emotion results. The achieved recognition accuracy on the IEMOCAP and MELD public datasets are 73.02% and 44.92%, respectively. Experimental results demonstrate the effectiveness of the proposed method.
    Network Rumor Detection Based on Enhanced Textual Semantics and Weighted Comment Stance
    ZHU Yi, WANG Gensheng, JIN Wenwen, HUANG Xuejian, LI Sheng
    2024, 18(12):  3311-3323.  DOI: 10.3778/j.issn.1673-9418.2402056
    Abstract ( )   PDF (6060KB) ( )  
    References | Related Articles | Metrics
    Social networks, while enabling information exchange among individuals, also serve as fertile grounds for the dissemination of rumors. The succinct nature of social media posts poses a challenge for most rumor detection methods reliant on content semantic features due to the insufficiency of semantic information. Additionally, numerous rumor detection techniques focusing on propagation features often disregard the unique attributes of commenters, leading to inadequate allocation of weights to different user comments. Thus, a network rumor detection approach is proposed, integrating text semantic enhancement and weighted comment stance. Initially, entities and concepts in posts are elucidated via an external knowledge graph to furnish additional contextual information, thereby augmenting semantic comprehension. Subsequently, leveraging pointwise mutual information, the enhanced text is translated into a weighted graph representation, and a weighted graph attention network is employed to assimilate enhanced semantic features of posts. Stance information for each comment within the post is then extracted using a pre-trained stance detection model, with weight values of stance information being learnt based on commenters’ characteristics. Furthermore, temporal data of comment stances and corresponding commenter sequences are fed into a cross-modal Transformer to glean the temporal features of comment stances. Ultimately, the enhanced semantic features are adaptively merged with the weighted temporal features of comment stances and fed into a multi-layer perceptron for classification. Experimental results on the PHEME and Weibo datasets demonstrate that this method not only achieves an accuracy improvement of over 1.6 percentage points compared with the state-of-the-art baseline method but also outperforms the best baseline method by at least 12 hours in early rumor detection.
    Fusion of Masked Autoencoder for Adaptive Augmentation Sequential Recommendation
    SUN Xiujuan, SUN Fuzhen, LI Pengcheng, WANG Aofei, WANG Shaoqing
    2024, 18(12):  3324-3334.  DOI: 10.3778/j.issn.1673-9418.2309042
    Abstract ( )   PDF (4389KB) ( )  
    References | Related Articles | Metrics
    In order to address the issue of poor-quality contrast views generated by contrastive learning methods in sequential recommendation tasks, a model called GATSR, which is based on graph attention networks for sequential recommendation, is proposed. Firstly, a global item-item transition graph is created based on all user interaction sequences, combining sequential patterns with global collaborative patterns to provide global context for the item representation. Then, an adaptive graph augmentation module is designed to extract important self-supervised signals based on an adaptive sampling strategy, learning more accurate item representations and effectively avoiding the interference of noise signals. Subsequently, the masked autoencoder module employs re-masking technology to mask to highly semantically related masked items again, enabling the encoder to learn higher-level item representations and achieving the reasonable reconstruction of masked items. Finally, the sequential recommender module integrates position information, global context, and the personalized user interaction sequence to obtain the final item representation and predict the user's future possible interaction items based on the representation, thereby providing more reliable recommendation results for users. Experimental results on the Books, Toys, and Retailrocket datasets show that the recommendation accuracy of the proposed model is superior to the most advanced baseline algorithms in terms of hit ratio (HR) and normalized discounted cumulative gain (NDCG) metrics. For example, it improves by 4.59% on the HR@5 metric and 8.89% on the NDCG@5 metric compared with the most advanced baseline.
    Network·Security
    Deepfake Detection Method Integrating Multiple Parameter-Efficient Fine-Tuning Techniques
    ZHANG Yiwen, CAI Manchun, CHEN Yonghao, ZHU Yi, YAO Lifeng
    2024, 18(12):  3335-3347.  DOI: 10.3778/j.issn.1673-9418.2311053
    Abstract ( )   PDF (7321KB) ( )  
    References | Related Articles | Metrics
    In recent years, as deepfake technology matures, face-swapping software and synthesized videos have become widespread. While these techniques offer entertainment, they also provide opportunities for misuse by malicious actors. Consequently, the significance of deepfake detection technology has grown markedly. Existing methods for deepfake detection commonly suffer from issues including poor cross-compression robustness, weak cross-dataset generalization, and high model training overheads. To address these challenges, this paper proposes a deepfake detection approach that combines multiple parameter-efficient fine-tuning techniques. This method utilizes a visual Transformer model pretrained with the masked image modeling self-supervised method as its backbone. Initially, it employs the low-rank adaptation (LoRA) method for fine-tuning the self-attention module parameters of the pretrained model. Concurrently, it introduces a parallel structure incorporating convolutional adapters to capture local texture information, enhancing the model’s adaptability in deepfake detection tasks. Subsequently, a serial structure introduces classical adapters to fine-tune the feed-forward network of the pretrained model, maximizing the utilization of knowledge acquired during the pretraining phase. Ultimately, a multi-layer perceptron replaces the original pretrained model’s classification head for deepfake detection. Experimental results across six datasets demonstrate that this model achieves an average frame-level AUC of approximately 0.996 with only 2×107 trainable parameters. In cross-compression experiments, the average frame-level AUC drop is 0.135. In cross-dataset generalization experiments, the frame-level AUC averages around 0.765.
    Forward-Secure Public-Key Encryption Scheme Based on SM9
    HUANG Wenfeng, XU Shengmin, MA Jinhua, NING Jianting, WU Wei
    2024, 18(12):  3348-3358.  DOI: 10.3778/j.issn.1673-9418.2310034
    Abstract ( )   PDF (4142KB) ( )  
    References | Related Articles | Metrics
    In the traditional hybrid cryptosystem, an attacker can generate the previously used session key to decrypt session contents encrypted under the session key due to the leakage of the current private key. To address this issue of leakage of the private key, this paper applies the key encapsulation mechanism and proposes a forward-secure public-key encryption scheme (FS-SM9) based on identity cryptosystem SM9. This paper also proves that the scheme is IND-FS-CPA under the hardness assumption (q, n)-DBDHI in the standard model. In the encryption scheme, the lifetime of the system is divided into multiple periods which are managed by a binary tree, which reduces the overheads of the system to a logarithmic level. The time information is embedded into the ciphertext when encrypting messages. Only the private key of the specific period can decrypt the ciphertext. Each private key is updated via an updating procedure and this updating procedure is unidirectional, during which a new private key is generated while the old one is deleted, so the forward security is preserved. Moreover, as shown by the performance analysis and experimental results, the scheme only introduces negligible overheads in achieving forward security under certain conditions. Therefore, the proposed scheme is practical and can be run on specific resource-constrained devices, providing forward security for these devices.