计算机科学与探索 ›› 2012, Vol. 6 ›› Issue (5): 397-408.DOI: 10.3778/j.issn.1673-9418.2012.05.002

• 学术研究 • 上一篇    下一篇

融合自动化逆向和聚类分析的协议识别方法

李城龙1,2,薛一波1,3+,汪东升1,3   

  1. 1. 清华大学 清华信息科学与技术国家实验室(筹),北京 100084
    2. 清华大学 计算机科学与技术系,北京 100084
    3. 清华大学 信息技术研究院,北京 100084
  • 出版日期:2012-05-01 发布日期:2012-05-09

ARCA: Traffic Classification Method Based on Automatic Reverse and Cluster Analysis

LI Chenglong1,2, XUE Yibo1,3+, WANG Dongsheng1,3   

  1. 1. Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China
    2. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
    3. Research Institute of Information Technology, Tsinghua University, Beijing 100084, China
  • Online:2012-05-01 Published:2012-05-09

摘要: 网络流分类与协议识别是网络管理的前提和必要条件,但是越来越多加密协议的出现,使得传统的流分类方法失效。针对加密协议的协议识别问题,提出了一种融合自动化逆向分析技术和网络消息聚类分析技术的新型分类方法(automatic reverse and message analysis,ARCA)。该方法通过自动化逆向分析技术获得网络协议的结构特征;再利用网络消息聚类分析技术,获得网络协议的交互过程;最后将网络协议的结构特征与交互过程用于加密协议流量的识别和分类检测。该方法不依赖于网络包的内容检测,能够解决协议加密带来的识别问题。通过对多个加密协议(如迅雷、BT、QQ和GTalk等)真实流量的实验,其准确率和召回率分别高于96.9%和93.1%,而且只需要检测流量中0.9%的字节内容即可。因此,ARCA方法能够对各类加密协议流量进行有效和快速的识别。

关键词: 协议识别, 网络消息, 逆向分析, 关联

Abstract: Traffic classification and protocol identification are the premise and the essential condition to effective network management. However, more and more encrypted protocols make traditional traffic classification methods less effective. To address the issue, this paper proposes an automatic reverse and message analysis (ARCA) method to identify encryption protocols. Different from traditional classification approaches, the proposed method exploits the protocol structure by automatically and reversely analyzing the target protocol, obtains the protocol interactive process by clustering messages, then identifies the protocol using the protocol structure and interactive process together. This method does not need to check payload, so it can classify the encrypted protocols. The paper evaluates the efficacy and accuracy of ARCA with real world traffic, such as encryption protocols Thunder, BitTorrent, QQ and GTalk. The experimental results show that the accuracy rates and the recall rates are over 96.9% and 93.1% respectively and only need check 0.9% of traffic. Therefore the proposed method has a great potential to accurately and quickly identify encryption protocols.

Key words: protocol identification, network message, reverse analysis, correlation