计算机科学与探索 ›› 2014, Vol. 8 ›› Issue (3): 305-312.DOI: 10.3778/j.issn.1673-9418.1306023

• 人工智能与模式识别 • 上一篇    下一篇

深度学习在航拍场景分类中的应用

李晓龙,张兆翔+,王蕴红,刘庆杰   

  1. 北京航空航天大学 计算机学院 智能识别与图像处理实验室,北京 100191
  • 出版日期:2014-03-01 发布日期:2014-03-05

Aerial Images Categorization with Deep Learning

LI Xiaolong, ZHANG Zhaoxiang+, WANG Yunhong, LIU Qingjie   

  1. Laboratory of Intelligent Recognition and Image Processing, School of Computer Science and Engineering, Beihang University, Beijing 100191, China
  • Online:2014-03-01 Published:2014-03-05

摘要: 最近几十年来,航拍图片和视频在城市规划、沿海地区监视、军事任务等方面都得到了广泛的运用。因而了解航拍图片中所包含的内容,研究航拍视频所拍摄的场景类型就显得异常重要。目前流行的场景分类算法大多是针对自然场景的,很少有针对高分辨率航拍场景分类的算法。针对高分辨率航拍图片的场景分类给出了一种分层式算法。该算法首先用尺度不变特征转换(scale-invariant feature transform,SIFT)算法提取鲁棒的块局部特征,然后在视觉词袋的基础上,用经局限型波兹曼模型(restricted Boltzmann machine,RBM)初始化的深层信念网络(deep belief network,DBN)来表示低层特征与高层视频特征之间的关系;同时深层信念网络也起到了分类器的作用。实验结果表明,该算法在处理高分辨率航拍图片场景分类问题时都要略好于目前主流算法。

关键词: 航拍, 场景分类, 视觉词袋, 深度学习, 高分辨率

Abstract: In recent decades, aerial image/video processing has been widely studied for urban planning, coastal monitoring and military tasks. Therefore, understanding the contents contained in aerial images and studying the scene classification of aerial videos are very important. However, currently most popular scene classification algorithms are mainly for natural scenes, rarely for high resolution aerial scene classification. This paper proposes a hierarchical scene classification model for aerial videos/images. Firstly, the scale-invariant feature transform (SIFT) vector is extracted as the patch feature. Then, on the basis of utilizing bag of words, the deep belief network (DBN) initialized by restricted Boltzmann machine (RBM) is used to obtain the latent variables which describe the relationship between low-level region features and high-level global features. The DBN also plays as a classifier. The proposed method achieves promising performance compared with the state of art scene classification methods.

Key words: aerial image, scene classification, bags of feature, deep learning, high resolution