Journal of Frontiers of Computer Science and Technology ›› 2008, Vol. 2 ›› Issue (1): 77-96.

• 学术研究 • Previous Articles     Next Articles

BioSeg: a biological sequence data model

ZHU Yangyong1,2+, XIONG Yun1   

  1. 1. Department of Computer and Information Technology, Fudan University, Shanghai 200433, China
    2. Shanghai Center for Bioinformation Technology, Shanghai 201203, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-02-20 Published:2008-02-20
  • Contact: ZHU Yangyong

BioSeg:一个生物序列数据模型

朱扬勇1,2+,熊 赟1   

  1. 1. 复旦大学 计算机与信息技术系,上海 200433
    2. 上海生物信息技术研究中心,上海 201203
  • 通讯作者: 朱扬勇

Abstract: The appropriate storage manner of biological sequence data is critical for accessing and dealing with them efficiently. Existing database management system cannot efficiently support biological sequence data type and its operations, people have to use text data type in database management system or text file directly. This state makes the low efficiency when biological sequence data are processed. The features of biological sequence data are investigated, the query demands are analyzed and induced, and then a novel biological sequence data model named BioSeg is presented. The model is composed of descripition and multi-dimensional array. The part of description represents annotations and other related information about biological sequence data and multi-dimensional array stores concrete sequence (for example, a DNA sequence “ATCCCGA”). Algebra operations on BioSeg which can implement query on biological sequence data. Query capability on BioSeg is more efficient and feasible than previous storage manner using text type.

Key words: Biological Sequence, Database Management System (DBMS), data model, Bioinformatics

摘要: 生物序列数据的表达和存储是生物序列数据处理的关键。当前的数据库管理系统不能有效地支持生物序列数据类型和操作,人们不得不用文本数据类型或直接使用文本文件存储生物序列数据。这种状况造成了生物序列比对、模式发现等数据处理的低效率。研究了生物序列数据的特征,分析并归纳了用户对生物序列数据的查询需求,提出了一个新的生物序列数据模型BioSeg。BioSeg模型由描述部分和多维数组组成,描述部分表示生物序列注释和其他相关信息,多维数组表示具体序列(如DNA序列“ATCCCGTA”)。BioSeg模型提供了实现生物序列数据查询的代数操作。相对于生物序列数据的文本存储方式,BioSeg模型提供的数据查询具有良好的效率和灵活性。

关键词: 生物序列, 数据库管理系统, 数据模型, 生物信息学