Journal of Frontiers of Computer Science and Technology ›› 2020, Vol. 14 ›› Issue (2): 260-273.DOI: 10.3778/j.issn.1673-9418.1901069

Previous Articles     Next Articles

Application of Multi-Layered Gradient Boosting Decision Trees in Pharmaceutical Classification

DU Shishuai, QIU Tian, LI Lingqiao, HU Jinquan, ZHENG Anbing, FENG Yanchun, HU Changqin, YANG Huihua   

  1. 1. School of Automation, Beijing University of Posts and Telecommunications, Beijing 100876, China
    2. School of Optoelectronics, Beijing Institute of Technology, Beijing 100081, China
    3. College of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
    4. National Institutes for Food and Drug Control, Beijing 100050, China
  • Online:2020-02-01 Published:2020-02-16



  1. 1. 北京邮电大学 自动化学院,北京 100876
    2. 北京理工大学 光电学院,北京 100081
    3. 桂林电子科技大学 电子工程与自动化学院,广西 桂林 541004
    4. 中国食品药品检定研究院,北京 100050


Near-infrared spectroscopy technology is highly effective in pharmaceutical analysis. For high-dimensional and non-linear small-scale near-infrared data, traditional drug identification algorithms lack enough feature learning ability, neural network-based methods have problems of local optima and over-fitting, and they tend to ignore the sample imbalance. Aiming at the above disadvantages, a pharmaceutical classification approach with multi-layered gradient Boosting decision trees based on feature selection and cost-sensitive learning (CS_FGBDT) is proposed. Firstly, the raw data are preprocessed by Savitsky-Golay smoothing and first derivative. Secondly, the random forest is used to adaptively extract features from the preprocessed spectra, and the feature map is constructed by multi-layered gradient Boosting trees. Then the negative effect of sample imbalance is minimized by combining cost-sensitive learning. The experimental results show that the model comparatively evaluated on two imbalanced data-sets of capsule and tablet has higher prediction accuracy and stability and is an effective method for drug identification.

Key words: near-infrared spectroscopy, adaptive feature selection, multi-layered gradient Boosting decision trees, cost-sensitive learning



关键词: 近红外光谱分析, 自适应特征选择, 多层梯度提升决策树, 代价敏感学习