Identification of DNA-binding Proteins Using Gapped-dipeptide Composition and Recursive Feature Elimination Algorithm
DOI:
Author:
Affiliation:

Laboratory of Quality and Safety Risk Assessment for Aquatic Products on Storage and Preservation (Shanghai), Ministry of Agriculture, College of Food Science and Technology,College of Information Technology, Shanghai Ocean University,Laboratory of Quality and Safety Risk Assessment for Aquatic Products on Storage and Preservation (Shanghai), Ministry of Agriculture, College of Food Science and Technology,Shanghai Center for Bioinformation Technology,Laboratory of Quality and Safety Risk Assessment for Aquatic Products on Storage and Preservation Shanghai,Ministry of Agriculture,College of Food Science and Technology

Clc Number:

Fund Project:

This work was supported by grants from The National Natural Science Foundation of China (31671946, 11601324) and Shanghai Municipal Science and Technology Commission Foundation (17050502200)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The identification of DNA-binding proteins (DBPs) plays an important role in functional annotation of genes and proteins of prokaryote and eukaryote organisms. This study, for the first time, combined the gapped-dipeptide composition (GapDPC) and recursive feature elimination (RFE) to identify DBPs. The position specific scoring matrix (PSSM) of each tested amino acid sequence was obtained. Based on the PSSM, their GapDPC features of the amino acid sequences were extracted, and then the optimal features were selected using the RFE method. Subsequently, the support vector machine (SVM) was chosen as a classifier and the datasets PDB396 and LB1068 were tested using the jackknife cross validation test. The result showed that the values of accuracy, Matthews correlation coefficient, sensitivity, and specificity for the identification of DBPs were 93.43%, 0.86, 89.04% and 96%, and 86.33%, 0.73, 86.49% and 86.18% for the datasets PDB396 and LB1068, respectively, which were obviously superior to the methods reported previously in the literature. The new model established in this study improved the identification methods of DBPs.

    Reference
    Related
    Cited by
Get Citation

TANG Ya-Dong, LIU Xiao, LIU Tai-Gang, XIE Lu, CHEN Lan-Ming. Identification of DNA-binding Proteins Using Gapped-dipeptide Composition and Recursive Feature Elimination Algorithm[J]. Progress in Biochemistry and Biophysics,2018,45(4):453-459

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:November 07,2017
  • Revised:February 09,2018
  • Accepted:March 09,2018
  • Online: April 19,2018
  • Published: April 20,2018