Prediction of E.coli Promoters Based on CNN
Author:
Affiliation:

1)School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China;2)Department of Rheumatology, the First Affiliated Hospital, Inner Mongolia Medical University, Hohhot 010050, China

Clc Number:

Fund Project:

This work was supported by grants from The National Natural Science Foundation of China (62063024), The Scientific Research Program at Universities of Inner Mongolia Autonomous Region of China (NJZY20005) and The Students Innovation Training Program of the Inner Mongolia University (201912240).

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Objective The prediction model based on PSSM (position-specific scoring matrix) has achieved good results, and various optimization methods based on PSSM are also being continuously developed. However, the accuracy rate is relatively lower. In order to further improve the prediction accuracy rate, this paper does further research based on the CNN algorithm.Methods In this paper, PSSM is used to process the letter sequence into a numeric matrix, and through a convolutional neural network (CNN) algorithm for classification. The 3 promoter sequences of Sigma38, Sigma54 and Sigma70 of E.coli K-12 (Escherichia coli K-12, hereinafter referred to as Escherichia coli) are used as the positive sets, and the sequences of the Coding and Non-coding regions of Escherichia coli are the negative set.Results In the prediction of Escherichia coli for the two-classification for promoters, the accuracy rate reaches 99%, and the success rate of promoter prediction is close to 100%; in the three-classification for Sigma38, Sigma54 and Sigma70 promoters, the prediction accuracy rate is 98%, and for each the prediction accuracy of these sequences can reach 0.98 or more. Finally, we tried 4 classifications of 3 promoters of Sigma38, Sigma54 and Sigma70 with Coding area or Non-coding area sequences respectively, the accuracy of prediction was 0.98. The prediction accuracy of the ten-fold cross-validation of the balanced samples of the Sigma promoters can reach more than 0.95, the Hamming distance is 0.016, and the Kappa coefficient is 0.97.Conclusion Compared with other classification algorithms such as SVM (support vector machine), the CNN classification algorithm has more advantages, and based on the classification advantages of CNN, the coding method can also be simplified.

    Reference
    Related
    Cited by
Get Citation

PENG Bao-Cheng, ZHANG Xiao-Wei, LIU Yang, Fan Guo-Liang. Prediction of E. coli Promoters Based on CNN[J]. Progress in Biochemistry and Biophysics,2022,49(7):1334-1347

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:May 13,2021
  • Revised:July 23,2021
  • Accepted:October 21,2021
  • Online: July 20,2022
  • Published: July 20,2022