High-resolution SNP Ancestry Inference Model and Efficiency Evaluation in Three East Asian Populations
Author:
Affiliation:

1.1)School of Computer Science, Shaanxi Normal University, Xi'an 710119, China;2.2)Physical Evidence Evaluation Center of the Ministry of Public Security, Beijing 100038, China;3.3)Key Laboratory of Phylogeny and Comparative Genomics of Jiangsu Province, Xuzhou 221116, China;4.4)School of Forensic Medcine, Shanxi Medical University, Taiyuan 030001, China

Clc Number:

Fund Project:

This work was supported by grants from the Key Research and Development Program of Shanxi Province (2018SF-251), The National Natural Science Foundation of China (81772027), Open Projects of National Engineering Laboratory (2018NELKFKT15), Open Projects of the Key Laboratory of Forensic Genetics of the Ministry of Public Security (2020FGKFKT01), and Major Projects of Universities in Jiangsu Province (17KJA180003).

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Single nucleotide polymorphism (SNP) profiling is a commonly used genetic tool for individual identification and ancestry inference in forensic genetics. This study collected ancestry informative SNPs (AISNPs) from literature and public libraries, and applied softmax regression, support vector machine and random forest, which were used to infer ancestry origins of Northern Han, Japanese and Korean, the three major populations in the North of East Asia. We analyzed 428 AISNPs in 103 northern Han samples and 104 Japanese samples from the 1 000 Genomes Project and 100 Korean samples from the Asian Diversity Project, using multiple linear regression collinearity diagnostics and random forest mean decrease accuracy to screen and optimize high-information AISNPs combinations which were used for ancestry inference linear and nonlinear prediction models, respectively. We constructed two discriminant models of softmax regression and support vector machine with 67-plex AISNPs and a random forest discriminant model with 42-plex AISNPs, achieving high-precision division of Northern Han, Japanese and Korean. The accuracy rates of the 5 times 10-fold cross-validation test of the softmax regression model, support vector machine model and random forest model were 95.19%, 95.77%, and 94.53%, respectively. The 67-plex and 42-plex AISNP prediction models established in this study can be used for genetic inference of the three major populations in the North of East Asia with high practical application value.

    Reference
    Related
    Cited by
Get Citation

WEN Hao, WEI Yi-Liang, GUO Xiao-Yuan, SUN Chang-Chun, XUE Si-Yao, LIU Jing, FAN Hong, JIANG Li. High-resolution SNP Ancestry Inference Model and Efficiency Evaluation in Three East Asian Populations[J]. Progress in Biochemistry and Biophysics,2021,48(8):973-981

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 22,2020
  • Revised:December 31,2020
  • Accepted:January 07,2021
  • Online: August 24,2021
  • Published: August 20,2021