|
|
Research on Prediction Method of Mouse Gene Loci Based on Machine Learning
|
FENG Xin 1,LI Yingrui1 ,WANG Ping1, DONG Zheyuan1, XIN Ruihao 2*
|
|
|
Abstract
DNA N6-methyladenine (6mA) is an important DNA methylation modification that plays a significant role in many biological regulatory processes. This article use a publicly available mouse dataset to study this modification. Firstly, the mouse gene sequence (A, T, C, G) is encoded using mathematical representation symbols. Then, the encoded information is subjected to feature selection using chi-square testing to select features related to 6mA sites for further study. Seven machine learning algorithms are then used to construct a classification model, and the predictive results are validated using a five-fold cross-validation method. The results showed that selecting the top 20 optimal features as training set sample features using a sliding window encoding method yielded a random forest model that achieved an accuracy of 1 in predicting mouse 6mA sites.
|
Published: 25 November 2022
|
|
|
|
|
|
|