%0 Journal Article %T Prediction of Protein Expression and Growth Rates by Supervised Machine Learning %A Simiao Zhao %J Natural Science %P 301-330 %@ 2150-4105 %D 2021 %I Scientific Research Publishing %R 10.4236/ns.2021.138025 %X The DNA sequences of an organism play an important influence on its transcription and translation process, thus affecting its protein production and growth rate. Due to the com-plexity of DNA, it was extremely difficult to predict the macroscopic characteristics of or-ganisms. However, with the rapid development of machine learning in recent years, it be-comes possible to use powerful machine learning algorithms to process and analyze biolog-ical data. Based on the synthetic DNA sequences of a specific microbe, E. coli, I designed a process to predict its protein production and growth rate. By observing the properties of a data set constructed by previous work, I chose to use supervised learning regressors with encoded DNA sequences as input features to perform the predictions. After comparing different encoders and algorithms, I selected three encoders to encode the DNA sequences as inputs and trained seven different regressors to predict the outputs. The hy-per-parameters are optimized for three regressors which have the best potential prediction performance. Finally, I successfully predicted the protein production and growth rates, with the best R2 score 0.55 and 0.77, respectively, by using encoders to catch the potential fea-tures from the DNA sequences. %K DNA Sequences %K Protein Production %K Growth Rate %K Supervised Machine Learning %U http://www.scirp.org/journal/PaperInformation.aspx?PaperID=111035