检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Fahriye Gemci Turgay Ibrikci Ulus Cevik
机构地区:[1]Kahramanmaras Sutcu Imam University,Kahramanmaras,46100,Turkey [2]Adana Alparslan Turkes Science and Technology University,Adana,01250,Turkey [3]Çukurova University,Adana,01330,Turkey
出 处:《Computer Systems Science & Engineering》2023年第9期3703-3713,共11页计算机系统科学与工程(英文)
基 金:This study is carried out by Cukurova University Scientific Research Projects(BAP)is supported with Project No:FDK-2019-11621.
摘 要:The study aims to find a successful solution by using computer algorithms to detect remote homologous proteins,which is a significant problem in the bioinformatics field.In this experimental study,structural classification of proteins(SCOP)1.53,SCOP benchmark,and the newly created SCOP protein database from the structural classification of proteins—extended(SCOPe)2.07 were used to detect remote homolog proteins.N-gram method and then Term Frequency-Inverse Document Frequency(TF-IDF)weighting were performed to extract features of the protein sequences taken from these databases.Next,a smoothing process on the obtained features was performed to avoid misclassification.Finally,the proteins with balanced features were classified into remote homologs using the built deep learning architecture.As a result,remote homologous proteins have been detected with novel deep learning architecture using both negative and positive protein instances with a mean accuracy of 89.13%and a mean relative operating characteristic(ROC)score of 88.39%.This experiment demonstrated the following:1)The successful outcome of this study in detecting remote homology is auspicious in discovering new proteins and thus in drug discovery in medicine.2)Natural language processing(NLP)techniques were used successfully in bioinformatics,3)the importance of choosing the correct n-value in the n-gram process,4)the necessity of using not only positive but negative instances in a classification problem,and 5)how effective the processes,such as smoothing,are in the classification accuracy in an imbalanced dataset.6)The deep learning architecture gives better results than the support vector machine(SVM)model on the smoothed data to detect proteins’remote homology.
关 键 词:Bioinformatics deep learning N-GRAM remote homolog protein text classification TF-IDF weighting
分 类 号:TP31[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.142.250.99