非编码碱基序列文献的挖掘  

Literature mining for non-coding base sequence

在线阅读下载全文

作  者:安建福[1] 孟丽莉[1] 

机构地区:[1]上海交通大学医学院附属仁济医院信息中心,上海200127

出  处:《上海交通大学学报(医学版)》2013年第10期1343-1347,共5页Journal of Shanghai Jiao tong University:Medical Science

摘  要:目的应用神经网络算法提高非编码碱基序列文献的查全率和查准率。方法从PubMed数据库中选取样本。对样本处理后,应用词频(TF)×逆文档频率(IDF)方法选取特征项,建立基于后向传播(BP)神经网络算法的检索模型。结果在选取100个特征项时,查准率为91.49%,查全率为71.23%,受试者工作特征曲线下面积(ROC-AUC)为0.823,特异度为93.37%,灵敏度为71.23%,准确率为82.30%。结论该方法与常用的关键词、MeSH词等方法相比,不仅能够查准也能查全与主题相关的文献。Objective To improve the recall rate and precision rate of non-coding base sequence literature retrieval with neural network algorithm. Methods The related literatures were obtained from PubMed as examples. After the sample literatures were dealt, the terms were selected with term frequency (TF) and inverse document frequency (IDF) methods, then the retrieval model based on back-propagation (BP) neural network algorithm was built. Results When 100 terms were selected, the precision rate, recall rate, area under the receiver operating characteristic curve (ROC-AUC), specificity, sensitivity and accuracy rate were 91.49%, 71.23%, 0. 823 0, 93.37%, 71.23% and 82.30% respectively. Conclusion Compared with common methods such as key words and MeSH retrieval, the retrieval model with neural network algorithm can effectively retrieve the literatures related to a particular topic.

关 键 词:非编码碱基序列 神经网络 后向传播算法 词频X逆文档频率 文献挖掘 

分 类 号:R-5[医药卫生] G252.7[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象