代码与文档间关联关系的提取方法研究和改进  被引量:3

Analysis and Improvement on Retrieval Methods for Traceability Links between Source Code and Documentation

在线阅读下载全文

作  者:赖冠辉[1] 王晓博[1] 刘超[1] 

机构地区:[1]北京航空航天大学计算机学院,北京100191

出  处:《电子学报》2009年第B04期22-30,共9页Acta Electronica Sinica

基  金:国家863项目(No.2006AA01Z176);国家自然科学基金(No.90718018)

摘  要:在潜在语义模型的基础上融入了软件文档和程序代码的特点,提出了基于类继承关系的代码聚类、代码特征项分类加权、引入相似度词典以及基于文档类型的分类搜索这四种改进策略.实验结果表明,四种策略可以在保持查全率不变的情况下提高查准率15%左右.表明在提取代码与文档间可跟踪性链时,考虑它们的固有特点,将有助于提高检索系统的查全率和查准率.Software documentation is usually expressed in natural languages and free text, in which it captures large useful information. Establishing traceability links between documentation and source code can be helpful in Software Engineering Management. Currently, the recovery of traceability links is mostly based on information retrieval techniques, e. g., probabilisfic model, vector space model and Latent Semantic Indexing(LSI). But previous work only treats documentation and source code as plain text files without considering the features with respect to Software Engineering. Four enhancing strategies are proposed to improve the traditional LSI method based on the features of software documentation and source code,namely,source code clustering,identifiers classifying, similarity thesaurus and hierarchical structure enhancement. Experimental results show that the four enhancement strategies can increase the precision by about 15%. So, the special characteristics of documentation and source code should be considered carefully during the recovering traceability links between them.

关 键 词:信息检索 可跟踪性链 程序理解 逆向工程 

分 类 号:TP311.5[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象