面向软件开发信息库的多源异构数据深层次挖掘方法  

Deep-level Mining Methods for Multi-source Heterogeneous Data in Software Development Information Repository

在线阅读下载全文

作  者:于平 Yu Ping(Guangzhou Huanan Business College,Guangzhou 510000,Guangdong)

机构地区:[1]广州华南商贸职业学院,广东广州510000

出  处:《武汉工程职业技术学院学报》2024年第1期36-41,共6页Journal of Wuhan Engineering Institute

摘  要:由于软件开发过程中涉及多个团队和人员的协作,文档之间往往存在不一致性、错误或遗漏等问题,这些问题如果不及时发现和处理,将严重影响软件开发的效率和质量。对此,为精准获取所需数据,提升软件开发者工作效率和软件开发速度,提出面向软件开发信息库的多源异构数据深层次挖掘方法。基于时间序列完成不同来源获取软件信息库多源异构数据缺失值以及噪声数据的处理;提取处理后多源异构数据特征,以此为输入SOM神经网络进行多源异构数据聚类;利用ATPRK方法预测出软件信息库的多源异构数据需求,以此为依据,再次聚类SOM网络输出聚类结果,实现多源异构数据的深层次挖掘。实验结果表示:该方法可挖掘出99%的软件开发信息库的多源异构数据;有效去除软件开发信息库中不被需要的多源异构数据;多源异构数据聚类数量为16时的聚类正确率最好,且多源异构数据最小聚类熵值仅为0.31,数据深层次挖掘效果较好。Due to the fact that software development process involves the collaboration of multiple teams and individuals in the software development process,there are some problems such as inconsistencies,errors or omissions among documents.If these issues are not discovered and addressed promptly,they can significantly impact the efficiency and quality of software development.To address this,a deep-level mining method for multi-source heterogeneous data in software development information repository is proposed to accurately obtain the necessary data,improve software developers'work efficiency and software development speed.The method involves handling missing values and noise data in the multi-source heterogeneous data from different sources based on time series analysis.The processed multi-source heterogeneous data features are extracted and used as inputs for clustering using a self-organizing feature mapping neural network(SOM neural network).Additionally,the ATPRK method is utilized to predict the requirements for the multi-source heterogeneous data in the software information repository.Based on this prediction,the SOM network clusters are recalculated to obtain the clustering results,achieving deep-level mining of multi-source heterogeneous data.Experimental results indicate that this method can mine 99%of the multi-source heterogeneous data in the software development information repository,effectively remove unnecessary data,achieve the best clustering accuracy when the number of clusters is 16,with a minimum clustering entropy value of only 0.31,demonstrating good performance in deep-level mining of data.

关 键 词:软件开发 多源异构 数据挖掘 数据预处理 特征提取 数据聚类 SOM神经网络 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象