检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王轶博 潘伟 张海涛 江涛 WANG Yibo;PAN Wei;ZHANG Haitao;JIANG Tao(Tobacco Economy Information Center,State Tobacco Monopoly Administration,Beijing 100045,China;Information Center,China Tobacco Hubei Industrial Co.,Ltd.,Wuhan 430040,China;Technology Center,China Tobacco Yunnan Industrial Co.,Ltd.,Kunming 650231,China)
机构地区:[1]国家烟草专卖局烟草经济信息中心,北京市100045 [2]湖北中烟工业有限责任公司信息中心,武汉市430040 [3]云南中烟工业有限责任公司技术中心,昆明市650231
出 处:《烟草科技》2024年第9期106-112,共7页Tobacco Science & Technology
基 金:中国烟草总公司重点研发项目“新一代信息技术融合创新与网信治理研究”(110202102049)。
摘 要:为满足全国烟草生产经营管理一体化平台建设对行业信息分类与编码的需求,按照“流程、实体、服务”三类数字对象对信息系统进行解构,结合烟草行业业务实际情况,提出层级互信息聚类算法(Hierarchical Mutual Information Clustering,HMIC),通过对文本数据进行自然语言处理,计算不同数字对象在不同分类层级的互信息,利用层次聚类算法对数字对象进行聚类,从而得到烟草行业信息分类,并在此基础上进行信息编码。将HMIC与常用聚类算法进行对比测试,结果表明:①所构建的HMIC模型的信息分类效果最好,其整体信息熵比使用欧氏距离的聚类算法降低约8.2%,比仅使用互信息矩阵的聚类算法降低约2.5%。②从信息量的角度对分类编码进行研究,能够更好地区分不同类别之间的差异,提高信息分类与编码的可用性。该技术可为指导信息系统项目全生命周期建设提供支持。To meet the needs of the construction of National Tobacco Production,Operation and Management Integrated Platform of the tobacco industry,information classifying and coding are developed.The information systems are decomposed according to three types of digital objects,namely“process,entity,and service”,and in conjunction with the real-life business of the tobacco industry,a hierarchical mutual information clustering(HMIC)algorithm is proposed.By conducting natural language processing on text data,the mutual information of different digital objects at different classification levels is calculated,and the hierarchical clustering algorithm is used to classify digital objects,thus obtaining tobacco industry information classification,and then information coding is completed based on the information classification.The HMIC algorithm was compared with commonly used clustering algorithms,the results showed that:1)The designed HMIC algorithm featured the best performance in information classifying,with its total information entropy reduced by about 8.2%compared with the clustering algorithm using Euclidean distance,and by about 2.5%compared with the clustering algorithm with mutual information matrix only.2)From the point of information content,the research of information classifying and coding could better distinguish the differences between different categories and improve their usability.This technology supports the guidance for the whole life cycle of information system project construction.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.147.48.161