检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘敏[1] 黄倚霄[1] 陈智扬 张湛梅[1] LIU Min;HUANG Yixiao;CHEN Zhiyang;ZHANG Zhanmei(China Mobile Communications Group Guangdong Co.,Ltd.,Guangzhou 510623,China)
机构地区:[1]中国移动通信集团广东有限公司,广东广州510623
出 处:《现代信息科技》2025年第3期140-145,152,共7页Modern Information Technology
摘 要:针对传统中小企业客户数据呈现杂乱无序状态且缺乏标准化的现状,提出一种创新的数据治理技术。该技术整合多源异构数据,该技术汇聚多源异构数据,融合光学字符识别(Optical Character Recognition,OCR)等多种方法,构建标准化的中小企业基础信息数据湖,从源头提升数据质量。引入“熵减”理念,利用智能算法对数据质量进行量化评估,能够及时定位并解决数据质量问题。同时,搭建时序数据库并构建基于熵减的马尔科夫链模型,以此预测未来数据质量趋势,精准治理潜在问题区域。该技术不仅实现了数据价值的最大化,还显著降低了治理成本,提高了数据治理的效率与准确性,为企业降本增效提供了有力支撑。Aiming at the current situation that the customer data of traditional small and medium enterprises is disorderly and lacks standardization,an innovative data governance technology is proposed.This technology integrates multi-source heterogeneous data,fuses Optical Character Recognition(OCR)and other methods,and constructs a standardized basic information data lake of small and medium enterprises,to improve data quality from the source.By introducing the concept of“entropy decrease”and using intelligent algorithms to quantitatively evaluate data quality,data quality problems can be located and solved in time.At the same time,a time series database is built and a Markov Chain model based on entropy decrease is constructed to predict future data quality trends and accurately govern potential problem areas.This technology not only maximizes the value of data,but also significantly reduces the cost of governance.It improves the efficiency and accuracy of data governance and provides strong support for enterprises to decrease costs and increase efficiency.
关 键 词:熵减 数据治理 马尔科夫链 中小企数据湖 时序数据库
分 类 号:TP311.1[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49