检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孔嘉斌 吕剑文 刘江南[1] 杜文轩 KONG Jiabin;LYU Jianwen;LIU Jiangnan;DU Wenxuan(State Key Laboratory of Advanced Design and Manufacturing for Vehicle Body,Changsha 410082,China)
机构地区:[1]湖南大学汽车车身先进设计制造国家重点实验室,长沙410082
出 处:《计算机科学》2023年第7期229-236,共8页Computer Science
基 金:国家科技部创新方法专项资助项目(2019IM050100);湖南省自然科学基金(2018JJ2039)。
摘 要:机械专利文献蕴含着海量以组件名称为信息单元的领域知识信息,组件名称用词灵活多变,具有独特、复杂和生僻等特点,难以被计算机准确识别,成为专利知识挖掘的一大阻碍。为了提出组件名称的高效识别方法,剖析并提炼专利文本语句中的组件名称构词特征;从组件名称相关的外部用词入手,通过标识附图标记,识别其左侧的名称字符,自动从文本中检索候选名称,并构建组件候选名称集合;提出了字频差算法,过滤候选名称集合的冗余字符;提出了动态构建左切分词库算法,进一步剔除未能被过滤的冗余字符;通过交叉实验测试和分析识别过程中字频差先验阈值、词频阈值和字频差阈值的选取对识别效果的影响,形成一种面向机械领域中文专利的组件名称识别三段式综合方法。最后通过对实验结果的对比分析,验证了该方法的有效性与高效性。Mechanical patent literature contains a large amount of domain knowledge where component names exist as information units.Being flexible and changeable,the word formatting of component name represents the characteristics of uniqueness,complexity and lesser-known expressions.The challenge of accurate recognition of component names by computers becomes an obstacle to patent knowledge mining.In order to propose an efficient method to recognize component names,the features of word formation in patent text statements are analyzed and extracted.Starting with external words related to component names,characters on the left side of the appended drawing reference signs(ADRS)are identified.Accordingly,candidate names are automatically retrieved from texts,and the set of candidate names are constructed.An algorithm of word frequency difference is proposed to filter redundant characters in the set of candidate names.By building left-segmentation library(LSL)dynamically,redundant characters which are not filtered are further eliminated.Based on cross-over experiment,the influence of character frequency difference prior threshold(CFDV-Ⅰ),word frequency threshold(LSWF)and character frequency difference threshold(CFDV-Ⅱ)on recognition result is tested and analyzed.Furthermore,a three-stage comprehensive method for recognizing component names from patent documents in mechanical field is proposed.Finally,the method has been proved to be effective and efficient by comparing the results of experiments.
分 类 号:TH122[机械工程—机械设计及理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.70