综合成分句法分析的技术名称识别  

Technology term recognition with comprehensive constituency parsing

在线阅读下载全文

作  者:朱俊杰 余丽 李圣文 周长征 ZHU Junjie;YU Li;LI Shengwen;ZHOU Changzheng(School of Computer Science,China University of Geosciences,Wuhan Hubei 430078,China;Center for Strategic Research on Frontier and Interdisciplinary Engineering Science and Technology,(Beijing Institute of Technology),Beijing 100081,China;Shiyan Juneng Electric Power Design Company Limited,Shiyan Hubei 442012,China)

机构地区:[1]中国地质大学(武汉)计算机学院,武汉430078 [2]中国工程科技前沿交叉战略研究中心(北京理工大学),北京100081 [3]十堰巨能电力设计有限公司,湖北十堰442012

出  处:《计算机应用》2024年第4期1072-1079,共8页journal of Computer Applications

基  金:国家自然科学基金资助项目(42071382)。

摘  要:技术名称是科技领域中用于准确交流信息的术语,自动识别技术名称可以帮助专家和大众发现、认知、应用新技术,具有重要价值;而基于无监督的方法在识别技术名称时存在规则复杂、适应性差等问题。为了提升从文本中识别技术名称的能力,提出一种综合成分句法的技术名称识别方法。首先,通过成分句法分析构造句法结构树;其次,从自上而下和自下而上这两个角度抽取候选技术名称;最后,融合统计频次和语义信息,以选取最优技术名称。此外,构建一个技术术语数据集以验证所提方法的有效性。在该数据集上的实验结果表明,相较于基于依存关系的方法,所提基于自下而上的方法的F1值提高了4.55个百分点;同时在3D打印领域进行了案例分析,发现所提方法识别的技术名称与该名称对应领域的发展契合,可用于回溯技术的发展历程和描绘技术的演化路径,为理解、发现、探索领域未来技术提供参考。Technology terms are used to communicate information accurately in the field of science and technology.Automatically recognizing technology terms from text can help experts and the public to discover,recognize,and apply new technologies,which is great of value,but unsupervised technology term recognition methods still have some limitations,such as complex rules and poor adaptability.To enhance the ability to recognize technology terms from text,an unsupervised technology term recognition method was proposed.Firstly,a syntactic structure tree was constructed through constituency parsing.Then,the candidate technology terms were extracted from both top-down and bottom-up perspectives.Finally,the statistical frequency and semantic information were combined to determine the most appropriate technology terms.Besides,a technology term dataset was constructed to validate the effectiveness of the proposed method.Experimental results on the proposed dataset show that the proposed method with top-down extraction has the F1 score improved by 4.55 percentage points compared to the dependency-based method.Meanwhile,the analysis results conducted on case study in the field of 3D printing show that the recognized technology terms by the proposed method are in line with the development of the field,which can be used to trace the development process of technology and depict the evolution path of technology,so as to provide references for understanding,discovering,and exploring future technologies of the field.

关 键 词:技术名称识别 成分句法分析 无监督方法 成分句法树 术语抽取 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象