统计和规则相结合的并列结构自动识别被引量：10

Automatic identification of coordinate structure based on statistics and rules

出　　处：《计算机应用研究》2009年第9期3403-3406,共4页Application Research of Computers

基　　金：国家"863"计划资助项目(2006AA01Z147);国家自然科学基金资助项目(60673041)

摘　　要：并列结构的自动识别是语言信息处理中的难点,采用统计和规则相结合的方法对并列结构的边界进行了识别。首先,根据连接词的位置,使用最大熵模型分别从左和从右识别出并列结构的左边界和右边界;接着,根据并列结构的特性对自动识别的左右边界使用预定义的规则进行后处理,得到最终左右边界。实验的训练集和测试分别包含12 396和1 219个并列结构。实验表明,该方法性能达到了78.1%,其中后处理加入规则的使用提高了3.4%。Automatic identification of coordinate structure is a challenging task for sentence analysis in natural language processing. The paper combined a statistical model and several novel rules to automatically identify boundaries of coordinate structures. Firstly, applied maximum entropy model to identify the left and right boundaries respectively. Then, according to specialties of coordinate structures, generated and used several novel rules to optimize the identifying results. The experiments were trained and tested on 12 396 and 1 219 coordinate structures. The results show that the combination of maximum entropy model and rules achieve performance 78.1% in F1, and that the rules bring 3.4% improvement in F1.

关键词：并列结构并列成分最大熵模型

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

统计和规则相结合的并列结构自动识别被引量：10

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

统计和规则相结合的并列结构自动识别 被引量：10

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

统计和规则相结合的并列结构自动识别被引量：10