一种改进的基于n-gram的古汉语断句与标点方法

An Improved Method Based on n-gram Model for Ancient Chinese Sentence Segmentation and Punctuation

作　　者：秦瑞琳 QIN Ruilin(College of Computer Engineering,Jimei University,Xiamen 361021,China)

出　　处：《集美大学学报(自然科学版)》2025年第2期198-204,共7页Journal of Jimei University:Natural Science

基　　金：福建省中青年教师教育科研项目“情感感受的量子计算模型及其仿真实现”(JAT210243);厦门市自然科学基金项目“引入量子机制的机器人情感计算模型及其仿真实现”(3502Z202473063)。

摘　　要：古汉语文本的自动断句与标点对提高我国古籍整理的自动化水平具有重要意义。现有古汉语断句与标点算法大多缺少对前后标点间相互影响的考虑。针对这一问题,本文提出一种改进的基于n-gram的古汉语断句与标点方法。该方法综合考虑了二元组到五元组的上下文信息,加权计算当前位置标点的概率,并据此辅助计算前后位置标点的概率,从而反映出前后标点间的相互影响。在多种古籍语料上的实验表明,所提方法在断句任务上能够取得比现有n-gram和GRU-RNN模型更高的F 1值,且在部分语料上的断句与标点性能优于BiLSTM+CRF模型。The automatic sentence segmentation and punctuation of ancient Chinese texts are of great significance to the improvement of the automatic level of Chinese ancient books.Most of the existing algorithms lack the consideration of the interaction between the preceding and the following punctuation marks.To address this issue,this paper proposes an improved method based on n-gram model.The method comprehensively considers the contextual information from 2-grams to 5-grams and calculates the punctuation probability of current position by weighting,which further assists in calculating the punctuation probability of the preceding and the following position,thereby reflecting the mutual influence between the preceding and the following punctuation marks.Experiments on various ancient-book corpora show that the proposed method achieves higher F 1-scores than existing n-gram and GRU-RNN models on sentence segmentation,and performs better than BiLSTM+CRF model on sentence segmentation and punctuation in some corpora.

关键词：古汉语断句标点 N-GRAM模型深度学习

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种改进的基于n-gram的古汉语断句与标点方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种改进的基于n-gram的古汉语断句与标点方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索