检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王润欣 李宁[1] WANG Runxin;LI Ning(Computer School,Beijing Information Science&Technology University,Beijing 102206,China)
机构地区:[1]北京信息科技大学计算机学院,北京102206
出 处:《北京信息科技大学学报(自然科学版)》2024年第4期71-80,共10页Journal of Beijing Information Science and Technology University
基 金:国家自然科学基金项目(61672105)。
摘 要:为解决纸质图书存在的无法快速定位知识概念、从字面难以把握教科书写作的逻辑结构和难以建立知识间的关联等问题,提出了一种结合大语言模型的教科书语步识别方法。首先,设计教科书语步结构,构建教科书语步分类数据集;然后,利用生成式大语言模型分别对稀缺语步和无明显特征语步进行语料生成和特征增强;最后,结合语步识别数据集和增强后语步数据,微调教科书语步识别初始模型,得到结合大语言模型的教科书语步识别模型。实验结果表明,与初始模型BERT-wwm-ext相比,经过大语言模型辅助的语步识别模型总体准确率提升5.06百分点,达到95.44%,Macro-F1值提升2.54百分点,达到93.51%。利用该语步识别模型自动构建了教科书知识图谱及书后索引,较清晰地展现了教科书写作的逻辑结构。To solve the problems existing in printed textbooks,such as the inability to quickly locate knowledge concepts,the difficulty to grasp the logical structure of textbook writing literally,and the difficulty to establish the correlation between knowledge,textbook moves recognition method facilitated by large language model was proposed.Firstly,textbook move structure was designed and a dataset for textbook move classification was constructed.Then,a generative large language model was used to generate corpus and enhance features for scarce and indistinct steps,respectively.Finally,by combining the move recognition dataset and enhanced move data,the initial model of textbook move recognition was fine-tuned to obtain a textbook move recognition model that combines the large language model.The experimental results show that compared with the initial model BERT-wwm-ext,the overall accuracy of the move recognition model facilitated by the large language model has increased by 5.06 percentage points,reaching 95.44%,and the Macro-F1 value has increased by 2.54 percentage points,reaching 93.51%.Furthermore,the move recognition model was utilized to construct a knowledge graph and an after-book-index,effectively elucidating the logical structure of textbook with heightened clarity.
关 键 词:数字教材 语步识别 大语言模型 知识图谱 书后索引
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.171