面向抑郁症行为特征的领域词典构建  

Building domain lexicon oriented to behavioral features in depression

在线阅读下载全文

作  者:周若彤 朱广丽[1,2] 李书羽 段文杰 李嘉伟 ZHOU Ruotong;ZHU Guangli;LI Shuyu;DUAN Wenjie;LI Jiawei(School of Computer Science and Engineering,Anhui University of Science and Technology,Huainan 232001,China;Institute of Artificial Intelligence Research,Hefei Comprehensive National Science Center,Hefei 230088,China)

机构地区:[1]安徽理工大学计算机科学与工程学院,安徽淮南232001 [2]合肥综合性国家科学中心人工智能研究院,安徽合肥230088

出  处:《大数据》2024年第5期96-108,共13页Big Data Research

基  金:国家自然科学基金项目(No.62076006);安徽高校协同创新项目(No.GXXT-2021-008)。

摘  要:抑郁症患者的行为表征反映其临床特征及病情状况,有利于病情诊断。当前抑郁症领域词典在构建时忽略了抑郁症文本中的行为特征与患者病况的关联性,导致词典领域信息不足。为此,提出面向抑郁症行为特征的领域词典构建方法,拓展了领域词典涵盖的情感表示。首先,采用TF-IDF算法构建情感类和行为类种子词集,通过PMI计算现有词典与情感类种子词的相似度获得情感类词集;其次,基于行为特征与患者病况的对应关系,设置行为类种子词标签,再将种子词与抑郁症文本输入WoBERT生成动态词向量,计算二者的相似度得到候选词集;然后,基于词间相似度构建语义图,并使用标签传播算法获得行为特征词集;最后,收集微博负面情感表情符号构建表情符号词集,合并情感类词集、行为特征词集与表情符号词集,得到中文抑郁症领域词典。实验结果表明,构建的词典可以提升抑郁症文本分类效果。Behavioral representations of the patients with depression reflect the clinical features and condition of the patients,therefore it is beneficial for disease diagnosis.However,in the construction of current depression lexicon,the correlation between the behavioral features and the condition of patients in depression texts is overlooked,resulting in incompleteness of the lexicon information.To address this problem,a domain lexicon construction,oriented to behavioral features in depression.was proposed which aimed to extend the domain lexicon's coverage of emotional expressions.Firstly,the seed word sets of sentiment and behavior were constructed by the TF-IDF algorithm respectively,the word set of sentiment was obtained by calculating PMI similarity between the seed word set of sentiment and the existing sentiment lexicon Secondly,the seed words of behavioral were labeled based on correspondence between behavioral features and the condition of patients,and further inputted into WoBERT with depression texts to separately generate dynamic word vectors.In addition,the candidate word set was acquired by calculating the similarity between the seed word set of behavioral and depression texts In addition,based on the similarity between words,the semantic graph was constructed to obtain the word set of behavioral features by label propagation algorithm.Finally,the emoticons with negative emotions on Weibo were collected to build the word set of emoticons.The word set of sentiment,the word set of behavioral features and the word set of emoticons were integrated into the Chinese Depression Domain Lexicon.Experimental results show that the constructed lexicon can improve the effect of depression text classification.

关 键 词:抑郁症 领域词典 行为特征 WoBERT 标签传播算法 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象