出 处:《中华预防医学杂志》2020年第6期685-690,共6页Chinese Journal of Preventive Medicine
基 金:国家自然科学基金(81673232);北京市教委科技计划(KM201810025009)。
摘 要:目的分析国家针对青少年艾滋病防治投放的核心知识宣传信息与“百度知道”文本挖掘词频差异。方法采用网络数据采集方法(即数据爬虫),采集并整理截至2018年6月11日“百度知道”上在线查询者关于艾滋病提问的相关信息;国家针对青少年艾滋病投放的核心宣传信息(简称核心知识宣传信息)由《大众人群艾滋病知识知晓率问卷》和《青年学生人群艾滋病知识知晓率问卷》,及14条针对青年学生艾滋病防治宣传教育核心知识构成。根据官方分类将所有数据分为预防,检测和治疗,危险性认识、症状和传播,法律法规、歧视与政策4类。利用中文文本分词、词频统计、对比分析和词频可视化呈现等文本挖掘方法比较去除无用词后以上两方面信息的差异。结果“百度知道”信息中,预防,检测和治疗,危险性认识、症状和传播,法律法规、歧视与政策类词频数量分别为18942、43140、73437和33859个;核心知识宣传信息中,4类词频数量分别为371、241、208和136个。核心知识宣传信息的语义相关词词频中,占总词频比例最高的为预防类(32.3%,162个),最低的为法律法规类(14.1%,71个);“百度知道”信息的语义相关词词频中,最高的为检测与治疗类(51.7%,51264个),最低的为预防类(11.4%,11272个)。两方面信息完全重复词占核心知识宣传信息词频比例为59.3%~63.9%;完全重复词占“百度知道”信息4类词频比例相对较低,预防类、检测与治疗类均大于45%,症状和传播类为34.3%(14781个),法律法规类最低,仅为17.0%(5744个)。结论对比官方投放和“百度知道”,法律法规类和预防类词频对比差异较大,建议在核心知识宣传中结合青少年需求和兴趣增补和改进相关内容。Objective The study intends to identify gap in HIV/AIDS awareness dissemination between the official channel delivery and the needs of adolescents.Methods We crawled all the HIV/AIDS queries from“Baidu zhidao”till June 11st,2018.“Baidu zhidao”inquiry and information form official public service announcement(abbreviated for“official delivery”hereafter)were the data source for comparative analysis.We categorized the text data into four kinds,“prevention”,“testing and treatment”,“symptoms and infection”and“legalization and policies”according to official categorization.Word segmentation was used for text mining and word frequency statistics,as well word cloud was used for word frequency visualization(all based on a comparison after removing the useless words).Results Of the official delivery,the proportion of prevention category accounted for 32.3%(n=162)(ranks 1st),and the proportion of legalization and policies category was 14.1%(n=71).While among the“Baidu zhidao”inquiry information,the proportion of testing and treatment category accounted for 51.7%(n=51264),and the proportion of prevention category accounted for 11.4%(n=11272).The frequencies of same terms/repeated terms of two channels accounted for 60%(59.3%-63.9%)of each category among the official delivery,of which,the proportion of interest terms comparatively less and more diverse in“Baidu zhidao”inquiries.The proportion of the terms frequency of each category was about 45%in“prevention,testing and treatment”,34.3%(n=14781)in“symptoms and infection”and 17.0%(n=5744)in“legalization and policies”,respectively.Conclusion A big gap was identified between the available official source and inquiries’term,especially word frequency discrepancy between“legalization and policies”and“prevention”categories.It underscore the necessity for the official channel to address the needs and interests of adolescents in the future.
关 键 词:获得性免疫缺陷综合征 HIV 知识宣传 百度知道 文本挖掘
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...