检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄赞 周双娥 HUANG Zan;ZHOU Shuang’e(College of School of Computer and Information Engineering,Hubei University,Wuhan Hubei 430062,China)
机构地区:[1]湖北大学计算机与信息工程学院,武汉430062
出 处:《计算机应用》2022年第S01期136-139,共4页journal of Computer Applications
基 金:科技大数据湖北省重点实验室(中国科学院武汉文献情报中心)开放基金课题资助项目(20KF011004)。
摘 要:针对目前缺乏从文献中获取图像及其描述信息的有效工具这一问题,提出一种基于SPIE Journals文献的光电图像数据获取的方法。方法主要分为两部分:一是研究SPIE数字图书馆的网页结构,从中爬取图像信息,包括图像名、图像本身、图像所在的文章和文章所属的期刊年份等信息,并将爬取的图像以二进制流的方式进行存储;二是在获取图像信息的同时,使用自然语言处理中的分句分割出文章中描述图像的段落,利用正则表达式查找出图像描述语句,将文中描述同一张图像的句子进行拼接,拼接后的文本描述信息与对应图像通过图像编号进行匹配。获取图像及其描述文本后对数据进行展示,对输入关键词进行统计分析。最后对SPIE数字图书馆中的数据源进行了在线处理和测试。实际测试结果表明,图像数据与其对应文本描述信息的获取结果准确,能够根据关键词匹配图像名,对图像数据进行检索,并展示关键词在年份和期刊上的统计图。Aiming at the current lack of effective tools for obtaining the images and their discription information from the literature,a method for obtaining photoelectric image data based on the SPIE Journals literature was proposed.The method was mainly divided into two parts:The first was to study the Web page structure of the SPIE Digital Library,to crawl image information from it,including the name of the image,the image itself,the article in which the image is located,and the year of the journal the article belongs to,and to store the crawled image in a binary stream;the second was to segment the paragraph describing the image in the article by making the clauses in natural language processing,to use regular expressions to search for the image description sentences,to stitch the sentences describing the same image in the text,and to match the stitched text description information and the corresponding image according to the image ID.After obtaining the image and its description text,the data was displayed,and the input keywords were statistically analyzed.Finally,the data source in the SPIE digital library was processed and tested online.The actual test results show that the image data and its corresponding text description information are obtained accurately.The image name can be matched according to the keywords,the image data can be retrieved and displayed,and the statistical graphs of keywords in years and journals can be displayed.
关 键 词:光电图像 数据获取 文本描述 自然语言处理 正则表达式
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.33