检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]东北大学学报编辑部
出 处:《中国科技期刊研究》2016年第2期202-206,共5页Chinese Journal of Scientific and Technical Periodicals
基 金:辽宁省社会科学规划基金资助项目(L12DXW011)
摘 要:【目的】实现自动提取科技期刊全文元数据并生成HTML文件。【方法】以方正排版文件为对象,在可以提取出来文章的标题、摘要等元数据的基础上,将文章的正文内容元数据化,提出了包含图、表、公式等的广义元数据概念,并建立了提取图、表元数据的提取规则,同时将方正排版数学公式转化为La Te X表达式。然后利用VB编程软件编写了自动提取广义元数据程序并将元数据重新组合生成HTML格式的文件。【结果】根据方正BD排版语言的特点,建立的提取规则能有效提取全文并元数据化,最后可直接生成HTML文件。【结论】实际应用表明了利用广义元数据生成HTML文件的有效性和可行性。[Purposes] This paper aims to automatically extract full text metadata from the journals of science and technology and generate HTML files. [Methods] Taking Founder typesetting files as the object,and on the basis of extracting metadata such as titles and abstracts,we transfer the contents into metadata. And the concept of general metadata( GM) is proposed,which includes the graph,table and formula metadata. The extraction rules of the graph and table metadata are established,and the transformation from the Founder formula to La Te X is proposed. Then,the VB programming software is programmed to extract the GM. We combine GMto generate the HTML full text file. [Findings] According to the characteristics of the BD typesetting language,the extraction rules can extract the full text metadata effectively,and the HTML file can be generated directly. [Conclusions] The practical application shows the effectiveness and feasibility of using the general metadata to generate HTML files.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28