检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《情报学报》2009年第4期582-592,共11页Journal of the China Society for Scientific and Technical Information
基 金:国家科技支撑项目“现代服务业共性技术支撑体系与应用示范工程”(No.2006BAH02A10); 广东省重点实验室基金项目(CCNL200601); 国家科技基础条件平台项目(2005DKA64001)生物信息学网络计算应用系统资助
摘 要:网络资源是指通过互联网传播共享、以文件目录为主要存储组织结构的内容,如书、讲义、音乐等。每个资源的内容具有完整独立性。它们是数字图书馆、教学资源库、专业内容库藏的重要组成。网络资源的一大特点是命名模式不规范,给检索利用带来极大不便。本文以2003~2006年间搜集的61万文件构成的16 284个网络资源为对象,用统计的方法考察网络资源命名特点及其中体现的用户命名习惯。包括资源及其内部子目录、文件的名字长度分布,字符类型熵、常用符号、高频片段模式、语义类型等,并分析无序命名中蕴含的用户命名习惯。本文的意义一方面有助于从混乱命名中净化和提取对检索查询有用的信息,另一方面有助于揭示网络用户参与海量网络资源共享的行为习惯。A Web resource refers to a file,or some files(maybe with directory or subdirectories) which represent a certain thing,meaning or entity,and are worthy of treasure in the long term.Web resources,such as e-books,learning materials or songs,can provide various contents to digital libraries,educational repositories or other digital collections.However,Web resources are characterized as chaotic naming,which have obstructed the searching and organizing to them.We inspect web resource naming conventions and user behavior characteristics using statistical methods based on 16,284 resources.The data set consisting of about 61 thousand files had been continually gathered on the Web from 2003 to 2006.In this paper,we study the distributions of the length of resource names,subdirectory names,and file names;the entropy of the character types;high occurrence of symbols in the names;high-frequency snippet styles and semantic types.These analyses reveal the disorderly naming conventions of the Internet users.The results we concluded will help both purify and extract useful information from chaos names for better retrieval,as well as illustrating the user behaviors when sharing and spreading web resources in the Internet.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222