人机共生时代的语言数据问题  被引量:16

On the Issues of Language Data in the Era of Human-Machine Symbiosis

在线阅读下载全文

作  者:李宇明[1] Li Yuming(Institute of Language Policy and Standards,Beijing Language and Culture University,Beijing 100083)

机构地区:[1]北京语言大学语言政策与标准研究所,北京100083

出  处:《华中师范大学学报(人文社会科学版)》2023年第5期135-143,共9页Journal of Central China Normal University:Humanities and Social Sciences

基  金:国家社会科学基金重点项目“中国学前儿童语料库建设及运作研究”(19AYY010);国家社会科学基金重大项目“新时代中国特色语言学基本理论问题研究”(19VXK06);国家社会科学基金重大项目“‘两个一百年’背景下的语言国情调查与语言规划研究”(21&ZD289)。

摘  要:人类不断创造各种语言技术以辅助语言应用、改善语言生活,从结绳记事、表意图画到文字的创制、印刷术的应用、广播影视的普及,而今进入了以互联网和语言智能为代表的现代语言技术阶段。“人-人”直接交际方式逐渐减少,“人-机-人”的间接交际方式成为常态,未来正在进入为人类配备AI助手的“人机共生”时代。以ChatGPT为代表的语言大模型是人类语言技术发展到今天的高峰,显示了大数据、特别是语言数据的强大功能;而语言大模型在语言表达中所表现出的知识缺陷,是网络上缺乏专门领域、特殊人群、特殊场景、非通用语种等“特域数据”造成的。数据,包括语言数据,已成为新科技发展的关键要素和现代经济的生产要素,必须通过法律法规、规范标准对数据进行管理,通过数据市场促进数据的生产、流通和利用,通过数据公司有规划地集聚“特域数据”以有效弥补网络数据之缺,通过语言智能教育来促进公民具有适应AI助手的能力,通过就业市场预测机制及时将劳动力转移到新技术催生出的新岗位。数据管理应宽严适度,既要尽力促进语言智能发展,又要保证技术向善,使其在伦理学的轨道上前进。Human beings have continuously created various language technologies to assist the application of language and improve language life,ranging from rope writing and ideograms to the creation of writing symbols,the application of printing,and the popularization of broadcasting and filming,and now we have entered the stage of modern language technology represented by the Internet and linguistic intelligence.The direct“human-human”communication method is gradually decreasing,while the indirect“human-machine-human”communication method is becoming the norm,and the future is entering the era of“human-machine symbiosis”where humans are equipped with AI assistants.The language model represented by ChatGPT is the peak of the development of human language technology today,which shows the powerful function of big data,especially that of the language data.However,the knowledge deficiencies shown in the language expression of the language model are caused by the lack of“special domain data”on the network in specialized fields,special populations,special scenarios,and non-common languages.Data,including language data,has become a key element in the development of new science and technology and a production factor of modern economy.It is necessary to manage data with laws,regulations,norms and standards,promote the production,circulation and utilization of data through the data market,and effectively make up for the lack of data on the Internet by gathering“special domain data”through the data companies in a planned way.It is also necessary to promote citizens ability to adapt to AI assistants through language intelligence education,and shift the labor force to new positions generated by new technologies in a timely manner through the job market forecasting mechanism.Data management should be appropriately lenient and stringent,so as to promote the development of linguistic intelligence as much as possible,but also to ensure that the technology is good and moves forward on an ethical track.

关 键 词:语言技术 语言数据 语言智能 AI助手 语言伦理 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] F49[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象