使用开源代码训练大模型的著作权法评价——以全球首例机器学习诉讼为研究样本  

Copyright Law Assessment of Using Open-Source Code to Train Large Language Models:A Case Study of the World's First Machine Learning Lawsuit

在线阅读下载全文

作  者:张韬略[1] Zhang Taolue

机构地区:[1]同济大学法学院

出  处:《知识产权》2025年第3期47-70,共24页Intellectual Property

基  金:国家社会科学基金项目“人工智能大模型包容审慎监管的法治路径研究”(项目号:24BFX031)的阶段性成果。

摘  要:从法解释论视角评价使用开源代码训练大模型行为的著作权法合法性时,应先分析在先许可协议对开源代码使用的约定。尽管大模型开发商可能违反了开源许可协议,且在模型训练或者输出阶段可能存在复制、修改、传播开源代码乃至删除作品来源信息的行为,但训练数据集不公开在多方面限制了著作权侵权认定。司法机关以大模型输出端为规制对象并以合理使用为利益调节器的务实思路,向大模型产业传递了友好信号,刺激了降重技术的开发,并可能进一步降低著作权人提起侵权诉讼的概率和理论正当性。个案分析过程还暴露出我国著作权法在应对大模型训练著作权侵权问题时的优缺点。我国亟需修正合理使用制度以应对大模型开发对数据训练的需求,同时应从立法和技术角度推动训练数据著作权权属信息的透明化,以保护作者著作人身权和电子权利管理信息。When evaluating the legality of using open-source code to train large language models(LLMs)under the current copyright law,the prior license agreement on using open-source code should be analyzed first.Although the trainer of LLMs may violate the open-source license agreements,and may copy,modify,disseminate the open-source code or even delete the copyrighted information of the work during the training or output phases,the non-disclosure of the training dataset has limited the determination of copyright infringement in many aspects.The pragmatic thinking of the judiciary to take the output side of LLMs as the regulatory target and using fair use as the core regulator of interests conflict has sent friendly signals to the LLMs industry,stimulated the development of similarity reduction technology,and may further reduce the probability and theoretical legitimacy of copyright owners to file infringement lawsuits.The case analysis also exposes the strengths and weaknesses of China's copyright law in responding to the problem of copyright infringement in LLMs training.There is an urgent need to amend the fair use system in China to cope with the demand for data training in LLMs development,and at the same time,the transparency of information on the copyright attribution of training data should be promoted from both legislative and technological perspectives in order to protect the authors'personal right of authorship and copyright management information.

关 键 词:开源代码 大模型 机器学习 著作权侵权 合理使用 

分 类 号:D923.41[政治法律—民商法学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象