大模型金融场景能力评测框架研究

Study on Evaluation Framework of Large Language Model’s Financial Scenario Capability

作　　者：程大伟吴佳璇李江彤丁志军[1,2,3] 蒋昌俊 CHENG Dawei;WU Jiaxuan;LI Jiangtong;DING Zhijun;JIANG Changjun(College of Computer Science and Technology,Tongji University,Shanghai 201804,China;Shanghai Artificial Intelligence Laboratory,Shanghai 200030,China;National Collaborative Innovation Center for Internet Financial Security,Shanghai 201804,China)

机构地区：[1]同济大学计算机科学与技术学院,上海201804 [2]上海人工智能实验室,上海200030 [3]国家级网络金融安全协同创新中心,上海201804

出　　处：《计算机科学》2025年第3期239-247,共9页Computer Science

基　　金：国家重点研发计划(2022YFB4501704);国家自然科学基金(62102287,62472317);上海市科技创新行动计划项目(24692118300,22YS1400600)。

摘　　要：随着大模型技术的快速发展,其在金融领域的应用已成为推动行业变革的重要力量。构建标准化、系统化的金融能力评测框架是衡量大模型金融场景能力的重要途径,但是现有的评测方法存在评测数据集泛化性弱、任务场景覆盖面窄等缺点。因此,提出了一种面向大模型金融能力的评测框架CFBenchmark,该框架由金融自然语言处理、金融场景计算、金融分析与解读,以及金融合规与安全四大核心评估模块构成,基于模块内的多任务场景设计和系统化评测指标来为金融领域大模型的能力评估提供标准化、系统化的解决途径。实验结果表明,大模型在金融场景下的表现与模型参数、架构和训练过程息息相关,同时大模型在金融合规与安全领域仍有很大改进空间。未来随着大模型在金融领域的应用愈发广泛,大模型金融能力测评框架需完善更多真实场景的任务设计与高质量测评数据的收集,以提升大模型在多样化金融场景下的泛化能力。With the rapid development of large language models(LLMs),its application in the financial sector has become a dri-ving force for industry transformation.Establishing a standardized and systematic evaluation framework for financial capabilities is a crucial way to assess large language models’abilities in financial scenarios.However,current evaluation methods have limitations,such as weak generalization of evaluation datasets and narrow coverage of task scenarios.To address these issues,this paper proposes a financial large language model benchmark,named CFBenchmark,which consists of four core assessment modules:financial natural language processing,financial scenario computation,financial analysis and interpretation,and financial compliance and security.High-quality tasks and systematic evaluation metrics are designed based on multi-task scenarios within each module,providing a standardized and systematic approach to assessing large models in the financial domain.Experimental results indicate that the performance of large language models in financial scenarios is closely related to their parameters,architecture,and trai-ning process.As the application of LLMs in the financial sector becomes more widespread in the future,the financial LLM benchmark will need to include more real-world application designs and high-quality evaluation data collection to help enhance the generalization ability of LLMs across diverse financial scenarios.

关键词：大模型评测金融大模型金融场景计算金融分析与解读金融合规与安全

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

大模型金融场景能力评测框架研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

大模型金融场景能力评测框架研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索