检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:程大伟 吴佳璇 李江彤 丁志军[1,2,3] 蒋昌俊 CHENG Dawei;WU Jiaxuan;LI Jiangtong;DING Zhijun;JIANG Changjun(College of Computer Science and Technology,Tongji University,Shanghai 201804,China;Shanghai Artificial Intelligence Laboratory,Shanghai 200030,China;National Collaborative Innovation Center for Internet Financial Security,Shanghai 201804,China)
机构地区:[1]同济大学计算机科学与技术学院,上海201804 [2]上海人工智能实验室,上海200030 [3]国家级网络金融安全协同创新中心,上海201804
出 处:《计算机科学》2025年第3期239-247,共9页Computer Science
基 金:国家重点研发计划(2022YFB4501704);国家自然科学基金(62102287,62472317);上海市科技创新行动计划项目(24692118300,22YS1400600)。
摘 要:随着大模型技术的快速发展,其在金融领域的应用已成为推动行业变革的重要力量。构建标准化、系统化的金融能力评测框架是衡量大模型金融场景能力的重要途径,但是现有的评测方法存在评测数据集泛化性弱、任务场景覆盖面窄等缺点。因此,提出了一种面向大模型金融能力的评测框架CFBenchmark,该框架由金融自然语言处理、金融场景计算、金融分析与解读,以及金融合规与安全四大核心评估模块构成,基于模块内的多任务场景设计和系统化评测指标来为金融领域大模型的能力评估提供标准化、系统化的解决途径。实验结果表明,大模型在金融场景下的表现与模型参数、架构和训练过程息息相关,同时大模型在金融合规与安全领域仍有很大改进空间。未来随着大模型在金融领域的应用愈发广泛,大模型金融能力测评框架需完善更多真实场景的任务设计与高质量测评数据的收集,以提升大模型在多样化金融场景下的泛化能力。With the rapid development of large language models(LLMs),its application in the financial sector has become a dri-ving force for industry transformation.Establishing a standardized and systematic evaluation framework for financial capabilities is a crucial way to assess large language models’abilities in financial scenarios.However,current evaluation methods have limitations,such as weak generalization of evaluation datasets and narrow coverage of task scenarios.To address these issues,this paper proposes a financial large language model benchmark,named CFBenchmark,which consists of four core assessment modules:financial natural language processing,financial scenario computation,financial analysis and interpretation,and financial compliance and security.High-quality tasks and systematic evaluation metrics are designed based on multi-task scenarios within each module,providing a standardized and systematic approach to assessing large models in the financial domain.Experimental results indicate that the performance of large language models in financial scenarios is closely related to their parameters,architecture,and trai-ning process.As the application of LLMs in the financial sector becomes more widespread in the future,the financial LLM benchmark will need to include more real-world application designs and high-quality evaluation data collection to help enhance the generalization ability of LLMs across diverse financial scenarios.
关 键 词:大模型评测 金融大模型 金融场景计算 金融分析与解读 金融合规与安全
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222