FinGPT: Open-Source Financial Large Language Models FinGPT：开源金融大型语言模型|电子爱好者

admin管理员组
文章数量:1558103

Abstract 摘要

大型语言模型（LLMs）已展现出在多个领域革新自然语言处理任务的潜力，因此在金融领域引起了极大的兴趣。访问高质量金融数据是金融LLMs（FinLLMs）面临的首个挑战。尽管像BloombergGPT这样的专有模型利用了其独特的数据积累优势，这种特权访问促使人们寻求一个开源的替代方案，以民主化互联网规模的金融数据。

在本文中，我们介绍了一个针对金融部门的开源大型语言模型，FinGPT。与专有模型不同，FinGPT采取了以数据为中心的方法，为研究人员和实践者提供了可访问和透明的资源，以开发他们的FinLLMs。我们强调了自动化数据策展管道和轻量级低秩适应技术在构建FinGPT中的重要性。此外，我们展示了几个潜在的应用作为用户的垫脚石，如机器人咨询、算法交易和低代码开发。通过开源AI4Finance社区内的协作努力，FinGPT旨在激发创新，民主化FinLLMs，并在开放金融中解锁新机遇。

两个相关的代码仓库为https://github/AI4Finance-Foundation/FinGPT 和 https://github/AI4Finance-Foundation/FinNLP。

Large language models (LLMs) have shown the potential of revolutionizing natural language processing tasks in diverse domains, sparking great interest in finance. Accessing high-quality financial data is the first challenge for financial LLMs (FinLLMs). While proprietary models like BloombergGPT have taken advantage of their unique data accumulation, such privileged access calls for an open-source alternative to democratize Internet-scale financial data. In this paper, we present an open-source large language model, FinGPT, for the finance sector. Unlike proprietary models, FinGPT takes a data-centric approach, providing researchers and practitioners with accessible and transparent resources to develop their FinLLMs. We highlight the importance of an automatic data curation pipeline and the lightweight low-rank adaptation technique in building FinGPT. Furthermore, we showcase several potential applications as stepping stones for users, such as roboadvising, algorithmic trading, and low-code development. Through collaborative efforts within the open-source AI4Finance community, FinGPT aims to stimulate innovation, democratize FinLLMs, and unlock new opportunities in open finance.

Two associated code repos are https://github. com/AI4Finance-Foundation/FinGPT and https:// github/AI4Finance-Foundation/FinNLP

1 Introduction 引言

人工智能的持续扩展和演化为大型语言模型的增长提供了肥沃的土壤【Vaswani et al., 2017; Radford et al., 2018; Devlin et al., 2018; Ethayarajh, 2019; Lewis et al., 2019; Lewis et al., 2020; Brown et al., 2020; Thoppilan et al., 2022】，从而在多个领域内的自然语言处理景观中引起了变革性的转变。这一彻底的变化在金融领域对这些模型潜在应用的兴趣中激发了极大的兴趣。然而，显而易见的是，获取高质量、相关且最新的数据是开发一个有效且高效的开源金融语言模型的关键因素。

在金融领域使用语言模型揭示了复杂的挑战。这些挑战范围从获取数据的困难、处理多种数据格式和类型、管理数据质量的不一致，到对最新信息的基本需求。特别是，历史或专门的金融数据提取由于数据媒介的不同，如网站平台、API、PDF文档和图像，证明是复杂的。

The continual expansion and evolution of artificial intelligence have provided a fertile ground for the proliferation of large language models [Vaswani et al., 2017; Radford et al., 2018; Devlin et al., 2018; Ethayarajh, 2019; Lewis et al., 2019; Lewis et al., 2020; Brown et al., 2020; Thoppilan et al., 2022], thereby effecting a transformative shift in the landscape of natural language processing across diverse domains. This sweeping change has engendered keen interest in the potential application of these models in the financial realm. It is, however, evident that the acquisition of high-quality, relevant, and up-to-date data stands as a critical factor in the development of an efficacious and efficient open-source financial language model.

Utilizing language models in the financial arena reveals intricate hurdles. These range from difficulties in obtaining data, dealing with diverse data formats and types, and managing data quality inconsistencies, to the essential requirement of up-to-date information. Especially, historical or specialized financial data extraction proves to be complex due to varying data mediums such as web platforms, APIs, PDF documents, and images.

在专有领域，像BloombergGPT【Wu et al., 2023】这样的模型利用其对专门数据的独家访问来训练特定于金融的语言模型。然而，他们的数据收集和训练协议的受限访问性和透明度加剧了对一个更开放和包容性替代品的需求。为了响应这一需求，我们正见证着向在开源领域民主化互联网规模金融数据的趋势转变。

In the proprietary sphere, models like BloombergGPT [Wu et al., 2023] have capitalized on their exclusive access to specialized data to train finance-specific language models. However, the restricted accessibility and transparency of their data collections and training protocols have accentuated the demand for a more open and inclusive alternative. In response to this demand, we are witnessing a shifting trend towards democratizing Internet-scale financial data in the open-source domain.

在本文中，我们讨论了与金融数据相关的上述挑战，并介绍FinGPT，一个端到端的开源金融大型语言模型（FinLLMs）框架。FinGPT采用以数据为中心的方法，强调数据获取、清洗和预处理在开发开源FinLLMs中的关键作用。通过提倡数据可访问性，FinGPT旨在增强金融领域的研究、合作和创新，为开放金融实践铺平道路。我们的贡献总结如下：

民主化：作为一个开源框架，FinGPT旨在民主化金融数据和FinLLMs，揭示开放金融中未被挖掘的潜力。
以数据为中心的方法：认识到数据策展的重要性，FinGPT采用以数据为中心的方法并实施严格的清洗和预处理方法来处理各种数据格式和类型，从而确保数据的高质量。
端到端框架：FinGPT采用了一个完整的FinLLMs框架，包含四个层次：
- 数据源层：该层保证全面的市场覆盖，通过实时信息捕捉解决金融数据的时间敏感性问题。
- 数据工程层：为实时NLP数据处理而设计，该层解决了金融数据高时间敏感性和低信噪比的固有挑战。
- LLMs层：关注一系列微调方法，该层减轻了金融数据高度动态的本质，确保模型的相关性和准确性。
- 应用层：展示实用应用和演示，该层突出了FinGPT在金融领域的潜在能力。

我们对FinGPT的愿景是作为在金融领域内激发创新的催化剂。FinGPT不仅提供技术贡献，而且还培育了一个FinLLMs的开源生态系统，促进实时处理和用户的定制化适应。通过在开源AI4Finance社区内培养一个强大的协作生态系统，FinGPT定位于重塑我们对FinLLMs的理解和应用。

In this paper, we address these aforementioned challenges associated with financial data and introduce FinGPT, an endto-end open-source framework for financial large language models (FinLLMs). Adopting a data-centric approach, FinGPT underscores the crucial role of data acquisition, cleaning, and preprocessing in developing open-source FinLLMs. By championing data accessibility, FinGPT aspires to enhance research, collaboration, and innovation in finance, paving the way for open finance practices. Our contributions are summarized as follows:

• Democratization: FinGPT, as an open-source framework, aims to democratize financial data and FinLLMs, uncovering untapped potentials in open finance.

• Data-centric approach: Recognizing the significance of data curation, FinGPT adopts a data-centric approach and implements rigorous cleaning and preprocessing methods for handling varied data formats and types, thereby ensuring high-quality data.

• End-to-end framework: FinGPT embraces a full-stack framework for FinLLMs with four layers:

– Data source layer: This layer assures comprehensive market coverage, addressing the temporal sensitivity of financial data through real-time information capture.

– Data engineering layer: Primed for real-time NLP data processing, this layer tackles the inherent challenges of high temporal sensitivity and low signal-tonoise ratio in financial data.

– LLMs layer: Focusing on a range of fine-tuning methodologies, this layer mitigates the highly dynamic nature of financial data, ensuring the model’s relevance and accuracy.

– Application layer: Showcasing practical applications and demos, this layer highlights the potential capability of FinGPT in the financial sector.

Our vision for FinGPT is to serve as a catalyst for stimulating innovation within the finance domain. FinGPT is not limited to providing technical contributions, but it also cultivates an open-source ecosystem for FinLLMs, promoting real-time processing and customized adaptation for users. By nurturing a robust collaboration ecosystem within the open-source AI4Finance community, FinGPT is positioned to reshape our understanding and application of FinLLMs.

2 Related Work 相关工作

2.1 LLMs and ChatGPT 大型语言模型与ChatGPT

大型语言模型（LLMs）已被认为是自然语言处理技术的一个重大突破，例如GPT-3和GPT-4【Brown et al., 2020】。它们采用基于变压器的架构，在各种生成任务中展现出令人印象深刻的性能。

作为OpenAI开发的GPT系列的一个分支，ChatGPT旨在基于输入提示产生类似人类的文本。它在多种应用中显示出显著的实用性，从起草电子邮件到编写代码，甚至创造书面内容。

Large Language Models (LLMs) have been recognized as a technological breakthrough in natural language processing, such as GPT-3 and GPT-4 [Brown et al., 2020]. They take transformer-based architectures, demonstrating impressive performance across various generative tasks.

As an offshoot of the GPT family developed by OpenAI, ChatGPT was designed to produce human-like text based on input prompts. It has shown significant utility in diverse applications, from drafting emails to writing code and even in creating written content.

2.2 LLMs in Finance 金融中的大型语言模型

大型语言模型（LLMs）已被应用于金融领域内的各种任务【Dredze et al., 2016; Araci, 2019; Bao et al., 2021; DeLucia et al., 2022】，从预测建模到从原始财务数据生成有洞察力的叙述。最近的文献集中在使用这些模型进行金融文本分析，鉴于该领域存在大量的文本数据，如新闻文章、财报电话会议记录和社交媒体帖子。

金融LLMs的首个示例是BloombergGPT【Wu et al., 2023】，它训练于一个混合了财务和通用来源的数据集上。尽管其能力令人印象深刻，但存在访问限制，且高昂的训练成本激发了对低成本领域适应的需求。

我们的FinGPT响应了这些挑战，提出了一个开源的金融LLM。它采用了来自人类反馈的强化学习（RLHF）来理解和适应个人偏好，为个性化金融助手铺平了道路。我们的目标是结合通用LLMs（如ChatGPT）的优势与金融适应，利用LLM在金融方面的能力。

LLMs have been applied to various tasks within the financial sector [Dredze et al., 2016; Araci, 2019; Bao et al., 2021; DeLucia et al., 2022], from predictive modeling to generating insightful narratives from raw financial data. Recent literature has focused on using these models for financial text analysis, given the abundance of text data in this field, such as news articles, earnings call transcripts, and social media posts.

The first example of financial LLMs is BloombergGPT [Wu et al., 2023], which was trained on a mixed dataset of financial and general sources. Despite its impre

本文标签：开源模型语言金融 Source

版权声明：本文标题：FinGPT: Open-Source Financial Large Language Models FinGPT：开源金融大型语言模型内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://m.elefans.com/dongtai/1727384172a1112229.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

FinGPT: Open-Source Financial Large Language Models FinGPT：开源金融大型语言模型

Abstract 摘要

1 Introduction 引言

2 Related Work 相关工作

2.1 LLMs and ChatGPT 大型语言模型与ChatGPT

2.2 LLMs in Finance 金融中的大型语言模型

更多相关文章

腾讯重磅开源 DCache，分布式 NoSQL 存储系统

腾讯不“封闭” ！刘昕阐述如何自下而上在腾讯做开源

c语言获取文件hash,C++获取文件哈希值(hash)和获取torrent(bt种子)磁力链接哈希值...

Linux下载工具photon,Photon v0.3.1 免费开源下载软件，替代迅雷的下载利器

Linux下载工具photon,不限速、免配置的 Aria2 免费开源下载软件 Photon，替代迅雷的...

推荐开源项目：Droiter - 在Android设备上运行OpenWrt

探索LibreDWG：开源的DWG文件格式实现

探索自由的未来：LibreDWG —— 开源DWG文件格式实现库

linux 迅雷 命令行,Linux小迅雷：uGet下载工具加速 | 薄荷开源网

推荐开源项目：`conduct` - 简洁的代码行为准则生成器

[开源工具]Win10-87键盘如何使用数字小键盘功能?

Windows10操作系统搭建C语言开发环境

vivo刷入鸿蒙,既然鸿蒙是开源，那么以后可以通过刷机在手机上使用鸿蒙吗？...

一站式开源分布式集群云真机测试平台Sonic——基于Docker方式部署sonic前后端（体验版）

如何利用国内的镜像下载Hugging Face模型与数据集？

c语言制作电脑病毒原理,用C语言编写的简单病毒

开源项目 `financial` 使用教程

第六章第十一题（金融应用：计算酬金）(Financial application: compute commissions)

探索金融数据新维度：Financial Datasets

Instant Financial 开源项目教程

发表评论

推荐文章

linux wifi连接上不能上网

solve

vivoy66升级Android+7.,vivoy66和X7哪个值得买？vivoy66与x7详细区别对比评测

(转)etBrains PyCharm v2018.3.4永久破解教程详细教程

如何教会爸妈用智能手机？方法比耐心重要，get这份手绘板说明书！

热门文章

Android 7.0以上 APP软件自动更新 解析软件包时出现错误

虚拟机设置了桥连模式无法上网（电脑wifi上网）

大迈u盘文件不见了怎么办？全面解析与恢复策略

win10 windows 键(徽标键) 失效解决办法

win10自定义快捷键 &amp;开机自启动—autohotkey实现快捷键上下左右（60键小键盘福利）

解决win10等系统修复windows键失灵 不弹出开始菜单问题

Python安装第三方包报错：InsecurePlatformWarning: A true SSLContext object is not available.

确定anaconda与安装的python对应版本的方法

推荐几个非常不错的富文本编辑器

别的电脑访问另外一台电脑当中的虚拟机项目

最新文章

电脑怎样重装系统,电脑怎样重装系统win7

重装系统后有两个系统怎么办

小白如何重装系统win7 小白如何重装系统win7教程

探索经典：Windows 7 64位系统镜像项目推荐

Windows11 2024九月更新正式版官方ISO纯净版镜像下载

2024年Linux最全小白必看：零基础安装Linux系统（超级详细）_从0-1搭建liunx

重装系统win7如何操作(重装系统win7怎么操作)

重装系统必看｜安装系统前的分区表类型GPT和MBR详解

笔记本怎么一键重装系统win7,笔记本一键重装系统win7系统教程

Windows 98 光盘引导文件

电脑如何自己重装系统win7(电脑怎么自己重装系统win7)

重装系统win7需要多久,重装系统win7需要多久才能开机

Windows镜像包大全（附百度云下载链接）

【8.28更新】Win11 23H2 正式版：22631.4112镜像下载！

Win7 64位企业版ISO镜像：重温经典，稳定高效的选择

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

如何实现高效的treenode搜索算法

treenode与链表有何本质区别

在哪些场景下应优先考虑使用treenode

linux 迅雷命令行,Linux小迅雷：uGet下载工具加速 | 薄荷开源网

Android 7.0以上 APP软件自动更新解析软件包时出现错误

win10自定义快捷键 &开机自启动—autohotkey实现快捷键上下左右（60键小键盘福利）

解决win10等系统修复windows键失灵不弹出开始菜单问题

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载