LLMs之RAGLong-Context：《检索增强生成还是长上下文LLMs？一项综合研究与混合方法Retrieval Augmented Generation or Long-Context LL|电子爱好者

admin管理员组
文章数量:1530826

LLMs之RAG/Long-Context：《检索增强生成还是长上下文LLMs？一项综合研究与混合方法Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach》翻译与解读

导读：该工作提出了一种有效结合RAG和长上下文LLM的混合方法，权衡了性能和成本，为长上下文应用提供了经验指导。

背景痛点：大语言模型（LLMs）在长文本处理上存在挑战。检索增强生成（RAG）是一种通过外部知识帮助模型处理长文本的有效方法。虽然RAG能降低成本，但在长文本理解方面的表现不如最新的长文本模型（LC），如Gemini-1.5和GPT-4。

>> 当输入上下文过长时，大型语言模型(LLM)的性能会受到影响，需要有效利用外部知识来提高理解能力。

>> 现有的检索增强生成(RAG)方法通过检索相关信息并将其提供给LLM来解决长上下文问题，但其性能往往低于直接使用长上下文LLM。

解决方案：提出了一种名为SELF-ROUTE的简单有效方法，通过LLM自我反馈来决定是使用RAG还是长上下文LLM来回答查询。该方法结合RAG和LC的优点，通过模型自反射来动态选择使用RAG还是LC，从而降低成本并保持性能。

核心思路步骤：

>> RAG与LC对比分析：在多种公共数据集上系统性地比较RAG和LC的性能与效率。

>> SELF-ROUTE实现：在模型预测阶段，先使用RAG进行初步预测。如果RAG无法充分回答问题，再使用LC进行详细分析。通过这种自反射机制，减少需要完整长文本模型处理的查询数量。

>> 对查询进行RAG-and-Route步骤：提供查询和检索块，让LLM预测查询是否可回答，如果可回答则使用RAG生成答案。对于RAG无法回答的查询，进入第二步，提供完整上下文给长上下文LLM进行预测。

优势：

>> 成本效益：大幅降低了计算成本，如Gemini-1。5-Pro节省了65%的tokens。SELF-ROUTE方法在性能与LC相近的情况下，大幅降低了计算成本。例如，针对Gemini-1.5-Pro，成本减少了65%。

>> 高效性：在降低成本的同时，SELF-ROUTE的整体性能可与长上下文LLM相当。在大多数查询上，RAG的预测与LC一致，因此可以在不影响性能的情况下显著降低计算量。

>> 适用性广泛：灵活性好，可根据任务需求和成本要求调整检索数量k。该方法在多个数据集和模型上验证了其有效性。

通过这种方法，文章为长文本应用提供了一种切实可行的解决方案，使得在准确性和经济性之间取得了良好的平衡。

《Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach》翻译与解读

Abstract

1 Introduction

Figure 1: While long-context LLMs (LC) surpass RAG in long-context understanding, RAG is significantly more cost-efficient. Our approach, SELF-ROUTE, com-bining RAG and LC, achieves comparable performance to LC at a much lower cost.图1：虽然长上下文LLMs（LC）在长上下文理解方面超越了RAG，但RAG在成本效益上明显更高。我们的方法SELF-ROUTE将RAG和LC结合，能够以更低的成本实现与LC相当的性能。

6 conclusion

《Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach》翻译与解读

地址	论文地址：https://www.arxiv/pdf/2407.16833
时间	2024年7月23日
作者	Google DeepMind等

Abstract

Retrieval Augmented Generation (RAG) has been a powerful tool for Large Language Mod-els (LLMs) to efficiently process overly lengthy contexts. However, recent LLMs like Gemini- 1.5 and GPT-4 show exceptional capabilities to understand long contexts directly. We conduct a comprehensive comparison between RAG and long-context (LC) LLMs, aiming to lever-age the strengths of both. We benchmark RAG and LC across various public datasets using three latest LLMs. Results reveal that when resourced sufficiently, LC consistently outper-forms RAG in terms of average performance. However, RAG’s significantly lower cost re-mains a distinct advantage. Based on this ob-servation, we propose SELF-ROUTE, a simple yet effective method that routes queries to RAG or LC based on model self-reflection. SELF-ROUTE significantly reduces the computation cost while maintaining a comparable perfor-mance to LC. Our findings provide a guideline for long-context applications of LLMs using RAG and LC.

检索增强生成（Retrieval Augmented Generation, RAG）已经成为大型语言模型（LLMs）有效处理过长上下文的有力工具。然而，最近的一些LLMs，如Gemini-1.5和GPT-4，展现出了直接理解长上下文的卓越能力。我们进行了对RAG和长上下文（LC）LLMs的全面比较，旨在利用两者的优势。我们使用三个最新的LLMs在各种公共数据集上对RAG和LC进行了基准测试。结果表明，在资源充足的情况下，LC在平均表现上持续优于RAG。然而，RAG显著较低的成本仍然是一个独特的优势。基于这一观察，我们提出了SELF-ROUTE，这是一种基于模型自我反思的简单而有效的方法，可以将查询路由到RAG或LC。SELF-ROUTE在大幅降低计算成本的同时，保持了与LC相当的性能。我们的研究结果为使用RAG和LC的长上下文应用提供了指导。

1 Introduction

Retrieval augmented generation (RAG) has been shown to be a both effective and efficient approach for large language models (LLMs) to leverage ex-ternal knowledge. RAG retrieves relevant informa-tion based on the query and then prompts an LLM to generate a response in the context of the retrieved information. This approach significantly expands LLM’s access to vast amounts of information at a minimal cost.

However, recent LLMs like Gemini and GPT-4 have demonstrated exceptional capabilities in un-derstanding long contexts directly. For example, Gemini 1.5 can process up to 1 million tokens (Reid et al., 2024). This prompts the need for a system-atic comparison between long-context (LC) LLMs and RAG: on one hand, RAG conceptually acts as a prior, regularizing the attention of LLMs onto retrieved segments, thus avoiding the distraction of the irrelevant information and saving unnecessary attention computations; on the other hand, large-scale pretraining may enable LLMs to develop even stronger long-context capabilities. Therefore, we are motivated to compare RAG and LC, evaluating both their performance and efficiency.

检索增强生成(RAG)已被证明是大型语言模型(LLM)利用外部知识的一种既有效又高效的方法。RAG根据查询检索相关信息，然后提示LLM在检索到的信息的上下文中生成响应。这种方法以最小的成本极大地扩展了LLM对大量信息的访问。

然而，最近的LLMs，如Gemini和GPT-4，在直接理解长期背景方面表现出了卓越的能力。例如，Gemini 1.5可以处理多达100万个tokens(Reid et al.， 2024)。这就需要对长上下文(LC) LLM和RAG进行系统的比较：

一方面，RAG在概念上充当先验，规范LLM对检索片段的注意力，从而避免了不相关信息的分散，节省了不必要的注意力计算;

另一方面，大规模的预训练可以使LLMs培养更强的长上下文能力。因此，我们有动机比较RAG和LC，评估它们的性能和效率。

In this work, we systematically benchmark RAG and LC on various public datasets, gaining a com-prehensive understanding of their pros and cons, and ultimately combining them to get the best of both worlds. Different from findings in previous work (Xu et al., 2023), we find that LC consistently outperform RAG in almost all settings (when re-sourced sufficiently). This demonstrates the su-perior progress of recent LLMs in long-context understanding.

Despite the suboptimal performance, RAG re-mains relevant due to its significantly lower compu-tational cost. In contrast to LC, RAG significantly decreases the input length to LLMs, leading to re-duced costs, as LLM API pricing is typically based on the number of input tokens. (Google, 2024; Ope-nAI, 2024b)1. Moreover, our analysis reveals that the predictions from LC and RAG are identical for over 60% of queries. For these queries, RAG can reduce cost without sacrificing performance.

在这项工作中，我们系统地在各种公共数据集上对RAG和LC进行基准测试，全面了解它们的优缺点，并最终将它们结合起来，以获得两者的最佳效果。与之前的研究结果不同(Xu et al.， 2023)，我们发现LC在几乎所有设置(当资源充足时)都始终优于RAG。这证明了最近LLMs在长期语境理解方面的卓越进步。

尽管性能不是最优的，但由于计算成本明显较低，RAG仍然具有相关性。与LC相比，RAG显著减少了LLM的输入长度，从而降低了成本，因为LLM API的定价通常基于输入令牌的数量。(谷歌,2024;Ope-nAI, 2024 b) 1。此外，我们的分析显示，LC和RAG对超过60%的查询的预测是相同的。对于这些查询，RAG可以在不牺牲性能的情况下降低成本。

Based on this observation, we propose SELF-ROUTE, a simple yet effective method that routes various queries to RAG or LC based on model self-reflection. With SELF-ROUTE, we significantly re-duce the cost while achieving overall performance comparable to LC. For example, the cost is reduced by 65% for Gemini-1.5-Pro and 39% for GPT-4O.

Fig. 1 shows the comparisons of LC, RAG and SELF-ROUTE using three recent LLMs: GPT-4O, GPT-3.5-Turbo and Gemini-1.5-Pro. In addition to quantitative evaluation, we provide a comprehen-sive analysis comparing RAG and LC, including common failure patterns of RAG, the trade-offs between cost and performance, and the results on additional synthetic datasets. Our analysis serves as a starting point, inspiring future improvements of RAG, and as a empirical guide for building long-context applications using RAG and LC.

基于这一观察，我们提出了SELF-ROUTE，这是一种简单而有效的方法，可以基于模型自我反射将各种查询路由到RAG或LC。通过SELF-ROUTE，我们大大降低了成本，同时实现了与LC相当的整体性能。例如，Gemini-1.5-Pro的成本降低了65%，gpt - 40的成本降低了39%。

图1显示了LC, RAG和SELF-ROUTE使用三种最新LLMs的比较:gpt - 40, GPT-3.5-Turbo和Gemini-1.5-Pro。除了定量评估之外，我们还对RAG和LC进行了全面的比较分析，包括RAG的常见故障模式、成本和性能之间的权衡，以及在其他合成数据集上的结果。我们的分析可以作为一个起点，激励RAG的未来改进，并作为使用RAG和LC构建长上下文应用程序的经验指南。

Figure 1: While long-context LLMs (LC) surpass RAG in long-context understanding, RAG is significantly more cost-efficient. Our approach, SELF-ROUTE, com-bining RAG and LC, achieves comparable performance to LC at a much lower cost.图1：虽然长上下文LLMs（LC）在长上下文理解方面超越了RAG，但RAG在成本效益上明显更高。我们的方法SELF-ROUTE将RAG和LC结合，能够以更低的成本实现与LC相当的性能。

6 conclusion

This paper presents a comprehensive comparison of RAG and LC, highlighting the trade-offs between performance and computational cost. While LC demonstrate superior performance in long-context understanding, RAG remains a viable option due to its lower cost and advantages when the input considerably exceeds the model’s context window size. Our proposed method, which dynamically routes queries based on model self-reflection, ef-fectively combines the strengths of both RAG and LC, achieving comparable performance to LC at a significantly reduced cost. We believe our findings contribute valuable insights for the practical appli-cation of long-context LLMs and pave the way for future research in optimizing RAG techniques.

本文对RAG和LC进行了全面的比较，强调了性能和计算成本之间的权衡。虽然LC在长上下文理解方面表现出卓越的性能，但RAG仍然是一个可行的选择，因为它的成本更低，并且在输入大大超过模型的上下文窗口大小时具有优势。我们提出的方法基于模型自反射动态路由查询，有效地结合了RAG和LC的优势，在显著降低成本的情况下实现了与LC相当的性能。我们相信我们的发现为长期上下文LLMs的实际应用提供了有价值的见解，并为优化RAG技术的未来研究铺平了道路。

本文标签：长上下文方法 long LLMs

版权声明：本文标题：LLMs之RAGLong-Context：《检索增强生成还是长上下文LLMs？一项综合研究与混合方法Retrieval Augmented Generation or Long-Context LL 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://m.elefans.com/xitong/1725897889a1047873.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

LLMs之RAGLong-Context：《检索增强生成还是长上下文LLMs？一项综合研究与混合方法Retrieval Augmented Generation or Long-Context LL

《Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach》翻译与解读

Abstract

1 Introduction

6 conclusion

更多相关文章

chrome浏览器打开网页默认全屏的方法

MyEclipse10与CI2018.12.0共存激活的方法

永久使用Beyond Compare4的方法

联想微型计算机 wifi,联想(Lenovo)路由器无线wifi设置方法图解

Linux 登录帐号 cmccedu无线,CMCC CMCC-EDU路由器绑定自动登录方法详细教程！

Win7 用户文件夹转移方法

计算机启动时为啥总要检测硬盘,硬盘总是自检 开机后总是硬盘自检关闭方法...

linux环境刷amd显卡bios,amd显卡刷bios方法

Linux 系统安装 AMD 显卡官方驱动的方法

超详细的免费下载论文方法

计算机无法读取手机内存,手机sd卡无法读取,教您解决手机sd卡无法读取的方法...

linux下暴风影音安装方法

LLMs模型速览（GPTs、LaMDA、GLMChatGLM、PaLMFlan-PaLM、BLOOM、LLaMA、Alpaca）

win10新建计算机账户,Windows10系统创建microsoft帐户的方法

Win10如何删除Windowsapps文件夹？Windowsapps文件夹删除方法

Windows10家庭版 SMB共享文件 方法

最新Windows10 Defender 关闭卸载方法

Win10一键修复所有dll缺失的方法

magicbook linux系统换w7,荣耀magicbook怎么安装win7 荣耀magicbook安装win7方法

总结！加快家里WIFI网速的方法

发表评论

推荐文章

AR路由器-使用非管理口登录Web界面（web界面命令行）

Zcash中的Notes

国内主流邮箱如何启用SMTPPOP3IMAP等协议？

win10间歇性闪屏_win10系统频繁闪屏刷新如何解决

WIFI智能音箱技术方案开发

热门文章

远程重启h3c路由器_H3C路由器远程登陆命令 -192.168.0.1 路由器怎么设置|192.168.1.1登陆|路由器设置密码-路由器网...

关于电脑格式化之后的恢复

电子计算机按数字错乱,Win7系统键盘数字错乱如何恢复

光影精灵usb安装linux,惠普光影精灵5笔记本怎么装win10系统(uefi+gpt)

卡在硬盘启动计算机,插硬盘启动卡死了,怎么办？电脑维修方法

AMD Radeon Software 面板弹错 “This application failed to start because no Qt platform plugin...........”

电脑html文件存到手机,怎样把电脑上的文件传到手机上【技巧详情】

Android病毒查杀原理,Android编程之杀毒的实现原理及具体实例

如何用搜狗拼音输入法输入各种上下标

原版win7系统怎么安装,原版Win7系统的安装步骤

最新文章

win10wifi开关自动弹回_win10突然搜不到wifi了，这个开关点不动，点了会自动变回去...

海思平台上USB WIFI的移植与局域网无线调试和视频流预览(1)

wifi一到晚上服务器无响应,一到晚上九点，网络就开始卡了？主要原因是这三点！...

【1】Kali破解家用WI-FI密码 - WEP加密

WIFI原理，WIFI6各代介绍 2020-11-23

用Termux给随身wifi刷机

手机控制电脑,在WIFI局域网下(关机,重启,遥控)

Kindle wifi 连接不上的问题

linux wifi关闭5g,双频路由器怎么关掉5G频段无线信号？

STM32智能门禁连接阿里云（指纹开锁、密码开锁、刷卡开锁、手机开锁）

青提WiFi微信小程序项目介绍

android系统wifi控制风扇,（开源）ESP8266改装小风扇，app远程控制+天猫精灵控制...

802.11协议：wifi

ESP8266 wifi钓鱼

无线WIFI安全渗透与防御

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

如何实现高效的treenode搜索算法

treenode与链表有何本质区别

在哪些场景下应优先考虑使用treenode

treenode在树形结构中的角色是什么

计算机启动时为啥总要检测硬盘,硬盘总是自检开机后总是硬盘自检关闭方法...

Windows10家庭版 SMB共享文件方法

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载