admin管理员组

文章数量:1530826

LLMs之RAG/Long-Context:《检索增强生成还是长上下文LLMs?一项综合研究与混合方法Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach》翻译与解读

导读:该工作提出了一种有效结合RAG和长上下文LLM的混合方法,权衡了性能和成本,为长上下文应用提供了经验指导。

背景痛点:大语言模型(LLMs)在长文本处理上存在挑战。检索增强生成(RAG)是一种通过外部知识帮助模型处理长文本的有效方法。虽然RAG能降低成本,但在长文本理解方面的表现不如最新的长文本模型(LC),如Gemini-1.5和GPT-4。

>> 当输入上下文过长时,大型语言模型(LLM)的性能会受到影响,需要有效利用外部知识来提高理解能力。

>> 现有的检索增强生成(RAG)方法通过检索相关信息并将其提供给LLM来解决长上下文问题,但其性能往往低于直接使用长上下文LLM。

解决方案:提出了一种名为SELF-ROUTE的简单有效方法,通过LLM自我反馈来决定是使用RAG还是长上下文LLM来回答查询。该方法结合RAG和LC的优点,通过模型自反射来动态选择使用RAG还是LC,从而降低成本并保持性能。

核心思路步骤

>> RAG与LC对比分析:在多种公共数据集上系统性地比较RAG和LC的性能与效率。

>> SELF-ROUTE实现:在模型预测阶段,先使用RAG进行初步预测。如果RAG无法充分回答问题,再使用LC进行详细分析。通过这种自反射机制,减少需要完整长文本模型处理的查询数量。

>> 对查询进行RAG-and-Route步骤:提供查询和检索块,让LLM预测查询是否可回答,如果可回答则使用RAG生成答案。对于RAG无法回答的查询,进入第二步,提供完整上下文给长上下文LLM进行预测。

优势

>> 成本效益:大幅降低了计算成本,如Gemini-1。5-Pro节省了65%的tokens。SELF-ROUTE方法在性能与LC相近的情况下,大幅降低了计算成本。例如,针对Gemini-1.5-Pro,成本减少了65%。

>> 高效性:在降低成本的同时,SELF-ROUTE的整体性能可与长上下文LLM相当。在大多数查询上,RAG的预测与LC一致,因此可以在不影响性能的情况下显著降低计算量。

>> 适用性广泛:灵活性好,可根据任务需求和成本要求调整检索数量k。该方法在多个数据集和模型上验证了其有效性。

通过这种方法,文章为长文本应用提供了一种切实可行的解决方案,使得在准确性和经济性之间取得了良好的平衡。

目录

《Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach》翻译与解读

Abstract

1 Introduction

Figure 1: While long-context LLMs (LC) surpass RAG in long-context understanding, RAG is significantly more cost-efficient. Our approach, SELF-ROUTE, com-bining RAG and LC, achieves comparable performance to LC at a much lower cost.图1:虽然长上下文LLMs(LC)在长上下文理解方面超越了RAG,但RAG在成本效益上明显更高。我们的方法SELF-ROUTE将RAG和LC结合,能够以更低的成本实现与LC相当的性能。

6 conclusion


《Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach》翻译与解读

地址

论文地址:https://www.arxiv/pdf/2407.16833

时间

2024723

作者

Google DeepMind等

Abstract

Retrieval Augmented Generation (RAG) has been a powerful tool for Large Language Mod-els (LLMs) to efficiently process overly lengthy contexts. However, recent LLMs like Gemini- 1.5 and GPT-4 show exceptional capabilities to understand long contexts directly. We conduct a comprehensive comparison between RAG and long-context (LC) LLMs, aiming to lever-age the strengths of both. We benchmark RAG and LC across various public datasets using three latest LLMs. Results reveal that when resourced sufficiently, LC consistently outper-forms RAG in terms of average performance. However, RAG’s significantly lower cost re-mains a distinct advantage. Based on this ob-servation, we propose SELF-ROUTE, a simple yet effective method that routes queries to RAG or LC based on model self-reflection. SELF-ROUTE significantly reduces the computation cost while maintaining a comparable perfor-mance to LC. Our findings provide a guideline for long-context applications of LLMs using RAG and LC.

检索增强生成(Retrieval Augmented Generation, RAG)已经成为大型语言模型(LLMs)有效处理过长上下文的有力工具。然而,最近的一些LLMs,如Gemini-1.5和GPT-4,展现出了直接理解长上下文的卓越能力。我们进行了对RAG和长上下文(LC)LLMs的全面比较,旨在利用两者的优势。我们使用三个最新的LLMs在各种公共数据集上对RAG和LC进行了基准测试。结果表明,在资源充足的情况下,LC在平均表现上持续优于RAG。然而,RAG显著较低的成本仍然是一个独特的优势。基于这一观察,我们提出了SELF-ROUTE,这是一种基于模型自我反思的简单而有效的方法,可以将查询路由到RAG或LC。SELF-ROUTE在大幅降低计算成本的同时,保持了与LC相当的性能。我们的研究结果为使用RAG和LC的长上下文应用提供了指导。

1 Introduction

Retrieval augmented generation (RAG) has been shown to be a both effective and efficient approach for large language models (LLMs) to leverage ex-ternal knowledge. RAG retrieves relevant informa-tion based on the query and then prompts an LLM to generate a response in the context of the retrieved information. This approach significantly expands LLM’s access to vast amounts of information at a minimal cost.

However, recent LLMs like Gemini and GPT-4 have demonstrated exceptional capabilities in un-derstanding long contexts directly. For example, Gemini 1.5 can process up to 1 million tokens (Reid et al., 2024). This prompts the need for a system-atic comparison between long-context (LC) LLMs and RAG: on one hand, RAG conceptually acts as a prior, regularizing the attention of LLMs onto retrieved segments, thus avoiding the distraction of the irrelevant information and saving unnecessary attention computations; on the other hand, large-scale pretraining may enable LLMs to develop even stronger long-context capabilities. Therefore, we are motivated to compare RAG and LC, evaluating both their performance and efficiency.

检索增强生成(RAG)已被证明是大型语言模型(LLM)利用外部知识的一种既有效又高效的方法。RAG根据查询检索相关信息,然后提示LLM在检索到的信息的上下文中生成响应。这种方法以最小的成本极大地扩展了LLM对大量信息的访问

然而,最近的LLMs,如Gemini和GPT-4,在直接理解长期背景方面表现出了卓越的能力。例如,Gemini 1.5可以处理多达100万个tokens(Reid et al., 2024)。这就需要对长上下文(LC) LLM和RAG进行系统的比较:

一方面,RAG在概念上充当先验,规范LLM对检索片段的注意力,从而避免了不相关信息的分散,节省了不必要的注意力计算;

另一方面,大规模的预训练可以使LLMs培养更强的长上下文能力。因此,我们有动机比较RAG和LC,评估它们的性能和效率。

In this work, we systematically benchmark RAG and LC on various public datasets, gaining a com-prehensive understanding of their pros and cons, and ultimately combining them to get the best of both worlds. Different from findings in previous work (Xu et al., 2023), we find that LC consistently outperform RAG in almost all settings (when re-sourced sufficiently). This demonstrates the su-perior progress of recent LLMs in long-context understanding.

Despite the suboptimal performance, RAG re-mains relevant due to its significantly lower compu-tational cost. In contrast to LC, RAG significantly decreases the input length to LLMs, leading to re-duced costs, as LLM API pricing is typically based on the number of input tokens. (Google, 2024; Ope-nAI, 2024b)1. Moreover, our analysis reveals that the predictions from LC and RAG are identical for over 60% of queries. For these queries, RAG can reduce cost without sacrificing performance.

在这项工作中,我们系统地在各种公共数据集上对RAG和LC进行基准测试,全面了解它们的优缺点,并最终将它们结合起来,以获得两者的最佳效果。与之前的研究结果不同(Xu et al., 2023),我们发现LC在几乎所有设置(当资源充足时)都始终优于RAG。这证明了最近LLMs在长期语境理解方面的卓越进步。

尽管性能不是最优的,但由于计算成本明显较低,RAG仍然具有相关性。与LC相比,RAG显著减少了LLM的输入长度,从而降低了成本,因为LLM API的定价通常基于输入令牌的数量。(谷歌,2024;Ope-nAI, 2024 b) 1。此外,我们的分析显示,LC和RAG对超过60%的查询的预测是相同的。对于这些查询,RAG可以在不牺牲性能的情况下降低成本。

Based on this observation, we propose SELF-ROUTE, a simple yet effective method that routes various queries to RAG or LC based on model self-reflection. With SELF-ROUTE, we significantly re-duce the cost while achieving overall performance comparable to LC. For example, the cost is reduced by 65% for Gemini-1.5-Pro and 39% for GPT-4O.

Fig. 1 shows the comparisons of LC, RAG and SELF-ROUTE using three recent LLMs: GPT-4O, GPT-3.5-Turbo and Gemini-1.5-Pro. In addition to quantitative evaluation, we provide a comprehen-sive analysis comparing RAG and LC, including common failure patterns of RAG, the trade-offs between cost and performance, and the results on additional synthetic datasets. Our analysis serves as a starting point, inspiring future improvements of RAG, and as a empirical guide for building long-context applications using RAG and LC.

基于这一观察,我们提出了SELF-ROUTE,这是一种简单而有效的方法,可以基于模型自我反射将各种查询路由到RAG或LC。通过SELF-ROUTE,我们大大降低了成本,同时实现了与LC相当的整体性能。例如,Gemini-1.5-Pro的成本降低了65%,gpt - 40的成本降低了39%。

图1显示了LC, RAG和SELF-ROUTE使用三种最新LLMs的比较:gpt - 40, GPT-3.5-Turbo和Gemini-1.5-Pro。除了定量评估之外,我们还对RAG和LC进行了全面的比较分析,包括RAG的常见故障模式、成本和性能之间的权衡,以及在其他合成数据集上的结果。我们的分析可以作为一个起点,激励RAG的未来改进,并作为使用RAG和LC构建长上下文应用程序的经验指南。

Figure 1: While long-context LLMs (LC) surpass RAG in long-context understanding, RAG is significantly more cost-efficient. Our approach, SELF-ROUTE, com-bining RAG and LC, achieves comparable performance to LC at a much lower cost.图1:虽然长上下文LLMs(LC)在长上下文理解方面超越了RAG,但RAG在成本效益上明显更高。我们的方法SELF-ROUTE将RAG和LC结合,能够以更低的成本实现与LC相当的性能。

6 conclusion

This paper presents a comprehensive comparison of RAG and LC, highlighting the trade-offs between performance and computational cost. While LC demonstrate superior performance in long-context understanding, RAG remains a viable option due to its lower cost and advantages when the input considerably exceeds the model’s context window size. Our proposed method, which dynamically routes queries based on model self-reflection, ef-fectively combines the strengths of both RAG and LC, achieving comparable performance to LC at a significantly reduced cost. We believe our findings contribute valuable insights for the practical appli-cation of long-context LLMs and pave the way for future research in optimizing RAG techniques.

本文对RAG和LC进行了全面的比较,强调了性能和计算成本之间的权衡。虽然LC在长上下文理解方面表现出卓越的性能,但RAG仍然是一个可行的选择,因为它的成本更低,并且在输入大大超过模型的上下文窗口大小时具有优势。我们提出的方法基于模型自反射动态路由查询,有效地结合了RAG和LC的优势,在显著降低成本的情况下实现了与LC相当的性能。我们相信我们的发现为长期上下文LLMs的实际应用提供了有价值的见解,并为优化RAG技术的未来研究铺平了道路。

本文标签: 长上下文方法longLLMs