NLP系列经典论文(1)-- Attention Is All You Need|电子爱好者

admin管理员组
文章数量:1623572

首先放论文原文链接

https://arxiv/pdf/1706.03762.pdfhttps://arxiv/pdf/1706.03762.pdf

Abstract

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 Englishto-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data

主流的序列转录模型大多由基于复杂循环或CNN的编码器和解码器构成。表现最好的模型也通过注意力机制将编码器和解码器结合在一块儿。这里，我们提出一个新的简单的网络架构，Transformer，不需要复杂的循环或卷积，只基于注意力机制。实验证明我们的模型在性能发面很突出，并行度更高且训练时间更少。（下面介绍了一些机器翻译数据集上的运行效果，带有绝对精度和相对精度，以及训练成本和时间）我们发现Transformer可以很好的泛化到别的任务中去，在大数据集和小数据集上都可以有很好的表现。

大致结构是：领域内的主要做法 + 我们的模型概要 + 模型特性 + 实验证明（绝对精度+相对精度+训练成本）+ 模型展望

1 Introduction

Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks in particular, have been firmly established as state of the art approaches in sequence modeling and transduction problems such as language modeling and machine translation [35, 2, 5]. Numerous efforts have since continued to push the boundaries of recurrent language models and encoder-decoder architectures [38, 24, 15].

Recurrent models typically factor computation along the symbol positions of the input and output sequences. Aligning the positions to steps in computation time, they generate a sequence of hidden states ht, as a function of the previous hidden state ht−1 and the input for position t. This inherently sequential nature precludes parallelization within training examples, which becomes critical at longer sequence lengths, as memory constraints limit batching across examples. Recent work has achieved significant improvements in computational efficiency through factorization tricks [21] and conditional computation [32], while also improving model performance in case of the latter. The fundamental constraint of sequential computation, however, remains.

Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences [2, 19]. In all but a few cases [27], however, such attention mechanisms are used in conjunction with a recurrent network.

In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.

RNN，特别是LSTM和GRU，已经在时间序列模型领域，如语言模型、机器翻译等任务中，构建起了目前最先进的方法。

RNN序列模型：并行度差，训练速度慢；对于长序列记忆丢失严重。尽管做了弥补工作，但问题依然存在。

注意力机制在序列模型领域有较为重要的作用，但当前大部分实列都基于RNN。

我们的工作只基于Attention，我们的优点......（和摘要部分差不多）

目前的研究现状，他们做出了那些贡献，存在那些缺点，为此我们的模型做到了能够弥补他们的缺点的东西。

2 Background

The goal of reducing sequentia

本文标签：系列经典论文 NLP Attention

版权声明：本文标题：NLP系列经典论文(1)-- Attention Is All You Need 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://m.elefans.com/dongtai/1728891495a1178109.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

xp系统

电子爱好者 - 最新技术资讯及电子产品介绍！

NLP系列经典论文(1)-- Attention Is All You Need

Abstract

1 Introduction

2 Background

更多相关文章

电脑硬件故障的排除方法经典收集大全

Python百日百行代码挑战-day8，day9，day10，游戏实战系列-五子棋

【论文阅读】Learning Distinctive Margin toward Active Domain Adaptation中categorical-wise margin loss的作用

计算机视觉论文-2021-08-03

[论文工具] LaTeX论文撰写常见用法及实战技巧归纳（持续更新）

计算机视觉论文-2021-07-28

经典网络LeNet-5介绍及代码测试(Caffe, MNIST, C++)

学习笔记|NLP中的注意力机制汇总

图像匹配 | 论文与方法整理

【论文简述】Efficient Multi-view Stereo by Iterative Dynamic Cost Volume（CVPR 2022）

解决联想拯救者系列笔记本电脑无线网高频断联问题~

零基础多图详解图神经网络（GNNGCN）【论文精读】

Attention Is All You Need-论文解读（不含实验）

Transformer论文解读和Bert模型架构

【人工智能概论】 Transformer论文翻译与粗浅解读

深度学习及论文中相关术语解析

【论文翻译】UniT: Unified Knowledge Transfer for Any-Shot Object Detection and Segmentation

论文Summary03——TAP规则脆弱性静态修复

[论文翻译]Attention Is All You Need

WebRTC系列-网络传输之7-ICE补充之偏好(preference)与优先级(priority)

发表评论

推荐文章

基于NodeJS英雄联盟游戏游戏综合网站

Requests库应用实例4：网络图片的爬取与存储(以爬取英雄联盟皮肤图片为例)

解决chrome内置google翻译无法使用的问题

ChatGLM3

Hadoop在启动yarn时报错：Cannot set priority of resourcemanager process xxxxx

热门文章

关于科技创新6 On Technology Innovation 6

自动化迅雷下载文件(vbs脚本)调用迅雷api

Creative Cloud软件（photoshops）默认安装路径更改方法

今日制造怎么安装solidworks插件_PS插件安装后出现了登陆界面，无法使用怎么解决？保证一招搞定...

cd linux 官网,cdlinux

手机维修之 ------ 维修中的降龙十八掌

微软发布了Win11 24H2版本的首个设置动态更新和恢复！

搜狗高级测试经理诸葛东明谈基于AI图像识别的输入法性能测试实践

MATLAB 2015B中文安装激活破解方法图文教程

机器学习入门基础（万字总结）（建议收藏！！！）

最新文章

Ubuntu20.04 安装星火商店安装Windows等应用

UOS 商店已过期试用

破解华强北子腾系统手表应用商店应用和主题商店表盘，实现表盘自由！

Ubuntu软件下载速度慢解决

ubuntu18.04 下载软件速度慢的解决办法

【Android工具】盘点几个有应用历史版本功能的应用商店

关于如何下载干净-无广告的常用软件

华为metebook电脑如何修改应用商店中新应用的默认安装路径

[linux]深度软件中心无法下载软件的临时解决办法

ipa下载安装神器？苹果软件这么容易就被搞定了

Google商店中下载安兔兔评测，后下载3D资源插件，提示“您的设备与此版本不兼容”

解决Microsoft Store无法下载软件问题

应用商店打开服务器错误,应用商店出错的修复方法

windows 没有应用商店，直接安装所需应用的解决方案

如何把microsoft store里面的软件添加到桌面

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

如何实现高效的treenode搜索算法

treenode与链表有何本质区别

在哪些场景下应优先考虑使用treenode

treenode在树形结构中的角色是什么

如何通过treenode实现二叉树

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载