YOLOv8添加AIFI（Attention-based Intrascale Feature Interaction模块替换SPPF模块）|电子爱好者

admin管理员组
文章数量:1590499

1. 引言

1.1 相关介绍

模块名称：Attention-based Intrascale Feature Interaction
论文名称：RT-DETR: DETRs Beat Yolos on Real-time Object Detection
这是论文中的图，此处将其中的AIFI模块拿过来改进YOLOv8。

1.2 其他可改进SPPF模块

如何修改：YOLOv8修改特征金字塔（替换SPPF模块）
或者看此贴：yolov8改进——SFFP特征金字塔池化修改（详细版）
常见特征金字塔模块代码实现：常见特征金字塔模块代码实现

2.改进

2.1 AIFI代码

在YOLOv8新版中，已经集成了这个模块，因此，这里不展示如何放置到yolov8中。

如果使用的是老版的YOLOV8代码，nn模块下新建一个AIFI.py即可。
代码如下：

class TransformerEncoderLayer(nn.Module):
    """Defines a single layer of the transformer encoder."""

    def __init__(self, c1, cm=2048, num_heads=8, dropout=0.0, act=nn.GELU(), normalize_before=False):
        """Initialize the TransformerEncoderLayer with specified parameters."""
        super().__init__()
        self.ma = nn.MultiheadAttention(c1, num_heads, dropout=dropout, batch_first=True)
        # Implementation of Feedforward model
        self.fc1 = nn.Linear(c1, cm)
        self.fc2 = nn.Linear(cm, c1)

        self.norm1 = nn.LayerNorm(c1)
        self.norm2 = nn.LayerNorm(c1)
        self.dropout = nn.Dropout(dropout)
        self.dropout1 = nn.Dropout(dropout)
        self.dropout2 = nn.Dropout(dropout)

        self.act = act
        self.normalize_before = normalize_before

    @staticmethod
    def with_pos_embed(tensor, pos=None):
        """Add position embeddings to the tensor if provided."""
        return tensor if pos is None else tensor + pos

    def forward_post(self, src, src_mask=None, src_key_padding_mask=None, pos=None):
        """Performs forward pass with post-normalization."""
        q = k = self.with_pos_embed(src, pos)
        src2 = self.ma(q, k, value=src, attn_mask=src_mask, key_padding_mask=src_key_padding_mask)[0]
        src = src + self.dropout1(src2)
        src = self.norm1(src)
        src2 = self.fc2(self.dropout(self.act(self.fc1(src))))
        src = src + self.dropout2(src2)
        return self.norm2(src)

    def forward_pre(self, src, src_mask=None, src_key_padding_mask=None, pos=None):
        """Performs forward pass with pre-normalization."""
        src2 = self.norm1(src)
        q = k = self.with_pos_embed(src2, pos)
        src2 = self.ma(q, k, value=src2, attn_mask=src_mask, key_padding_mask=src_key_padding_mask)[0]
        src = src + self.dropout1(src2)
        src2 = self.norm2(src)
        src2 = self.fc2(self.dropout(self.act(self.fc1(src2))))
        return src + self.dropout2(src2)

    def forward(self, src, src_mask=None, src_key_padding_mask=None, pos=None):
        """Forward propagates the input through the encoder module."""
        if self.normalize_before:
            return self.forward_pre(src, src_mask, src_key_padding_mask, pos)
        return self.forward_post(src, src_mask, src_key_padding_mask, pos)


class AIFI(TransformerEncoderLayer):
    """Defines the AIFI transformer layer."""

    def __init__(self, c1, cm=2048, num_heads=8, dropout=0, act=nn.GELU(), normalize_before=False):
        """Initialize the AIFI instance with specified parameters."""
        super().__init__(c1, cm, num_heads, dropout, act, normalize_before)

    def forward(self, x):
        """Forward pass for the AIFI transformer layer."""
        c, h, w = x.shape[1:]
        pos_embed = self.build_2d_sincos_position_embedding(w, h, c)
        # Flatten [B, C, H, W] to [B, HxW, C]
        x = super().forward(x.flatten(2).permute(0, 2, 1), pos=pos_embed.to(device=x.device, dtype=x.dtype))
        return x.permute(0, 2, 1).view([-1, c, h, w]).contiguous()

    @staticmethod
    def build_2d_sincos_position_embedding(w, h, embed_dim=256, temperature=10000.0):
        """Builds 2D sine-cosine position embedding."""
        grid_w = torch.arange(int(w), dtype=torch.float32)
        grid_h = torch.arange(int(h), dtype=torch.float32)
        grid_w, grid_h = torch.meshgrid(grid_w, grid_h, indexing='ij')
        assert embed_dim % 4 == 0, \
            'Embed dimension must be divisible by 4 for 2D sin-cos position embedding'
        pos_dim = embed_dim // 4
        omega = torch.arange(pos_dim, dtype=torch.float32) / pos_dim
        omega = 1. / (temperature ** omega)

        out_w = grid_w.flatten()[..., None] @ omega[None]
        out_h = grid_h.flatten()[..., None] @ omega[None]

        return torch.cat([torch.sin(out_w), torch.cos(out_w), torch.sin(out_h), torch.cos(out_h)], 1)[None]

2.2 task.py

这里新版YOLOv8也帮我们写好了，因此,不需要改动。

如果是老版的代码，在parse_model方法下，找到一堆elif的地方添加以下代码。

        elif m is AIFI:
            args = [ch[f], *args]

老版如下。并没有AIFI的代码。

2.3 模型改进

将yolov8.yaml复制一份，新建yolov8-AIFI.yaml,将SPPF模块替换为AIFI即可,如下。
SPPF那一行修改如下： - [-1, 1, AIFI, [1024, 8]] # 9

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics/tasks/detect

# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4
  - [-1, 3, C2f, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8
  - [-1, 6, C2f, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16
  - [-1, 6, C2f, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]]  # 7-P5/32
  - [-1, 3, C2f, [1024, True]]
  - [-1, 1, AIFI, [1024, 8]]  # 9

# YOLOv8.0n head
head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 6], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f, [512]]  # 12

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 4], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f, [256]]  # 15 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 12], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f, [512]]  # 18 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 9], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f, [1024]]  # 21 (P5/32-large)

  - [[15, 18, 21], 1, Detect, [nc]]  # Detect(P3, P4, P5)

3. 运行图

运行效果如下，没有报错。

提醒：这个对torch版本要求比较高！！！

本文标签：模块 Attention Based AIFI Interaction

版权声明：本文标题：YOLOv8添加AIFI（Attention-based Intrascale Feature Interaction模块替换SPPF模块）内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://m.elefans.com/xitong/1728075365a1144406.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

xp系统

论文《Neighbor Interaction Aware Graph Convolution Networks for Recommendation》阅读

6小时前

论文《Neighbor Interaction Aware Graph Convolution Networks for Recommendation》阅读论文概况Introduction在GCN中，对于目标结点的a

Unity XR Interaction Toolkit（三）Input Action Manager

6小时前

Input Action Asset作用 Input Action Asset负责的是自动启用或者禁用输入动作资产类型列表中输入动作类型的所有输入 Input Action Asset组件详解 Action Assets这就是输入动

GMPNN：Drug-drug interaction prediction with learnable size-adaptive molecular substructures.

6小时前

GMPNN：Drug-drug interaction prediction with learnable size-adaptive molecular substructures. MPNN介绍 https:

Learning Attentive Pairwise Interaction for Fine-Grained Classification论文解读

6小时前

论文链接：https:arxivabs2002.10191 分享的这篇文章来自于AAAI2020，文章的整个思路并不难理解。文章的idea来自于我们人类对相似图像的识别。一般来说，我们识别相似的图像，一方面是去找到图像中特殊的区域

【D3.js 学习记录】——Interaction 地图数据交互可视化

6小时前

文章目录 Interaction -- 地图数据可视化JSON地图数据的表达 -- TopoJson， GeoJsonTopoJsonGeoJsonJson数据的读取编程（内含一些可能会遇见的坑）geoPath数据绑定比例尺（如何让地图平铺

Vux的VChart支持antv F2交互行为（Interaction）

6小时前

F2提供了以下五种通用的交互行为(Interaction)：图表平移(Pan)图表缩放(Pinch)Swipe 快扫饼图选中柱状图选中v-chart并不支持F2的交互行为（Interaction），如果我们要想图

[HOI Transfomer] End-to-End Human Object Interaction Detection with HOI Transformer(CVPR. 2021)

6小时前

1. Motivation 目前现有的HOI（任务交互）领域的方法是one-stage或者two-stage的。 Current approaches either decouple HOI task into separated sta

Protein-protein interaction site prediction through combining local and global features 文章梳理

6小时前

作者：中南大学李敏团队发表期刊：Bioinformatics 时间：2019.9.4 0 写在前面的疑惑 1）如果一个氨基酸的绝对溶剂可及性<

论文阅读：Compositional Learning for Human Object Interaction

5小时前

Compositional Learning for HOI（ECCV 2018）文章作者的的想法是因为我们很难搜集到所有组合之间的interaction，所以必须会面临的问题就是要识别在数据集中从未见到过的情况，也就是HOI的z

visjs入门--模块interaction

5小时前

interactionthese are all options in full.var options = {interaction:{dragNodes:true,dragView: true,hideEdgesOnDrag

textstudio问题： Could not start the command: pdflatex.exe -synctex=1 -interaction=nonstopmode

5小时前

同时安装textlive32位与64位之后，并且在系统环境变量中设置64位命令在32位命令之前，再使用texstudio然后出现一个问题，不论怎么执行都会出现下面这个问题&a

多模态融合(七)Multi-modality Latent Interaction Network for Visual Question Answering

5小时前

背景本篇论文的工作来自于香港中文大学-商汤联合实验室。与DFAF出自同一作者之手。论文接收于ICCV2019 摘要现有的解决VQA问题的方法大都是关注于各个独立的image regions于question words之间的联系(

Interaction triggers in WPF

5小时前

 Interaction triggers in WPF Interaction Class - static class that owns the Triggers and Behavio

OpenLayers 之地图交互功能（interaction）详解，openlayers百度地图

5小时前

转自：http:www.bkjiawebzh1003573.html 地图交互功能和上一篇讲的地图控件有些混淆，它们都控制着用户与地图的交互，区别是地图控件的触发都是

IANet：Interaction-and-Aggregation Network for Person Re-identification阅读笔记

5小时前

IANet:Interaction-and-Aggregation Network for Person Re-identification 1. 摘要由于CNN具有固定的几何结构(卷积固定的滑动窗口)，因此在模

【论文笔记】《Efficient Physics-Based Implementation for Realistic Hand-Object Interaction...》

5小时前

一、相关介绍《Efficient Physics-Based Implementation for Realistic Hand-Object Interaction in Virtual Reality》（基于物

QCustomPlot之Interaction简单解析

5小时前

Interaction实例中主要简单的使用了跟用户交互有关的一些plottables的信号和常用函数的调用。主要用到的部件有QCPPlotTitle,QCPAxis,QCPLegend,QCPPlottableLegendItem,QCP

MPAndroidChart 教程：与图表的交互 Interaction with the Chart

5小时前

该库允许您完全自定义与图表视图的可能触摸（和手势）交互，并通过回调方法对交互作出反应。启用禁用交互 setTouchEnabled(boolean enabled)&

Cation–pi interaction in protein structures（1）

5小时前

In addition to noncovalent interactions, the cation-π interaction (Dougherty, 1996) is recognized as an important noncov

Video-based Evanescent, Anonymous, Asynchronous Social Interaction: Motivation and Adaption to Mediu

5小时前

题目：Video-based Evanescent, Anonymous, Asynchronous Social Interaction: Motivation and Adaption to Medium 作者&

电子爱好者 - 最新技术资讯及电子产品介绍！

YOLOv8添加AIFI（Attention-based Intrascale Feature Interaction模块替换SPPF模块）

1. 引言

1.1 相关介绍

1.2 其他可改进SPPF模块

2.改进

2.1 AIFI代码

2.2 task.py

2.3 模型改进

3. 运行图

更多相关文章

论文《Neighbor Interaction Aware Graph Convolution Networks for Recommendation》阅读

Unity XR Interaction Toolkit（三）Input Action Manager

GMPNN：Drug-drug interaction prediction with learnable size-adaptive molecular substructures.

Learning Attentive Pairwise Interaction for Fine-Grained Classification论文解读

【D3.js 学习记录】——Interaction 地图数据交互可视化

Vux的VChart支持antv F2交互行为（Interaction）

[HOI Transfomer] End-to-End Human Object Interaction Detection with HOI Transformer(CVPR. 2021)

Protein-protein interaction site prediction through combining local and global features 文章梳理

论文阅读：Compositional Learning for Human Object Interaction

visjs入门--模块interaction

textstudio问题： Could not start the command: pdflatex.exe -synctex=1 -interaction=nonstopmode

多模态融合(七)Multi-modality Latent Interaction Network for Visual Question Answering

Interaction triggers in WPF

OpenLayers 之 地图交互功能（interaction）详解，openlayers百度地图

IANet：Interaction-and-Aggregation Network for Person Re-identification阅读笔记

【论文笔记】《Efficient Physics-Based Implementation for Realistic Hand-Object Interaction...》

QCustomPlot之Interaction简单解析

MPAndroidChart 教程：与图表的交互 Interaction with the Chart

Cation–pi interaction in protein structures（1）

Video-based Evanescent, Anonymous, Asynchronous Social Interaction: Motivation and Adaption to Mediu

发表评论

推荐文章

APP上线到360步骤

Linux文本编辑器Vim操作命令汇总！

【数字信号去噪】阿基米德算法优化变分模态分解AOA-VMD数字信号去噪（优化K值 alpha值 综合指标 适应度函数包络熵）【含Matlab源码 4877期】

计算机桌面上的微信图标不显示不出来的,电脑微信图标任务栏不见了怎么办

win10来访计算机需要密码,如何在Windows10中创建访客帐户-保护隐私简单方法

热门文章

uni-app轮播图实现之swiper

通过 itms:services:? 在线安装ipa ，跨过app-store

Linux常用命令英文全称与中文解释 (pwd、su、df、du等)

linux xfs文件系统故障修复,xfs文件系统修复方法

系统兼容软件CrossOverv22版本 轻松运行Windows应用

如何彻底卸载3dmax2020_3dsmax2020卸载安装失败如何彻底卸载清除干净3dsmax2020注册表和文件的方法...

10款国民级企业文件加密系统介绍，究竟哪一个是你的菜？

C语言实现文件加密与解密技术解析

16.Modularized Interaction Network for Named Entity Recognition 阅读笔记

Active Interaction 使用指南

最新文章

Win10中PyCharm2020.1.4安装使用入门（修订版）

Java开发工具：IDEA 2023.3(Win&amp;Mac)中文激活版

三分钟教会你FL Studio 21.3中文破解版图文激活教程

catiawin10许可证灰色_安装CATIA V5 6R2017 Win64时“许可证管理工具”窗口不弹出解决方案...

iMazing 3.0.3.1Mac中文破解版下载安装激活

工具.USB抓包软件（Win）

win10 下的anaconda + pytroch深度学习环境配置

WIN10系统下在ANACONDA中激活PYTHON及安装AUTOKERAS的步骤

FL Studio24.1.1.4285最新中文破解版下载 2024年最新附带补丁器激活码

Camtasia2024破解版注册码激活秘钥

教你一招，轻松激活Winrar

FL Studio24.1.1.4239官方中文破解版安装激活图文教程使用指南

win10 下 EditPlus5.5 安装和使用记录

mathtype永久破解版2024最新安装教程中文版

原型设计工具：Axure RP9 中文激活版 winmac

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

OpenLayers 之地图交互功能（interaction）详解，openlayers百度地图

【数字信号去噪】阿基米德算法优化变分模态分解AOA-VMD数字信号去噪（优化K值 alpha值综合指标适应度函数包络熵）【含Matlab源码 4877期】

系统兼容软件CrossOverv22版本轻松运行Windows应用

Java开发工具：IDEA 2023.3(Win&Mac)中文激活版

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载