HOTR: End-to-End Human-Object Interaction Detection with Transformers|电子爱好者

admin管理员组
文章数量:1589818

模型在vcoco场景1上的验证效果

模型在vcoco场景2上的验证效果模型在HICO-DET上的验证效果

HOTR的模型结构图如下所示:
在代码中如何实现的?

在Backbone中:
(1)将图片([bs,3,H,W])送入CNN模型中进行特征提取,使用了ResNet50,得到特征图src([bs,2048,h,w])
(2)引入位置编码pos_embed[bs,256,h,w],query_embed([100,256])
在进入Transformer前,将特征图src降维([bs,256,h,w])
进入Transformer:
(1)Encoder:
首先将src与pos_embed降维,并交换维度:
src由[bs,256,h,w]→[hw,bs,256],
pos_embed由[bs,256,h,w]→[hw,bs,256],
query_embed由[100,26]→[100,bs,256],

B. 将src,pos_embed,query_embed送入Encoder中,得到memory : [hw,bs,256]

(2)Decoder
首先新引入一个全0的Tensor:tgt,其维度与query_embed([100,bs,256])一样
将tgt,memory,pos_embed,query_embed送入Decoder中,得到hs : [6,bs,100,256]

hs的维度为[6,bs,100,256],这是因为在Transformer中将6个Decoder的输出(Tensor[bs,100,256])整合到一个Tensor中,得到维度为[6,bs,100,256]的Tensor

4. 实例表示:执行目标检测

inst_repr = F.normalize(hs[-1], p=2, dim=2) # 处理最后一个解码器的结果,得到实例表示
outputs_class = self.detr.class_embed(hs) #[6,bs,100,92],class_embed是一个nn.Linear(256,num_classes + 1)
outputs_coord = self.detr.bbox_embed(hs).sigmoid() # [6,bs,100,4], bbox_embed是一个MLP(256, 256, 4, 3)

其中:

本文标签： object Human HOTR Transformers Detection

版权声明：本文标题：HOTR: End-to-End Human-Object Interaction Detection with Transformers 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://m.elefans.com/xitong/1728076372a1144528.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

xp系统

【异常检测第一篇】DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning

26天前

前言异常检测也属于时间序列问题的一个大分支，记下来一段时间我也会定期分享一些这方面的内容，结合很多ML,DL知识的异常检测和诊断问题，我们一起努力！

PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark 论文笔记

25天前

原文链接：https:arxivabs2203.11089 本文是针对单目相机的3D车道检测任务提出的，理论上其Perspective Transformer也可用于BEV 3D目标检测。

Multiview Detection with Feature Perspective Transformation

25天前

Multiview Detection with Feature Perspective Transformation （2020 ECCV） 处理严重遮挡下的行人检测学习记录文章目录 Mul

Transformers from an Optimization Perspective

25天前

Transformers from an Optimization Perspective

Python3提示Exception inside application: object.init() takes exactly one argument (the instance to

21天前

以下解决办法所处于的软件版本有： Python3 Django2 Channel3 问题场景： 在 Channels 3.0.0 版中，当我尝试通过 websocke

Error occurred during initialization of VMCould not reserve enough space for object heap的一种不是解决的解决

21天前

一、问题描述项目启动时，报错： Error occurred during initialization of VM Could not reserve enough space for o

GLib-GObject-CRITICAL **: g_object_set: assertion 'G_IS_OBJECT (object)' failed

20天前

opencv 视频解码一直报错, 根据资料, 重新安装了下 libcurl库, sudo apt install libcurl4-openssl-dev 现在还没有跟踪到具体出错的原因, 希望大大们有消息了, 说一声.

IE浏览器不支持object-fit的解决方案

19天前

项目上使用object-fit:cover用于图片裁剪，以保持长宽比，保证内容区域被填满。但IE浏览器并不支持，我这里用https:githubanselmhob

【时间序列预测】Are Transformers Effective for Time Series Forecasting?

18天前

题目：Transformers 对时间序列预测有效吗? 发表时间：2022.05.26 平台：arXiv 来源：香港中文大学最近&#xf

Hard-Hat Detection for Construction Safety Visualization翻译

12天前

原文地址：Hard-Hat Detection for Construction Safety Visualization 译文： 用于施工安全可视化的安全帽检测摘要在2012

android indicate,Why does Android Studio indicate that an object might be null when it cannot be?

9天前

Why does Android Studio indicate that an object might be null when it cannot be? Because lint cannot know. document.exis

DeprecationWarning: executable_path has been deprecated, please pass in a Service object

4天前

DeprecationWarning: executable_path has been deprecated, please pass in a Service object 弃用警告：Executive_path已

论文阅读 (76)：Anomaly Detection in Video Sequence with Appearance-Motion Correspondence

2天前

文章目录 1 概述1.1 题目1.2 摘要1.3 代码1.4 引用 2 方法2.1 初始模块 (Inception module)2.2 外观卷积自编码器 (Conv-AE)2.3 动作预测U-Net2.4 额外的动作相关目标函数2.5 异

Paper Reading《Taming Pretrained Transformers for Extreme Multi-label Text Classification》

1天前

time：2020-11-30 github code arxiv paper SIGKDD 2020 Applied Data Track 1. 主要工作针对极端多标签文本分类（Ext

论文阅读：Early depression detection in social media based on deep learning and underlying emotions

1天前

题目： Early depression detection in social media based on deep learning and underlying emotions 基于深度学习和潜在情感的社

HOTR: End-to-End Human-Object Interaction Detection with Transformers论文阅读笔记

2小时前

一、本文的内容 1. 研究目的本文提出了一种基于transformer的人物交互的新的框架，它能够根据图像预测出a pair of 三元组( 人，物，交互)&#x

Cascaded Human-Object Interaction Recognition论文阅读笔记

2小时前

笔记现有的方法大都采用single-stage的推理线，考虑到任务的复杂性，作者提出了一种采用级联结构，多分支，从粗糙到细致的HOI理解。如图1，作者的模型包含了一个实例定位网络和一个交互识别网络。这两个网络都以级联的形式工作，通过实例定

paper reading(2)-HOTR: End-to-End Human-Object Interaction Detection with Transformers

2小时前

注：该文章取自CVPR2021 源码： Abstract 首先介绍了一下HOI任务：检测人与物体交互关系的任务，包含 i)定位交互的主体和客体 ii)交互标签的分类大多数现有的方法是通过检测人和对象，分别推断每一对直接的关系，但这种方