论文阅读-Modular Interactive Video Object Segmentation Interaction-to-Mask, Propagation|电子爱好者

admin管理员组
文章数量:1590153

Abstract

我们提出了模块化交互式 VOS (MiVOS) 框架，该框架将interaction-to-mask和mask propagation解耦，从而实现更高的通用性和更好的性能。单独训练的交互模块将用户交互转换为对象掩码，然后由我们的传播模块在读取space-time memory时使用新的 top-k 过滤策略进行时间传播。为了有效地考虑用户的意图，提出了一种新颖的difference-aware module来学习如何在每次交互之前和之后正确融合掩码，这些掩码通过使用space-time memory与目标帧对齐。我们在 DAVIS 上使用不同形式的用户交互（例如，涂鸦、点击）对我们的方法进行了定性和定量评估，以表明我们的方法优于当前最先进的算法，同时需要更少的帧交互，在泛化方面具有额外优势针对不同类型的用户交互。我们贡献了一个具有 480 万帧像素精确分割的大规模合成 VOS 数据集，以配合我们的源代码，以促进未来的研究。

Introduction

interactive VOS(iVOS)

特点： interactive VOS方法将用户交互（例如，涂鸦或点击）作为输入，用户可以在其中迭代地细化结果直到满意。

包含的两个任务：

interaction understanding
temporal propagation

Existing Problem

（1）The strong coupling limits the form of user interaction (e.g., scribbles only) and makes training difficult.Attempts to decouple the two tasks fail to reach state-of-the-art accuracy as user’s intent cannot be adequately taken into account in the propagation process.

强耦合限制了用户交互的形式（例如，仅涂鸦）并使训练变得困难。由于在传播过程中无法充分考虑用户的意图，尝试将这两个任务解耦未能达到最先进的准确性 .

（2）naive decoupling may lead to loss of user’s intent as the original interaction is no longer available in the propagation stage.

naive解耦可能会导致失去用户的意图，因为原始交互在传播阶段不再可用。

Solution

We present a decoupled modular framework to address the iVOS problem.

Contributions

We innovate on the decoupled interaction-propagation framework and show that this approach is simple, effective, and generalizable.我们对解耦的交互传播框架进行了创新，并表明这种方法简单、有效且可推广。
We propose a novel lightweight top-k filtering scheme for the attention-based memory read operation in mask generation during propagation.我们提出了一种新颖的轻量级 top-k 过滤方案，用于在传播过程中的掩码生成中基于注意力的内存读取操作。
We propose a novel difference-aware fusion module to faithfully capture the user’s intent which improves iVOS accuracy and reduces the amount of user interaction.我们提出了一种新颖的差异感知融合模块来忠实地捕捉用户的意图，从而提高 iVOS 的准确性并减少用户交互量。
We contribute a large-scale synthetic VOS dataset with 4.8M frames to accompany our source codes to facilitate future research.我们提供了一个具有 480 万帧的大规模合成 VOS 数据集，以配合我们的源代码，以促进未来的研究。

Related Work

Progress in iVOS is shown below:

Semi-Supervised Video Object Segmentation

defination: segment a specific object throughout a video given only a fully-annotated mask in the first frame.

Interactive Video Object Segmentation (iVOS)

focus:

（1）scribble interaction

（2）click interaction

Interactive Image Segmentation

Method

Initial Work

Initially, the user selects and interactively annotates one frame (e.g., using scribbles or clicks) to produce a mask.

最初，用户选择并交互式地注释一帧（例如，使用涂鸦或点击）以生成蒙版。

MiNet Overview

Character Denfination

（1）We denote r as the current interaction round

（2）the user-interacted frame index in the r-th round is tr

（3）the mask results of the r-th round is Mr

（4）the mask of individual j-th frame is denoted as M rj

Core Component

interaction-to-mask:allowing the user to obtain real-time feedback and achieve a satisfactory result on a single frame

mask propagation: the corrected mask is bidirectionally propagated

difference-aware fusion: use the two sequences while avoiding possible decay or loss of user’s intent.

how to capture the user’s intent:use the difference in the selected mask before and after user interaction

Figure

Interaction-to-Mask

Scribble-to-Mask(S2M)

Goal: produce a single-image segmentation in real time given input scribbles

backbone: DeepLabV3+ semantic segmentation network

Local Control

previous state-of-the-art approach:it may harm the global result when only local fine adjustment is needed toward the end of the segmentation process.

the source of previous state-of-the-art approach:

Konstantin Sofiiuk, Ilia Petrov, Olga Barinova, and Anton Konushin. f-brs: Rethinking backpropagating refinement for interactive segmentation. In CVPR, 2020. 1, 2, 3, 4, 7, 8

our approach:it is straightforward to assert local control by limiting the interactive algorithm to apply in a user-specified region

the comparison of above two approaches:

Temporal Propagation

Goal: tracks the object and produces corresponding masks in subsequent frames.

Memory Read with Top-k Filtering

（1）计算affinity

F ∈ R THW ×HW represents the affinity between a query position and a memory position

（2）filter the affinities such that only the top-k entries are kept

作用：effectively removes noises regardless of the sequence length

优点：increase robustness and overcome the overhead of top-k

（3）For query position j, the feature mj is read from memory by：

（4）concatenate the read features with vQ

the process is shown below:

Propagation strategy

our propagation scheme:

Difference-Aware Fusion

（1）compute the positive and negative changes separately as two masks D+ and D−

说明：(·)+ is the max(·, 0)

（2）compute the aligned masks

说明：W来自Memory Read with Top-k Filtering中的第二步

（3）feed these features into a simple five-layer residual network which is terminated by a sigmoid to output a final fused mask

Mechanism of the difference-aware fusion module:

说明：

Experiment

Performance on the DAVIS interactive validation set:

Conclusion

我们提出 MiVOS，一种由三个模块组成的新型解耦方法:Interaction-to-Mask, Propagation and Difference-Aware Fusion.通过将交互与传播解耦，MiVOS 是通用的，并且不受交互类型的限制。另一方面，所提出的fusion module通过忠实地捕捉用户的意图来协调交互和传播，并减少在解耦过程中丢失的信息，从而使 MiVOS 既准确又高效。我们希望我们的 MiVOS 能够激发和激发 iVOS 的未来研究

本文标签：论文 Interactive Video object Modular

版权声明：本文标题：论文阅读-Modular Interactive Video Object Segmentation Interaction-to-Mask, Propagation 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://m.elefans.com/dongtai/1728076009a1144481.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

xp系统

电子爱好者 - 最新技术资讯及电子产品介绍！

论文阅读-Modular Interactive Video Object Segmentation Interaction-to-Mask, Propagation

Abstract

Introduction

interactive VOS(iVOS)

Existing Problem

Solution

Contributions

Related Work

Semi-Supervised Video Object Segmentation

Interactive Video Object Segmentation (iVOS)

Interactive Image Segmentation

Method

Initial Work

MiNet Overview

Character Denfination

Core Component

Figure

Interaction-to-Mask

Scribble-to-Mask(S2M)

Local Control

Temporal Propagation

Memory Read with Top-k Filtering

Propagation strategy

Difference-Aware Fusion

Experiment

Conclusion

更多相关文章

opentracing-02 dapper论文词汇摘要

video 标签在谷歌浏览器下报错 DOMException: play() failed because the user didn't interact

CVPR 2020 论文大盘点-图像增强与图像恢复篇

YOLOv2论文中英文对照翻译

【实用sci论文常用词语】

Item 7: Eliminate obsolete object references

小程序 video 控制器外观调整_Kessil 360X Tuna Sun无线控制器使用分享

学生宿舍管理系统的设计与实现(课程论文)

论文笔记之Learning Human-Object Interaction Detection using Interaction Points

论文《Neighbor Interaction Aware Graph Convolution Networks for Recommendation》阅读

【论文阅读】CIR-Net: Cross-Modality Interaction and Refinement for RGB-D Salient Object Detection

[HOI Transfomer] End-to-End Human Object Interaction Detection with HOI Transformer(CVPR. 2021)

HOTR: End-to-End Human-Object Interaction Detection with Transformers

【论文阅读】Hierarchical Alternate Interaction Network for RGB-D Salient Object Detection

论文解读：Prediction of Protein–Protein Interaction Sites Using Convolutional Neural Network

【论文】(IJCAI20 知识图谱神经网络)KGNN: Knowledge Graph Neural Network for Drug-Drug Interaction Prediction

AI医药论文笔记--MUFFIN: multi-scale feature fusion for drug–drug interaction prediction

论文浏览(8) Asynchronous Interaction Aggregation for Action Detection

Video-based Evanescent, Anonymous, Asynchronous Social Interaction: Motivation and Adaption to Mediu

关于 ChatGPT 必看论文推荐【附论文链接】

发表评论

推荐文章

linux系统开机grub命令修复方法,解析Linux系统下GRUB故障修复

编写操作系统的平台之争：Windows还是*nix

Docker 基础概念 及 windows下使用

things 3 mac 破解版永久激活方法

2024会声会影破解免费序列号，激活全新体验！

热门文章

uni-app开发APP上架应用市场遇到的坑

Android应用加固（使用360加固保）

App常见面试题

逆向某视频app（一）

Windows 7 bluetooth 外围设备 解决方案

一种轻便的裸机多任务实现方法

5年Java经验字节社招：半月3次面试，成功拿到Offer，必收藏！！

ip-guard加密客户端加密后文档图标没有加密锁的标志

【双系统】忘记Ubuntu登录密码，应该重装系统还是换新电脑？

记一次重置金立手机密码

最新文章

kali linux 忘记密码如何重置|在 kali 上重置密码

linux之Centos系统破解密码两种方法

RedHat9.0如何修改root密码及设置GRUB密码

xp开机密码破解

深信服 SANGFOR 设备密码恢复和配置备份恢复

华为手机计算机快捷设置密码,华为手机首次重启绘制图案密码后让输入解锁密码。我没有设置呀？怎么办...

树莓派 忘记密码的解决方法

CentOS6忘记密码的解决办法。

macbook重置账户密码

Mac忘记登录密码解决方案

linux 找回登录密码,linux找回密码

Ubuntu 重置密码

Linux系统启动故障与忘记root用户密码时怎么办

Docker 基础概念及 windows下使用

Windows 7 bluetooth 外围设备解决方案

树莓派忘记密码的解决方法

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载