


  • 1、论文总述
  • 2、Feature Alignment in Object Detection
  • 参考文献


这篇论文的出发点是one-stage检测器中anchor与feature的misalignment问题,two-stage中没有这个问题是因为它在第二阶段有ROI-Pooling或者ROI-Align操作,即将proposals与feature对齐之后再进行一次回归分类;但one-stage例如SSD里,是用小的卷积核直接对feature map进行卷积,feature map的同一个点同时对应着多个不同尺度和shape的anchor,one-stage对尺度的解决办法主要是利用FPN或者多个level的featuremap,将小的ROI放在大的featuremap上,将大的ROI放在小的featuremap上,FPN那篇论文中有个公式好像就是专门用来将不同尺度的ROI分配在不同level的featuremap上;但是onestage里并没有解决同一个feature map里需要对应不同shape的问题,论文中说是没有将学到的anchor与feature对齐,关于对齐这个问题,感觉最近出的好几篇论文都是解决这个问题的,如GA-RPN里的feature adaption,DCN里学习offset,文中也有一段专门比较各个对齐的方法。

具体思路:像是在RefineDet上进行改进,文章将im2col and RoIAlign.进行数学推导比较,发现两者很相似,因此产生了ROIConv,即在前面学到learned anchor之后,将learned anchor与这个feature map本身对应的ROI尺度(根据步长来计算)做差得到offset,然后DCN(可变形卷积)利用这个offset将learned anchor与feature对齐然后回归分类,注意:作者说这是在onestage中实现了ROI的对齐,即使不是one-stage,也是简化Two_stage网络,模糊了one two的界限,还是很有意义的。

关于这个 对齐 的理解可以参考如何评价目标检测模型AlignDet?这个讨论中kwduan的回答,看完会理解的相对好一点。

关于这篇论文算法的解读,有一篇比较好的知乎文章,可以参考,具体细节就不在本博客里写了:2019 AlignDet(One-stage目标检测算法,mAP=44.1)论文阅读笔记

In this work, by discovering the deep connection between im2col [2] and RoIAlign, we propose a
novel RoI Convolution (RoIConv) which performs accurate feature alignment in one-stage detectors
for the first time. Our RoIConv shares the same computation complexity the vanilla convolution
and can be seamlessly integrated into any existing one-stage detectors in a plug-and-play manner.
Based on that, we also propose a Fully Convolutional AlignDet detector which perfectly combines
the flexibility of learned anchors and the preciseness of our RoIConv. Our method enjoys the explicit
alignment of two-stage detectors while remains the fully convolutional nature and computation cost
of one-stage detectors, getting the bests of both worlds.

2、Feature Alignment in Object Detection

SPP-net [11] is the first work to extract fixed-length
features from candidate windows in convolution networks.
RoIPooling [7] improves over SPP-net
by enabling end-to-end training. Both SPP-net and RoIPooling round the sub-region to the nearest
integer boundary which incurs quantization errors. To address the quantization error of RoIPooling,
RoIAlign [9] uses bilinear interpolation to compute the exact values at sampled locations in each RoI bins, showing significant gains for localization.
Deformable RoIPooling [4] adds offsets to each
sub-region for RoIPool, bringing adaptiveness for the region feature.
Guided Anchor [24] tries to
adapt features for learned anchors with anchor-guided deformable convolutions.


