admin管理员组

文章数量:1622541

1. 起因

起因是最近有个小朋友同事,把SIFT边缘响应抑制的阈值设置成了4,导致没有任何一个关键点可以通过筛选。帮她把来龙去脉搞清楚了之后,自然知道了阈值应该如何设置。顺便和大家分享一下,也考虑到这不算什么公司机密,都是二十年前的算法了。

2. 原文引用

在SIFT的文章1中,边缘响应抑制的段落是4.1节。鉴于本身段落比较独立的,先在这里引用如下,后面会拆解分析。

2.a For stability, it is not sufficient to reject keypoints with low contrast. The difference-of-
Gaussian function will have a strong response along edges, even if the location along the
edge is poorly determined and therefore unstable to small amounts of noise.
2.b A poorly defined peak in the difference-of-Gaussian function will have a large principal
curvature across the edge but a small one in the perpendicular direction. The principal curva-
tures can be computed from a 2x2 Hessian matrix, H, computed at the location and scale of
the keypoint:

H = [ D x x D x y D x y D y y ] H=\left[ {\begin{array}{c} {D_{xx} } & {D_{xy} } \\ {D_{xy} } & {D_{yy} } \\ \end{array}} \right] H=[DxxDxyDxyDyy]
The derivatives are estimated by taking differences of neighboring sample points.
2.c The eigenvalues of H are proportional to the principal curvatures of D. Borrowing from
the approach used by Harris and Stephens (1988), we can avoid explicitly computing the
eigenvalues, as we are only concerned with their ratio. Let α be the eigenvalue with the
largest magnitude and β be the smaller one. Then, we can compute the sum of the eigenvalues from the trace of H and their product from the determinant:

T r ( H ) = D x x + D y y = α + β D e t ( H ) = D x x D y y − ( D x y ) 2 = α β Tr\left( {\rm{H}} \right) = D_{xx} + D_{yy} = \alpha + \beta \\ Det\left( {\rm{H}} \right) = D_{xx} D_{yy} - \left( {D_{xy} } \right)^2 = \alpha \beta Tr(H)=Dxx+Dyy=α+βDet(H)=DxxDyy(Dxy)2=αβ
2.d In the unlikely event that the determinant is negative, the curvatures have different signs so the point is discarded as not being an extremum. Let r be the ratio between the largest magnitude eigenvalue and the smaller one, so that α = rβ. Then,
T r ( H ) 2 D e t ( H ) = ( α + β ) 2 α β = ( r α + β ) 2 r β 2 = ( r + 1 ) 2 r \frac{{Tr\left( {\rm{H}} \right)^2 }}{{Det\left( H \right)}} = \frac{{\left( {\alpha + \beta } \right)^2 }}{{\alpha \beta }} = \frac{{\left( {r\alpha + \beta } \right)^2 }}{{r\beta ^2 }} = \frac{{\left( {r + 1} \right)^2 }}{r} Det(H)Tr(H)2=αβ(α+β)2=rβ2(rα+β)2=r(r+1)2
2.e which depends only on the ratio of the eigenvalues rather than their individual values. The
quantity
( r + 1 ) 2 r \frac{{\left( {r + 1} \right)^2 }}{r} r(r+1)2 is at a minimum when the two eigenvalues are equal and it increases with r. Therefore, to check that the ratio of principal curvatures is below some threshold, r, we only need to check

T r ( H ) 2 D e t ( H ) < ( r + 1 ) 2 r \frac{{Tr\left( {\rm{H}} \right)^2 }}{{Det\left( H \right)}} < \frac{{\left( {r + 1} \right)^2 }}{r} Det(H)Tr(H)2<r(r+1)2

2.f This is very efficient to compute, with less than 20 floating point operations required to
test each keypoint. The experiments in this paper use a value of r = 10, which eliminates
keypoints that have a ratio between the principal curvatures greater than 10.

3. 鞍点的排除

首先,我们来看一下原文的这句话:In the unlikely event that the determinant is negative, the curvatures have different signs so the point is discarded as not being an extremum. (我标记为2.d节)。原文的一整段都是在说边缘响应的抑制,但这里的这句话,描述的是鞍点的排除。
在高等数学的二维微积分中,判断一个点是鞍点还是极值点的依据,如下列出:
D=fxx(x0,y0)⋅fyy(x0,y0)−[fxy(x0,y0)]2
如果 D>0 且 fxx(x0,y0)>0,则 (x0,y0) 是局部极小值点。
如果 D>0 且 fxx(x0,y0)<0,则 (x0,y0) 是局部极大值点。
如果 D<0,则 (x0,y0) 是鞍点。
如果 D=0,则需要进一步分析(例如,通过更高阶导数或泰勒展开)来确定临界点的类型。
而D其实就是2节中Hessian矩阵H的行列式(determinant)。

4. 主曲率,边缘和关键点

4.1 什么是主曲率

过曲面上某个点上具有无穷个正交曲率,其中存在一条曲线使得该曲线的曲率为极大,这个曲率为极大值Kmax ,垂直于极大曲率面的曲率为极小值Kmin。这两个曲率属性为主曲率。他们代表着法曲率的极值。
法曲率的定义有点复杂,后续有机会再补吧(TODO),反正大家定性地知道,主曲率是一组曲率(有两个值),方向是互相正交的,一个代表着当前点的极大曲率,一个代表着当前点的极小曲率。

4.2 主曲率和边缘


对于上图这样的边缘,假定它是8位图像。那么主曲率很显然,一个水平方向的0,一个是竖直方向的255。
根据2.c节中的第一句话,主曲率的比值等于H矩阵(Hessian矩阵)特征值的比值,则有(2.d节中)α和β的比值r=255/0(无穷大),或者0/255(接近0),总之会非常远离1,因为1的时候,α和β是相等的。由于我们同时考虑了r=255/0(无穷大),或者0/255(接近0),所以刚才的讨论是兼容旋转的。

4.3 主曲率和关键点

我们知道,SIFT的关键点提取,归根结底,使用的是二维Laplace滤波核,在与图像做相关性。
一维的Laplace滤波核是两边凸起来,那么二维的就是一维的转一圈,是一个圆环。

所以,理论上想要SIFT提取出来的关键点,应该是比较中心对称的,而中心对称的图像,主曲率的两个值是比较接近,也就是2.d节说到的α = rβ中的r接近1.

5. 减小主曲率之比的计算量

据4节中的讨论,Hessian矩阵的特征值的比r,越接近1,则越不是边缘;越远离1,则越是边缘。在实际的运算中,求矩阵特征值的运算量是比较大的,能避免就避免。于是SIFT的作者(确切的说是他们引用的Harris角点的作者)就想了一个办法,找到了一个表达式。这个表达式score在r=1时取极小值,r->0或r->∞时取另一个”极大值”:
s c o r e = ( r + 1 ) 2 r = 2 + r + 1 r , r ∈ [ 0 , ∞ ] score = \frac{{\left( {r + 1} \right)^2 }}{r} = 2 + r + \frac{1}{r},r \in \left[ {0,\infty } \right] score=r(r+1)2=2+r+r1,r[0,]
求导可知,score的值在r=1时,取最小值4,此时完全不是边缘;随着r远离1,开始趋向∞,达到∞时完全是边缘。 所以我们的小朋友同事,写了一个score<Threshold=4才不算边缘,那就是要求比理论最小值更小,自然没有任何一个点能通过测试。
而score的计算量很小,等于Hessian矩阵的迹的平方/行列式。
T r ( H ) 2 D e t ( H ) = ( α + β ) 2 α β = ( r α + β ) 2 r β 2 = ( r + 1 ) 2 r \frac{{Tr\left( {\rm{H}} \right)^2 }}{{Det\left( H \right)}} = \frac{{\left( {\alpha + \beta } \right)^2 }}{{\alpha \beta }} = \frac{{\left( {r\alpha + \beta } \right)^2 }}{{r\beta ^2 }} = \frac{{\left( {r + 1} \right)^2 }}{r} Det(H)Tr(H)2=αβ(α+β)2=rβ2(rα+β)2=r(r+1)2
对于一个2*2矩阵来说,计算量非常小。


  1. Distinctive Image Features from Scale-Invariant Keypoints ↩︎

本文标签: 大白话抑制边缘SIFT