SLAM-Visual Navigation学习之SIFT算法与代码详解|电子爱好者

admin管理员组
文章数量:1622542

SIFT算法

文章目录

- SIFT算法
一、特征点，关键点，角点？
二、前置知识
- 1.尺度
- 2.卷积
- 3.高斯函数
- 4.高斯卷积（模糊）
三、SIFT算法的引入
- Harris算法缺陷：
- - 1.尺度问题
  - 2.旋转问题
- SIFT引入：
四、SIFT算法步骤
- 1. Scale-space Extrema Detection——尺度空间极值检测
- 2. Keypoint Localization——关键点定位
- 3. Orientation Assignment——确定主方向
- 4. Keypoint Descriptor——生成描述符
- 5. Keypoint Matching——描述符匹配
五、尺度空间极值检测
- 1. 图像初始化
- 2. 计算高斯金字塔层数
- 3. 生成高斯核
- 4. 生成高斯金字塔
- 5. 生成高斯差分金字塔
- 6. 极值检测
- 7.小结
六、关键点定位与舍弃
- 1. 关键点定位
- 2. 关键点舍弃
- - 1）去除对比度低的点
  - 2）边缘效应
- 3. 小结
七、确定主方向
八、生成描述符
- 1. 前置知识
- - 1）线性插值
  - 2）双线性插值
  - 3）三线性插值
  - 4）插值的应用
- 2. 生成描述符
- - 1）旋转至主方向
  - 2）生成描述符
  - 3）限制
- 3.小结
总结
代码
参考资料

一、特征点，关键点，角点？

特征点：图像中具有某些明显特征的点。一个特征可能是一个像素点，也可能一个特征占据了多个像素点
关键点：图像中具有某些明显特征的点，多指一个像素点
角点：同关键点

这里可能与别处表达有些不同，此处只是想区分一下特征点有时并不等于关键点和角点。

二、前置知识

1.尺度

尺度：用来模拟观察者距离物体的远近的程度，可以大致理解为距离物体远近的距离。
尺度空间：尺度空间就是在多个尺度下观察目标，指的是图像的模糊程度（于不同距离观察物体会有模糊之分）

2.卷积

假设图像为灰度图，仅有灰度值，左侧表格为原始图像中各个像素点的灰度值，中间表格为卷积核，右侧表格为卷积过后的图像。卷积即卷积核与原始图像对应位置相乘再相加，得到新的值。例如左上角的3x3与卷积核进行相乘得到数值139，同理其他也如此。

3.高斯函数

即正态分布函数
一维情况

二维情况

分布情况由中心向外侧分散

4.高斯卷积（模糊）

高斯卷积即用高斯核进行图像卷积操作
高斯核

对图像进行高斯卷积后，图像会变得更加平滑。

解释：卷积前的像素点，在卷积时要根据其中心开始周围的几x几的点进行权值求和，故每一个点的特征都与其周围的点的特征相关，也就是说该点的像素值与周围点的像素值有关，从而使得图像中像素点的变化不那么跳动。

三、SIFT算法的引入

我们知道Harris角点检测算法是用来提取图像中的角点，即图像中相对周围变化比较明显的像素点。假设在相同尺度（相同分辨率下），相同旋转方向（两幅图图像未进行旋转）的情况下，我们利用Harris角点检测算法来探测出两幅图像中的所有角点，再进行比对，即可确定两幅图中角点的匹配，从而达到图像匹配的效果。

Harris算法缺陷：

1.尺度问题

但事实上，在进行图像匹配时，会出现两幅图间尺度（分辨率）不匹配的情况，如下图所示：

在对左侧小图像进行探测时，我们采用了小窗口进行Harris角点检测，从而能够提取中其中的角点，但当我们对右侧放大过后（分辨率不同）的图像进行同样窗口的Harris角点检测时，会发现其不能成功提取到角点。Harris在进行角点检测时，其窗口大小一定的固定的，若根据分辨率进行变动，太过于繁琐。

2.旋转问题

若对两幅图像其中之一进行旋转，则其方向已发生改变，根本无法进行对比，也过于繁琐。

SIFT引入：

因此，2004 年，不列颠哥伦比亚大学的 D.Lowe 在论文Distinctive Image Features from Scale-Invariant Keypoints中提出了一种新算法，即SIFT (Scale-Invariant Feature Transform) ，算法提取关键点并计算其描述符，通过对比描述符来实现图像的匹配，并且具有尺度不变性与旋转不变性。

四、SIFT算法步骤

1. Scale-space Extrema Detection——尺度空间极值检测

2. Keypoint Localization——关键点定位

3. Orientation Assignment——确定主方向

4. Keypoint Descriptor——生成描述符

5. Keypoint Matching——描述符匹配

五、尺度空间极值检测

1. 图像初始化

将图像进行上采样（分辨率扩大一倍），进行一次高斯模糊形成初始图像，作为后续处理的第一幅图

2. 计算高斯金字塔层数

根据图像的分辨率进行计算每一组该生成多少层图像，从而形成金字塔。
将图像的长和宽分别记作m和n，故此图像的分辨率为m*n，则该金字塔的层数N为：

为什么－2的原因后续再谈

3. 生成高斯核

由于要形成不同尺度以及不同模糊程度的图像，所以我们需要不同的高斯核。

可以看出从初始的sigma到后续的s*sigma等，从而形成不同层不同模糊程度的图像。

4. 生成高斯金字塔

一个图像的尺度空间L(x,y,σ)，定义为原始图像I(x,y)与一个可变尺度的2维高斯函数G(x,y,σ)卷积运算，即：

σ是尺度空间因子，它决定了图像的模糊的程度

中间带有色彩的图像为原始图像，进行图像初始化后，对其进行高斯模糊。
由图所示，每一行代表每一组，其中每一列代表一层，从左到右每一层图像模糊程度递增，选取此组倒数第三层图像下采样后的图像，作为下一组的初始图像，在进行高斯模糊，重复此操作，从而生成不同尺度，不同模糊程度的高斯金字塔

5. 生成高斯差分金字塔

本应该使用LOG算子进行高斯金字塔的后续处理，但其处理复杂，DOG算子可以进行替代，在公式推导过后，DOG与LOG算子是非常近似的，推导见大神写的此文https://zhuanlan.zhihu/p/442484545

高斯差分金字塔的形成即高斯金字塔的不同层进行作差操作。

注意：因为进行了作差操作，所以高斯差分金字塔的层数相比高斯金字塔少1层

6. 极值检测

SIFT算法中初步判断该像素点是否为关键点的方法，即将此点与其所在层和上下两层的3x3区域的26点进行对比，若此点像素值大于或小于其他点，则认定此点为关键点。通过遍历不同层与不同位置，从而初步找到所有的关键点。

提取后形成如下带有关键点信息的图像

注意：每一个关键点的比较必须与当前层和上下两层进行比较，所以第一层与最后一层无法拥有关键点，故此后形成的带有初始关键点信息的图像的高斯差分金字塔，会比之前少2层。所以综上，高斯金字塔必须至少拥有4层，从而至少3层高斯差分金字塔，进而在提取关键点时，可以在中间一层图像提取关键点吗信息。（这里有可能有错误，但意思是这个意思）

7.小结

根据以上步骤，我们已经做到在不同尺度下（分辨率下）提取关键点，故此算法拥有了尺度不变形
以下是我忘了出处的两幅生成高斯金字塔的图，被我用到了PPT中（嘻嘻）。再看以下的图，尺度空间极值检测这个过程就比较好理解了。

六、关键点定位与舍弃

1. 关键点定位

此前我们提取到了图像中的关键点，即知道关键点所在像素区域的坐标–像素坐标，但由于像素坐标都是离散的，例如（0,0），（1,1），我们想知道关键点的具体位置–亚像素位置，例如（1.1,2）。举例如图所示（可能图文无关 ^ ^ 但意思是这意思）：

那么怎么找这个具体位置呢，在检测到的关键点处，对其进行三元二阶泰勒展开

解出其具体位置。

2. 关键点舍弃

1）去除对比度低的点

对比度低的点对噪声敏感，高对比度的图像有较好的清晰度和细节，故在处理时，会对比该点的灰度值是否大于某个门限值contrast_threshold，若大于，则留下进一步处理，否则舍弃该点。

2）边缘效应

边缘效应是什么并不好解释，请参考B站大神“会飞的吴克”，其中有边缘效应的解释，在此只是参照大神基础的一些自己的理解，并不一定正确。
我们在上面讲过高斯卷积用到的高斯核具有平滑的作用，即将通过高斯卷积可以将图片进行“模糊”处理，使得我们得到的每一个新的像素点的值，都与其周围像素点的值相关，从而使得图像信息不跳动，更加平缓。
低通滤波器
滤波器在信号处理中很常见，而低频滤波器的作用就是在信号传输中通过低频信号，阻断高频信号，而在图像领域，我们可以透过它来理解边缘效应。图像中比较暗的点，我们称为低频点，比较亮的点，称为高频点，那么下图红点即为高频点，即亮度相比周围有明显变化的像素点，在经过高斯卷积后，图像变为灰色，即高频（亮度变化明显）点被去除，所以高斯核可近似理解为低通滤波器。

原因：高斯核使得周围点的信息影响到中间点的信息，所以高频点被平滑处理了

高通滤波器
高通滤波器即通高频，阻低频。
在卷积操作中，不同的核可以有不同的卷积结果，下面讲sobel核：

sobel核的特点很明显，即中间一行，一列均为0，这在进行卷积时候就会导致其周围3x3区域内，同行和同列的权重为0，所以就会相当于是下面一行减上面一行，右边一列减左边一列，从而形成了一个作差的操作，也就是dx和dy的操作，有什么作用呢？

在Harris角点检测时我们提到角点检测有3种情况：
① 该点的各个方向上梯度变化都很明显–角点
② 该点在某个方向上梯度变化比较明显，其余无变化–边
③ 该点在各个方向上梯度均无明显变化–平坦区域
所以，如若sobel形成了某种作差操作，那么我们就可以很明显的知道该关键点的周围方向是否有明显变化，从而可以用于边缘检测，Canny Edge Detection中就是用到了sobel核来进行的边缘检测。
若运用到下图中，我们可以清楚的看到，平坦区域（灰色区域）在用sobel核进行卷积过后，都变成了黑色，即作差后值均为0，而高频点在作差过后得到了保留，因为其与周围点的像素值不同，所以sobel核起到了近似高通滤波器的作用，保留了高频点，即保留了相对周围变化比较明显的点。

原因：当前点与水平，垂直方向的变化相减，故低频区域卷积后均为0，高频有变化，仅留下高频
是不是所有具有作差性质的核都可作为此处的高通滤波器呢？不清楚

带通滤波器Band Pass

既然高斯卷积过后只留下了低频区域，sobel核卷积之后只留下了高频区域，那么也就是说高斯卷积过后得到平坦区域得到保留，sobel核卷积后关键点得到保留，那么剩下的不就只剩下边了？那么这就是边缘效应（……），什么意思呢？意思就是我们在上一步检测到的关键点，有可能是真正的关键点，即高频区域，但还有一些关键点可能是一些边上的关键点，因为SIFT算法不同于Harris算法，SIFT算法只是根据了其上下两层和本层的26个点进行了关键点的比较，从而确定出其为关键点，但是Harris确定关键点的方式是比较其各个方向的梯度变化。
那么既然如此，SIFT算法就会存在我找到的某个关键点，它其实不是关键点，而只是某一条边上的像素值相比周围26点的像素值比较大或比较小的点，即不是我们想要的点，要去除，所以我们要去除边缘效应

PS:此处与滤波器等的类比可能会有误，但是感觉理解的应该没什么问题，想表达的意思就是加粗的这一句话

那么现在我们已经知道了SIFT算法中用到的DOG算子存在边缘效应，为什么存在边缘效应呢？
因为我们在尺度空间极值检测时，构造过两个金字塔，一个是高斯金字塔，使用不同尺度以及不同的高斯核进行的高斯卷积；另一个是高斯差分金字塔，是通过高斯金字塔相邻层的作差操作得来的（这里理解可能会有误），那么既有卷积又有作差，岂不是会产生边缘效应？

好的，我们已经知道了为什么存在边缘效应，那么边缘效应怎么去除呢？
直接放我的PPT吧不想打字了因为我没彻底完全理解 ^ ^

3. 小结

通过了关键点定位得到关键点的亚像素位置，以及通过去除对比度低的点和去除边缘效应，从而得到我们最终的关键点。

注：此时的关键点即为最终的关键点

七、确定主方向

在前面我们已经彻底解决了尺度不变性，以及关键点的具体位置，那么接下来要解决的就是旋转不变性。我们采用梯度方向直方图的方法来取确定该关键点的主方向。

用一个半径为r的圆圈，圈住的像素范围对其进行梯度方向直方图的统计，梯度方向的统计分为8个方向，即将360分为8份，45度为1份，统计以该关键点为中心的，半径为r的圆锁圈住的像素点的梯度幅值，通过高斯加权到对应的梯度方向上，从而统计出下图右侧的直方图，直方图的值最大的方向即为该关键点的主方向，而若有其他直方图的幅值大于最大幅值的80%（自己设的）时，则认为此方向为该关键点的辅方向。

问题1：半径r的大小怎么确定？

问题2：关键点辅方向的意思？
答：一个关键点可能只有一个主方向，辅方向可以有可以没有。若只有一个主方向没有辅方向，那么该关键点的位置与主方向就都确定了，那如果有辅方向呢？即此刻该关键点的方向就有两个或多个了，我们一般对于有辅方向的关键点采取以下措施处：

即当做多个关键点进行处理

八、生成描述符

1. 前置知识

1）线性插值

如下图，已知两点，求直线上任意一点，很简单我们可以化出右侧红框的一个式子，观察这个式子很容易发现，y0是左侧的点，y1是右侧的点，但y0所乘的值为x到x1之间占x0到x1之间的权重，而y1所乘的值为x0到x之间占x0到x1之间的权重，即乘的是离自己较远的一边的权重。

如下图，记作A，B，P点，A与P之间的权重值记作P。

那么我们可以得出 P = A *（1-t）+ B * t ，在图像处理中，插值的存在很普遍，我们把线性插值记作lerp。
lerp函数的实现很简单，如下

template<typename T>
inline T lerp(const T &lo, const T &hi, float t)
{ return lo * (1 - t) + hi * t; }

故我们可以记作P = lerp（A，B，t）

2）双线性插值

如图所示，求P点。

我们可以利用三次线性插值来进行求取，如下：
nx0 = lerp（c00，c10，tx）
nx1 = lerp（c01，c11，tx）
P = lerp（nx0，nx1，ty）

那么可以很轻易的得出：

P = c00（1-tx） + c10tx（1-ty） + c01（1-tx）ty + c11txty

3）三线性插值

c00 = lerp(c000,c100,tx)
c10 = lerp(c010,c110,tx)
c01 = lerp(c001,c101,tx)
c11 = lerp(c011,c111,tx)
c0 = lerp(c00,c10b, ty)
c1 = lerp(c01, c11, ty)
c = lerp（c0，c1，tz）

4）插值的应用

插值在SIFT中的处理更类似于权重的关系，我们先来看看三线性插值在HOG算法中的应用，进行对比即可。
HOG算法将梯度方向由0到π以20度分为9份，负值分配到正值上
0 20 40 60 80 100 120 140 160

假设P点为蓝框所在像素点，其梯度方向为85度，梯度幅值为100
设其距离左右上下的距离分别为2、6、2、6

则其在x，y方向的梯度插值，根据距离，可知x，y方向上的权重分配系数均为6/8 2/8，故左上角的幅值分配为：100 * 6/8*6/8 = 56.2
梯度方向分配：85介于80与100之间，分配到85的梯度值为 56.2 * 15/20 = 42.1875，风波路带100的梯度值为56.2 * 5/20 = 14.0625

即插值其实用于权重的分配，比如线性插值，我已知了P点，想去求A点？那么P点分配到A点的权重就是P点的赋值*（1-t），
同理，双线性插值时，去求P点分配到左上角C00的值时，就是P点的赋值乘（1-tx）（1-ty）
那么三线性插值的应用其实就是双线性插值x，y方向分配的基础上多了一个方向的分配，不知理解了没有，反正我是理解的透透的了。

那么到此我们就理解了三线性插值在HOG算法中的应用了，其在SIFT算法中其实同理，在下面讲。

2. 生成描述符

在前述我们已经解决了关键点的尺度不变性以及旋转不变性，那么接下来就要生成描述符，首先我们要理解生成描述符是为了什么？当然是为了后续进行图像的匹配比较，仅需对比描述符即可确定两幅图之间关键点的匹配。
类似于主方向的确定，生成关键点的描述符也需要在某个区域内生成，在原论文中给出的半径r为：

将关键点周围的区域，分成44的16个子区域，每个子区域由88 64个像素构成，在每个子区域内的子区域（22），统计其在8个（36）个区域上的梯度（方向，赋值，经过高斯加权的），故每个子区域有8个数，448 = 128个向量，按顺序写出来，即为描述符。
d：4，44的大格
m：3
mσ：4*4的小格
求中每一个最小的方格代表一个像素。

1）旋转至主方向

在上一步我们求得关键点的主方向和辅方向，为什么要进行旋转呢？因为在两幅图之间进行匹配时候，旋转会导致其坐标发生变化，所以我们要将每一个关键点，都要旋转到其主方向，作为其参考坐标系，后续进行匹配的时候就不会受到旋转的影响。

旋转后的坐标记作x’ y’，右侧为一个旋转矩阵和原坐标。

2）生成描述符

那么其梯度方向与幅值怎么进行统计呢？即用到我们之前讲的三线性插值来进行统计，我们已经知道HOG算法中三线性插值的应用，此处的应用与HOG算法相同，只不过方向只有8个，在关键点周围的16个区域内，以各个区域中心为原点（我们记作种子点），计算周围区域分配到种子点的梯度方向与梯度权值，从而加权得出各个方向的梯度权值，形成8个方向的向量。那么中共16个格子，加起来即为128个向量。

注意：此时我们会用一个三维数组来存取此128个向量，行、列、8个方向值，即4x4x8的三维数组。

3）限制

在进行实际应用时，我们会将三维数组转为一维向量，为了排除图像收到光照的影响（会有噪声），我们要对128个向量进行归一化处理。其中描述子向量为H=（h1，h2…，h128），归一化后的向量为L=（L1，L2…，L128）。

此外，为了防止相机饱和度变化对某些方向的梯度值影响过大，而对方向值的影响微弱，因此设置门限值（归一化后），一般取0.2，梯度值大于0.2的令它等于0.2，小于0.2的不变。

3.小结

此刻我们完整的讲述了SIFT算法，后续要进行的即为图像之间的匹配了，而匹配只需要比较其描述符即可，在此就不进行后续讲解了。

总结

理解SIFT算法需要查阅很多很多资料，其中细节之处更是越挖越深，看了将近一个月才得以总结出此文，而SIFT算法如今的应用却没有那么广泛，毕竟它实现起来速度并不快（也可能是我菜），后续的SURF，ORB算法都比SIFT容易理解，SIFT算法的Python代码详解放在下面，有注释！！！！！！

代码

pysift.py

from numpy import all, any, array, arctan2, cos, sin, exp, dot, log, logical_and, roll, sqrt, stack, trace, unravel_index, pi, deg2rad, rad2deg, where, zeros, floor, full, nan, isnan, round, float32
from numpy.linalg import det, lstsq, norm
from cv2 import resize, GaussianBlur, subtract, KeyPoint, INTER_LINEAR, INTER_NEAREST
from functools import cmp_to_key
import logging

####################
# Global variables #
####################

logger = logging.getLogger(__name__)
float_tolerance = 1e-7

#################
# Main function #
#################

def computeKeypointsAndDescriptors(image, sigma=1.6, num_intervals=3, assumed_blur=0.5, image_border_width=5):
    """Compute SIFT keypoints and descriptors for an input image
    """
    image = image.astype('float32') # 转换为float类型
    base_image = generateBaseImage(image, sigma, assumed_blur)  # 原图上采样并模糊后的初始图像
    num_octaves = computeNumberOfOctaves(base_image.shape)  # 计算高斯金字塔层数
    gaussian_kernels = generateGaussianKernels(sigma, num_intervals)  # 生成高斯核
    gaussian_images = generateGaussianImages(base_image, num_octaves, gaussian_kernels)  # 生成高斯金字塔
    dog_images = generateDoGImages(gaussian_images)  # 生成高斯差分金字塔
    keypoints = findScaleSpaceExtrema(gaussian_images, dog_images, num_intervals, sigma, image_border_width)  # 找尺度空间极值点
    keypoints = removeDuplicateKeypoints(keypoints)  # 去除重复的关键点
    keypoints = convertKeypointsToInputImageSize(keypoints)  # 将关键点转换到原图的位置
    descriptors = generateDescriptors(keypoints, gaussian_images)  # 生成描述符
    return keypoints, descriptors

#########################
# Image pyramid related #
#########################

def generateBaseImage(image, sigma, assumed_blur):
    """Generate base image from input image by upsampling by 2 in both directions and blurring
    """
    logger.debug('Generating base image...')
    image = resize(image, (0, 0), fx=2, fy=2, interpolation=INTER_LINEAR) # 上采样放大一倍
    sigma_diff = sqrt(max((sigma ** 2) - ((2 * assumed_blur) ** 2), 0.01))
    return GaussianBlur(image, (0, 0), sigmaX=sigma_diff, sigmaY=sigma_diff)  # the image blur is now sigma instead of assumed_blur

def computeNumberOfOctaves(image_shape):
    """Compute number of octaves in image pyramid as function of base image shape (OpenCV default)
    """
    return int(round(log(min(image_shape)) / log(2) - 1)) # 计算层数

def generateGaussianKernels(sigma, num_intervals):
    """Generate list of gaussian kernels at which to blur the input image. Default values of sigma, intervals, and octaves follow section 3 of Lowe's paper.
    """
    logger.debug('Generating scales...')
    num_images_per_octave = num_intervals + 3  # 保证每组的层数大于3
    k = 2 ** (1. / num_intervals)
    gaussian_kernels = zeros(num_images_per_octave)  # scale of gaussian blur necessary to go from one blur scale to the next within an octave
    gaussian_kernels[0] = sigma

    for image_index in range(1, num_images_per_octave):
        sigma_previous = (k ** (image_index - 1)) * sigma
        sigma_total = k * sigma_previous
        gaussian_kernels[image_index] = sqrt(sigma_total ** 2 - sigma_previous ** 2)
    return gaussian_kernels

def generateGaussianImages(image, num_octaves, gaussian_kernels):
    """Generate scale-space pyramid of Gaussian images
    """
    logger.debug('Generating Gaussian images...')
    gaussian_images = []

    for octave_index in range(num_octaves):
        gaussian_images_in_octave = []  # 存每一组中的不同层的图像
        gaussian_images_in_octave.append(image)  # first image in octave already has the correct blur
        for gaussian_kernel in gaussian_kernels[1:]:  # # 循环高斯模糊，形成不同层，加入数组
            image = GaussianBlur(image, (0, 0), sigmaX=gaussian_kernel, sigmaY=gaussian_kernel)
            gaussian_images_in_octave.append(image)
        gaussian_images.append(gaussian_images_in_octave)  # 以一组加入数组
        octave_base = gaussian_images_in_octave[-3]  # 找倒数第三层 再下采样作为下一层的基层图像
        image = resize(octave_base, (int(octave_base.shape[1] / 2), int(octave_base.shape[0] / 2)), interpolation=INTER_NEAREST)
    return array(gaussian_images, dtype=object)  # 返回高斯金字塔

def generateDoGImages(gaussian_images):
    """Generate Difference-of-Gaussians image pyramid
    """
    logger.debug('Generating Difference-of-Gaussian images...')
    dog_images = []

    for gaussian_images_in_octave in gaussian_images:
        dog_images_in_octave = []
        for first_image, second_image in zip(gaussian_images_in_octave, gaussian_images_in_octave[1:]):
            dog_images_in_octave.append(subtract(second_image, first_image))  # ordinary subtraction will not work because the images are unsigned integers
        dog_images.append(dog_images_in_octave)
    return array(dog_images, dtype=object)

###############################
# Scale-space extrema related #
###############################

def findScaleSpaceExtrema(gaussian_images, dog_images, num_intervals, sigma, image_border_width, contrast_threshold=0.04):
    """Find pixel positions of all scale-space extrema in the image pyramid
    """
    # 参数  高斯金字塔  高斯差分金字塔 层数 sigma border
    logger.debug('Finding scale-space extrema...')
    threshold = floor(0.5 * contrast_threshold / num_intervals * 255)  # from OpenCV implementation
    keypoints = []

    for octave_index, dog_images_in_octave in enumerate(dog_images):
        #  遍历差分金字塔dog_images_in_octave代表一组
        for image_index, (first_image, second_image, third_image) in enumerate(zip(dog_images_in_octave, dog_images_in_octave[1:], dog_images_in_octave[2:])):
            # (i, j) is the center of the 3x3 array
            # 此时 first second third代表第一二三层，
            for i in range(image_border_width, first_image.shape[0] - image_border_width):
                for j in range(image_border_width, first_image.shape[1] - image_border_width):
                    # 定下3x3位置
                    # i,j为3x3中心位置
                    if isPixelAnExtremum(first_image[i-1:i+2, j-1:j+2], second_image[i-1:i+2, j-1:j+2], third_image[i-1:i+2, j-1:j+2], threshold):
                        # 确定下了极值点 second_image[i+1,j+1]
                        # 定位极值点（亚像素位置） 泰勒

                        localization_result = localizeExtremumViaQuadraticFit(i, j, image_index + 1, octave_index, num_intervals, dog_images_in_octave, sigma, contrast_threshold, image_border_width)
                        # 此时得到关键点信息
                        if localization_result is not None:
                            keypoint, localized_image_index = localization_result
                            keypoints_with_orientations = computeKeypointsWithOrientations(keypoint, octave_index, gaussian_images[octave_index][localized_image_index])
                            # 此时得到关键点的主方向以及辅方向（可没有）
                            for keypoint_with_orientation in keypoints_with_orientations:
                                keypoints.append(keypoint_with_orientation)
    return keypoints

def isPixelAnExtremum(first_subimage, second_subimage, third_subimage, threshold):
    """Return True if the center element of the 3x3x3 input array is strictly greater than or less than all its neighbors, False otherwise
    """
    center_pixel_value = second_subimage[1, 1]
    if abs(center_pixel_value) > threshold:
        # 阈值判断
        if center_pixel_value > 0:
            return all(center_pixel_value >= first_subimage) and \
                   all(center_pixel_value >= third_subimage) and \
                   all(center_pixel_value >= second_subimage[0, :]) and \
                   all(center_pixel_value >= second_subimage[2, :]) and \
                   center_pixel_value >= second_subimage[1, 0] and \
                   center_pixel_value >= second_subimage[1, 2]
        elif center_pixel_value < 0:
            return all(center_pixel_value <= first_subimage) and \
                   all(center_pixel_value <= third_subimage) and \
                   all(center_pixel_value <= second_subimage[0, :]) and \
                   all(center_pixel_value <= second_subimage[2, :]) and \
                   center_pixel_value <= second_subimage[1, 0] and \
                   center_pixel_value <= second_subimage[1, 2]
    return False

def localizeExtremumViaQuadraticFit(i, j, image_index, octave_index, num_intervals, dog_images_in_octave, sigma, contrast_threshold, image_border_width, eigenvalue_ratio=10, num_attempts_until_convergence=5):
    """Iteratively refine pixel positions of scale-space extrema via quadratic fit around each extremum's neighbors
    """
    """细化极值点到亚像素级别
        i，j：3x3中心位置
        image_index：高斯差分金字塔中每组的图像索引（此时是中间层）
        octave_index:高斯差分金字塔的组索引
        num_intervals:层数
        dog_images_in_octave:高斯差分金字塔
        sigma：高斯标准差
        contrast_threshold:对比度阈值
        image_border_width:检测区域远离图像边缘5个像素图像
        eigenvalue_ratio:主曲率阈值
        num_attempts_until_convergence:最大尝试次数
    """
    logger.debug('Localizing scale-space extrema...')
    extremum_is_outside_image = False
    image_shape = dog_images_in_octave[0].shape
    for attempt_index in range(num_attempts_until_convergence):
        # need to convert from uint8 to float32 to compute derivatives and need to rescale pixel values to [0, 1] to apply Lowe's thresholds
        first_image, second_image, third_image = dog_images_in_octave[image_index-1:image_index+2]
        pixel_cube = stack([first_image[i-1:i+2, j-1:j+2],
                            second_image[i-1:i+2, j-1:j+2],
                            third_image[i-1:i+2, j-1:j+2]]).astype('float32') / 255.
        # 求关键点的梯度
        gradient = computeGradientAtCenterPixel(pixel_cube)
        # hessian矩阵
        hessian = computeHessianAtCenterPixel(pixel_cube)
        # 最小二乘拟合求偏移量
        extremum_update = -lstsq(hessian, gradient, rcond=None)[0]
        if abs(extremum_update[0]) < 0.5 and abs(extremum_update[1]) < 0.5 and abs(extremum_update[2]) < 0.5:
            break
        j += int(round(extremum_update[0]))
        i += int(round(extremum_update[1]))
        image_index += int(round(extremum_update[2]))
        # 此时i,j,image_index为具体位置？
        # make sure the new pixel_cube will lie entirely within the image
        # 若超出限制(边界限制)
        if i < image_border_width or i >= image_shape[0] - image_border_width or j < image_border_width or j >= image_shape[1] - image_border_width or image_index < 1 or image_index > num_intervals:
            extremum_is_outside_image = True
            break
    if extremum_is_outside_image:
        logger.debug('Updated extremum moved outside of image before reaching convergence. Skipping...')
        return None
    if attempt_index >= num_attempts_until_convergence - 1:
        logger.debug('Exceeded maximum number of attempts without reaching convergence for this extremum. Skipping...')
        return None

    """
    消除边缘响应：在边缘梯度的方向上主曲率比较大，而沿着边缘方向的主曲率值比较小。DoG算子会产生较强的边缘效应，
    需要剔除不稳定的边缘响应点。主曲率通过特征点处的海森矩阵求出。如果一个点不在边缘，那么该点在x，y方向上的曲率差不多。
    海森矩阵的特征值和D（x）曲率是成正比的，但是这样太麻烦了，我们就通过矩阵的迹和行列式来代替。
    首先，当行列式小于零时舍去该点，再者当不满足下式时，也舍去该点。
    
            tr**2 / det < (r+1)**2 / r**2

    """
    functionValueAtUpdatedExtremum = pixel_cube[1, 1, 1] + 0.5 * dot(gradient, extremum_update)
    # 首先去除对比度低的点（对比度低，对噪声敏感----高对比度有好的图像的清晰度、细节表现、灰度层次） contrast_threshold
    if abs(functionValueAtUpdatedExtremum) * num_intervals >= contrast_threshold:
        xy_hessian = hessian[:2, :2]
        xy_hessian_trace = trace(xy_hessian)
        xy_hessian_det = det(xy_hessian)
        # 判断主曲率阈值是否在预设阈值之下
        if xy_hessian_det > 0 and eigenvalue_ratio * (xy_hessian_trace ** 2) < ((eigenvalue_ratio + 1) ** 2) * xy_hessian_det:
            # Contrast check passed -- construct and return OpenCV KeyPoint object
            keypoint = KeyPoint()
            # 坐标 组数 邻域直径 关键点响应强度
            # pt为Point2f类, x y
            keypoint.pt = ((j + extremum_update[0]) * (2 ** octave_index), (i + extremum_update[1]) * (2 ** octave_index))  # *2意思是我下采样后的坐标要乘回去
            keypoint.octave = octave_index + image_index * (2 ** 8) + int(round((extremum_update[2] + 0.5) * 255)) * (2 ** 16)
            keypoint.size = sigma * (2 ** ((image_index + extremum_update[2]) / float32(num_intervals))) * (2 ** (octave_index + 1))  # octave_index + 1 because the input image was doubled
            keypoint.response = abs(functionValueAtUpdatedExtremum)
            return keypoint, image_index
    return None

def computeGradientAtCenterPixel(pixel_array):
    """Approximate gradient at center pixel [1, 1, 1] of 3x3x3 array using central difference formula of order O(h^2), where h is the step size
    """
    # 求梯度
    # With step size h, the central difference formula of order O(h^2) for f'(x) is (f(x + h) - f(x - h)) / (2 * h)
    # Here h = 1, so the formula simplifies to f'(x) = (f(x + 1) - f(x - 1)) / 2
    # NOTE: x corresponds to second array axis, y corresponds to first array axis, and s (scale) corresponds to third array axis
    dx = 0.5 * (pixel_array[1, 1, 2] - pixel_array[1, 1, 0])
    dy = 0.5 * (pixel_array[1, 2, 1] - pixel_array[1, 0, 1])
    ds = 0.5 * (pixel_array[2, 1, 1] - pixel_array[0, 1, 1])
    return array([dx, dy, ds])

def computeHessianAtCenterPixel(pixel_array):
    """Approximate Hessian at center pixel [1, 1, 1] of 3x3x3 array using central difference formula of order O(h^2), where h is the step size
    """
    # With step size h, the central difference formula of order O(h^2) for f''(x) is (f(x + h) - 2 * f(x) + f(x - h)) / (h ^ 2)
    # Here h = 1, so the formula simplifies to f''(x) = f(x + 1) - 2 * f(x) + f(x - 1)
    # With step size h, the central difference formula of order O(h^2) for (d^2) f(x, y) / (dx dy) = (f(x + h, y + h) - f(x + h, y - h) - f(x - h, y + h) + f(x - h, y - h)) / (4 * h ^ 2)
    # Here h = 1, so the formula simplifies to (d^2) f(x, y) / (dx dy) = (f(x + 1, y + 1) - f(x + 1, y - 1) - f(x - 1, y + 1) + f(x - 1, y - 1)) / 4
    # NOTE: x corresponds to second array axis, y corresponds to first array axis, and s (scale) corresponds to third array axis
    center_pixel_value = pixel_array[1, 1, 1]
    dxx = pixel_array[1, 1, 2] - 2 * center_pixel_value + pixel_array[1, 1, 0]
    dyy = pixel_array[1, 2, 1] - 2 * center_pixel_value + pixel_array[1, 0, 1]
    dss = pixel_array[2, 1, 1] - 2 * center_pixel_value + pixel_array[0, 1, 1]
    dxy = 0.25 * (pixel_array[1, 2, 2] - pixel_array[1, 2, 0] - pixel_array[1, 0, 2] + pixel_array[1, 0, 0])
    dxs = 0.25 * (pixel_array[2, 1, 2] - pixel_array[2, 1, 0] - pixel_array[0, 1, 2] + pixel_array[0, 1, 0])
    dys = 0.25 * (pixel_array[2, 2, 1] - pixel_array[2, 0, 1] - pixel_array[0, 2, 1] + pixel_array[0, 0, 1])
    return array([[dxx, dxy, dxs], 
                  [dxy, dyy, dys],
                  [dxs, dys, dss]])

#########################
# Keypoint orientations #
#########################

def computeKeypointsWithOrientations(keypoint, octave_index, gaussian_image, radius_factor=3, num_bins=36, peak_ratio=0.8, scale_factor=1.5):
    """Compute orientations for each keypoint
    """
    """
    keypoint: 关键点信息
    octave_index: 组数
    gaussian_image: 对应组数对应层的图像
    radius_factor: 半径因子
    num_bins: 直方图柱的数目
    peak_ratio: 保留辅方向的百分比
    scale_factor: 尺度因子
    """
    logger.debug('Computing keypoint orientations...')
    keypoints_with_orientations = []
    image_shape = gaussian_image.shape
    # 根据scale去确定花圈半径以及权重值
    scale = scale_factor * keypoint.size / float32(2 ** (octave_index + 1))  # compare with keypoint.size computation in localizeExtremumViaQuadraticFit()
    radius = int(round(radius_factor * scale))
    weight_factor = -0.5 / (scale ** 2)
    raw_histogram = zeros(num_bins)
    smooth_histogram = zeros(num_bins)
    # 采集在3σ邻域窗口内像素的梯度和方向分布特征，从而确定方向
    for i in range(-radius, radius + 1):
        region_y = int(round(keypoint.pt[1] / float32(2 ** octave_index))) + i
        if region_y > 0 and region_y < image_shape[0] - 1:
            for j in range(-radius, radius + 1):
                region_x = int(round(keypoint.pt[0] / float32(2 ** octave_index))) + j
                if region_x > 0 and region_x < image_shape[1] - 1:
                    # 中心差分求偏导
                    dx = gaussian_image[region_y, region_x + 1] - gaussian_image[region_y, region_x - 1]
                    dy = gaussian_image[region_y - 1, region_x] - gaussian_image[region_y + 1, region_x]
                    # 梯度幅值
                    gradient_magnitude = sqrt(dx * dx + dy * dy)
                    # 梯度方向
                    gradient_orientation = rad2deg(arctan2(dy, dx))
                    # 权重占比
                    weight = exp(weight_factor * (i ** 2 + j ** 2))  # constant in front of exponential can be dropped because we will find peaks later
                    # 直方柱
                    histogram_index = int(round(gradient_orientation * num_bins / 360.))
                    raw_histogram[histogram_index % num_bins] += weight * gradient_magnitude

    for n in range(num_bins):
        # 平滑
        smooth_histogram[n] = (6 * raw_histogram[n] + 4 * (raw_histogram[n - 1] + raw_histogram[(n + 1) % num_bins]) + raw_histogram[n - 2] + raw_histogram[(n + 2) % num_bins]) / 16.
    # 主方向 最大值
    orientation_max = max(smooth_histogram)
    # 找其余的大值     大值>右移1位 且 大值>左移一位  1234  4123
    orientation_peaks = where(logical_and(smooth_histogram > roll(smooth_histogram, 1), smooth_histogram > roll(smooth_histogram, -1)))[0]
    # 找辅方向
    for peak_index in orientation_peaks:
        peak_value = smooth_histogram[peak_index]
        if peak_value >= peak_ratio * orientation_max:
            # Quadratic peak interpolation
            # The interpolation update is given by equation (6.30) in https://ccrma.stanford.edu/~jos/sasp/Quadratic_Interpolation_Spectral_Peaks.html
            left_value = smooth_histogram[(peak_index - 1) % num_bins]
            right_value = smooth_histogram[(peak_index + 1) % num_bins]
            interpolated_peak_index = (peak_index + 0.5 * (left_value - right_value) / (left_value - 2 * peak_value + right_value)) % num_bins
            orientation = 360. - interpolated_peak_index * 360. / num_bins
            if abs(orientation - 360.) < float_tolerance:
                orientation = 0
            new_keypoint = KeyPoint(*keypoint.pt, keypoint.size, orientation, keypoint.response, keypoint.octave)
            keypoints_with_orientations.append(new_keypoint)
    return keypoints_with_orientations

##############################
# Duplicate keypoint removal #
##############################

def compareKeypoints(keypoint1, keypoint2):
    """Return True if keypoint1 is less than keypoint2
    """
    if keypoint1.pt[0] != keypoint2.pt[0]:
        return keypoint1.pt[0] - keypoint2.pt[0]
    if keypoint1.pt[1] != keypoint2.pt[1]:
        return keypoint1.pt[1] - keypoint2.pt[1]
    if keypoint1.size != keypoint2.size:
        return keypoint2.size - keypoint1.size
    if keypoint1.angle != keypoint2.angle:
        return keypoint1.angle - keypoint2.angle
    if keypoint1.response != keypoint2.response:
        return keypoint2.response - keypoint1.response
    if keypoint1.octave != keypoint2.octave:
        return keypoint2.octave - keypoint1.octave
    return keypoint2.class_id - keypoint1.class_id

def removeDuplicateKeypoints(keypoints):
    """Sort keypoints and remove duplicate keypoints
    """
    # 对关键点进行排序，去除重复关键点

    if len(keypoints) < 2:
        return keypoints

    keypoints.sort(key=cmp_to_key(compareKeypoints))
    unique_keypoints = [keypoints[0]]

    for next_keypoint in keypoints[1:]:
        last_unique_keypoint = unique_keypoints[-1]
        if last_unique_keypoint.pt[0] != next_keypoint.pt[0] or \
           last_unique_keypoint.pt[1] != next_keypoint.pt[1] or \
           last_unique_keypoint.size != next_keypoint.size or \
           last_unique_keypoint.angle != next_keypoint.angle:
            unique_keypoints.append(next_keypoint)
    return unique_keypoints

#############################
# Keypoint scale conversion #
#############################

def convertKeypointsToInputImageSize(keypoints):
    """Convert keypoint point, size, and octave to input image size
    """
    converted_keypoints = []
    for keypoint in keypoints:
        keypoint.pt = tuple(0.5 * array(keypoint.pt))
        keypoint.size *= 0.5
        keypoint.octave = (keypoint.octave & ~255) | ((keypoint.octave - 1) & 255)
        converted_keypoints.append(keypoint)
    return converted_keypoints

#########################
# Descriptor generation #
#########################

def unpackOctave(keypoint):
    """Compute octave, layer, and scale from a keypoint
    """
    octave = keypoint.octave & 255
    layer = (keypoint.octave >> 8) & 255
    if octave >= 128:
        octave = octave | -128
    scale = 1 / float32(1 << octave) if octave >= 0 else float32(1 << -octave)
    return octave, layer, scale

def generateDescriptors(keypoints, gaussian_images, window_width=4, num_bins=8, scale_multiplier=3, descriptor_max_value=0.2):
    """Generate descriptors for each keypoint
    """
    """
    keypoints: 关键点
    gaussian_images: 高斯金字塔图像
    window_width: 关键点附近区域长度
    num_bins: 8个方向的梯度直方图
    scale_multiplier: 求取极值的尺度多少  缩放倍数
    descriptor_max_value: 描述符最大值

    """
    logger.debug('Generating descriptors...')
    descriptors = []

    for keypoint in keypoints:
        # 组 层 尺度
        octave, layer, scale = unpackOctave(keypoint)
        gaussian_image = gaussian_images[octave + 1, layer]
        num_rows, num_cols = gaussian_image.shape
        # round四舍五入
        point = round(scale * array(keypoint.pt)).astype('int')
        bins_per_degree = num_bins / 360.
        angle = 360. - keypoint.angle
        # 角度转弧度 用来旋转主方向
        cos_angle = cos(deg2rad(angle))
        sin_angle = sin(deg2rad(angle))
        weight_multiplier = -0.5 / ((0.5 * window_width) ** 2)  # 高斯加权运算
        row_bin_list = []  # 存放每个邻域点对应4x4小窗口的哪一个（行）
        col_bin_list = []  # 存放每个邻域点对应4x4小窗口的哪一个（列）
        magnitude_list = []  # 存放每个邻域点的梯度幅值
        orientation_bin_list = []  # 存放每个邻域点的梯度方向角所处的方向组
        # 三维数组，存储最后的128维向量   长  宽 8个方向
        histogram_tensor = zeros((window_width + 2, window_width + 2, num_bins))   # first two dimensions are increased by 2 to account for border effects

        # Descriptor window size (described by half_width) follows OpenCV convention
        hist_width = scale_multiplier * 0.5 * scale * keypoint.size  # 3×sigma，每个小窗口的边长 即  m sigma 4*3/2 ？
        half_width = int(round(hist_width * sqrt(2) * (window_width + 1) * 0.5))   # sqrt(2) corresponds to diagonal length of a pixel
        half_width = int(min(half_width, sqrt(num_rows ** 2 + num_cols ** 2)))     # ensure half_width lies within image
        # 中心点的半宽
        # 坐标轴旋转至主方向
        # 关键点为中心的4*4=16个子区域内进行遍历像素
        for row in range(-half_width, half_width + 1):
            for col in range(-half_width, half_width + 1):
                # 旋转过后的坐标
                row_rot = col * sin_angle + row * cos_angle
                col_rot = col * cos_angle - row * sin_angle
                # 对应4×4子区域的下标（行）（列）
                row_bin = (row_rot / hist_width) + 0.5 * window_width - 0.5
                col_bin = (col_rot / hist_width) + 0.5 * window_width - 0.5
                # 保证邻域的点在旋转后，仍然处于4×4的区域内
                if row_bin > -1 and row_bin < window_width and col_bin > -1 and col_bin < window_width:
                    # 计算对应原图的row col
                    window_row = int(round(point[1] + row))
                    window_col = int(round(point[0] + col))
                    if window_row > 0 and window_row < num_rows - 1 and window_col > 0 and window_col < num_cols - 1:
                        # 直接在旋转前的图上计算梯度，因为旋转时，都旋转了，不影响大小
                        dx = gaussian_image[window_row, window_col + 1] - gaussian_image[window_row, window_col - 1]
                        dy = gaussian_image[window_row - 1, window_col] - gaussian_image[window_row + 1, window_col]
                        # 梯度幅值
                        gradient_magnitude = sqrt(dx * dx + dy * dy)
                        # 梯度方向
                        gradient_orientation = rad2deg(arctan2(dy, dx)) % 360
                        weight = exp(weight_multiplier * ((row_rot / hist_width) ** 2 + (col_rot / hist_width) ** 2))
                        row_bin_list.append(row_bin)
                        col_bin_list.append(col_bin)
                        magnitude_list.append(weight * gradient_magnitude)
                        orientation_bin_list.append((gradient_orientation - angle) * bins_per_degree)

        # 将magnitude分配到4*4*8（d*d*num_bins）的各区域中，即分配到histogram_tensor数组中
        # 子行 子列 权重 子方向
        for row_bin, col_bin, magnitude, orientation_bin in zip(row_bin_list, col_bin_list, magnitude_list, orientation_bin_list):
            # Smoothing via trilinear interpolation
            # Notations follows https://en.wikipedia/wiki/Trilinear_interpolation

            # Note that we are really doing the inverse of trilinear interpolation here (we take the center value of the cube and distribute it among its eight neighbors)
            # 三线性插值
            row_bin_floor, col_bin_floor, orientation_bin_floor = floor([row_bin, col_bin, orientation_bin]).astype(int)

            row_fraction, col_fraction, orientation_fraction = row_bin - row_bin_floor, col_bin - col_bin_floor, orientation_bin - orientation_bin_floor
            if orientation_bin_floor < 0:
                orientation_bin_floor += num_bins
            if orientation_bin_floor >= num_bins:
                orientation_bin_floor -= num_bins
            # 先按行进行线性插值
            c1 = magnitude * row_fraction  # 幅值 * 插值
            c0 = magnitude * (1 - row_fraction)  # 幅值 * (1 - 插值)
            # 对每一个行向量，进行y方向的线性插值
            c11 = c1 * col_fraction
            c10 = c1 * (1 - col_fraction)
            c01 = c0 * col_fraction
            c00 = c0 * (1 - col_fraction)
            # 再分配到对应的梯度方向上
            c111 = c11 * orientation_fraction
            c110 = c11 * (1 - orientation_fraction)
            c101 = c10 * orientation_fraction
            c100 = c10 * (1 - orientation_fraction)
            c011 = c01 * orientation_fraction
            c010 = c01 * (1 - orientation_fraction)
            c001 = c00 * orientation_fraction
            c000 = c00 * (1 - orientation_fraction)

            # 所有的数值存到4*4*8的128维数组，形成128维向量
            histogram_tensor[row_bin_floor + 1, col_bin_floor + 1, orientation_bin_floor] += c000
            histogram_tensor[row_bin_floor + 1, col_bin_floor + 1, (orientation_bin_floor + 1) % num_bins] += c001
            histogram_tensor[row_bin_floor + 1, col_bin_floor + 2, orientation_bin_floor] += c010
            histogram_tensor[row_bin_floor + 1, col_bin_floor + 2, (orientation_bin_floor + 1) % num_bins] += c011
            histogram_tensor[row_bin_floor + 2, col_bin_floor + 1, orientation_bin_floor] += c100
            histogram_tensor[row_bin_floor + 2, col_bin_floor + 1, (orientation_bin_floor + 1) % num_bins] += c101
            histogram_tensor[row_bin_floor + 2, col_bin_floor + 2, orientation_bin_floor] += c110
            histogram_tensor[row_bin_floor + 2, col_bin_floor + 2, (orientation_bin_floor + 1) % num_bins] += c111

        # histogram_tensor为当前关键点的128维向量，要对其进行归一化处理，下述化成了一维
        descriptor_vector = histogram_tensor[1:-1, 1:-1, :].flatten()  # Remove histogram borders
        # 归一化描述符
        # Threshold and normalize descriptor_vector
        # 描述子的门限化，大于0.2就等于0.2，小于0.2的保持不变
        threshold = norm(descriptor_vector) * descriptor_max_value
        descriptor_vector[descriptor_vector > threshold] = threshold
        # 归一化 norm范数 模 即h1 + ..... hj  归一化处理
        descriptor_vector /= max(norm(descriptor_vector), float_tolerance)
        # Multiply by 512, round, and saturate between 0 and 255 to convert from float32 to unsigned char (OpenCV convention)
        descriptor_vector = round(512 * descriptor_vector)
        descriptor_vector[descriptor_vector < 0] = 0
        descriptor_vector[descriptor_vector > 255] = 255
        descriptors.append(descriptor_vector)
    return array(descriptors, dtype='float32')

template_matching_demo.py – 读取图片和匹配的代码

import numpy as np
import cv2
import pysift
from matplotlib import pyplot as plt
import logging
logger = logging.getLogger(__name__)

MIN_MATCH_COUNT = 10

img1 = cv2.imread('box.png', 0)
img2 = cv2.imread('box_in_scene.png', 0)

# Compute SIFT keypoints and descriptors
kp1, des1 = pysift.computeKeypointsAndDescriptors(img1)
kp2, des2 = pysift.computeKeypointsAndDescriptors(img2)

# Initialize and use FLANN
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks = 50)

flann = cv2.FlannBasedMatcher(index_params, search_params)
"""
    返回的俩个DMatch数据结构类型，
        queryIdx，trainIdx，distance
        queryIdx：测试图像的特征点描述符的下标（第几个特征点描述符），同时也是描述符对应特征点的下标。
        trainIdx：样本图像的特征点描述符下标,同时也是描述符对应特征点的下标。
        distance：代表这一次匹配的特征点描述符的欧式距离，数值越小也就说明俩个特征点越相近。----相似性的度量
    这俩个DMatch数据类型是俩个与原图像特征点最接近的俩个特征点（match返回的是最匹配的）
    只有这俩个特征点的欧式距离小于一定值的时候才会认为匹配成功。
"""
matches = flann.knnMatch(des1, des2, k=2)

# Lowe's ratio test
good = []
for m, n in matches:
    if m.distance < 0.7 * n.distance:
        # 匹配成功放入good
        good.append(m)

if len(good) > MIN_MATCH_COUNT:
    # Estimate homography between template and scene
    src_pts = np.float32([ kp1[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)
    dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)

    M = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)[0]

    # Draw detected template in scene image
    h, w = img1.shape
    pts = np.float32([[0, 0],
                      [0, h - 1],
                      [w - 1, h - 1],
                      [w - 1, 0]]).reshape(-1, 1, 2)
    dst = cv2.perspectiveTransform(pts, M)

    img2 = cv2.polylines(img2, [np.int32(dst)], True, 255, 3, cv2.LINE_AA)

    # 组合两幅图
    h1, w1 = img1.shape
    h2, w2 = img2.shape

    nWidth = w1 + w2
    nHeight = max(h1, h2)
    hdif = int((h2 - h1) / 2)
    newimg = np.zeros((nHeight, nWidth, 3), np.uint8)

    for i in range(3):
        newimg[hdif:hdif + h1, :w1, i] = img1
        newimg[:h2, w1:w1 + w2, i] = img2

    # Draw SIFT keypoint matches
    for m in good:
        pt1 = (int(kp1[m.queryIdx].pt[0]), int(kp1[m.queryIdx].pt[1] + hdif))
        pt2 = (int(kp2[m.trainIdx].pt[0] + w1), int(kp2[m.trainIdx].pt[1]))
        cv2.line(newimg, pt1, pt2, (255, 0, 0))

    plt.imshow(newimg)
    plt.show()
else:
    print("Not enough matches are found - %d/%d" % (len(good), MIN_MATCH_COUNT))

参考资料

全网的所有资料我几乎都查阅过，肯定会有遗漏，若未注到请私信。

OpenCV官方介绍
Lowe D . Distinctive image features from scale-invariant key points[J]. International Journal of Computer Vision, 2003, 20:91-110.
Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). Ieee, 2005, 1: 886-893.
https://wwwblogs/fcfc940503/p/11492540.html
https://max.book118/html/2015/1221/31774000.shtm
https://blog.csdn/qq_33328642/article/details/109660609
https://blog.csdn/carrine/article/details/123697849
B站大神：会飞的吴克–BV1Qb411W7cK BV1KK4y1t7qs
B站大神：飛鳥的日常记录–BV1cu411z75M
B站大神：十万伏特丘比特–BV1oh411x7cf

本文标签：算法详解代码 Visual SLAM

版权声明：本文标题：SLAM-Visual Navigation学习之SIFT算法与代码详解内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://m.elefans.com/xitong/1728871563a1177331.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

SLAM-Visual Navigation学习之SIFT算法与代码详解

SIFT算法

文章目录

一、特征点，关键点，角点？

二、前置知识

1.尺度

2.卷积

3.高斯函数

4.高斯卷积（模糊）

三、SIFT算法的引入

Harris算法缺陷：

1.尺度问题

2.旋转问题

SIFT引入：

四、SIFT算法步骤

1. Scale-space Extrema Detection——尺度空间极值检测

2. Keypoint Localization——关键点定位

3. Orientation Assignment——确定主方向

4. Keypoint Descriptor——生成描述符

5. Keypoint Matching——描述符匹配

五、尺度空间极值检测

1. 图像初始化

2. 计算高斯金字塔层数

3. 生成高斯核

4. 生成高斯金字塔

5. 生成高斯差分金字塔

6. 极值检测

7.小结

六、关键点定位与舍弃

1. 关键点定位

2. 关键点舍弃

1）去除对比度低的点

2）边缘效应

3. 小结

七、确定主方向

八、生成描述符

1. 前置知识

1）线性插值

2）双线性插值

3）三线性插值

4）插值的应用

2. 生成描述符

1）旋转至主方向

2）生成描述符

3）限制

3.小结

总结

代码

参考资料

更多相关文章

mac简体拼音打出来是英文_Mac OS X自带中文拼音输入法详解

暑假算法训练day3

祖玛java代码_简单祖玛游戏源码

贪心算法

北林oj-算法设计与分析-A dice game(思路+代码）

2018北大暑校acm算法训练课程 Tian Ji -- The Horse Racing 贪心

笔记本网卡wireless-AC-9462驱动设备问题（代码10）导致搜索不到无线网络 -- 已解决

重装系统出现蓝屏代码0X000000a5怎么办

图像相似度算法--SIFT算法详解

尺度不变特征变换（SIFT算法）Matlab程序代码测试例子的说明(Lowe的代码）

一文详解LeGO-LOAM中的点云分割

Asterisk Chan_dahdi中的Group组策略实现代码分析

图像特征匹配方法——SIFT算法原理及实现

TEB算法详解 参数详解

图像处理中经典的特征检测和提取算法

翻译 | ORB-SLAM: a Versatile and Accurate Monocular SLAM System（ORB-SLAM：一种通用的(全能的)精确的单目SLAM系统）

陌生单词-专业英语代码编码符号1

计算机视觉SIFT算法详解

你真的理解图像处理经典算法SIFT吗？最全面综述：尺度不变特征转换

Visual Studio 2010 和 SQL Server 2008 在Win10下的安装

发表评论

推荐文章

报错：Failure to find xx:pom:1.0.0 in was cached in http: the local repository

ESP32学习笔记----WiFi使用(网页控制)

Eclipse安装教程 ——史上最详细安装Java &amp;Python教程说明

win7开启iis服务

AIDL中should be declared in a file错误

热门文章

Flink 异常 - 10.checkpoint Failure reason: Not all required tasks are currently running

TEB算法详解参数详解

Eclipse安装教程 ——史上最详细安装Java &Python教程说明

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载