您在和我一起Softmax吗？|电子爱好者

admin管理员组
文章数量:1579085

Once upon a time, I was trying to train a speaker recognition model with TIMIT dataset. I used Alexnet since I wanted to try this with a smaller model first. I have used a softmax layer at the end. The inputs were spectrograms of voices of different people and labels were the speaker IDs. MSELoss was used with the PyTorch library. I left the model for hours to train but to no avail. I was wondering why.

从前，我试图用TIMIT数据集训练说话者识别模型。我使用Alexnet是因为我想先尝试使用较小的模型。我在最后使用了softmax层。输入是不同人的声音的声谱图，标签是讲话者ID。 MSELoss与PyTorch库一起使用。我把模型放了几个小时去训练，但徒劳无功。我想知道为什么。

I checked the output from the model (the output from the softmax). The elements of the output array were all equal to each other, for all the inputs I tried. This was really annoying. It seemed that the model did not learn anything at all. So I set out to investigate. This article contains some of my findings about the softmax function. First let’s examine the softmax function

我检查了模型的输出(softmax的输出)。对于我尝试的所有输入，输出数组的元素都彼此相等。真烦人。该模型似乎根本不学习任何东西。所以我开始调查。本文包含有关softmax函数的一些发现。首先让我们检查一下softmax函数

The equation above shows the softmax function of a vector x. As we can see softmax function contains exponential terms. The result of the exponential function can get very large with increasing input. Therefore for sufficiently large inputs, overflow errors can occur! So we need to make sure that the input does not get too large to cause this. Here by input I mean the input to the softmax function. So I tried to find when the exponential term gives an overflow error. The largest value without overflow was found to be 709 (at least in my machine).

上面的等式显示了向量x的softmax函数。我们可以看到softmax函数包含指数项。随着输入的增加，指数函数的结果会变得非常大。因此，对于足够大的输入，可能会发生溢出错误！因此，我们需要确保输入不会变得太大而导致这种情况。在这里，输入是指softmax函数的输入。所以我试图找到指数项何时给出溢出错误。发现没有溢出的最大值是709(至少在我的机器中)。

value=709
sm=np.exp(value)

Note that this value could change from machine to machine and from the library to library.

请注意，此值可能会在计算机之间和库之间变化。

Next I set out to explore how softmax behaves for large and small inputs. So I created input arrays sampled from normal distributions with varying mean values. And then plotted the statistics after taking the the softmax function. The size of the input was chosen to be 1000.

接下来，我着手探讨softmax在大小输入方面的表现。因此，我创建了从正态分布采样的平均值均不同的输入数组。然后采用softmax函数绘制统计数据。输入的大小选择为1000。

The code I used to do this is as follows (in python)

我用来执行此操作的代码如下(在python中)

means_list=[]
max_list=[]
sd_list=[]
x_axis=[]
sm_list=[]
for mean in range(0,40000):
    mean=mean/100
    sd=mean/10
    feature=normal(mean,sd,1000)
    sm=np.exp(feature)/np.sum(np.exp(feature)) 
    sm_list.append(sm)
    means_list.append(sm.mean())
    max_list.append(sm.max())
    sd_list.append(sm.std())
    x_axis.append(mean)

The following figure shows the mean values of the softmax function plotted against mean value of the input feature. As expected, it is constant. Well that is good so far.

下图显示了针对输入特征的平均值绘制的softmax函数的平均值。如预期的那样，它是恒定的。到目前为止，这很好。

Now let’s plot the max value of softmax vs the mean value of the input feature vector.

现在，让我们绘制softmax的最大值与输入特征向量的平均值的关系图。

We can see that for very small inputs, the result of softmax 0.001 (I had to print the array values for this). The input array had 1000 elements. Seems like under small inputs, softmax divides the output probabilities equally (1/1000) to the components even though the elements in the input feature array are not equal.

我们可以看到，对于非常小的输入，softmax 0.001的结果(我必须为此打印数组值)。输入数组有1000个元素。好像在较小的输入下，即使输入要素数组中的元素不相等，softmax也会将输出概率平均(1/1000)划分给各个分量。

Further insight can be taken by looking at the plot of standard deviation of the softmax values plotted against the mean value of the input feature vector.

通过查看针对输入特征向量平均值绘制的softmax值的标准偏差图，可以进一步了解。

The SD reaches 0 (meaning no variation among the probability values) when the input in small. Well this is not good. Lets observe a numerical example.

当输入较小时，SD达到0(表示概率值之间没有变化)。好吧，这不好。让我们观察一个数值示例。

feature=np.array([1.0,5.0,6.0,2.0])*1e-8
sm=np.exp(feature)/np.sum(np.exp(feature)) 
print(sm)>> [0.24999999 0.25       0.25000001 0.25      ]

As we can see inputs in the scale of 1e-8 causes softmax to output similar values making these useless. Well this was what happened to my model.

如我们所见，输入范围为1e-8的输入会导致softmax输出相似的值，从而使这些值无用。嗯，这就是我的模型发生的事情。

And again something awful happens when the inputs are very large. The max value of the softmax reaches 1. This means the other values must be close to 0. It can be seen that SD value also plateau. Now for larger values,

当输入很大时，又会发生可怕的事情。 softmax的最大值达到1。这意味着其他值必须接近0。可以看出SD值也趋于平稳。现在为更大的值，

feature=np.array([1.0,5.0,6.0,2.0])*100
sm=np.exp(feature)/np.sum(np.exp(feature)) 
print(sm)>>[7.12457641e-218 3.72007598e-044 1.00000000e+000 1.91516960e-174]

Only the 3rd element is 1 in the softmax output. The other are almost zero.

softmax输出中只有第三个元素为1。另一个几乎为零。

We can see that softmax does not represent the inputs distribution well for inputs too large or too small. So if our model produces values in these ranges before the softmax, the model will not learn anything because softmax is useless.

我们可以看到，对于太大或太小的输入，softmax不能很好地表示输入分布。因此，如果我们的模型在softmax之前的这些范围内产生值，则该模型将不会学到任何东西，因为softmax是无用的。

我们该怎么办？ (What can we do about this ?)

缩放输入 (Scaling the input)

One solution to these problems is standardizing the inputs before we send them to softmax.

这些问题的一种解决方案是在将输入发送到softmax之前对其进行标准化。

feature=(feature-feature.max())/(feature.max()-feature.min())

After doing this the plot of max value of the softmax against mean value of input features looked like below.

完成此操作后，softmax的最大值相对于输入特征的平均值的图如下所示。

It looks like those awkward values at very small and large input values are gone now which is good.

看起来那些输入值非常小而笨拙的值现在消失了，这很好。

使用log-softmax (Using log-softmax)

Sometimes taking log value of softmax can make the operation more stable. The equation for log softmax is simply taking log value of softmax (obviously!). But there are certain implications of doing this.

有时取softmax的对数值可以使操作更稳定。对数softmax的方程式只是采用softmax的对数值(显然！)。但是这样做有一定的含义。

On closer inspection, we can see that log softmax can be converted to the following form

通过仔细检查，我们可以看到log softmax可以转换为以下形式

This simplifies things a lot. The second term on the right hand side of the equation can be simplified with the method commonly called the log-sum-exp trick. This prevents overflow and underflow errors making log softmax more stable than bare softmax. Most of the libraries which calculate log-softmax use this log-sum-exp trick. For more about this, read this article :

这大大简化了事情。可以使用通常称为log-sum-exp技巧的方法来简化等式右边的第二项。这样可以防止出现上溢和下溢错误，从而使日志软最大比裸软最大更稳定。计算log-softmax的大多数库都使用此log-sum-exp技巧。有关此的更多信息，请阅读本文：

Let’s take a look at the variation of mean value of log softmax as we change the mean value of input to log softmax

让我们看一下将输入的平均值更改为对数softmax时对数softmax平均值的变化

Next max value of log softmax

日志softmax的下一个最大值

Then standard deviation

然后标准偏差

Looks like standard deviation increases when we increase the mean input feature values. The standard deviation does not reach zero when the input feature mean increases like in earlier problematic cases.

当我们增加平均输入要素值时，似乎标准偏差增加了。当输入特征均值增加时(如较早出现问题的情况)，标准差不会达到零。

In fact from my experiments I saw that using log-softmax, the model trained faster than with the min-max scaling. If you are using PyTorch, CrossEntropyLoss can be used since it already contains log-softmax.

实际上，根据我的实验，我发现使用log-softmax可以比使用max-max缩放模型更快地训练模型。如果您使用的是PyTorch，则可以使用CrossEntropyLoss，因为它已经包含log-softmax。

For my Jupyter Notebook (with plots and all) go to

对于我的Jupyter笔记本(包含所有图)，请转到

带走 (Takeaway)

Sometimes softmax can be numerically unstable (give overflow or underflow errors) or useless (all the outputs are the same or weird). So if your model is reluctant to learn anything, it could be due to this. In this case solve this problem with something like log-softmax. But some solutions may not work for a particular application. So we may have to experiment a bit.

有时softmax可能在数值上不稳定(产生上溢或下溢错误)或无用(所有输出相同或怪异)。因此，如果您的模型不愿学习任何东西，则可能是由于此。在这种情况下，可以使用log-softmax之类的方法解决此问题。但是某些解决方案可能不适用于特定的应用程序。因此，我们可能需要尝试一下。

翻译自: https://medium/swlh/are-you-messing-with-me-softmax-84397b19f399

本文标签：和我您在 softmax

版权声明：本文标题：您在和我一起Softmax吗？内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://m.elefans.com/dianzi/1727848066a1133326.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

您在和我一起Softmax吗？

我们该怎么办？ (What can we do about this ?)

缩放输入 (Scaling the input)

使用log-softmax (Using log-softmax)

带走 (Takeaway)

更多相关文章

面试要问面试官什么问题_这些是您在面试中要问的问题。

【BBuf的CUDA笔记】九，使用newbing（chatgpt）解析oneflow softmax相关的fuse优化

Softmax Strategy

网页中添加QQ链接，别人一点就能和我进行QQ聊天

周鸿祎：没必要和我争谁是网络一哥，价值第一

计算机c盘下的管理员,C盘多出一个和我管理员用户一样的文件夹

您在和我一起Softmax吗？

发表评论

推荐文章

win10 网卡驱动消失

PAT甲级 - 1003 Emergency （25 分）

Linux 用着太爽啦！！！

郑大计算机专业英语01章,郑州大学远程教育《大学英语II》第01章在线测试

聚观早报 | 哪吒L纯电版开启预售；OPPO Pad 3获3C认证

热门文章

About Eclipse Working Groups

WIFI基本知识

楼下邻居总偷网，改密码也没用，禁止他们联网，可过一会他们又能连上，该怎么办？...

电脑文件怎么加密

如何让你的电脑自动定时开机

windows服务器网站日志,windows服务器网站日志文件

微型计算机代表性机型,接下的旗舰机型将能频繁看到它！高通骁龙845解析

AnyMP4 MP3 Converter for Mac(mp3格式转换器)

1003 Emergency (25 分)

什么是搜索引擎？

最新文章

一芯FC1178BC主控U盘量产修复指南

慧荣SMISM3280AB开卡量产工具适用于无法识别设备黑片U盘量产工具修复使用

u盘无法识别怎么办，u盘无法识别解决方法

linux 下u盘分区修复无法识别问题解决

定了，6大领域93个开源任务，阿里开源导师带你参与中科院开源之夏2022

识别到硬盘 计算机不显示盘符,笔者教你修复可以识别u盘但不显示盘符的问题...

agio U盘强制弹出导致的无法识别需格式化的问题的修复方案

U盘无法与计算机连接,U盘无法连接电脑

通过修复VMware软件解决虚拟机无法识别到U盘设备的问题

@mysql数据库面试手册

Ubuntu及Debian下挂载U盘及exFat文件系统U盘无法挂载的解决

linux usb3.0无法识别u盘启动,Deepin 20系统能识别USB3.0：如果不能用请重启系统或重插几次...

为什么计算机无法读取u盘,电脑无法识别读取U盘怎么办？逐一排查解决问题

解决Ubuntu下U盘无法识别的问题

测试工程师「 面试题 」那点故事

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

如何实现高效的treenode搜索算法

treenode与链表有何本质区别

在哪些场景下应优先考虑使用treenode

treenode在树形结构中的角色是什么

如何通过treenode实现二叉树

识别到硬盘计算机不显示盘符,笔者教你修复可以识别u盘但不显示盘符的问题...

测试工程师「面试题」那点故事

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载