admin管理员组文章数量:1609899
直线回归数据 离群值
An in-depth look at How Outliers can cause a poor model fit and How to detect them
深入研究离群值如何导致较差的模型拟合以及如何检测到它们
Linear Regression is without a doubt one of the most widely used machine algorithms because of the simple mathematics behind it and the ease with which it can be implemented.
毫无疑问, L inear回归是最广泛使用的机器算法之一,因为它背后的简单数学和易于实现的算法。
But this simplicity comes with a series of assumptions which have to be met such as:
但是,这种简单性带有一系列必须满足的假设,例如:
1) Linearity
1)线性度
2)Homoscedasticity
2)同方性
3)Normality
3)常态
4)No Multicollinearity
4)没有多重共线性
I have gone through in detail in some of my previous articles on how to make sure these assumptions are met and taken care of.
在之前的一些文章中,我详细介绍了如何确保满足并假设这些假设。
In this article, I will be going over How Outliers can pose a serious problem for a Linear Regression model and how to detect them accordingly.
在本文中,我将探讨“离群值”如何对线性回归模型造成严重的问题以及如何相应地检测它们。
什么是离群值? (What are Outliers?)
Outliers are data points that fall far away from the major“cluster” of points.
离群点是远离点的主要“簇”的数据点。
They can be legit data points carrying valuable information or can be erroneous values altogether. But in most of the projects, I have worked on,The Outliers present were mostly erroneous values which made little to no sense.
它们可能是携带有价值信息的合法数据点,也可能是完全错误的值。 但是在我从事的大多数项目中,当前的异常值大多是错误的,几乎没有意义。
离群值如何影响模型? (How do Outliers affect the model?)
To better understand How Outliers can cause problems, I will be going over an example Linear Regression problem wit
版权声明:本文标题:直线回归数据 离群值_离群值如何在线性回归中构成问题。 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://m.elefans.com/dianzi/1728577269a1164625.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论