admin管理员组

文章数量:1609899

直线回归数据 离群值

An in-depth look at How Outliers can cause a poor model fit and How to detect them

深入研究离群值如何导致较差的模型拟合以及如何检测到它们

Linear Regression is without a doubt one of the most widely used machine algorithms because of the simple mathematics behind it and the ease with which it can be implemented.

毫无疑问, L inear回归是最广泛使用的机器算法之一,因为它背后的简单数学和易于实现的算法。

But this simplicity comes with a series of assumptions which have to be met such as:

但是,这种简单性带有一系列必须满足的假设,例如:

1) Linearity

1)线性度

2)Homoscedasticity

2)同方性

3)Normality

3)常态

4)No Multicollinearity

4)没有多重共线性

I have gone through in detail in some of my previous articles on how to make sure these assumptions are met and taken care of.

在之前的一些文章中,我详细介绍了如何确保满足并假设这些假设。

In this article, I will be going over How Outliers can pose a serious problem for a Linear Regression model and how to detect them accordingly.

在本文中,我将探讨“离群值”如何对线性回归模型造成严重的问题以及如何相应地检测它们。

什么是离群值? (What are Outliers?)

Outliers are data points that fall far away from the major“cluster” of points.

离群点是远离点的主要“簇”的数据点。

They can be legit data points carrying valuable information or can be erroneous values altogether. But in most of the projects, I have worked on,The Outliers present were mostly erroneous values which made little to no sense.

它们可能是携带有价值信息的合法数据点,也可能是完全错误的值。 但是在我从事的大多数项目中,当前的异常值大多是错误的,几乎没有意义。

离群值如何影响模型? (How do Outliers affect the model?)

To better understand How Outliers can cause problems, I will be going over an example Linear Regression problem wit

本文标签: 线性直线数据如何在