Adaptive Filtering|电子爱好者

admin管理员组
文章数量:1623796

Reference:
Slides of EE4C03, TUD
Hayes M H. Statistical digital signal processing and modeling

Content

- Steepest Descent Algorithm
- - Stability / Convergence
  - Converge rate
  - Mean-square error
- LMS Algorithm
- - Convergence in the mean
  - Convergence in the mean-square
- Normalized LMS
- Application: Noise Cancellation
- Recursive Least Squares

In the previous five chapters, we have considered a variety of different problems including signal modeling, Wiener filtering, and spectrum estimation. In each case, we made an important assumption that the signals that were being analyzed were stationary. Unfortunately, since the signals that arise in almost every application will be nonstationary, the approaches and techniques that we have been considering thus far would not be appropriate. Therefore, we are going to develop a better approach that begins with a nonstationary assumption at the outset.

To motivate the approach that we will be considering in this chapter, let us consider the Wiener filtering problem within the context of nonstationary processes:
d ^ ( n ) = ∑ k = 0 p w ( k ) x ( n − k ) \hat d(n)=\sum_{k=0}^p w(k)x(n-k) d^(n)=k=0∑pw(k)x(n−k)
with the Wiener-Hopf equations
R x w = r d x \mathbf R_x \mathbf w=\mathbf r_{dx} Rxw=rdx
However, if d ( n ) d(n) d(n) and x ( n ) x(n) x(n) are nonstationary, then the filter coefficients that minimize E { ∣ e ( n ) ∣ 2 } E\{|e(n)|^2\} E{∣e(n)∣2} will depend on n n n, and the filter will be shift-varying, i.e.,
d ^ ( n ) = ∑ k = 0 p w n ( k ) x ( n − k ) (1) \hat d(n)=\sum_{k=0}^p w_n(k)x(n-k)\tag{1} d^(n)=k=0∑pwn(k)x(n−k)(1)
where w n ( k ) w_n(k) wn(k) is the value of the k k kth filter coefficient at time n n n. Using vector notation, this estimate may be expressed as
d ^ ( n ) = w n T x ( n ) (2) \hat d(n)=\mathbf w_n^T \mathbf x(n)\tag{2} d^(n)=wnTx(n)(2)
where
w n = [ w n ( 0 ) , w n ( 1 ) , ⋯ , w n ( p ) ] T (3) \mathbf w_n=[w_n(0),w_n(1),\cdots,w_n(p)]^T\tag{3} wn=[wn(0),wn(1),⋯,wn(p)]T(3)
is the vector of filter coefficients at time n n n, and
x ( n ) = [ x ( n ) , x ( n − 1 ) , ⋯ , x ( n − p ) ] T (4) \mathbf x(n)=[x(n),x(n-1),\cdots, x(n-p)]^T\tag{4} x(n)=[x(n),x(n−1),⋯,x(n−p)]T(4)
The Wiener-Hopf equations will then become
R x ( n ) w n = r d x ( n ) (5) \mathbf R_x (n) \mathbf w_n=\mathbf r_{dx} (n)\tag{5} Rx(n)wn=rdx(n)(5)
where
R x ( n ) = [ E { x ( n ) x ∗ ( n ) } E { x ( n − 1 ) x ∗ ( n ) } ⋯ E { x ( n − p ) x ∗ ( n ) } E { x ( n ) x ∗ ( n − 1 ) } E { x ( n − 1 ) x ∗ ( n − 1 ) } ⋯ E { x ( n − p ) x ∗ ( n − 1 ) } ⋮ ⋮ ⋮ E { x ( n ) x ∗ ( n − p ) } E { x ( n − 1 ) x ∗ ( n − p ) } ⋯ E { x ( n − p ) x ∗ ( n − p ) } ] (6) \begin{aligned} &\begin{array}{lccc} \mathbf{R}_{x}(n)=\left[\begin{array}{cccc} E\left\{x(n) x^{*}(n)\right\} & E\left\{x(n-1) x^{*}(n)\right\} & \cdots & E\left\{x(n-p) x^{*}(n)\right\} \\ E\left\{x(n) x^{*}(n-1)\right\} & E\left\{x(n-1) x^{*}(n-1)\right\} & \cdots & E\left\{x(n-p) x^{*}(n-1)\right\} \\ \vdots & \vdots & & \vdots \\ E\left\{x(n) x^{*}(n-p)\right\} & E\left\{x(n-1) x^{*}(n-p)\right\} & \cdots & E\left\{x(n-p) x^{*}(n-p)\right\} \end{array}\right] \\ & \end{array}\\ \end{aligned}\tag{6} Rx(n)=⎣⎢⎢⎢⎡E{x(n)x∗(n)}E{x(n)x∗(n−1)}⋮E{x(n)x∗(n−p)}E{x(n−1)x∗(n)}E{x(n−1)x∗(n−1)}⋮E{x(n−1)x∗(n−p)}⋯⋯⋯E{x(n−p)x∗(n)}E{x(n−p)x∗(n−1)}⋮E{x(n−p)x∗(n−p)}⎦⎥⎥⎥⎤(6)
is a ( p + 1 ) × ( p + 1 ) (p+1) \times(p+1) (p+1)×(p+1) Hermitian matrix of autocorrelations and
r d x ( n ) = [ E { d ( n ) x ∗ ( n ) } , E { d ( n ) x ∗ ( n − 1 ) } , ⋯ , E { d ( n ) x ∗ ( n − p ) } ] T (7) \mathbf{r}_{d x}(n)=\left[E\left\{d(n) x^{*}(n)\right\}, E\left\{d(n) x^{*}(n-1)\right\}, \cdots, E\left\{d(n) x^{*}(n-p)\right\}\right]^{T}\tag{7} rdx(n)=[E{d(n)x∗(n)},E{d(n)x∗(n−1)},⋯,E{d(n)x∗(n−p)}]T(7)
Solving ( 6 ) (6) (6) would be impractical in most real-time implementations. However, the problem may be simplified considerably if we relax the requirement that w n \mathbf w_n wn minimize the mean-square error at each time n n n and consider, instead, a coefficient update equation of the form
w n + 1 = w n + Δ w n (8) \mathbf w_{n+1}=\mathbf w_n+\Delta \mathbf w_n\tag{8} wn+1=wn+Δwn(8)
where Δ w n \Delta \mathbf w_n Δwn is a correction that is applied to the filter coefficients w n \mathbf w_n wn at time n n n to form a new set of coefficients, w n + 1 \mathbf w_{n+1} wn+1 at time n + 1 n+1 n+1.

Even for stationary process, we may still prefer to implement a time-invariant Wiener filter using ( 8 ) (8) (8) when p p p is large or R x \mathbf R_x Rx is ill-conditioned. Besides, solving the Wiener-Hopf equations requires that the r x ( k ) r_x(k) rx(k) and r d x ( k ) r_{dx}(k) rdx(k) be known. Since these ensemble averages are typically unknown, then it is necessary to estimate them from measurements of the processes and build them into the adaptive filter.

Steepest Descent Algorithm

Let w n \mathbf w_n wn be an estimate of the vector that minimizes the mean-square error ξ ( n ) \xi(n) ξ(n) at time n n n. At time n + 1 n + 1 n+1 a new estimate is formed by adding a correction to w n \mathbf w_n wn that is designed to bring w n \mathbf w_n wn closer to the desired solution. The correction involves taking a step of size μ \mu μ in the direction of maximum descent down the quadratic error surface.

As shown in Fig. 9.4b, for any vector w \mathbf w w, the gradient is orthogonal to the line that is tangent to the contour of constant error at w \mathbf w w. However, since the gradient vector points in the direction of steepest ascent, the direction of steepest descent points in the negative gradient direction. Thus, the update equation for w n \mathbf w_n wn is
w n + 1 = w n − μ ∇ ξ ( n ) (SD.1) \mathbf w_{n+1}=\mathbf w_n-\mu \nabla \xi(n) \tag{SD.1} wn+1=wn−μ∇ξ(n)(SD.1)
The step size μ μ μ affects the rate at which the weight vector moves down the quadratic surface and must be a positive number.

Let us now evaluate the gradient vector ∇ ξ ( n ) \nabla \xi (n) ∇ξ(n). Assuming that w \mathbf w w is complex, the gradient is the derivative of E { ∣ e ( n ) ∣ 2 } E\{|e(n)|^2\} E{∣e(n)∣2} with respect to w ∗ \mathbf w^* w∗. With
∇ ξ ( n ) = ∇ E { ∣ e ( n ) ∣ 2 } = E { e ( n ) ∇ e ∗ ( n ) } \nabla \xi(n)=\nabla E\{|e(n)|^2\}=E\{e(n)\nabla e^*(n) \} ∇ξ(n)=∇E{∣e(n)∣2}=E{e(n)∇e∗(n)}
Since e ( n ) = d ( n ) − w T x ( n ) e(n)=d(n)-\mathbf w^T \mathbf x(n) e(n)=d(n)−wTx(n), we have
∇ ξ ( n ) = − E { e ( n ) x ∗ ( n ) } (SD.2) \nabla \xi(n)=-E\{e(n)\mathbf x^*(n) \}\tag{SD.2} ∇ξ(n)=−E{e(n)x∗(n)}(SD.2)
Thus, ( S D . 1 ) (SD.1) (SD.1) becomes
w n + 1 = w n + μ E { e ( n ) x ∗ ( n ) } (SD.3) \mathbf w_{n+1}=\mathbf w_n+\mu E\{e(n)\mathbf x^*(n) \} \tag{SD.3} wn+1=wn+μE{e(n)x∗(n)}(SD.3)
To see how this steepest descent update equation for w n \mathbf w_n wn performs, let us consider what happens in the case of stationary processes. If x ( n ) x(n) x(n) and d ( n ) d(n) d(n) are jointly WSS then
w n + 1 = w n + μ E { [ d ( n ) − w n T x ( n ) ] x ∗ ( n ) } = w n + μ ( r d x − R x w n ) (SD.4) \begin{aligned} \mathbf w_{n+1}&=\mathbf w_n+\mu E\{[d(n)-\mathbf w_n^T\mathbf x(n)]\mathbf x^*(n) \}\\ &=\mathbf w_n+\mu(\mathbf r_{dx}-\mathbf R_x\mathbf w_n) \end{aligned}\tag{SD.4} wn+1=wn+μE{[d(n)−wnTx(n)]x∗(n)}=wn+μ(rdx−Rxwn)(SD.4)
Note that if w n \mathbf w_n wn is the solution to the Wiener-Hopf equations, w n = R x − 1 r d x \mathbf w_n=\mathbf R_x^{-1}\mathbf r_{dx} wn=Rx−1rdx, then the correlation term is zero and w n + 1 = w n \mathbf w_{n+1}=\mathbf w_n wn+1=wn for all n n n.

The steepest descent algorithm may be summarized as follows:

Initialize the steepest descent algorithm with an initial estimate, w 0 , \mathbf{w}_{0}, w0, of the optimum weight vector w \mathbf{w} w.
Evaluate the gradient of ξ ( n ) \xi(n) ξ(n) at the current estimate, w n , \mathbf{w}_{n}, wn, of the optimum weight vector.
Update the estimate at time n n n by adding a correction that is formed by taking a step of size μ \mu μ in the negative gradient direction

w n + 1 = w n + μ ( r d x − R x w n ) \mathbf{w}_{n+1}=\mathbf w_n+\mu(\mathbf r_{dx}-\mathbf R_x\mathbf w_n) wn+1=wn+μ(rdx−Rxwn)

Go back to (2) and repeat the process.

Stability / Convergence

Of greater interest, however, is how the weights evolve in time, beginning with an arbitrary initial weight vector w 0 \mathbf w_0 w0. The following property defines what is required for w n \mathbf w_n wn to converge to w \mathbf w w.

For jointly wide-sense stationary process, d ( n ) d(n) d(n) and x ( n ) x(n) x(n), the steepest descent adaptive filter converges to the solution to the Wiener-Hopf equations
lim ⁡ n → ∞ w n = R x − 1 r d x \lim_{n\to \infty} \mathbf w_n=\mathbf R_x^{-1}\mathbf r_{dx} n→∞limwn=Rx−1rdx
if the step size satisfies the condition
0 < μ < 2 λ m a x (SD.5) 0<\mu<\frac{2}{\lambda_{max}}\tag{SD.5} 0<μ<λmax2(SD.5)
where λ m a x \lambda_{max} λmax is the maximum eigenvalue of the autocorrelation matrix R x \mathbf R_x Rx.

We first define a weight error vector
c n = w n − w (SD.6) \mathbf c_n=\mathbf w_n-\mathbf w \tag{SD.6} cn=wn−w(SD.6)
We will then prove that lim ⁡ n → ∞ c n = 0 \lim_{n\to \infty}\mathbf c_n=0 limn→∞cn=0. By applying ( S D . 4 ) (SD.4) (SD.4) and the Wiener-Hopf equations R x w = r d x \mathbf R_x \mathbf w=\mathbf r_{dx} Rxw=rdx, we can develop
c n + 1 = w n + 1 − w = ( I − u R x ) w n + μ R x w − w = [ I − μ R x ] ( w n − w ) = [ I − μ R x ] c n (SD.7) \begin{aligned} \mathbf c_{n+1}&=\mathbf w_{n+1}-\mathbf w=(\mathbf I-\,u \mathbf R_x)\mathbf w_n+\mu \mathbf R_x \mathbf w-\mathbf w\\ &=[\mathbf I-\mu \mathbf R_x](\mathbf w_n-\mathbf w)=[\mathbf I-\mu \mathbf R_x]\mathbf c_n \end{aligned}\tag{SD.7} cn+1=wn+1−w=(I−uRx)wn+μRxw−w=[I−μRx](wn−w)=[I−μRx]cn(SD.7)
Since R x = R x H \mathbf R_x=\mathbf R_x^H Rx=RxH and R x ⪰ 0 \mathbf R_x \succeq 0 Rx⪰0, it can be factored as
R x = V Λ V H \mathbf R_x=\mathbf V \boldsymbol {\Lambda} \mathbf V^H Rx=VΛVH
with V V H = I \mathbf V \mathbf V^H=\mathbf I VVH=I and λ k ≥ 0 \lambda_k\ge0 λk≥0. Incorporating this factorization into ( S D . 7 ) (SD.7) (SD.7) leads to
c n + 1 = [ I − μ V Λ V H ] c n = V ( I − μ Λ ) V H c n (SD.8) \mathbf c_{n+1}=[\mathbf I-\mu \mathbf V \boldsymbol {\Lambda} \mathbf V^H]\mathbf c_n=\mathbf V(\mathbf I-\mu \boldsymbol \Lambda)\mathbf V^H\mathbf c_n\tag{SD.8} cn+1=[I−μVΛVH]cn=V(I−μΛ)VHcn(SD.8)
If we define
u n = V H c n (SD.9) \mathbf u_n=\mathbf V^H \mathbf c_n \tag{SD.9} un=VHcn(SD.9)
then ( S D . 8 ) (SD.8) (SD.8) becomes
u n + 1 = ( I − μ Λ ) u n (SD.10) \mathbf u_{n+1}=(\mathbf I-\mu \boldsymbol \Lambda)\mathbf u_n\tag{SD.10} un+1=(I−μΛ)un(SD.10)

With an initial weight vector u 0 \mathbf u_0 u0, it follows that
u n = ( I − μ Λ ) n u 0 (SD.11) \mathbf u_{n}=(\mathbf I-\mu \boldsymbol \Lambda)^n\mathbf u_0\tag{SD.11} un=(I−μΛ)nu0(SD.11)
In order for w n \mathbf w_n wn to converge to w \mathbf w w it is necessary that the weight error vector c n \mathbf c_n cn converge to zero and, therefore, that u n = V H c n \mathbf u_n = \mathbf V^H \mathbf c_n un=VHcn converge to zero. This will occur for any u 0 \mathbf u_0 u0 if and only if
∣ 1 − μ λ k ∣ < 1 ; k = 0 , 1 , ⋯ , p (SD.12) |1-\mu \lambda_k|<1;\quad k=0,1,\cdots,p \tag{SD.12} ∣1−μλk∣<1;k=0,1,⋯,p(SD.12)
which places the following restriction on the step size μ \mu μ
0 < μ < 2 λ m a x (SD.5) 0<\mu<\frac{2}{\lambda_{max}}\tag{SD.5} 0<μ<λmax2(SD.5)
as was to be shown.

Converge rate

From ( S D . 11 ) (SD.11) (SD.11), we can see that each entry of u n \mathbf u_n un converges with a rate determined by ∣ 1 − μ λ i ∣ |1-\mu \lambda_i| ∣1−μλi∣. If 1 − μ λ m a x > 0 1-\mu \lambda_{max}>0 1−μλmax>0, then the slowest mode is determined by λ m i n \lambda_{min} λmin.

Define a time constant τ \tau τ such that
( 1 − μ λ m i n ) τ = 1 e (SD.13) (1-\mu \lambda_{min})^\tau=\frac{1}{\rm e} \tag{SD.13} (1−μλmin)τ=e1(SD.13)
For sufficiently small μ \mu μ,
τ = − 1 ln ⁡ ( 1 − μ λ m i n ) ≈ 1 μ λ m i n (SD.14) \tau=\frac{-1}{\ln (1-\mu \lambda_{min})}\approx \frac{1}{\mu \lambda_{min}}\tag{SD.14} τ=ln(1−μλmin)−1≈μλmin1(SD.14)
If μ = 1 / λ m a x \mu=1/\lambda_{max} μ=1/λmax, then
τ ≈ λ m a x λ m i n = c o n d ( R x ) (SD.15) \tau \approx \frac{\lambda_{max}}{\lambda_{min}}=\mathrm{cond}(\mathbf R_x)\tag{SD.15} τ≈λminλmax=cond(Rx)(SD.15)
If the eigenvalues of R x \mathbf R_x Rx are widely spread then convergence will be slow.

Mean-square error

For wide-sense stationary processes the minimum mean-square error is
ξ m i n = r d ( 0 ) − r d x H w (SD.16) \xi_{min}=r_d(0)-\mathbf r_{dx}^H\mathbf w \tag{SD.16} ξmin=rd(0)−rdxHw(SD.16)
For an arbitrary weight vector w n \mathbf w_n wn, the mean-square error is
ξ ( n ) = E { ∣ d ( n ) − w n T x ( n ) ∣ 2 } = r d ( 0 ) − r d x H w n − w n H r d x + w n H R x w n = r d ( 0 ) − r d x H ( w + c n ) − ( w + c n ) H r d x + ( w + c n ) H R x ( w + c n ) = r d ( 0 ) − r d x H w n + c n H R x c n = ξ m i n + c n H R x c n = ξ m i n + u n H Λ x u n = ξ m i n + ∑ k = 0 P λ k ( 1 − μ λ k ) 2 n ∣ u 0 ( k ) ∣ 2 (SD.17) \begin{aligned} \xi(n)&=E\{|d(n)-\mathbf w_n^T\mathbf x(n)|^2\}=r_d(0)-\mathbf r_{dx}^H \mathbf w_n-\mathbf w_n^H \mathbf r_{dx}+\mathbf w_n ^H \mathbf R_x \mathbf w_n\\ &=r_d(0)-\mathbf r_{dx}^H (\mathbf w+\mathbf c_n)-(\mathbf w+\mathbf c_n)^H \mathbf r_{dx}+(\mathbf w+\mathbf c_n) ^H \mathbf R_x (\mathbf w+\mathbf c_n)\\ &=r_d(0)-\mathbf r_{dx}^H \mathbf w_n+\mathbf c_n^H\mathbf R_x\mathbf c_n\\ &=\xi_{min}+\mathbf c_n^H\mathbf R_x \mathbf c_n\\ &=\xi_{min}+\mathbf u_n^H\boldsymbol \Lambda_x \mathbf u_n\\ &=\xi_{min}+\sum_{k=0}^P \lambda_k(1-\mu \lambda_k)^{2n}|u_0(k)|^2 \end{aligned}\tag{SD.17} ξ(n)=E{∣d(n)−wnTx(n)∣2}=rd(0)−rdxHwn−wnHrdx+wnHRxwn=rd(0)−rdxH(w+cn)−(w+cn)Hrdx+(w+cn)HRx(w+cn)=rd(0)−rdxHwn+cnHRxcn=ξmin+cnHRxcn=ξmin+unHΛxun=ξmin+k=0∑Pλk(1−μλk)2n∣u0(k)∣2(SD.17)
Thus, if the step size μ μ μ satisfies the condition for convergence given in ( S D . 5 ) (SD.5) (SD.5), then ξ ( n ) \xi (n) ξ(n) decays exponentially to ξ m i n \xi_{min} ξmin.

LMS Algorithm

In previous section, we developed the weight-vector update equation given by
w n + 1 = w n − μ ∇ ξ ( n ) = w n + μ ( r d x − R x w n ) \mathbf{w}_{n+1}=\mathbf w_n-\mu \nabla \xi(n)=\mathbf w_n+\mu(\mathbf r_{dx}-\mathbf R_x\mathbf w_n) wn+1=wn−μ∇ξ(n)=wn+μ(rdx−Rxwn)
In practical, R x \mathbf R_x Rx and r d x \mathbf r_{dx} rdx are not perfectly known and have to be estimated from the data.

The Least-Mean-Square algorithm makes simple estimation based just on current samples
R ^ x = x ∗ ( n ) x T ( n ) , r ^ d x = x ∗ ( n ) d ( n ) (LMS.1) \hat {\mathbf R}_x=\mathbf x^*(n)\mathbf x^T(n),\quad \hat{\mathbf r}_{dx}=\mathbf x^*(n)d(n)\tag{LMS.1} R^x=x∗(n)xT(n),r^dx=x∗(n)d(n)(LMS.1)
The resulting instantaneous gradient estimate is
∇ ξ ^ ( w ) = R ^ x w n − r ^ d x = x ∗ ( n ) x T ( n ) w n − x ∗ ( n ) d ( n ) = − x ∗ ( n ) e ( n ) (LMS.2) \hat {\nabla \xi}(\mathbf w)=\hat{\mathbf R}_x\mathbf w_n-\hat{\mathbf r}_{dx}=\mathbf x^*(n)\mathbf x^T(n)\mathbf w_n-\mathbf x^*(n)d(n)=-\mathbf x^*(n)e(n)\tag{LMS.2} ∇ξ^(w)=R^xwn−r^dx=x∗(n)xT(n)wn−x∗(n)d(n)=−x∗(n)e(n)(LMS.2)
LMS algorithm: (initialized usually by w 0 = 0 {\mathbf w}_0=\mathbf 0 w0=0)
y ( n ) : = w n T x ( n ) e ( n ) : = d ( n ) − y ( n ) w n + 1 : = w n + μ e ( n ) x ∗ ( n ) (LMS.3) \begin{aligned} y(n) &:=\mathbf{w}_n^T \mathbf{x}(n) \\ e(n) &:=d(n)-y(n) \\ {\mathbf{w}}_{n+1} &:={\mathbf{w}}_n+\mu e(n) \mathbf{x}^{*}(n) \end{aligned}\tag{LMS.3} y(n)e(n)wn+1:=wnTx(n):=d(n)−y(n):=wn+μe(n)x∗(n)(LMS.3)

Convergence in the mean

Since w n \mathbf w_n wn is a vector of random variables, the convergence of the LMS algorithm must be considered within a statistical framework. Therefore, we will begin by assuming that x ( n ) x(n) x(n) and d ( n ) d(n) d(n) are jointly wide-sense stationary processes.

Assume that the data x ( n ) \mathbf x(n) x(n) and the LMS weight vector w n \mathbf w_n wn are statistically independent.

For jointly wide-sense stationary process, d ( n ) d(n) d(n) and x ( n ) x(n) x(n), the LMS algorithm converges in the mean, i.e.,
lim ⁡ n → ∞ E { w n } = R x − 1 r d x (LMS.4) \lim_{n\to \infty} E\{ \mathbf w_n\}=\mathbf R_x^{-1}\mathbf r_{dx}\tag{LMS.4} n→∞limE{wn}=Rx−1rdx(LMS.4)
if
0 < μ < 2 λ m a x (LMS.5) 0<\mu<\frac{2}{\lambda_{max}}\tag{LMS.5} 0<μ<λmax2(LMS.5)
where λ m a x \lambda_{max} λmax is the maximum eigenvalue of the autocorrelation matrix R x {\mathbf R}_x Rx.

The LMS coefficient update equation:
w n + 1 = w n + μ [ d ( n ) − w n T x ( n ) ] x ∗ ( n ) \mathbf w_{n+1}=\mathbf w_n+\mu[d(n)-\mathbf w_n^T\mathbf x(n)]\mathbf x^*(n) wn+1=wn+μ[d(n)−wnTx(n)]x∗(n)
Taking the expected value, we have
E { w n + 1 } = E { w n } + μ E { d ( n ) x ∗ ( n ) } − μ E { w n T x ( n ) x ∗ ( n ) } = E { w n } + μ E { d ( n ) x ∗ ( n ) } − μ E { x ∗ ( n ) x T ( n ) w n } = ( I − μ R x ) E { w n } + μ r d x \begin{aligned} E\{\mathbf w_{n+1}\}&=E\{\mathbf w_{n}\}+\mu E\{d(n)\mathbf x^*(n)\}-\mu E\{\mathbf w_{n}^T\mathbf x(n)\mathbf x^*(n)\}\\ &=E\{\mathbf w_{n}\}+\mu E\{d(n)\mathbf x^*(n)\}-\mu E\{\mathbf x^*(n) \mathbf x^T(n)\mathbf w_n\}\\ &=(\mathbf I-\mu \mathbf R_x)E\{\mathbf w_{n}\}+\mu \mathbf r_{dx} \end{aligned} E{wn+1}=E{wn}+μE{d(n)x∗(n)}−μE{wnTx(n)x∗(n)}=E{wn}+μE{d(n)x∗(n)}−μE{x∗(n)xT(n)wn}=(I−μRx)E{wn}+μrdx
Defining a weight error vector
c n = w n − w (LMS.6) \mathbf c_n=\mathbf w_n-\mathbf w\tag{LMS.6} cn=wn−w(LMS.6)
Then
E { c n + 1 } = ( I − μ R x ) E { w n } + μ R x w − w = ( I − μ R x ) E { c n } = V ( I − μ Λ x ) V H E { c n } (LMS.7) \begin{aligned} E\{\mathbf c_{n+1}\} &=(\mathbf I-\mu \mathbf R_x)E\{\mathbf w_{n}\}+\mu \mathbf R_x \mathbf w-\mathbf w\\ &=(\mathbf I-\mu \mathbf R_x)E\{\mathbf c_{n}\}\\ &=\mathbf V(\mathbf I-\mu \boldsymbol \Lambda_x)\mathbf V^HE\{\mathbf c_{n}\} \end{aligned}\tag{LMS.7} E{cn+1}=(I−μRx)E{wn}+μRxw−w=(I−μRx)E{cn}=V(I−μΛx)VHE{cn}(LMS.7)
Similarly, define
u n = V H c n (LMS.8) \mathbf u_n=\mathbf V^H\mathbf c_n\tag{LMS.8} un=VHcn(LMS.8)
It follows from ( L M S . 6 ) (LMS.6) (LMS.6) that
E { u n } = ( I − μ Λ ) n u 0 (LMS.9) E\{\mathbf u_{n}\}=(\mathbf I-\mu \boldsymbol \Lambda)^n \mathbf u_0 \tag{LMS.9} E{un}=(I−μΛ)nu0(LMS.9)
In order for E { w n } E\{\mathbf w_n\} E{wn} to converge to w \mathbf w w it is necessary that
∣ 1 − μ λ k ∣ < 1 ⟹ 0 < μ < 2 λ m a x |1-\mu \lambda_k|<1\Longrightarrow0<\mu<\frac{2}{\lambda_{max}} ∣1−μλk∣<1⟹0<μ<λmax2

Convergence in the mean-square

Ensemble-average value of the LMS mean-square error:
ξ ( n ) = E { ∣ e ( n ) ∣ 2 } = E { ∣ d ( n ) − w n T x ( n ) ∣ 2 } = E { ∣ d ( n ) − ( c n + w ) T x ( n ) ∣ 2 } = E { ∣ d ( n ) − w T x ( n ) ∣ 2 } + E { c n H x ∗ ( n ) x T ( n ) c n } ≜ ξ m i n ( n ) + ξ e x ( n ) (LMS.10) \begin{aligned} \xi(n)&=E\{|e(n)|^2\}=E\{|d(n)-\mathbf w_n^T\mathbf x(n)|^2\}\\ &=E\{|d(n)-(\mathbf c_n+\mathbf w)^T\mathbf x(n)|^2\}\\ &=E\{|d(n)-\mathbf w^T\mathbf x(n)|^2\}+E\{\mathbf c_n^H\mathbf x^*(n)\mathbf x^T(n)\mathbf c_n\}\\ &\triangleq \xi_{min}(n)+\xi_{ex}(n) \end{aligned}\tag{LMS.10} ξ(n)=E{∣e(n)∣2}=E{∣d(n)−wnTx(n)∣2}=E{∣d(n)−(cn+w)Tx(n)∣2}=E{∣d(n)−wTx(n)∣2}+E{cnHx∗(n)xT(n)cn}≜ξmin(n)+ξex(n)(LMS.10)
where ξ m i n \xi_{min} ξmin is Wiener error and ξ e x \xi_{ex} ξex is excess error. And
ξ e x ( n ) = E { c n H x ∗ ( n ) x T ( n ) c n } = E { t r [ x ∗ ( n ) x T ( n ) c n c n H ] } ≜ t r ( R x K ) (LMS.11) \begin{aligned} \xi_{ex}(n)&=E\{\mathbf c_n^H\mathbf x^*(n)\mathbf x^T(n)\mathbf c_n\}\\ &=E\{\mathrm{tr} [\mathbf x^*(n)\mathbf x^T(n)\mathbf c_n \mathbf c_n^H]\}\\ &\triangleq \mathrm{tr}(\mathbf R_x \mathbf K) \end{aligned}\tag{LMS.11} ξex(n)=E{cnHx∗(n)xT(n)cn}=E{tr[x∗(n)xT(n)cncnH]}≜tr(RxK)(LMS.11)

The mean-square error ξ ( n ) \xi (n) ξ(n) converges to a steady-state value of
ξ ( ∞ ) = ξ m i n + ξ e x ( ∞ ) = ξ m i n 1 1 − μ ∑ k = 0 p λ K 2 − μ λ k (LMS.12) \xi (\infty)=\xi_{min}+\xi _{ex}(\infty)=\xi_{min}\frac{1}{1-\mu \sum_{k=0}^p\frac{\lambda_K}{2-\mu \lambda_k}}\tag{LMS.12} ξ(∞)=ξmin+ξex(∞)=ξmin1−μ∑k=0p2−μλkλK1(LMS.12)

and the LMS algorithm is said to converge in the mean-square if and only if the step-size μ \mu μ satisfies the following two conditions:
0 < μ < 2 λ m a x (LMS.13) 0<\mu<\frac{2}{\lambda_{max}}\tag{LMS.13}\\ 0<μ<λmax2(LMS.13)

μ ∑ k = 0 p λ K 2 − μ λ k < 1 (LMS.14) \mu \sum_{k=0}^p\frac{\lambda_K}{2-\mu \lambda_k}<1\tag{LMS.14} μk=0∑p2−μλkλK<1(LMS.14)

where λ k \lambda _k λk is the eigenvalue of R x \mathbf R_x Rx.

Note that ( L M S . 13 ) (LMS.13) (LMS.13) is the condition that is required for the LMS algorithm to converge in the mean, and ( L M S . 14 ) (LMS.14) (LMS.14) guarantees that ξ ( ∞ ) \xi(\infty) ξ(∞) is positive.

If μ ≪ 2 / λ m a x \mu \ll 2/\lambda_{max} μ≪2/λmax, as is typically the case, then μ λ k ≪ 2 \mu \lambda_k \ll 2 μλk≪2 and ( L M S . 14 ) (LMS.14) (LMS.14) may be simplified to
1 2 μ ∑ k = 0 p λ k = 1 2 μ t r ( R x ) < 1 (LMS.15) \frac{1}{2}\mu \sum_{k=0}^p \lambda_k=\frac{1}{2}\mu \mathrm{tr}(\mathbf R_x)<1 \tag{LMS.15} 21μk=0∑pλk=21μtr(Rx)<1(LMS.15)
From ( L M S . 12 ) (LMS.12) (LMS.12), we have
ξ ( ∞ ) = ξ m i n + ξ e x ( ∞ ) ≈ ξ m i n 1 1 − 1 2 μ t r ( R x ) (LMS.16) \xi(\infty)=\xi_{min}+\xi _{ex}(\infty)\approx \xi_{min}\frac{1}{1-\frac{1}{2}\mu \mathrm{tr}(\mathbf R_x)}\tag{LMS.16} ξ(∞)=ξmin+ξex(∞)≈ξmin1−21μtr(Rx)1(LMS.16)
In summary, if
0 < μ < 2 t r ( R x ) (LMS.17) 0<\mu<\frac{2}{\mathrm{tr}(\mathbf R_x)}\tag{LMS.17} 0<μ<tr(Rx)2(LMS.17)
then the LMS converges in the mean square, and
ξ e x ( ∞ ) ≈ μ ξ m i n 1 2 t r ( R x ) 1 − 1 2 μ t r ( R x ) ≈ 1 2 μ ξ m i n t r ( R x ) (LMS.18) \xi_{ex}(\infty) \approx \mu \xi_{min} \frac{\frac{1}{2}\mathrm{tr}(\mathbf R_x)}{1-\frac{1}{2}\mu\mathrm{tr}(\mathbf R_x)}\approx \frac{1}{2}\mu \xi_{min} \mathrm{tr}(\mathbf R_x) \tag{LMS.18} ξex(∞)≈μξmin1−21μtr(Rx)21tr(Rx)≈21μξmintr(Rx)(LMS.18)

Thus, for small μ \mu μ, the excess mean-square error is proportional to the step size μ \mu μ.

Normalized LMS

For stationary processes, the LMS algorithm converges in the mean if 0 < μ < 2 / λ m a x 0<\mu<2/\lambda_{max} 0<μ<2/λmax, and converges in the mean-square if 0 < μ < 2 / t r ( R x ) 0<\mu<2/\mathrm{tr}(\mathbf R_x) 0<μ<2/tr(Rx). However, since R x \mathbf R_x Rx is generally unknown, then either λ m a x \lambda_{max} λmax or R x \mathbf R_x Rx must be estimated in order to use these bounds. One way around this difficulty is to use the fact that, for stationary processes, t r ( R x ) = ( p + l ) E { ∣ x ( n ) ∣ 2 } \mathrm{tr}(\mathbf R_x) = (p + l)E\{|x(n) |^2\} tr(Rx)=(p+l)E{∣x(n)∣2}. Therefore, the condition for mean-square convergence may be replaced with
0 < μ < 2 ( p + 1 ) E { ∣ x ( n ) ∣ 2 } (NLMS.1) 0<\mu<\frac{2}{(p+1)E\{|x(n) |^2\}}\tag{NLMS.1} 0<μ<(p+1)E{∣x(n)∣2}2(NLMS.1)
where E { ∣ x ( n ) ∣ 2 } E\{|x(n) |^2\} E{∣x(n)∣2} is the power in the process x ( n ) x(n) x(n). This power may be estimated using a time average such as
E ^ { ∣ x ( n ) ∣ 2 } = 1 p + 1 ∑ k = 0 p ∣ x ( n − k ) ∣ 2 (NLMS.2) \hat E\{ | x(n) |^2 \}=\frac{1}{p+1}\sum_{k=0}^p |x(n-k)|^2 \tag{NLMS.2} E^{∣x(n)∣2}=p+11k=0∑p∣x(n−k)∣2(NLMS.2)
which leads to the following bound on the step size for mean-square convergence:
0 < μ < 2 x H ( n ) x ( n ) (NLMS.3) 0<\mu<\frac{2}{\mathbf x^H(n)\mathbf x(n)}\tag{NLMS.3} 0<μ<xH(n)x(n)2(NLMS.3)
Therefore, a convenient way to incorporate this bound into the LMS filter is to use a (time-varying) step size of the form
μ ( n ) = β x H ( n ) x ( n ) = β ∥ x ( n ) ∥ 2 (NLMS.4) \mu(n)=\frac{\beta}{\mathbf x^H(n)\mathbf x(n)}=\frac{\beta}{\|\mathbf x (n)\|^2}\tag{NLMS.4} μ(n)=xH(n)x(n)β=∥x(n)∥2β(NLMS.4)
where β \beta β is a normalized step size with 0 < β < 2 0<\beta<2 0<β<2.

Replacing μ \mu μ in the LMS weight vector update equation with μ ( n ) μ(n) μ(n) leads to the Normalized LMS algorithm (NLMS), which is given by
w n + 1 = w n + β x ∗ ( n ) ∥ x ( n ) ∥ 2 e ( n ) (NLMS.5) \mathbf w_{n+1}=\mathbf w_n+\beta \frac{\mathbf x^*(n)}{\|\mathbf x(n)\|^2}e(n)\tag{NLMS.5} wn+1=wn+β∥x(n)∥2x∗(n)e(n)(NLMS.5)
Compared with the LMS algorithm, the normalized LMS algorithm requires additional computation to evaluate the normalization term ∥ x ( n ) ∥ 2 \|x(n)\|^2 ∥x(n)∥2. However, if this term is evaluated recursively as follows
∥ x ( n + 1 ) ∥ 2 = ∥ x ( n ) ∥ 2 + ∣ x ( n + 1 ) ∣ 2 − ∣ x ( n − p ) ∣ 2 (NLMS.6) \|\mathbf x(n+1)\|^2=\|\mathbf x(n)\|^2+|x(n+1)|^2-|x(n-p)|^2\tag{NLMS.6} ∥x(n+1)∥2=∥x(n)∥2+∣x(n+1)∣2−∣x(n−p)∣2(NLMS.6)
then the extra computation involves only two squaring operations, one addition, and one subtraction.

Application: Noise Cancellation

An important issue in the implementation of an adaptive filter that is apparent from Fig. 9 .1 is the requirement that the error signal, e ( n ) e(n ) e(n), be available to the adaptive algorithm. Since it is e ( n ) e(n) e(n) that allows the filter to measure its performance and determine how the filter coefficients should be modified, without e ( n ) e(n) e(n) the filter would not be able to adapt.

In some applications, acquiring d ( n ) d(n) d(n) is straightforward, which makes the evaluation of e ( n ) e(n) e(n) easy. For example: the problem of system identification. A plant produces an output d ( n ) d(n) d(n) in response to a known input x ( n ) x(n) x(n). The goal is to develop a model for the plant, W n ( z ) W_n(z) Wn(z), that produces a response d ^ ( n ) \hat d(n) d^(n) that is as close as possible to d ( n ) d(n) d(n). To account for observation noise in the measurement of the plant output, an additive noise source v ( n ) v(n) v(n) is shown in the figure.

However, in some applications, the acquisition of d ( n ) d(n) d(n) is not straightforward. For example: Consider the problem of interference cancellation in which a signal d ( n ) d (n) d(n) is observed in the presence of an interfering signal v ( n ) v(n) v(n),
x ( n ) = d ( n ) + v ( n ) x(n) = d(n) + v(n) x(n)=d(n)+v(n)
Since d ( n ) d(n) d(n) is unknown, the error sequence cannot be generated directly. However, the system shown below will generate a sequence that may be used by the adaptive filter to estimate d ( n ) d(n) d(n).

Specifically, suppose that d ( n ) d(n) d(n) and v ( n ) v(n) v(n) are real-valued, uncorrelated zero mean processes. In addition, suppose that d ( n ) d(n) d(n) is a narrowband process and that v ( n ) v(n) v(n) is a broadband process with an autocorrelation sequence that is approximately zero for lags k ≥ k 0 k\ge k_0 k≥k0.

We need E { ∣ d ( n ) − y ( n ) ∣ 2 } E\{|d(n)-y(n)|^2\} E{∣d(n)−y(n)∣2}, but the system can only provide E { ∣ d ( n ) + v ( n ) − y ( n ) ∣ 2 } E\{|d(n)+v(n)-y(n)|^2 \} E{∣d(n)+v(n)−y(n)∣2}. Fortunately, based on the assumptions above, we have
E { ∣ e ( n ) ∣ 2 } = E { ∣ d ( n ) + v ( n ) − y ( n ) ∣ 2 } = E { ∣ d ( n ) − y ( n ) ∣ 2 } + E { ∣ v ( n ) ∣ 2 } + 2 ℜ { E [ v ( n ) ( d ( n ) − y ( n ) ) ∗ ] } \begin{aligned} E\{|e(n)|^2\}=&E\{|d(n)+v(n)-y(n)|^2 \}\\ =&E\{|d(n)-y(n)|^2 \}+E\{|v(n)|^2 \}+2\Re\{ E [v(n)(d(n)-y(n))^* ]\} \end{aligned} E{∣e(n)∣2}==E{∣d(n)+v(n)−y(n)∣2}E{∣d(n)−y(n)∣2}+E{∣v(n)∣2}+2ℜ{E[v(n)(d(n)−y(n))∗]}
Note that
E [ v ( n ) ( d ( n ) − y ( n ) ) ∗ ] = E { v ( n ) y ∗ ( n ) } = E { v ( n ) [ ∑ k = 0 p w n ∗ ( k ) x ∗ ( n − k 0 − k ) ] } = ∑ k = 0 p w n ∗ ( k ) [ E { v ( n ) d ∗ ( n − k 0 − k ) } + E { v ( n ) v ∗ ( n − k 0 − k ) } ] = 0 \begin{aligned} E [v(n)(d(n)-y(n))^* ]&=E\{ v(n)y^*(n) \}=E\{v(n) [\sum_{k=0}^p w_n^*(k)x^*(n-k_0-k)]\}\\ &=\sum_{k=0}^p w^*_n(k)[E\{v(n)d^*(n-k_0-k)\}+E\{v(n)v^*(n-k_0-k)\}]\\ &=0 \end{aligned} E[v(n)(d(n)−y(n))∗]=E{v(n)y∗(n)}=E{v(n)[k=0∑pwn∗(k)x∗(n−k0−k)]}=k=0∑pwn∗(k)[E{v(n)d∗(n−k0−k)}+E{v(n)v∗(n−k0−k)}]=0
Therefore,
E { ∣ e ( n ) ∣ 2 } = E { ∣ v ( n ) ∣ 2 } + E { ∣ d ( n ) − y ( n ) ∣ 2 } E\{|e(n)|^2\}=E\{|v(n)|^2\}+E\{|d(n)-y(n)|^2\} E{∣e(n)∣2}=E{∣v(n)∣2}+E{∣d(n)−y(n)∣2}
Minimizing E { ∣ e ( n ) ∣ 2 } E\{|e(n)|^2\} E{∣e(n)∣2} is equivalent to minimizing E { ∣ d ( n ) − y ( n ) ∣ 2 } E\{|d(n)-y(n)|^2\} E{∣d(n)−y(n)∣2}. Thus, the output of the adaptive filter is the minimum mean-square estimate of d ( n ) d(n) d(n).

Recursive Least Squares

In each of the adaptive filtering methods discussed so far, we have considered gradient descent algorithms for the minimization of the mean-square error
ξ ( n ) = E { ∣ e ( n ) ∣ 2 } (RLS.1) \xi(n)=E\{|e(n)|^2\}\tag{RLS.1} ξ(n)=E{∣e(n)∣2}(RLS.1)
The difficulty with these methods is that they all require knowledge of the autocorrelation of the input process, E { x ( n ) x ∗ ( n − k ) } E\{x(n)x^*(n-k)\} E{x(n)x∗(n−k)}, and the cross-correlation between the input and the desired output, E { d ( n ) x ∗ ( n − k ) } E\{d(n)x^*(n-k)\} E{d(n)x∗(n−k)}. When this statistical information is unknown, we have forced to estimate these statistics from the data. In the LMS adaptive filter, these ensemble averages are estimated using instantaneous values,
E ^ { e ( n ) x ∗ ( n − k ) } = e ( n ) x ∗ ( n − k ) (RLS.2) \hat E\{ e(n)x^*(n-k) \}=e(n)x^*(n-k)\tag{RLS.2} E^{e(n)x∗(n−k)}=e(n)x∗(n−k)(RLS.2)
An alternative, is to consider error measures that do not include expectations and that may be computed directly from the data. For example, a least squares error
ε ( n ) = ∑ i = 0 n ∣ e ( i ) ∣ 2 (RLS.3) \varepsilon(n)=\sum_{i=0}^n |e(i)|^2\tag{RLS.3} ε(n)=i=0∑n∣e(i)∣2(RLS.3)
Another way to motivate it is to consider the Wiener receiver
w 0 = R x − 1 r d x \mathbf w_0=\mathbf R_x^{-1}\mathbf r_{dx} w0=Rx−1rdx
with R x = E { x ∗ ( n ) x T ( n ) } \mathbf R_x=E\{\mathbf x^*(n)\mathbf x^T(n)\} Rx=E{x∗(n)xT(n)} and r d x = E { x ∗ ( n ) d ( n ) } \mathbf r_{dx}=E\{\mathbf x^*(n)d(n)\} rdx=E{x∗(n)d(n)}.

With finite data, all expectations are estimated from the available data:
R ^ x = 1 N ∑ n = 0 N − 1 x ∗ ( n ) x T ( n ) = 1 N X ∗ X T r ^ d x = 1 N ∑ n = 0 N − 1 x ∗ ( n ) d ( n ) = 1 N X ∗ d T \begin{aligned} &\hat {\mathbf R}_x=\frac{1}{N}\sum_{n=0}^{N-1} \mathbf x^*(n)\mathbf x^T(n)=\frac{1}{N}\mathbf X^* \mathbf X^T\\ &\hat {\mathbf r}_{dx}=\frac{1}{N}\sum_{n=0}^{N-1} \mathbf x^*(n)d(n)=\frac{1}{N}\mathbf X^* \mathbf d^T \end{aligned} R^x=N1n=0∑N−1x∗(n)xT(n)=N1X∗XTr^dx=N1n=0∑N−1x∗(n)d(n)=N1X∗dT
Therefore,
w ^ 0 = R ^ x − 1 r ^ d x = ( X ∗ X T ) − 1 X ∗ d T (RLS.4) \hat {\mathbf w}_0=\hat{\mathbf R}_x^{-1}\hat{\mathbf r}_{dx}=(\mathbf X^* \mathbf X^T)^{-1}\mathbf X^* \mathbf d^T\tag{RLS.4} w^0=R^x−1r^dx=(X∗XT)−1X∗dT(RLS.4)
which is the optimal finite-sample solution for the following cost function
ξ ^ ( w ) = 1 N ∑ n = 0 N − 1 ∣ w T x ( n ) − d ( n ) ∣ 2 = 1 N ∥ w T X − d ∥ 2 ≜ 1 N ∑ i = 0 n ∣ e ( i ) ∣ 2 (RLS.5) \hat \xi (\mathbf w)=\frac{1}{N}\sum_{n=0}^{N-1} |\mathbf w^T \mathbf x(n) -d(n)|^2=\frac{1}{N}\|\mathbf w^T \mathbf X-\mathbf d\|^2\triangleq\frac{1}{N} \sum_{i=0}^n |e(i)|^2 \tag{RLS.5} ξ^(w)=N1n=0∑N−1∣wTx(n)−d(n)∣2=N1∥wTX−d∥2≜N1i=0∑n∣e(i)∣2(RLS.5)
Therefore, it is reasonable to measure error as ( R L S . 3 ) (RLS.3) (RLS.3).

Suppose at time n n n we know
X n : = [ x ( 0 ) , ⋯ , x ( n ) ] , d n : = [ d ( 0 ) , ⋯ , d ( n ) ] (RLS.6) \mathbf X_n:=[\mathbf x(0),\cdots,\mathbf x(n)], \quad \mathbf d_n:=[d(0),\cdots,d(n)]\tag{RLS.6} Xn:=[x(0),⋯,x(n)],dn:=[d(0),⋯,d(n)](RLS.6)
The solution to minimize ∥ w T X n − d n ∥ 2 \|\mathbf w^T \mathbf X_n-\mathbf d_n\|^2 ∥wTXn−dn∥2 is
w ^ n = ( X n ∗ X n T ) − 1 X n ∗ d n T = : Φ n − 1 θ n (RLS.7) \hat {\mathbf w}_n=(\mathbf X_n^* \mathbf X_n^T)^{-1}\mathbf X_n^* \mathbf d_n^T =: \boldsymbol{\Phi}_n^{-1} \boldsymbol{\theta}_n \tag{RLS.7} w^n=(Xn∗XnT)−1Xn∗dnT=:Φn−1θn(RLS.7)
where
Φ n : = X n ∗ X n T = ∑ i = 0 n x ∗ ( i ) x T ( i ) , θ n : = X n ∗ d n T = ∑ i = 0 n x ∗ ( i ) d ( i ) (RLS.8) \boldsymbol{\Phi}_{n}:=\mathbf{X}_{n}^{*} \mathbf{X}_{n}^{\mathrm{T}}=\sum_{i=0}^{n} \mathbf{x}^{*}(i) \mathbf{x}^{\mathrm{T}}(i), \quad \boldsymbol{\theta}_{n}:=\mathbf{X}_{n}^{*} \mathbf{d}_{n}^{\mathrm{T}}=\sum_{i=0}^{n} \mathbf{x}^{*}(i) d(i) \tag{RLS.8} Φn:=Xn∗XnT=i=0∑nx∗(i)xT(i),θn:=Xn∗dnT=i=0∑nx∗(i)d(i)(RLS.8)
To Update Φ n − 1 \Phi_{n}^{-1} Φn−1, we can apply the matrix inversion lemma:
Φ n + 1 − 1 = ( Φ n + x ∗ ( n + 1 ) x T ( n + 1 ) ) − 1 = Φ n − 1 − Φ n − 1 x ∗ ( n + 1 ) x T ( n + 1 ) Φ n − 1 1 + x T ( n + 1 ) Φ n − 1 x ∗ ( n + 1 ) (RLS.9) \boldsymbol{\Phi}_{n+1}^{-1}=\left(\boldsymbol{\Phi}_{n}+\mathbf{x}^{*}(n+1) \mathbf{x}^{\mathrm{T}}(n+1)\right)^{-1}=\boldsymbol{\Phi}_{n}^{-1}-\frac{\boldsymbol{\Phi}_{n}^{-1} \mathbf{x}^{*}(n+1) \mathbf{x}^{\mathrm{T}}(n+1) \boldsymbol{\Phi}_{n}^{-1}}{1+\mathbf{x}^{\mathrm{T}}(n+1) \boldsymbol{\Phi}_{n}^{-1} \mathbf{x}^{*}(n+1)}\tag{RLS.9} Φn+1−1=(Φn+x∗(n+1)xT(n+1))−1=Φn−1−1+xT(n+1)Φn−1x∗(n+1)Φn−1x∗(n+1)xT(n+1)Φn−1(RLS.9)
The update of θ n \boldsymbol{\theta}_{n} θn is straight forward:
θ n + 1 = θ n + x ∗ ( n + 1 ) d ( n + 1 ) (RLS.10) \boldsymbol{\theta}_{n+1}=\boldsymbol{\theta}_{n}+\mathbf x^*(n+1)d(n+1)\tag{RLS.10} θn+1=θn+x∗(n+1)d(n+1)(RLS.10)

Recursive Least Squares (RLS) algorithm:
Initialization: θ 0 = 0 \boldsymbol{\theta}_{0}=0 θ0=0 and P 0 = δ − 1 I , \mathbf{P}_{0}=\delta^{-1} \mathbf{I}, P0=δ−1I, where δ \delta δ is a very small positive constant.

P n + 1 : = P n − P n x ∗ ( n + 1 ) x T ( n + 1 ) P n 1 + x T ( n + 1 ) P n x ∗ ( n + 1 ) θ n + 1 : = θ n + x ∗ ( n + 1 ) d ( n + 1 ) w ^ n + 1 : = P n + 1 θ n + 1 \begin{aligned} \mathbf{P}_{n+1} &:=\mathbf{P}_{n}-\frac{\mathbf{P}_{n} \mathbf{x}^{*}(n+1) \mathbf{x}^{\mathrm{T}}(n+1) \mathbf{P}_{n}}{1+\mathbf{x}^{\mathrm{T}}(n+1) \mathbf{P}_{n} \mathbf{x}^{*}(n+1)} \\ \boldsymbol{\theta}_{n+1} &:=\boldsymbol{\theta}_{n}+\mathbf{x}^{*}(n+1) d(n+1) \\ \hat{\mathbf{w}}_{n+1} &:=\mathbf{P}_{n+1} \boldsymbol{\theta}_{n+1} \end{aligned} Pn+1θn+1w^n+1:=Pn−1+xT(n+1)Pnx∗(n+1)Pnx∗(n+1)xT(n+1)Pn:=θn+x∗(n+1)d(n+1):=Pn+1θn+1

Finite horizon: For adaptive purpose, we want an effective window of data.

Sliding window: Φ n \boldsymbol{\Phi}_n Φn and θ n \boldsymbol{\theta}_n θn based on only the last L L L samples
Φ n + 1 − 1 = ( Φ n + x ∗ ( n + 1 ) x T ( n + 1 ) − x ∗ ( n − L ) x T ( n − L ) ) − 1 θ n + 1 = θ n + x ∗ ( n + 1 ) d ( n + 1 ) − x ∗ ( n − L ) d ( n − L ) (RLS.11) \begin{aligned} \boldsymbol{\Phi}_{n+1}^{-1}&=\left(\boldsymbol{\Phi}_{n}+\mathbf{x}^{*}(n+1) \mathbf{x}^{\mathrm{T}}(n+1)-\mathbf{x}^{*}(n-L) \mathbf{x}^{\mathrm{T}}(n-L)\right)^{-1}\\ \boldsymbol{\theta}_{n+1}&=\boldsymbol{\theta}_{n}+\mathbf x^*(n+1)d(n+1)-\mathbf x^*(n-L)d(n-L) \end{aligned}\tag{RLS.11} Φn+1−1θn+1=(Φn+x∗(n+1)xT(n+1)−x∗(n−L)xT(n−L))−1=θn+x∗(n+1)d(n+1)−x∗(n−L)d(n−L)(RLS.11)
applying twice matrix inversion lemma → \to → doubles complexity, and we have to keep L L L previous data samples in memory.
Exponential window: scale down Φ n \boldsymbol{\Phi}_n Φn and θ n \boldsymbol{\theta}_n θn by a factor λ ≈ 1 \lambda \approx 1 λ≈1
Φ n + 1 − 1 = ( λ Φ n + x ∗ ( n + 1 ) x T ( n + 1 ) ) − 1 θ n + 1 = λ θ n + x ∗ ( n + 1 ) d ( n + 1 ) (RLS.12) \begin{aligned} \boldsymbol{\Phi}_{n+1}^{-1}&=\left(\lambda\boldsymbol{\Phi}_{n}+\mathbf{x}^{*}(n+1) \mathbf{x}^{\mathrm{T}}(n+1)\right)^{-1}\\ \boldsymbol{\theta}_{n+1}&=\lambda \boldsymbol{\theta}_{n}+\mathbf x^*(n+1)d(n+1) \end{aligned}\tag{RLS.12} Φn+1−1θn+1=(λΦn+x∗(n+1)xT(n+1))−1=λθn+x∗(n+1)d(n+1)(RLS.12)
Corresponds to
X n + 1 = [ x ( n + 1 ) λ 1 / 2 x ( n ) λ x ( n − 1 ) λ 3 / 2 x ( n − 2 ) ⋯ ] d k + 1 = [ d ( n + 1 ) λ 1 / 2 d ( n ) λ d ( n − 1 ) λ 3 / 2 d ( n − 2 ) ⋯ ] (RLS.13) \begin{aligned} \mathbf{X}_{n+1}&=[\mathbf{x}(n+1) \quad \lambda^{1 / 2} \mathbf{x}(n)\quad \lambda \mathbf{x}(n-1) \quad \lambda^{3 / 2} \mathbf{x}(n-2) \quad \cdots] \\ \mathbf{d}_{k+1}&=[d(n+1)\quad \lambda^{1 / 2} d(n) \quad \lambda d(n-1) \quad \lambda^{3 / 2} d(n-2) \cdots] \end{aligned}\tag{RLS.13} Xn+1dk+1=[x(n+1)λ1/2x(n)λx(n−1)λ3/2x(n−2)⋯]=[d(n+1)λ1/2d(n)λd(n−1)λ3/2d(n−2)⋯](RLS.13)
Exponentially weighted RLS algorithm:
P n + 1 : = λ − 1 P n − λ − 2 P n x ∗ ( n + 1 ) x T ( n + 1 ) P n 1 + λ − 1 x T ( n + 1 ) P n x ∗ ( n + 1 ) θ n + 1 : = λ θ n + x ∗ ( n + 1 ) d ( n + 1 ) w ^ n + 1 : = P n + 1 θ n + 1 \begin{aligned} \mathbf{P}_{n+1} &:=\lambda^{-1} \mathbf{P}_{n}-\lambda^{-2} \frac{\mathbf{P}_{n} \mathbf{x}^{*}(n+1) \mathbf{x}^{\mathrm{T}}(n+1) \mathbf{P}_{n}}{1+\lambda^{-1} \mathbf{x}^{\mathrm{T}}(n+1) \mathbf{P}_{n} \mathbf{x}^{*}(n+1)} \\ \boldsymbol{\theta}_{n+1} &:=\lambda \boldsymbol{\theta}_{n}+\mathbf{x}^{*}(n+1) d(n+1) \\ \hat{\mathbf{w}}_{n+1} &:=\mathbf{P}_{n+1} \boldsymbol{\theta}_{n+1} \end{aligned} Pn+1θn+1w^n+1:=λ−1Pn−λ−21+λ−1xT(n+1)Pnx∗(n+1)Pnx∗(n+1)xT(n+1)Pn:=λθn+x∗(n+1)d(n+1):=Pn+1θn+1

本文标签： adaptive filtering

版权声明：本文标题：Adaptive Filtering 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://m.elefans.com/xitong/1728890371a1178007.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

Adaptive Filtering

Content

Steepest Descent Algorithm

Stability / Convergence

Converge rate

Mean-square error

LMS Algorithm

Convergence in the mean

Convergence in the mean-square

Normalized LMS

Application: Noise Cancellation

Recursive Least Squares

更多相关文章

zuul-Error during filtering

signature=d885dd49007233dea5e127ddb158cf7e,User adaptive fuzzy commitment for signature template pro...

文献总结（1）：A Neural Collaborative Filtering Model with Interaction-based Neighborhood

Cassandra 3.0：查询模式与Allow Filtering

DeepCham: Collaborative Edge-Mediated Adaptive Deep Learning for Mobile Object Recognition

模拟喷泉码（入门一）：Near-Capacity Adaptive Analog Fountain Codes for Wireless Channels

系统开起bfe服务器,Win10系统下无法开启base filtering engine服务如何解决

Adaptive Filtering

发表评论

推荐文章

python爬取皮肤_用Python爬取英雄联盟（lol）全部皮肤

chrome 安装旧版本并禁止自动更新

2022-04-14 Windows11 安装WSA (win11安装安卓应用)

AI录音笔战场：搜狗与科大讯飞的对决

Lecture 8: Uncovering momentum space. Expectation values and their time dependence.

热门文章

win10远程桌面查看对方计算机名,详细教你win10远程桌面连接命令

CentOS7安装谷歌浏览器及解决google-chrome无法启动

wifi出现重大漏洞

从入门到精通 - Fayson带你玩转CDH

用U盘给MacBook安装Win7的方法

软件工程第一次作业：写一篇自己的博客

SIFT图像匹配原理及python实现（源码实现及基于opencv实现）

树莓派直接连接笔记本电脑方法

windows笔记本或者双网卡配置路由分流（内外网一起用）开发必备

压缩包 点击解压.html,快速恢复因压缩包解压后而丢失的文件数据|悟空恢复大师...

最新文章

解决常见Surface上不了应用商店的问题

Chrome扩展程序应用商店方式一

应用市场显示服务器错误的是,win10应用商店打不开服务器出错怎么办

win10企业版如何安装应用商店-默认没有应用商店

windows10 LTSC版本 安装应用商店及聚焦屏保

统信私有化应用商店解决方案发布，轻松解决应用分发管理难题

2021 App上架到 各应用商店（应用宝，华为，vivo，小米，AppStore）

Chrome应用商店打不开问题

Chrome浏览器应用商店扩展插件无法安装的解决办法

rancher应用商店的使用

[云原生专题-53]：Kubesphere云治理-操作-通过Kubesphere应用商店一键部署微服务应用-消息中间件RabbitMQ的安装与部署

deepin-appstore 编译及下载的应用无法改变路径说明

如何找到 Microsoft Store 下载的python文件位置

kubesphere发布应用到应用商店完整步骤

play商店 小米_小米应用商店和Google Play商店的简单对比

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

如何实现高效的treenode搜索算法

treenode与链表有何本质区别

在哪些场景下应优先考虑使用treenode

treenode在树形结构中的角色是什么

如何通过treenode实现二叉树

压缩包点击解压.html,快速恢复因压缩包解压后而丢失的文件数据|悟空恢复大师...

windows10 LTSC版本安装应用商店及聚焦屏保

2021 App上架到各应用商店（应用宝，华为，vivo，小米，AppStore）

play商店小米_小米应用商店和Google Play商店的简单对比

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载