Multi-channel Echo Cancellers for Packet Telephony using a low cost DSP
description
Transcript of Multi-channel Echo Cancellers for Packet Telephony using a low cost DSP
Multi-channel Echo Multi-channel Echo CancellersCancellers
for Packet Telephony using a low cost DSPfor Packet Telephony using a low cost DSPKrishna V V, Jitendra Rayala,Joseph Yau, Brendon Slade
DSP Products DivisionLSI Logic
Plan
• Line Echo Cancellation Overviewo Echo Sources and Cureso EC for packet Voice
• Echo Canceller Internals• Multi-channel EC on LSI403LP• Summary
Echo Sources in Telephony
Echo arises due to impedance mismatches at hybrids.
Near Echo for A: (side tone)
Leakage at AH1 + Reflection at AH2.
Far Echo for A:
Leakage at BH2 (major component) + Reflection at BH1.
Tx-AAH1
Rx-A
Tx-B
Rx-B
AH2
Central Office, A
Central Office, B
4-wire segment
2-wirelocal loop
2-wirelocal loop
BH2 BH1
When is EC critical?
The need for EC is determined by both
• Echo level, or “Echo Return Loss” (ERL)
• Round-trip path delay
Typical ERL values range between 6dB to 12dB.
Typical round-trip network delays:
POTS (Local Calls)POTS (LD, terrestrial)
POTS (LD, satellite)Wireless (GSM, CDMA,…)
Packet Voice
Less than 10ms30-70ms300-500ms100-180ms120-200ms
Delay in Packet Networks
Overall delay break-up:
Revised from “Internet Telephony: Going like crazy”, byG. Thomsen, Y. Jani, IEEE Spectrum, May 2000.
Speech Codec
Packetization
I/O Buffers
Transmission
Jitter Buffer
Total
0.2 - 40ms
10 - 30ms
20 - 60ms
20 - 150ms
50 - 150ms
100 - 400ms
Low delay for PCM, ADPCM, G.728, BV16, …
Typical: 120 – 200ms
Tackling Echo: Telephony Standards
G.164
G.161
G.165
1976 1980 1984 1988 1992 1996 2000
G.168
Echo Suppressors
Line Echo Cancellers
Indicates specification release / revision
CCITT ITU
2004
EC: The Past Decade
Long distance POTS networks (incl. satellite downlinks)
Cellular networks
~ (12 - 24)
~ 120 mW
~ $20 - 25
Long distance POTS networks; Cellular networks
Packet voice networks
~ (24 - 256)
~ 10 - 25 mW
~ $4 - 5
Cellular networks; Packet voice networks
Long distance POTS networks
~ (64 – 672+)
~ 5 - 15 mW
~ $2
EC channels per board
Power per channel*
Cost per channel*
Major markets
Other markets
Early 90’s Late 90’s Early 00’s
* Excludes overall system power consumption / costs.
EC for Packet Voice
Question: Why EC in the Gateway?
CPE
IP Cloud
Gateway
PSTN Cloud
EC
EC
EC
EC
EC
Central Office
2-wire link“4-wire” link
EC in Packet Networks
EC at CPE:• Short tails sufficient (~ 16 ms) on FXS ports• Longer tails (32 - 64 ms) used on FXO ports• As few as 2 - 24 channels, as many as a few
100’s, depending on the CPE
EC at PVG:• Longer tail support (32 - 128 ms)• As many as 8K to 30K channels
Packet Voice: CPE Detail
PCM Interface
A-Law
U-Law
Linear
FXO
FXS
PRI
…
Frame / Packet
Interface
To RTP packetization
From Jitter Buffer
Voice Encoder G.711, G.726, G.729A, G.723.1,
GSM AMR, iLBC, BV16, …
VAD
Tone DetectorDTMF, V.21
G.168Line Echo Canceller
Voice Decoder / PLCG.711, G.726, G.729A, G.723.1,
GSM AMR, iLBC, BV16, …
CNG
Tone GeneratorDTMF, CPT
Caller ID TxType I, II
EC – A Black-box View
(LRES or LRET)
G.168
EC
RIN(from far-end)
SIN SOUT
(LRIN)
(LECHO)
Near-end signal
ROUT(to near-end)
Control Status
Echo
ACOM = LRIN – LSOUT (near-end signal absent)ERL = LRIN – LECHO
G.168 EC Internals
Control Logic (Adaptation,
NLP)
EC Enable
Disable
Rin
SoutSin
Rout
V.25 Tone Detector;
Holding-band Logic
NonlinearProcessor (NLP);
Comfort Noise (CNI)
Some EC Design Options
“Full tail”“Tail independent” or “Floating window”
Single filter with robust control
Double filter with simpler controls
Time domain
Transformdomain
Subbandstructure
Full Tail / Floating Window
128 ms
Actual echo path
Full tail solution
2-window solution
12 ms 12 ms
Key Performance Issues
• Fast initial convergence• Low steady-state residual• Fast tracking (for occasional path changes)
Adaptation method
Determined by:Big Questions -- How fast?How low?
Key Performance Issues (Cont’d)
• Robust to near-end talk• Robust to double-talk Adaptation Control
Determined by:
• Near-end voice quality (measured by PESQ, MOS, ...) NLP Module
• Near-end back-ground noise contrast CNI Module
Adaptation Options
• NLMS (sample rate or block adaptive)• Enhanced NLMS variants (decorrelation,
variable step size, PNLMS, PNLMS++)• Fast affine projection (FAP)• Fast RLS (FTRLS, QR-RLS, …)• Other methods also exist …
Costs of Adaptation
NLMS
APA (order P)
FAP (order P)
PNLMS
FTRLS
MACs / Sample*
O(2.N)
O(2.PN) + O(7.P2)
O(2.N) + O(20.P)
O(4.N)*
O(8.N)*
Data Memory / Channel
~ O(2.N)
~ O(2.PN)
~ O(2.N)
~ O(2.N)
~ O(7.N)
* MACs/sample not a good cost measure for PNLMS and FTRLS.
LSI403LP/LC DSP
• 120 MHz - 200 MHz clock
• ZSP400 core, up to 4 instructions per cycle
• Dual MACs can perform two 16x16 or one 32x32 operation(s) per cycle
• 48K words of on-chip SRAM (configurable as 16K:32K or 24K:24K or 32K:16K of PM and DM)
• Two serial ports with TDM support
• As low as $4.00 in volume.
Price-Performance Balance:
EC Complexity Break-up
NLMS based EC can be split into 3 functional parts:
1. FIR filtering Typically 15-25% of the processor load; varies with tail length
2. NLMS filter update Typically 25-35% of the processor load; varies with tail length.
3. Overall Control Logic This has a few loops for IIR filtering, division, as well as many if-then-else-type of decisions. Also includes V.25 tone detector, comfort noise generator, etc. Typically 40-60% of the processor load; varies with tail length.
EC Complexity
* Other logic includes several IIR filters, conditional branching, data buffer management, etc. for update control, NLP, CNI and V25 tone disabler.
FIR Filtering:
Filter Updates:
Other Logic*:
Ops / SampleData
Memory
O(N) O(N)
O(N)-O(2N) O(N)
~ c ~ M
For lattice structures, filtering and update stage break-up not possible.
Costs are almost constant (depends very weakly on filter length, N).
NLMS based Example:
Multi-MAC Processors:
FIR Filtering:
Filter Updates:
Other Logic:
Load for 64ms EC
~ 12 – 13 (MHz)
~ 8 – 9 (MHz)
~ 5 – 6 (MHz)
O(N)
O(N)-O(2N)
~ c
O(N/2)
O(N/2)-O(N)
~ c
O(N/4)
O(N/4) -O(N/2)
~ c
MACs / cycle: 1 2 4
Percentage load for “other logic” is significant.
FIR Filtering Loop
ZSP400 Code snippet:
L_ECFilter_Loop: lddu r2, r14, 2 ! r2 = Y[k], r3 = Y[k+1] lddu r4, r13, 2 ! r4 = A[k], r5 = A[k+1] mac2.a r2, r4 ! r1r0 = r1r0 + r2*r4 + r3*r5 agn0 L_ECFilter_Loop
Approximately N/2 cycles per sample, as it can be implemented using lddu / lddu / mac2.a instruction sequence.
LSI403LP/LC EC Implementations
Two Versions:• Full-tail, 64ms echo canceller• Windowed version (up to 3 discrete echoes)
Code Size:
Data Memory:
Channel Data:
I/O Buffers:
Load (MHz):
1.1 K
0.4K
1.3K
0.12K
8.6
3.8 K
1.2K
1.6K
0.12K
6.1
Full-Tail Windowed
Notes:
All memory in 16-bit words.
I/O buffers are for 2.5 ms frame size
Numbers subject to change (on-going revisions!).
Multi-channel EC Costs
• Processor load (MHz or gates):o Increases almost linearly with channel counto For large channel counts savings possible
• Data Memory (Channel object):o Increases linearly with channel counto On-chip memory is expensive, but reduces power
consumption, offers easier scalability with multiple cores
24 Channels on LSI403LP/LC
Resources for 24 channels:
Code Size:
Data Memory:
Load (MHz):
1.1 K
35 K
208
3.8 K
42.5 K
147
Full-Tail Windowed
Notes:
Processor load is worst case (all channels performing adaptation).
Extra data memory requirements can be met by the free program memory on LSI403LP/LC. The required swap operations (for some channels only) are estimated to add an extra load of 6.5 MHz in case of the windowed version.
Multi-chip packaging using LSI403WLP provides higher channel density. For example, a dual-processor package can support 32 channels, at a lower clock, without requiring any memory swaps.
Thanks!
Questions?
Summary
• LSI403LP or LSI403LC can be used to support as many as 24 channels of LEC with 64ms tail, without any external SRAM.
• Very low cost per channel (under $0.50 per channel).• Multi-chip packaging for higher channel density.• Custom ASICs can be built for further cost reduction.
Higher performance options using ZSP G2 cores also possible.
Backup Slides
Delay: G.114 Guidelines
One-way Delay
< 150ms
150-400ms
> 400ms
ITU-T Classification (with echo “adequately controlled”)
Mostly acceptable.
Acceptable (maybe).
Unacceptable (in general).
Terrestrial, national long distance PSTN: < 50ms
Terrestrial, international PSTN: ~ 100ms
Cellular: Mobile to PSTN: ~ 150ms
Cellular: Mobile to Mobile: ~ 300 – 400ms
TYPICAL DELAYS
Echo Level and Delay
10
20
30
10 30 50 70 90
Delay (ms)
Re
qd.
ER
L (
dB
)
ERL data from Table 1.1, “Acoustic Signal Processing for Telecommunication”,S. L. Gay and J. Beneste (Ed.s), Kluwer Academic Publishers (2000)
Dealing With Delay (Echo)
• One-way delays in packet voice networks > 100ms• As recommended in ITU-T G.131, a network echo
canceller (EC) is required.• EC required only for:
o PSTN interfaces on packet voice gateways (PVGs)o Analog phone (SLIC) interfaces on CPEs
• EC not required for digital IP phoneso AEC may still be needed (for hands-free operation)
• EC tail length – a much misused parameter• ITU-T G.168 EC was initially developed for PSTN.
Can it be applied as-is for packet voice networks?
CPE: Customer Premises Equipment, PVG: Packet Voice Gateway
Quality of Service (QoS)
Voice Codec
Voice quality(MOS, PESQ, R-value, etc.)
End-End Delay
Packet Loss Concealment
Echo Canceller
“Lost” packets PLC quality
Delay induced quality loss
EC Quality
VAD / CNG
Speech clipping, comfort noise
quality
Codec delay
Tx / Rx buffers
Packetization
Network Latency
Line echo
Jitter Buffer
JB SizeJB delay