GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 1

Chen Tang

Institute of Communication and Navigation

German Aerospace Center

Overview

• Introduction and Motivation

• MUD System Design

• GPU CUDA Architecture

• GPU-accelerated Implementation of MUD

• Simulation Result

• Summary

Overview

• Summary

Introduction and Motivation

• Bidirectional satellite communication

• Multi-user access issue• MF-TDMA (e.g. DVB-RCS)

• Multiuser Detection (MUD) • Increase spectrum efficiency• Few practical MUD implementations for satellite systems

• High complexity• Sensitive to synchronization and channel estimation errors

Introduction and Motivation

• NEXT project - Network Coding Satellite Experiment paved the way to the GEO research communication satellite H2Sat.• H2Sat: explore and test new broadband (high data rate) satellite

communication • NEXT Exp 3: Multiuser detection (MUD) for satellite return links

Packet A3 Packet A2 Packet A1

Packet B3 Packet B2 Packet B1Multiuser Detection

Packet A3 Packet A2 Packet A1Packet B3 Packet B2 Packet B1

• Two users transmit at the same frequency and time

• A transparent satellite return link

• Main objectives:• Develop a MUD receiver in SDR• Increase decoding throughput

real-time processing

Overview

• Summary

MUD System Design

• Multiuser detection (MUD) complexity• Optimal MUD proposed by Verdú:

• exponential complexity on number of users

• Suboptimal MUD algorithms:• e.g. PIC; SIC

• We use Successive Interference Cancellation (SIC)• Linear complexity on number of users• Straightforward extension to support more users

MUD System Design

• Successive Interference Cancellation (SIC)• Sequentially decode users & cancel interference• Multi-stage SIC improve PER• Error propagation• Sensitive to channel estimation errors• Phase noise

• Expectation Maximization Channel Estimation (EM-CE)

MUD System Design

• Real-time implementation of MUD is challenging

• Processing bottlenecks:• LDPC channel decoding• EM channel estimation• Resampling and interference cancellation

• Programmable hardware devices• DSP; FPGA (hard to develop, low flexibility)• Attractive alternative: GPGPU

• High performance• High flexibility

Overview

• Summary

• GPUs are massively multithreaded multi-cores chips• Image and video rendering• General-purpose computations

Ref: Nvidia CUDA_C_Programming_Guide 2013

Nvidia Tesla c2070:448 cores; 515 GFLOPs of double-precision peak performance

• GPU is specialized for computation-intensive, highly parallel computation (exactly what graphics rendering is about)• More transistors for data processing rather than data caching and flow control

ALU: Arithmetic Logic Unit

• Limited number of concurrent threads• Server with four hex-core processors 24 concurrent active threads (or 48, if HyperThreading supported)

• Much more concurrent threads• Hundreds-cores of processor• more than thousands of

concurrent active threads

CUDA Architecture

• In Nov. 2006, first GPU built with Nvidia’s CUDA architecture

• CUDA: Compute Unified Device Architecture• Each ALU can be used for general-purpose computations• All execution units can arbitrarily read and write memory• Allows to use high-level programming languages

(C/C++; OpenCL; Fortran; Java&Python)

CUDA Architecture

• Serial program with parallel kernels • Serial code executes in a host (CPU) thread• Parallel kernel code executes in many device (GPU) threads• Host (CPU) and device (GPU) maintain separate memory spaces

LDPC Decoder on GPU

• Assign one CUDA thread to work on each edge of each check node

U1:n = 4800k = 3200

𝐶 𝑗→𝑉 𝑖

C1 C2 C3 Cn-k

V1 V2 V3 V4 Vn

…...

U2:n = 4800k = 2400

LDPC Decoder on GPU

• Assign one CUDA thread to work on each edge of each check node

• Speedup: 10x

• Throughput: 1.6Mbps (coderate: 2/3, )

U1:n = 4800k = 3200

𝐶 𝑗→𝑉 𝑖

C1 C2 C3 Cn-k

V1 V2 V3 V4 Vn

…...

U2:n = 4800k = 2400

Overview

• Summary

MUD receiver on GPU

• Processing bottlenecks:• LDPC channel decoding• EM channel estimation• Resampling and interference cancellation• Data transfer between host and device memory

(144GB/s of Nvidia Tesla vs. 8GB/s of PCIe*16)

• All parts of each single user receiver and interference cancellation on GPU

• Minimize the latency of intermediate data transfer between host and device memory

GPUCPU

Overview

• Summary

Simulation Setup

• GPU Nvidia Tesla c2070 (1.15GHz)• Comparison benchmark: Intel Xeon CPU E5620 (2.4GHz)

• BPSK modulation• Two user terminals (power imbalance: U1 3dB higher than U2)• Channel coding: LDPC

• Irregular Repeat Accumulate • Blocklength: 4800 bits• U1 coderate: 2/3 , U2 coderate: 1/2

• Baud-rate: 62500 symbols/second real-time threshold: ca. 85ms (66 kbps)

Simulation Result

Real-time threshold

Overview

• Summary

Summary

• SDR implementation of MUD receiver• High flexibility and low cost• Extension to support more users

• GPU acceleration• 1.8x ~ 3.8x faster than the real-time threshold• Still space to improve• New GPU better performance

• GPU CUDA is very promising for powerful parallel computing• Low learning curve• Heterogeneous: mixed serial-parallel programming• Scalable

• CUDA-powered Matlab (MATLAB® with Parallel Computing Toolbox; Jacket™ from AccelerEyes)

• Days/weeks of simulation hours

• “GNU Radio is a free & open-source software development toolkit that provides signal processing blocks to implement software radios”

• Software Architecture• Main processing of the blocks are in C++

functions processed by CPU on PC

GNURadio

Python Module

C++ Shared Library

Python Script /GNU Radio Companion

GNURadio + CUDA

• , • CPU LDPC Decoder

• Throughtput: • GPU LDPC Decoder

• Throughput:

Irregular Repeat Accumulate LDPC(IRA)

n = 4800k = 2400

CUDA core

monster

CUDAmonster

Thank you !Q&A ?

• Advantages of GPU: • High computational processing

power• High memory bandwidth• High flexibility

• Drawbacks of GPU: • Non stand-alone device• Bad at serial processing• Separate memory space• Additional hands-on effort

Comparison of total processing time of MUD between CPU and GPU

GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links > Sino-German...

Transcript of GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links > Sino-German...

GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links > Sino-German...

Documents

Transcript of GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links > Sino-German...

Gwo-Huang Chen R.C. Tang E.W. Price - USDA

SDR-342/SDR-342T - Soundstream Technologies · SDR-342/SDR-342T Flip-down/Detachable 3.4” Monitor MP3/MP4 Player AM/FM Radio Built-in TV tuner (only for SDR-342T model) Owner’s

JO...Longgang Ye, Chaobo Tang, Heng Liu, and Yongming Chen 3909: Microwave Dielectric Properties and Reduction Behavior of Low-Grade Pyrolusite: Fei He, Jin Chen, Guo Chen, Jinhui

Biogen Idec Acquisition Thesis November 30, 2007 Umesh Baheti (MBA) Jason Chen (MD) Akshay Dhiman (MBA) Huaping Tang (PhD)

Tang and Song China Tang and Song China. The Tang Dynasty Expands China Tang Rulers Create a Powerful Empire Tang Rulers Create a Powerful Empire Tang.

Chair’s Message - iam2018w.conf-online.orgiam2018w.conf-online.org/history/IAM2014W.pdf · Chair’s Message International ... Ling-Lang Tang Shu-Hui Chen Yu-Hua Chen Yuan Ze university

Group 1: James Raio Mark Tolmach Willie Tang Jessica Zukhovich Semona Skvirsky Andrew Chen.

Running head: Sumoylation and excess copper stress ... · 2 Arabidopsis SUMO E3 ligase SIZ1 is involved in excess copper tolerance Chyi-Chuann Chen 1, Yong-Yi Chen 1, I-Chien Tang

WASHINGTON UNIVERSITY IN ST. LOUIS Dissertation ...ychen/public/Huang.pdf · Dissertation Examination Committee: Yixin Chen, Chair Christopher Gill Katz Norman Yinjie Tang Kilian

Ke H, Chen D, Li X, Tang Y, Shah T, Ranjan R. Towards ...

GOING GLOBAL™ CONSULTING Jane Chen, Osvaldo Morales, Nicole Linderman, Xiaosu Tang Solving China’s greatest real estate dilemma.

Bhatia, Manoj Chen, Yin Yin Clark, Neo Ghabraei, Ali Tang, John Tran, Wilson Romero, Jessica.

HDPE EnergyTransmission Energy Transmission.pdf · astm f2619 / astm 2513 sdr 11 sdr 17 sdr 9 sdr 13.5 sdr 21 sdr 26 sdr 32.5 astm f2619, astm d2513, api 15 le astm d3350 ppi (tr-4)

Recent SDR ProjectsRecent SDR Projects - NTMS SDR Projects.pdf · Recent SDR ProjectsRecent SDR Projects Dave Robinson WW2R. W5HN North Texas DTV USB Dongles for Hams? NTMS ... •

Complementary and Integrative Oncology in the Cross ... · Raﬀaele Capasso, Italy Il-Moo Chang, Republic of Korea Yunfei Chen, China Kevin W. Chen, USA Juei-Tang Cheng, Taiwan Jen-Hwey

Projecting Australian Mortality using the CMI Mortality ...€¦ · Projecting Australian Mortality using the CMI Mortality Projections Model. Chen Tang, Bridget Browne, Aaron Bruhn

CMHB221 Chinese Herbal Formulae key clinic symptoms/signs for Er Chen Tang , Wen Dan Tang , Ban Xia Bai Zhu Tian Ma Tang , Zhi Sou San , Bao He Wan and Jian Pi Wan All key clinic symptoms

Ancortek-Product-Overviewancortek.com/wp-content/uploads/2017/05/Ancortek... · • SDR-KIT 2500B Advanced Kits • SDR-KIT 580AD • SDR-KIT 980AD • SDR-KIT 2400AD • SDR-KIT

UHF SDR TransceiverUpconverter HF Transceiver LNA ... VHF/UHF SDR System VHF/UHF SDR Transceiver and ½ W Interface 14. UHF/Microwave SDR System UHF/Microwave SDR Transceiver and Antenna

Ultrasound guidance: Improved Outcomes? · 4. Chen MJL, Lew HL, Hsu TC, Tsai WC, Lin WC, Tang SFT, Lee YC, Hsu RCH, Chen CPC. Ultrasound-guided shoulder injections in the treatment