Ultra sound solution Impact of C++ DSP optimization techniques.

Post on 11-Jan-2016

232 views 0 download

Tags:

Transcript of Ultra sound solution Impact of C++ DSP optimization techniques.

Ultra sound solution

Impact of C++ DSP optimization techniques

Research Team discussion Ultra-sound probe (20 MHz) that sends out

signals into body that reflect off moving blood cells in (Artery? Vein?)

Ultra-sound frequency received is Doppler shifted compared to transmitted frequency Same as sound when ambulance goes by. Higher

if approaching, lower if receding They get the positive frequencies (towards)

on the left audio channel and negative frequencies (away) on the right audio channel.

04/21/23.ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 2 / 33

Picture looks like this

Note that the display loses all direction information Can I help them to output the maximum frequency?

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 3 / 33

Captured audio signal

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 4 / 33

Engineering Problems

Problem 5 – Different amplitudes common

Problem 6 – Why are funny dead spots not lining up in left and right channels? Handling stereo not mono signals

Incorrect labeling / misinterpreation

Problem 7 – How to remove dead-spots?

Max frequency – definition 1 Frequency

below which X% of the frequencies fall

Noisy signal for large thresholds

> 80%

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 5 / 33

After XPI Stage 2 Have a working algorithm concept Engineering problem 1 – Complex math (a + jb) on SHARC! Engineering Problem 2 – Define maximum frequency

zillions of blood cells – therefore distribution of frequencies Workable prototype – discuss more with customer

Engineering Problem 3 – SHARC D/A can’t handle DC signal Workable prototype – discuss more with customer

Engineering Problem 4 – Can SHARC handle all this in real-time?

Problem 5 – Is different amplitudes of input channels common? Yes

Problem 6 – Why are funny dead spots not lining up in left and right channels? Artifact – mislabeled and misinterpreted sampled

Problem 7 – How to remove dead-spots? – Discuss more with customer

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 6 / 33

ProcessBlockDONEOUTSIDEINTERRUPT

AVOIDS RACE

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 7 / 33

Real life problem -- Stereo

Minor changes to Audio Premptive Task

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 8 / 33

Make “C – code more general Moved buffer[ ] to external files Unknown size of arrays being

processed

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 9 / 33

Switch to Release mode Switch to optimizing compiler

(ReleaseNWC) means can no longer set breakpoints – Fix with these steps

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 10 / 33

First look at code

Timing -- software loop with r2 as loop counter – test at end

N * (10 – 1) cycles (jump is not db)

-1 for 1parallel instruction

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 11 / 33

UseCompilerInfo button

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 12 / 33

3 Stalls – 2 on software jump. 1 on ?

Obvious things to do We are already processing left and

right channels in one program Switch to left audio in dm memory and

right audio in pm memory

Need to do Make right buffers ‘pm’ Change prototype of function to padd pm

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 13 / 33

As expected 2 cycles saved

Parallel dm and pm reads and writes

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 14 / 33

Why software loop? Switch does know what to do about

size of loop so can’t oprtimize loop

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 15 / 33

THIS PRAGMAIS A CONTRACTBETWEEN THEDEVELOPER AND COMPILEDON’T LIE

This does not compile

Pragma variables not handled by preprocessor

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 16 / 33

Variable as end of loop Compile will not optimizewhen loop parameter is declared external, or internal or static

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 17 / 33

Loop parameters all constantsknown to compiler

Drop from 8 cycles to2 cycles as compiler knows enough to switch to hardware loop control – STALLS FROM JUMP GONE

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 18 / 33

Where am I getting all my info?

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 19 / 33

Can we switch to SIMD mode

VECTORIZATION

MAY NOT BE POSSIBLE IF COMPILER DOES NOT KNOW ABOUT ALIGNMENT OF ARRAYS

(How arrays placed in memory)

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 20 / 33

Impact of vectorization Before -- loop count was 0x80 With memory operations of the form

r2 = dm(i4, m6) where m6 = 1 meaning code is doing r2 = i4+

+;

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 21 / 33

New instructions – SIMD mode

Bit set mode1 0x200000 (bit clr mode 1)

Processor doing r2 = dm(i5, 2)

Same as r2 = dm(i5, 1) AND s2 = dm(i5, 1)

Loading two registers

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 22 / 33

Try using #pragma inline BEFORE AFTER (20 cycles

faster?)

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 23 / 33

C++ showing out of order execution

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 24 / 33

WARNING

Lets do “inline” ProcessOneBlock( ) is called by four

subroutines – lets in

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 25 / 33

Mixed mode view is interesting

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 26 / 33

Mixed Mode Out of order execution with 4 copies of the code for

DoCopyBlock( ) (one for each of Process 0, Process1, Process2, Process 3)

NO CODE OF ProcessOneBlock( )

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 27 / 33

Speed improvement Moving from software loop and using dm and pm

memories caused a change from 8 cycles / pt to 2 cycles for two points processed in SIMD (4 CALLS * 7 CYCLES SAVED * N POINTS PROCESSED)

Moving to IN_LINE causes a change of around 120 cycles for each subroutine call (4 CALLS * 120 CYCLES SAVED)

N = 128 -- (4 * 1800 to 4 * 120) 480 Mhz processor -- 15 us to 1 us LESSON LEARNT – SPEND YOUR TIME OPTIMIZING

THE LOOPS – REST IS SMALLER AND GETS SMALLER WITH LARGER N

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 28 / 33

Otherimprovementsdepend oncode Characteristicsspecifics

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 29 / 33

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 30 / 33

Profile guided optimization

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 31 / 33

Memory alignment can be important

After first char fetch, system and move to move 8 chars in SIMD

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 32 / 33

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 33 / 33

Conditional code (manual PGO)

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 34 / 33

Correct ways to process loops

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 35 / 33

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 36 / 33

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 37 / 33

#pragma all_aligned #pragma loop_unroll N #pragma SIMD_for #pragma align num #pragma alignment_region( and

#pragma alignment_region_end

04/21/23ENCM515 – Ultrasound ProblemCopyright smithmr@ucalgary.ca 38 / 33