Design Exploration of a Human-machine Interface (HMI) Application

Design Exploration of a Human-machine Interface (HMI)

ApplicationFrancis Li

Sam Madden

The Application

• Data glove interface– Wired, bulky

• SmartDust scenario– A mote on each fingertip

• Investigate implementations• Explore design alternatives

Proof-of-Concept Prototype

• By SmartDust group– Atmel AVR Microprocessor– RFM TR1000 Radio– 6 accelerometers– Host PC performs processing

• Analysis– Power: 45 mW measured– Continuous operation of processor,

accelerometers, communication with host

Application Analysis

• Processing (on PC)– Do 20 times per second, for each accelerometer

• Read in X and Y samples (10 bits each)• Compute rolling average to smooth input data• Convert averages to polar coordinates

– Dominates cost: sqrt, acos, atan– Secondary cost: floating point operations

– Periodically, calculate gesture via simple template matching (static hand positions)

Application Analysis (cont)

• Communication (from Atmel to PC)– 20 samples / sec • 6 accelerometers • 4

bytes/sample 480 bytes/sec– 115.6 kb/sec RF link– Radio = 12mA @ 3V, when transmitting

1.2 mW for radio alone• Real world power >> 1.2 mW, due to

software and analog overhead( real world analysis later )

Optimization Process

• Match Application to HW


• Match Application to HW

• Match Hardware to Application


• Match Application to HW– Local computation to reduce communication



• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point




• Match Hardware to Application– Distributed vs. Centralized



• Match Hardware to Application– Distributed vs. Centralized– TI vs. Atmel



• Match Hardware to Application– Distributed vs. Centralized– TI vs. Atmel– DSP

Communication vs.Computation

• Estimates of local processing cost on Atmel (via simulation of GCC program)

• Average: 2223 instr. x 2• CalcPolar: 19017 instr.

2.83x106 instructions• Report gesture once per second

FindGestureError: 5444 instr.10 gestures, 6 accelerometers 5444 • 60 3.26x105 instr.

• Memory operations are 2 cyles/instruction• Total cycles ~ 3.7M 4Mhz 13.5 mW• Communication = 8 bits/sec negligible cost

Loop 6•20 / sec

Communication vs.Computation 2

• Cost of communication to Host PC (measured)

• 4317 nJ/bit• From Culler, Hill, Szewczyk, Woo, “System

Architecture For Networked Sensors.” 4317nJ/bit • 480 bytes/sec • 8 = 16.57 mW

• Processor still sucks power– Current implementation requires 13.5mW– Using sleep, only 1.17 mW 17.74 mW total

Distributed vs. Centralized

• Move some processing to each sensor– 6 processors

• Each computing average, polar transform• Transmitting 4 x 8 = 32bits once/second

• Using Atmel processor on each mote– Computation

• ~ .5M cycles/sec 2mA @ 2.7V 5.4mW– Communication

• Very small: 4317nJ • 32 = .13 mW– 5.53 mW/mote = 33.2 mW total (Bad Idea!)

TI Microcontroller Evaluation

• A microcontroller with better specs– MSP430P112 330 A/Mhz active mode

1.5 A standby (6 ns wakeup)• Used IAR Systems compiler, profiler,

development environment• Analysis

– Centralized 3.3V, 4 Mhz: 3.8 mW– Distributed 2.5V, 1 Mhz: 0.48 mW per mote

• Six processors 2.9 mW

TI DSP Evaluation• TMS320C54x• Used TI Code Composer Studio, compiler,

simulator• Power

– Active Mode, 3.3V 10 Mhz: 33 mW– IDLE1, 0.36 mW

• Analysis– Centralized: 7.8 mW– Distributed: 1.6 mW per mote

• Six processors = 9.6 mW total

TI DSP Evaluation Part 2

• TMS320C55x (two parallel MACs)• Same tools, with C55x compiler, simulator• Power: No details available...

– Advertised: 0.9V, 0.05 mW/Mhz• Analysis

– Centralized: 1170240 cycles (vs 2290440 54x)• 2 Mhz: 0.1 mW

– Distributed: 195040 cycles (vs 381740 54x)• 1 Mhz: 0.05 mW• Six processors: 0.3 mW total

Other Explorations

• Hand optimized code– Possible to massively reduce computation cost– FP/Transcendentals conspicuously painful– Outside scope of our exploration

• Radio Hardware– Bluetooth ~ 100 times more efficient

• Reconfigurable Computing• Other circuitry (e.g. accelerometers)

Results Summary• Cost, in mW of various implementations

17.74 using sleep mode, 28 without

• 31/104 % improvement with same hardware• 170x improvement with new hardware PC Centralized Distributed

Atmel 17.74/28 13.5 33.2TI - 3.8 2.9DSP 1 - 7.8 9.6DSP 2 - 0.1 0.3

Conclusions

• By finding better mappings from SW HW Application, big performance gains are possible.

• Effective use of local processor resources can reduce communication overheads, which are significant.

• DSPs and other specialized processors can be a big win and don’t require hand-coded assembly or reconfigurable design

Design Exploration of a Human-machine Interface (HMI) Application

Documents

Transcript of Design Exploration of a Human-machine Interface (HMI) Application