Design Exploration of a Human-machine Interface (HMI) Application
description
Transcript of Design Exploration of a Human-machine Interface (HMI) Application
Design Exploration of a Human-machine Interface (HMI)
ApplicationFrancis Li
Sam Madden
The Application
• Data glove interface– Wired, bulky
• SmartDust scenario– A mote on each fingertip
• Investigate implementations• Explore design alternatives
Proof-of-Concept Prototype
• By SmartDust group– Atmel AVR Microprocessor– RFM TR1000 Radio– 6 accelerometers– Host PC performs processing
• Analysis– Power: 45 mW measured– Continuous operation of processor,
accelerometers, communication with host
Application Analysis
• Processing (on PC)– Do 20 times per second, for each accelerometer
• Read in X and Y samples (10 bits each)• Compute rolling average to smooth input data• Convert averages to polar coordinates
– Dominates cost: sqrt, acos, atan– Secondary cost: floating point operations
– Periodically, calculate gesture via simple template matching (static hand positions)
Application Analysis (cont)
• Communication (from Atmel to PC)– 20 samples / sec • 6 accelerometers • 4
bytes/sample 480 bytes/sec– 115.6 kb/sec RF link– Radio = 12mA @ 3V, when transmitting
1.2 mW for radio alone• Real world power >> 1.2 mW, due to
software and analog overhead( real world analysis later )
Optimization Process
• Match Application to HW
Optimization Process
• Match Application to HW
• Match Hardware to Application
Optimization Process
• Match Application to HW– Local computation to reduce communication
• Match Hardware to Application
Optimization Process
• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point
• Match Hardware to Application
Optimization Process
• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point
• Match Hardware to Application– Distributed vs. Centralized
Optimization Process
• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point
• Match Hardware to Application– Distributed vs. Centralized– TI vs. Atmel
Optimization Process
• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point
• Match Hardware to Application– Distributed vs. Centralized– TI vs. Atmel– DSP
Optimization Process
• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point
• Match Hardware to Application– Distributed vs. Centralized– TI vs. Atmel– DSP
Communication vs.Computation
• Estimates of local processing cost on Atmel (via simulation of GCC program)
• Average: 2223 instr. x 2• CalcPolar: 19017 instr.
2.83x106 instructions• Report gesture once per second
FindGestureError: 5444 instr.10 gestures, 6 accelerometers 5444 • 60 3.26x105 instr.
• Memory operations are 2 cyles/instruction• Total cycles ~ 3.7M 4Mhz 13.5 mW• Communication = 8 bits/sec negligible cost
Loop 6•20 / sec
Communication vs.Computation 2
• Cost of communication to Host PC (measured)
• 4317 nJ/bit• From Culler, Hill, Szewczyk, Woo, “System
Architecture For Networked Sensors.” 4317nJ/bit • 480 bytes/sec • 8 = 16.57 mW
• Processor still sucks power– Current implementation requires 13.5mW– Using sleep, only 1.17 mW 17.74 mW total
Optimization Process
• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point
• Match Hardware to Application– Distributed vs. Centralized– TI vs. Atmel– DSP
Distributed vs. Centralized
• Move some processing to each sensor– 6 processors
• Each computing average, polar transform• Transmitting 4 x 8 = 32bits once/second
• Using Atmel processor on each mote– Computation
• ~ .5M cycles/sec 2mA @ 2.7V 5.4mW– Communication
• Very small: 4317nJ • 32 = .13 mW– 5.53 mW/mote = 33.2 mW total (Bad Idea!)
Optimization Process
• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point
• Match Hardware to Application– Distributed vs. Centralized– TI vs. Atmel– DSP
TI Microcontroller Evaluation
• A microcontroller with better specs– MSP430P112 330 A/Mhz active mode
1.5 A standby (6 ns wakeup)• Used IAR Systems compiler, profiler,
development environment• Analysis
– Centralized 3.3V, 4 Mhz: 3.8 mW– Distributed 2.5V, 1 Mhz: 0.48 mW per mote
• Six processors 2.9 mW
Optimization Process
• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point
• Match Hardware to Application– Distributed vs. Centralized– TI vs. Atmel– DSP
TI DSP Evaluation• TMS320C54x• Used TI Code Composer Studio, compiler,
simulator• Power
– Active Mode, 3.3V 10 Mhz: 33 mW– IDLE1, 0.36 mW
• Analysis– Centralized: 7.8 mW– Distributed: 1.6 mW per mote
• Six processors = 9.6 mW total
TI DSP Evaluation Part 2
• TMS320C55x (two parallel MACs)• Same tools, with C55x compiler, simulator• Power: No details available...
– Advertised: 0.9V, 0.05 mW/Mhz• Analysis
– Centralized: 1170240 cycles (vs 2290440 54x)• 2 Mhz: 0.1 mW
– Distributed: 195040 cycles (vs 381740 54x)• 1 Mhz: 0.05 mW• Six processors: 0.3 mW total
Other Explorations
• Hand optimized code– Possible to massively reduce computation cost– FP/Transcendentals conspicuously painful– Outside scope of our exploration
• Radio Hardware– Bluetooth ~ 100 times more efficient
• Reconfigurable Computing• Other circuitry (e.g. accelerometers)
Results Summary• Cost, in mW of various implementations
17.74 using sleep mode, 28 without
• 31/104 % improvement with same hardware• 170x improvement with new hardware PC Centralized Distributed
Atmel 17.74/28 13.5 33.2TI - 3.8 2.9DSP 1 - 7.8 9.6DSP 2 - 0.1 0.3
Conclusions
• By finding better mappings from SW HW Application, big performance gains are possible.
• Effective use of local processor resources can reduce communication overheads, which are significant.
• DSPs and other specialized processors can be a big win and don’t require hand-coded assembly or reconfigurable design