Challenges in High-Performance Embedded Designs FINAL
-
Upload
david-fong -
Category
Documents
-
view
216 -
download
0
Transcript of Challenges in High-Performance Embedded Designs FINAL
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
1/391
-Embedded Designs
Alec Bath, Application Engineer, STMicroelectronics
Markus Mayr, Product Marketing Engineer, STMicroelectronics
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
2/392
1D Barcode Scanner
Our current 1D Barcode Scanner is great - but:
We need to reduce cost to stay competitive
We need to add new features to react to customerrequests: USB, RTOS, etc.
2
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
3/393
Original 1D Barcode Scanner
ARM7TDMI ARM7TDMI
UART/ PS2 (GPIO)
50MHz128KB Flash16KB RAM
50MHz128KB Flash16KB RAM
CCD Scan sensorCCD Scan sensorExternal Fast ADCExternal Fast ADC
12
User InterfaceUser Interface
GPIO
3
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
4/394
Von-Neumann Bottleneck
DMADMA
AHB1AHB1
MasterMaster
Slow PeripheralsSlow Peripherals User InterfaceUser Interface
SRAM
EMIEMI
AHB2AHB2ARM7TDMI
Master
ARM7TDMI
Master
FLASHFLASH
Fast PeripheralsFast PeripheralsUSB 1.1USB 1.1
External Fast ADCExternal Fast ADC
12
4
Up to128KBUp to128KB
IRQ!!IRQ!!
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
5/395
1D Scanner design then and now
BuBuSystem
D-bus
I-busCORTEX-M3Master 1
72MHz128KB Flash20KB RAM
CORTEX-M3Master 1
72MHz128KB Flash20KB RAM
FLASHFLASHI/F
I/F
Matrix
Matrix
GP-DMAMaster 2GP-DMAMaster 2
SlaveSlave
AHB-APB2AHB-APB2
AHB-APB1AHB-APB1
AHB
GPIOA,B,C,D,E - AFIO USART1- SPI1 - ADC1,2 -
TIM1 - EXTI
GPIOA,B,C,D,E - AFIO USART1- SPI1 - ADC1,2 -
TIM1 - EXTI
Bridges
APB1
APB2
Arbiter
USART2,3 - SPI2 - I2C1,2 TIM2,3,4 - IWDG WWDG
USB CAN BKP PWR
USART2,3 - SPI2 - I2C1,2 TIM2,3,4 - IWDG WWDG
USB CAN BKP PWR
User InterfaceUser Interface
USB 1.1USB 1.1
5
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
6/396
Innovative System Architecture
Harvard architecture + BusMatrix allows concurrent Flash execution and DMA transfer
Advanced Peripherals to further offload the CPU
Low-latency deterministic interrupt controller in the Cortex-M3 core
75% lower power at the same clock speed as ARM7
30% better code size via THUMB2 instruction set
BusM
atrix
BusM
atrix
System
D-bus
I-bus
SRAM
Slave
SRAM
Slave
FLASHFLASHI/F
I/F
--
GPIOA,B,C,D,E - AFIO USART1- SPI1 - ADC1,2 -
TIM1 - EXTI
GPIOA,B,C,D,E - AFIO USART1- SPI1 - ADC1,2 -
TIM1 - EXTI
APB2
CORTEX-M3Master 1
72MHz128KB Flash20KB RAM
CORTEX-M3Master 1
72MHz128KB Flash20KB RAM
6
-Master 2
-Master 2
AHB-APB1AHB-APB1
Bridges
APB1
Arbiter
USART2,3 - SPI2 - I2C1,2 TIM2,3,4 - IWDG WWDG USB CAN BKP PWR
USART2,3 - SPI2 - I2C1,2 TIM2,3,4 - IWDG WWDG USB CAN BKP PWR
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
7/397
Cortex-M3 Harvard Architecture
Instructions fetched over I -bus
System
D-bus
I-bus
CORTEX-M3Master 1
CORTEX-M3Master 1
FLASHFLASH
Flash
I/F
Slave
Flash
I/F
Slave
biter
D-bus
SRAMSlaveSRAMSlave
AHB/ APBxSlave
AHB/ APBxSlave
AHB
Bridges
APBxUSART / SPI /
I2C / ADC/ TIMUSART / SPI /
I2C / ADC/ TIMMulti
layerBusMatrix/A
System
GP-DMA
GP-DMA
7
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
8/398
Cortex-M3 Harvard Architecture
Instructions fetched over I -bus
System
D-bus
I-bus
CORTEX-M3Master 1
CORTEX-M3Master 1
FLASHFLASH
Flash
I/F
Slave
Flash
I/F
Slave
biter
D-bus
w e era s e c e on - us
SRAMSlaveSRAMSlave
AHB/ APBxSlave
AHB/ APBxSlave
AHB
Bridges
APBxUSART / SPI /
I2C / ADC/ TIMUSART / SPI /
I2C / ADC/ TIMMulti
layerBusMatrix/A
System
GP-DMA
GP-DMA
8
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
9/399
Cortex-M3 Harvard Architecture
Instructions fetched over I -bus
System
D-bus
I-bus
CORTEX-M3Master 1
CORTEX-M3Master 1
FLASHFLASH
Flash
I/F
Slave
Flash
I/F
Slave
biter
D-bus
w e cons an s e c e on - us
SRAMSlaveSRAMSlave
AHB/ APBxSlave
AHB/ APBxSlave
AHB
Bridges
APBxUSART / SPI /
I2C / ADC/ TIMUSART / SPI /
I2C / ADC/ TIMMulti
layerBusMatrix/A
System
GP-DMA
GP-DMA
While Core reads peripheral
9
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
10/3910
DMA & Cortex-M3 Data Flow
Instructions fetched over I -bus
System
D-bus
I-bus
CORTEX-M3Master 1
CORTEX-M3Master 1
FLASHFLASH
Flash
I/F
Slave
Flash
I/F
Slave
biter
D-bus
w e era s e c e on - us
SRAMSlaveSRAMSlave
AHB/ APBxSlave
AHB/ APBxSlave
AHB
Bridges
APBxUSART / SPI /
I2C / ADC/ TIMUSART / SPI /
I2C / ADC/ TIMMulti
layerBusMatrix/A
System
GP-DMA
GP-DMA
WhileDMAreadsSRAM!
While Core reads peripheral
10
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
11/3911
The Cortex-M3 MCU Core
High performance with low dynamic power Harvard Architecture 30% erformance im rovement over ARM7TDMI Single-cycle multiply Hardware divide Atomic Bit manipulation
Best code density Thumb-2 brings 32-bit performance with 16-bit code density
Deterministic Interrupt controller inside the core, 12-cycle push / 12-cycle pop Just 6-cycle latency for tail-chained interrupts
Improved debug features
11
Serial-Wire-Viewer adds real-time data trace ETM on select part numbers 2 data watchpoints, 6 hardware breakpoints
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
12/3912
Whats Thumb-2 ?
Thumb-2 is a NEW ARM instruction set, mixing 16 & 32-bit instructions
Backwards compatible to previous 16-bit THUMB instruction set
12 new instructions, including several DSP-type instructions
Memory footprint similar to THUMB, Performance similar to ARM!
No more interworking between ARM and THUMB modes!
12
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
13/3913
Compact Code and Data Memory
Cortex-M3 supports unaligned data accesses to improve data constant and
RAM utilization
Dataaligned
32bit machinewhich doesnot supportunaligned data
long (32)
int (16)
char (8)
long (32)
int (16)c
int (16)
char (8) char (8) char (8)
char (8)
long (32)
long
int (16)
char (8)
long int (16)c
int (16)
long (32)
char (8) char (8) char (8)
char (8)
long (32)
long (32)
long (32)
long
Structuremanagementexample
Unused (wasted) space Free space for the rest of the application
long (32)
Reduces SRAM Memory Requirements By Over 25%
Less Memor = Lower cost devices!
13
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
14/3914
Atomic Bit Manipulation via Bit Banding
14
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
15/3915
NVIC Interrupt Handling
Interrupts are handled in hardware! Theres no instruction overhead
-
Processor state automatically saved to the stack over the data bus.{PC, xPSR, R0-R3, R12, LR}
In parallel, ISR is prefetched on the instruction bus.-ISR ready to start executing as soon as stack PUSH complete.
12-cycle Exit:
Processor state is automatically restored from the stack.
In parallel, interrupted instruction is prefetched ready for execution uponcompletion of stack POP.
15
Stack POP can be interrupted, allowing new ISR to be immediatelyexecuted without the overhead of state saving.
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
16/3916
Fast Interrupt Response and Tail Chaining
Here we have 2 simultaneous interrupts,
IRQ1 being of higher priority
PUSH POPISR 1 PUSH POPISR 2
26 16 26 16
IRQ1
IRQ2ARM7Interrupt handling inassembler code
42 CYCLES
HighestWhen IRQ1 is finished, IRQ2 is serviced
PUSH ISR 1 POPISR 2
12
Cortex-M3Interrupt handling in HW
6 12
6 CYCLES
Tail-chaining
16
In Cortex-M3, ISR2 has only a 6-cycle delay.ISR2 has been tail-chained
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
17/3917
System Timer (SysTick)
Flexible system timer is part of the Cortex-M3 Core
24-bit self-reloading down counter with end of count interrupt
2 configurable Clock sources
Suitable for Real Time OS or other scheduled tasks
Can only access when executing in privilegedmode
17
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
18/3918
STM32 On-Chip Flash Memory Interface
Mission:: Support 72 MHz operation directly from Flash memory
64-bits wide Flash with Pre-fetch (2 64bits buffers)
Instructions-BUS
FlashInterface
ARBITERARBITER
bits
bits
FLASHMEMORY
bits
64bits
32 bits 16 bits16 bits
321616Bits
Thumb-2
Data/Debug-BUS
ARRAY
6 6 64
64bits
Thumb-2 Thumb-2humb
32bits
Thumb-2
3
2bits
Th
umb-2
CORTEX-M3
CPU
18
32 bits
Data
16-bit
Data
8 bit
Data
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
19/3919
Low Power Modes
Functions Low power modes names and
Low Power Modes
CPU Periphshigh
speedOsc
Medium
Speed
OSC
RTC
CalendarLSI RAM
STM32LCurrent consumption
STM32LWake-up Time
RUN (from Flash) ONCan beenabled
ONCan beenabled
ON ON ON 230A/MHz
RUN (from RAM) ONCan beenabled
ONCan beenabled
ON ON ON 185A/MHz
LP RUN ONan e
enabledan e
enabledON (LS) ON ON ON 11A -
LP SLEEP OFFCan beenabled
OFF OFF ON ON ON 6A 0.35s
STOP w/full RTC
ON
ON ON ON 1.3A 8s
STOP w/o RTC OFF ON ON 0.43A 57s
STANDBY w/full RTC ON OFF OFF 1A 8s
19
STANDBY w/o RTC OFF OFF OFF 0.27A 57s
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
20/3920
MCU Platform for Rapid Innovation
Powerful core
Fully compatible product portfolio
Complete ecosystem
Connectivity Line Performance Line Access LineValue LineUSB Access Line Low Power w LCDLow Power
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
21/3921
2D Barcode Scanner
21
Easy and fast product transition thanks to scalablearchitecture and compatible product family
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
22/39
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
23/3923
Innovative System Architecture
CORTEX-M3120MHz w/ MPU
CORTEX-M3120MHz w/ MPU
High SpeedUSB2.0
High SpeedUSB2.0
Ethernet10/100
Ethernet10/100
Dual PortDMA2
Dual PortDMA2
Dual PortDMA1
Dual PortDMA1
riph1
Master 1Master 1
I-Bus
D-Bus
S-Bus
as eras er
FIFO/DMAFIFO/DMA
as eras er
FIFO/DMAFIFO/DMA
FIFO/8 StreamsFIFO/8 Streams
AHB1AHB1
Dual PortAHB1-APB2Dual Port
AHB1-APB2
as eras er
FIFO/8 StreamsFIFO/8 Streams
Mem1
Mem2
Periph2
Fast PeripheralsFast Peripherals
GPIOsGPIOs
Dual PortAHB1-APB1Dual Port
AHB1-APB1Slow PeripheralsSlow Peripherals
Pe
SRAM1112KBSRAM1112KB
SRAM216KB
SRAM216KB
FSMCFSMC
AHB2AHB2 , rypto,USB Full Speed
, rypto,USB Full Speed
23
Multi-AHB Bus Matrix
ARTAccelerator
ARTAccelerator
I-Code
D-CodeFLASHUp to
1Mbytes
FLASHUp to
1Mbytes
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
24/3924
2D Barcode Scanner
HS USB DeviceGPIO
USB connector
Camera I/F
User Interface USART connector
USART
802.15.4 Radio
HIDClass
DAC / I2S
Image Processing(DSP Calculations)
24
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
25/39
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
26/39
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
27/3927
Audio Docking Station
Cortex-M3
CPU
MicroPhone &
Pre-amplifier
12-bits ADC
Flash
SRAM
DMA
MP3 Decoder ( 2 instances )MP3 Encoder
Volume/ Ch Mixer
Loudness Control
AudioDAC
Amp
FSMC
PLL block
XTAL oscillators32 kHz + 3-25 MHz
Speakers / Headset
USB
IS
SDIO
eco er
5 bands - Equalizer
MSClass *
File System *
FS USB Host *
HMI Control& Display
27
Audio Media:
USB mass storage device
TouchScreenQVGALCD
USBKey
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
28/3928
Dual Port
DMA1
Dual Port
DMA1 Dual PortDual Port Slow PeripheralsSlow Peripherals
Architecture : DMA & Multi-Bus Matrix
Master 2Master 2
AHB1AHB1
Dual PortAHB1-APB2Dual Port
AHB1-APB2Dual Port
DMA2Master 3
Dual PortDMA2
Master 3
Fast PeripheralsFast Peripherals
GPIOsGPIOs
AHB1-APB1AHB1-APB1
ast erp era usto bypass the bus matrix
SRAM1112KBSRAM1112KB
SRAM216KB
SRAM216KB
FSMCFSMC
AHB2AHB2g peeUSB2.0Master 4
g peeUSB2.0Master 4
Ethernet10/100Master 5
Ethernet10/100Master 5
MI, rypto,
USB Full Speed
MI, rypto,
USB Full Speed
28
ARTAccelerator
ARTAccelerator
CORTEX-M3120MHz w/ MPU
Master 1
CORTEX-M3120MHz w/ MPU
Master 1
I-Code
D-Code
FLASHUp to
1Mbytes
FLASHUp to
1Mbytes
Multi-AHB Bus Matrix
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
29/39
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
30/3930
ARTAcceleratorTM: the Bottom line !
80
100
120
140
160
MIPS)
STM32F200
MCU A
MCU B
STM32F200performance
0
20
40
60
0 50 100 150erfo
rmance(
Impact of wait states:
is almost linearwith frequency
30
P Core Frequency - mper ec acce era or-Slow flash
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
31/39
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
32/3932
What is CoreMark?
Simple, yet sophisticated
,
Comprehensive documentation and run rules
Free, but not cheap
Open C code source download from EEMBC website
Dhrystone Terminator
The benefits of Dhrystone without all the shortcomings
Free, small, easily portable
CoreMark does real work
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
33/3933
Exposing Dhrystone Weaknesses
Major portions of Dhrystone are susceptible to a compilers
-
Library calls are made within the timed portion and dominate
the time consumed by the benchmark - NOT CoreMark
Completely synthetic and does not mimic any behavior thatcan be expected in a real application- NOT CoreMark
No official source code resulting in different, and oftenundisclosed, versions (1.1, 2.0, 2.1) - NOT CoreMark
Very vague and ambiguous run guidelines are not universally
known and are not enforced - NOT CoreMark
(DMIPS, Dhrystones per second, DMIPS/MHz) - NOTCoreMark
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
34/3934
CoreMark Workload Features
Matrix manipulation allows the use of MAC and common math ops
State machine operation represents data dependent branches
Cyclic Redundancy Check (CRC) is very common embeddedfunction
est ng or:
A processors basic pipeline structure
Basic read/write operations
Integer operations
Control operations
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
35/3935
Summary: The Value of CoreMark
Simple to use, yet sufficiently sophisticated forenc mar ng a processor core
Freely available, limited usage restrictions
Provides industry standard tool to allow users tobe in embedded rocessor anal sis
Introduces all processor users to the overallvalue of EEMBC
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
36/3936
EEMBC CoreMark 1.0 - Summary250
STM32F2xx(228.6@ 120MHz
CoreMark
[Iter/Sec]
STM32F2xx
150
200
(190.30@ 100MHz)
50
100
36
0
MHz
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
37/39
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
38/3938
Lets review!
Not all Cortex-M3 microcontrollers are the same!
ngs to cons er n your next g -per ormanceembedded design:
An innovative multi-layer bus architecture
-
Highly optimized flash memory controller
The efficient Cortex-M3 core
38
A cutting-edge 90nm process technology, enabling
120MHz performance with very low power consumption
-
8/8/2019 Challenges in High-Performance Embedded Designs FINAL
39/39