THE EMBEDDED COMPUTING
PLATFORM
The CPU Bus
The bus is the mechanism by which the CPU communicates with memory and devices.
A bus is, at a minimum, a collection of wires, but the bus also defines a protocol by which the CPU, memory and devices communicate.
One of the major role of the bus is to provide an interface to memory.
Bus Protocols
Bus protocol determines how devices communicate.
Devices on the bus go through sequences of states. Protocols are specified by state machines, one
state machine per actor in the protocol. May contain asynchronous logic behavior.
Four-cycle handshake
1. Device1 raises its o/p to signal an enquiry, which tells device2 that it should get ready to listen for data.
2. When device2 is ready to receive, it raises its o/p to signal an acknowledgement. At this point, device1 and 2 can transmit or receive.
3. Once the data transfer is complete, device2 lowers its o/p, signaling that it has received the data.
4. After seeing that ack has been released, device1 lowers its o/p.
Four-cycle handshake
device 1 device 2
enq
ack
device 1
device 2
1 2 3 4
Microprocessor busses
Clock provides synchronization to the bus components,
R/W’ is true when the bus is reading and false when the bus is writing,
Address is an a bit bundle of signals that transmits the address for an access,
Data is an n bit bundle of signals that can carry data to or from the CPU,
Data ready’ signals when the values on the data bundle are valid
A typical microprocessor bus
Timing diagrams
A timing diagram shows how the signals on a bus vary over time,
since values like the address and data can take on many values, some standard notation is used to describe signals.
A signal can go between 0/1 state and a stable/changing state.
To be sure that signals go to their proper values at the proper time, timing diagram sometimes show timing constraints.
Timing diagrams
Timing diagram for the example bus
Timing diagram shown with timing constraints for the example bus.
The diagram shows a read and a write. Timing constraints shown only for read operation,
but similar constraints applies to the write operation. The bus is normally in read mode, since that does not
change any state. During a read the external device or memory is
sending a value on the data lines, while during a write the CPU is controlling the data lines.
Bus read and write
Read Operation on timing diagram
A read or write is initiated by setting address enable high after the clock starts to rise. We set R/W’=1 to indicate a read and the address lines are set to the desired address.
One clock cycle later, the memory or device is expected to assert the data value at that address on the data lines. simultaneously, the external device specifies that the data are valid by pulling down the data ready’ line. This line is active low, meaning that a logically true value is indicated by a low voltage, in order to provide increased immunity to electrical noise.
The CPU is free to remove the address at the end of clock cycle and must do so before the beginning of the next cycle. The external device has a similar requirement for removing the data value from the data lines.
Bus wait state
Burst read
The handshake that tells the CPU and devices when data are to be transferred is formed by data ready for the acknowledge side, but Is implicit for the inquiry side.
The data ready signal allows the bus to be connected to devices that are slower than bus.
The cycle between the minimum time at which data can be asserted and when it is actually inserted are known as wait states.
In this burst read transaction the CPU sends one address but receives of data values.
Bus burst read
State diagrams for bus read
CPUdevice
Get data
Done
Adrs
Wait
See ack
Senddata
Release ack
Adrs
Wait
Ack
start
State diagram
The state machine view of the bus transaction is also helpful and useful complement to the timing diagram.
It shows the transition of control signal. And the CPU decides to perform a read
transaction, it moves to a new state, sending bus signals that cause the device to behave appropriately.
The device’s state transition graph captures it side of the protocol.
Bus multiplexing
CPU
adrs
device
data
adrs
data enable
Adrs enable
Bus multiplexing
Some buses use multiplexed address and data. Additional control lines are provided to tell
whether the value on the address/data lines is an address or data.
Typically, the address comes first followed by the data.
The address can be held in a register until the data arrive so that both can be presented to the device at the same time.
DMA
Direct memory access (DMA) performs data transfers without executing instructions. CPU sets up transfer. DMA engine fetches,
writes. DMA controller is a
separate unit.
Bus mastership
By default, CPU is bus master and initiates transfers.
DMA must become bus master to perform its work. CPU can’t use bus while DMA operates.
Bus mastership protocol: Bus request. Bus grant.
DMA operation CPU sets DMA registers for
start address, length. DMA status register
controls the unit. Once DMA is bus master, it
transfers automatically. May run continuously until
complete. May use every nth bus cycle.
Bus transfer sequence diagram
System bus configurations
Multiple busses allow parallelism: Slow devices on one
bus. Fast devices on
separate bus. A bridge connects two
busses.
CPU slow device
memory
high-speeddevice
brid
ge
slow device
Bridge state diagram
ARM AMBA bus Two varieties:
AHB is high-performance. APB is lower-speed, lower
cost. AHB supports pipelining,
burst transfers, split transactions, multiple bus masters.
All devices are slaves on APB.
Memory Devices
Several different types of memory: Read Only Memories
Flash. Read/Write Memories
DRAM. SRAM.
Each type of memory comes in varying: Capacities. Widths.
Memory Device Organization
4-Mbit memory may be 1M x 4-bit array – single memory access obtain 4-bit
data item, with maximum of 2^20 different addresses. 4M x 1-bit array – single memory access obtain 1-bit
data item, with maximum of 2^22 different addresses. The height width ratio of memory is known as its aspect
ratio. The data are stored in 2-D array of memory cells. n-bit (n = r + c) address
A row address A column address
Internal Organization of a Memory Devices
Random-access memory
Dynamic RAM is dense, requires refresh. Synchronous DRAM is dominant type. SDRAM uses clock to improve performance,
pipeline memory accesses. Static RAM is faster, less dense, consumes
more power.
Static RAM and its operation
CE is the chip enable input. It is active low. When CE=1 the SRAM’s data pins are disabled, and when CE=0, the data pins are enabled.
R/W controls whether the current operation is a read (R/W=1) or a write (R/W=0). Read and write are normally specified relative to the CPU, so read means reading from RAM and write means writing to RAM.
Adrs specifies the address for the read or write. Data is a bidirectional bundle of signals for data transfer.
When R/W=1, the pins are o/p, and when R/W=0, the data pins are input.
InterfaceTiming
diagram
A read operation on the SRAM occurs as follows:
CE is set to zero enabling the chip with R/W=1. An address is presented on the address lines. After some delay, data appear on the data lines. A write operation is similar: CE is set to zero. R/W is set to 0 for writing. An address is set on the address line and data is
set on the data lines.
InterfaceTiming
diagram
Timing diagram for Read
First, RAS is set to 0 and the row part of the address is set on the address lines.
Next, CAS is set to 0 and the column part of the address are put on the address lines.
Read-only memory
ROM may be programmed at factory. Flash is dominant form of field-
programmable ROM. Electrically erasable, must be block erased. Random access, but write/erase is much slower
than read. NOR flash is more flexible. NAND flash is more dense.
Flash memory
Non-volatile memory. Flash can be programmed in-circuit.
Random access for read. To write:
Erase a block to 1. Write bits to 0.
Flash writing
Write is much slower than read. 1.6 ms write, 70 ns read.
Blocks are large (approx. 1 Mb). Writing causes wear that eventually
destroys the device. Modern lifetime approx. 1 million writes.
Types of flash
NOR: Word-accessible read. Erase by blocks.
NAND: Read by pages (512-4K bytes). Erase by blocks.
NAND is cheaper, has faster erase, sequential access times.
I/O Devices
• I/O devices are commonly used in embedded computing systems.
• Some devices are often found as on-chip devices in microcontrollers.
• Other devices are interfaced externally.• We need to understand the requirements of
devices interfacing and its uses in programming.
Timers and counters
Very similar: a timer is incremented by a periodic signal; a counter is incremented by an asynchronous,
occasional signal. Rollover causes interrupt.
Watchdog timer
Watchdog timer is periodically reset by system timer.
If watchdog is not reset, it generates an interrupt to reset the host.
host CPU watchdogtimer
interrupt
reset
Digital-to-analog conversion
Use resistor tree:
R
2R
4R
8R
bn
bn-1
bn-2
bn-3
Vout
Flash A/D conversion
N-bit result requires 2n comparators:
encoder
Vin
...
Dual-slope conversion
Use counter to time required to charge/discharge capacitor.
Charging, then discharging eliminates non-linearities.
Vin
timer
Sample-and-hold
Samples data:
converterVin
Switch debouncing
A switch must be debounced to multiple contacts caused by eliminate mechanical bouncing:
Encoded keyboard
An array of switches is read by an encoder. N-key rollover remembers multiple key
depressions.
row
LED
Must use resistor to limit current:
7-segment LCD display
May use parallel or multiplexed input.
Types of high-resolution display
Liquid crystal display (LCD) is dominant form.
Plasma, OLED, etc. Frame buffer holds current display contents.
Written by processor. Read by video.
Touchscreen
Includes input and output device. Input device is a two-dimensional
voltmeter:
Touchscreen position sensing
ADC
voltage
Component Interfacing : Memory interfacing
Static RAM is simpler to interface to a bus than is DRAM, due to both the DRAM’s RAS/CAS multiplexing and the need for refresh.
The R/W on the bus can often be directly connected to the SRAM.
The main issue in interfacing SRAM is decoding the address. The chip enable pin is used in RAM’s to simplify the
interfacing of large memories. If the required number of memory words fits within the
height of an available memory, then the interface is simple: the CE signal is permanently wired to the ground so that the chip is always enabled.
DRAM interfacing
The bus address can be split in to row and column address with a small amount of logic-a register captures the address, a multiplexer selects the row or column portion of the address, and a state machine generates RAS and CAS.
The refresh signal can be generated with a counter and a state machine as shown.
The counter times the wait between successive refresh actions, the controller generates the required signal.
In idle state, the bus signals are passed through the DRAM to enable reads and writes.
When the counter roles over, the controller generates CAS and then RAS to induce the next refresh cycle.
Device interfacing
Some I/O devices are designed to interface directly to a particular bus, forming glueless interfaces.
But glue logic is required when a device is required when a device is connected to a bus for which it is not designed.
An I/O device typically requires a much smaller range of addresses than a memory, so addresses must be decoded much more finely.
Some additional logic is required to cause the bus to read and write the device’s register.
System architecture• An architecture is a set of elements and the relationships
between them that together form a single unit.• The architecture of an embedded computing system is
the blue-print for implementing that system.• The architecture of an embedded computing system
includes both hardware and software elements.Hardware:
Hardware architecture of an embedded system is more obvious manifestation that you can touch it and feel.
CPU: • There are many different architectures and even within
an architecture we can select between models that vary in clock speed, integrated peripherals and so on
• The choice of the CPU cannot be made considering the software that will execute on the machine.
System architectureBus:• In applications that make intensive use of the bus due to
I/O or other data traffic, the bus may be more of a limiting factor than the CPU.
• Attention must be paid to the required data bandwidths to be sure that the bus can handle the traffic.
Memory:• The ratio of ROM to RAM and selection of DRAM versus
SRAM can have a significant influence on the cost of the system.• The speed of memory will play a large part in determining
system performanceI/O devices: • networking, sensors, actuators, etc.How big/fast much each one be?
Software architecture
Functional description must be broken into pieces:
division among people; conceptual organization; performance; testability; maintenance.
Hardware and software architectures
Hardware and software are intimately related: software doesn’t run without hardware; how much hardware you need is
determined by the software requirements: speed; memory.
Evaluation boards
Designed by CPU manufacturer or others. Includes CPU, memory, some I/O devices. May include prototyping section. CPU manufacturer often gives out
evaluation board netlist---can be used as starting point for your custom board design.
Adding logic to a board
Programmable logic devices (PLDs) provide low/medium density logic.
Field-programmable gate arrays (FPGAs) provide more logic and multi-level logic.
Application-specific integrated circuits (ASICs) are manufactured for a single purpose.
The PC as a platform
Advantages: cheap and easy to get; rich and familiar software environment.
Disadvantages: requires a lot of hardware resources; not well-adapted to real-time.
Typical PC hardware platform
CPU
CPU bus
memory
DMAcontroller
timers
businterface
bus
inte
rfac
e
high-speed bus
low-speed bus
device
device
intrctrl
Typical PC hardware platform
The CPU provides basic computational facilities. RAM is used for program storage. ROM holds the boot program. A DMA controller provides DMA capabilities. Timers are used by the operating system for a variety of
purposes. A high speed bus connected to the CPU bus through a bridge,
allows fast devices to communicate efficiently with the rest of the system.
A low speed bus provides an inexpensive way to connect simpler devices and may be necessary for backward compatibility as well.
Typical busses
PCI: standard for high-speed interfacing 33 or 66 MHz. PCI Express.
USB (Universal Serial Bus), Firewire (IEEE 1394): relatively low-cost serial interface with high speed.
Software elements
IBM PC uses BIOS (Basic I/O System) to implement low-level functions: boot-up; minimal device drivers.
BIOS has become a generic term for the lowest-level system software.
Example: StrongARM
StrongARM system includes: CPU chip (3.686 MHz clock) system control module (32.768 kHz clock).
Real-time clock; operating system timer general-purpose I/O; interrupt controller; power manager controller; reset controller.
Strong ARM SA-1100
Peripheral devices of system control module:
A real time clock. An operating system timer. 28 general-purpose I/Os(GPIOs). An interrupt controller. A power manager controller. A reset controller that handles resetting the
processor.
Debugging embedded systems
Challenges: target system may be hard to observe; target may be hard to control; may be hard to generate realistic inputs; setup sequence may be complex.
Host/target design
Use a host system to prepare software for target system:
targetsystem
host systemserial line
Host/target design
• Load the programs into the target,• Start and stop program execution on
the target, and • Examine memory and CPU registers.
Host-based tools
Cross compiler: compiles code on host for target system.
Cross debugger: displays target state, allows target system to be
controlled.
Software debuggers
A monitor program residing on the target provides basic debugger functions.
Debugger should have a minimal footprint in memory.
User program must be careful not to destroy debugger program, but , should be able to recover from some damage caused by user code.
Breakpoints
A breakpoint allows the user to stop execution, examine system state, and change state.
Replace the breakpointed instruction with a subroutine call to the monitor program.
ARM breakpoints
0x400 MUL r4,r6,r60x404 ADD r2,r2,r40x408 ADD r0,r0,#10x40c B loop
uninstrumented code
0x400 MUL r4,r6,r60x404 ADD r2,r2,r40x408 ADD r0,r0,#10x40c BL bkpoint
code with breakpoint
Breakpoint handler actions
Save registers. Allow user to examine machine. Before returning, restore system state.
Safest way to execute the instruction is to replace it and execute in place.
Put another breakpoint after the replaced breakpoint to allow restoring the original breakpoint.
In-circuit emulators
A microprocessor in-circuit emulator is a specially-instrumented microprocessor.
Allows you to stop execution, examine CPU state, modify registers.
Logic analyzers
A logic analyzer is an array of low-grade oscilloscopes:
Logic analyzer architecture
UUTsample
memorymicroprocessor
controller
system clock
clockgen
state ortiming mode
vectoraddress
displaykeypad
System Data Samples
Hardware/software co-verification
An instruction level simulation may be used to debug code running on the CPU.
A cycle-level simulation tool may be used for faster simulation of parts of the system.
A hardware/software co-simulator may be used to simulate various parts of the system at different level of detail.
Bus-Based Computer Systems
Designing with microprocessors. Development and debugging. System-level performance analysis. Example: alarm clock
Design Example : Alarm clock
Alarm on Alarm off
Alarmready
settime
setalarm
hour minute
light
button
PM
Operations
Set time: hold set time, depress hour, minute.
Set alarm time: hold set alarm, depress hour, minute.
Turn alarm on/off: depress alarm on/off.
Alarm clock requirementsname alarm clockpurpose 24-hour digital clock with one alarminputs set time, set alarm, hour, minute, alarm on/offoutputs four-digit display, PM indicator, alarm ready, buzzerfunctions keep time, set time, set alarm, turn alarm on/off,
activate buzzer by alarmperformance hours and digits, no seconds; not high precisionmanufacturingcost
consumer product
power ACphysicalsize/weight
fits on stand
Alarm clock class diagram
Lights* Display Mechanism
Buttons*
Speaker*
1 1 1 1
1
1
1
1
Alarm clock physical classes
Lights*
digit-val()digit-scan()alarm-on-light()PM-light()
Buttons*
set-time(): booleanset-alarm(): booleanalarm-on(): booleanalarm-off(): booleanminute(): booleanhour(): boolean
Speaker*
buzz()
Display class
Display
time[4]: integeralarm-indicator: booleanPM-indicator: boolean
set-time()alarm-light-on()alarm-light-off()PM-light-on()PM-light-off()
Mechanism classMechanism
Seconds: integerPM: booleantens-hours, ones-hours: booleantens-minutes, ones-minutes: booleanalarm-ready: booleanalarm-tens-hours, alarm-ones-hours: booleanalarm-tens-minutes, alarm-ones-minutes: boolean
scan-keyboard()update-time()
Update-time behavior
update secondswith rollover
update hh:mmwith rollover
Rollover?
T
F
PM=true PM=false
AM->PMPM->AM
display.set-time(current time)
Time >= alarm and alarm-on?
alarm.buzzer(true)
T
F
Scan-keyboard behavior
compute button activations
alarm-ready=true
alarm-ready=false
alarm.buzzer(false)
Increment timetens w. rollover
and AM/PM
Increment timeones w. rollover
and AM/PM
save buttonstates
Alarm-on
Alarm-off
Set-time andnot set-alarmand hours
Set-time andnot set-alarmand minutes
System architecture
The system has both periodic and aperiodic components-the current time must obviously be updated periodically, and the button commands occur occasionally
It seems reasonable to have the following two major software components: An interrupt driven routine can update the current time. The
current time will be kept in a variable in memory. A timer can be used to interrupt periodically and update the time.
A foreground program can poll the buttons and execute their commands. Since buttons are changed at a relatively slow rate, it makes no sense to add the hardware required to connect the buttons to interrupts.
System architecture
The foreground code will be implemented as a while loop:
While (TRUE){Read buttons(button values);/*read inputs*/Process command(button values);/*do
commands*/Check alarm();/*decide whether to turn on
the alarm*/ }
System architecture
• The loop first reads the button using first command
• The buttons will remain depressed for many sample periods since the sample rate is much faster than any person can push and release buttons.
• We want to make sure that clock responds to this as a single depression of the button, not one depression per sample interval.
Testing
Component testing: test interrupt code on the platform; can test foreground program using a mock-up.
System testing: relatively few components to integrate; check clock accuracy; check recognition of buttons, buzzer, etc.
Preprocessing button inputs
As shown in the figure this can be done using a simple edge detection
Top Related