EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase...

28
EcoXiP Industry Pubs & In-the-News IoT device designs are relying upon Execute-in-Place (XiP) system architecture to achieve lower controller cost and higher performance. However, commodity memory stands in the way of achieving the cost and performance targets of an XiP implementation. Adesto EcoXiP is specifically designed for XiP, with a high performance protocol that enables blazingly fast 266 MB/s performance at half the power consumption and total lower cost. Adesto Technologies Corporation 3600 Peterson Way | Santa Clara, California USA 95054 | Phone: 408-400-0578 FAX: 408-400-0721 www.adestotech.com

Transcript of EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase...

Page 1: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

EcoXiP Industry Pubs & In-the-News

IoT device designs are relying upon Execute-in-Place (XiP) system architecture to

achieve lower controller cost and higher performance.

However, commodity memory stands in the way of achieving the cost and

performance targets of an XiP implementation.

Adesto EcoXiP is specifically designed for XiP, with a high performance protocol

that enables blazingly fast 266 MB/s performance at half the power consumption

and total lower cost.

Adesto Technologies Corporation 3600 Peterson Way | Santa Clara, California USA 95054 | Phone: 408-400-0578 FAX: 408-400-0721

www.adestotech.com

Page 2: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

EcoXiP Industry Pubs & In-the-News

Table of Contents:

EcoXiP Overview

EcoXiP Datasheet (requires registration)

EcoXiP e-Bulletin Channel Announcement

EcoXiP EVK Product Sheet

The Linley Processor Report:

Adesto Execute-in-Place

Article: Embedded Computing Design

Is your Quad Device Choking your System Performance? EcoXiP can help http://www.embedded-computing.com/iot/is-your-quad-device-choking- your-system-performance

Article: Embedded Computing Design

Selecting the Optimal Flash for your Embedded Application http://www.embedded-computing.com/guest-blogs/selecting-the-optimal- flash-device-for-your-embedded-application

In-the-News: Adesto and STMicroelectronics Collaborate

In-the-News: EcoXiP Supports New JEDEC Standards

White Paper :

Crossover to Memory Expansion with Adesto EcoXip and NXP's i.MX RT Crossover Processors

Adesto Technologies Corporation 3600 Peterson Way | Santa Clara, California USA 95054 | Phone: 408-400-0578 FAX: 408-400-0721

www.adestotech.com

Page 3: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

EcoXiP Evaluation Kits Now Available

Adesto EcoXiP

Best Performance and Best Power for XiP Applications

High-speed xSPI-Octal memory designed for Execute-In-Place

System designers are leveraging Execute-in-Place (XiP) as the transformative system architecture that

will deliver higher-performance IoT and edge devices.

Adesto EcoXiP xSPI-Octal memory is specifically designed with the right architecture and features to

meet the requirements of high-speed, low-power, instant-on XiP applications.

Now you can get going even faster with your XiP design by requesting your free EcoXiP product

samples or by ordering the EcoXiP evaluation kit.

EcoXiP Delivers:

Performance

Blazingly fast Octal interface (up to 266Mbytes / sec)

Optimized for microcontroller cache controllers

Power

Up to 50% lower power than other high-speed Octal devices

25% better power efficiency than other Quad devices for same Mbytes / sec

Price

Cheaper than other high-speed Octal devices

Lower solution cost than other Quad devices for same Mbytes / sec

Page 4: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

EcoXiP Evaluation Kits Now Available

EcoXiP is the only memory that fully supports these JEDEC specs:

xSPI standard JESD251 and 251-1

The new Reset Signaling Protocol JESD252

The latest version of SFDP JESD216D

Evaluation Kits now available

EcoXiP EVKs with NXP's i.MX RT1050 can be ordered directly from Adesto or through an Adesto

distributor.

Third party reference design boards using EcoXiP are also available. Contact Embedded Artists at

www.embeddedartists.com .

Learn more and get samples

Adesto Technologies Corporation 3600 Peterson Way | Santa Clara, California USA 95054 | Phone: 408-400-0578 FAX: 408-400-0721

www.adestotech.com

Page 5: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

EcoXiP™

Evaluation Kit ATXPxxx-EVK

NXP iMXRT 1050 Evaluation Kit (EVK) featuring Adesto EcoXip Execute-In-Place memory devices. The ATXPxxx-EVK-iMXRT1050 evaluation kit features the NXP iMXRT1050 cross-over controller alongside the

Adesto EcoXiP 32-Mbit or 128-Mbit xSPI memory designed to support NXP's implementation of the Arm® Cortex®-

M7 core for high speed Execute-In-Place (XiP) applications.

Kit Contents

• MIMXRT1050-EVK board

• USB cable (Micro B)

• Adesto EcoXiP Part No: See Product Offerings table below

• Display (optional): RK043FN02H-CT 4.3 inch LCD 480 x 272 pixels with capacitive touch

• User Guide (downloadable) Document # AN106 link: https://www.adestotech.com/products/ecoxip/

• Tool support: MCUXpresso, IAR, Keil, and MDK

Product Offerings

Ordering Code Adesto EcoXiP Part Number Display Included

ATXP032-EVK02-iMXRT1050 32 Mb EcoXiP ATXP032 Yes

ATXP128-EVK02-iMXRT1050 128 Mb EcoXiP ATXP128 Yes

ATXP032-EVK01-iMXRT1050 32 Mb EcoXiP ATXP032 No

ATXP128-EVK01-iMXRT1050 128 Mb EcoXiP ATXP128 No

Co rp o rat e Of f ice

California | USA

Adesto Headquarters

3600 Peterson Way

Santa Clara, 95054

Phone: (+1) 408.400.0578

Email: [email protected]

© 2019 Adesto Technologies. All rights reserved

Adesto, the Adesto logo, CBRAM and DataFlash are trademarks or registered trademarks of Adesto Technologies Corporation in the United States and other countries. Other company, product, and service

names may be trademarks or service marks of others. Adesto products are covered by one or more patents listed at http://www.adestotech.com/patents.

Disclaimer: Adesto Technologies Corporation (“Adesto”) makes no warranties of any kind, other than those expressly set forth in Adesto’s Terms and Conditions of Sale at http://www.adesto-

tech.com/terms-conditions. Adesto assumes no responsibility or obligations for any errors which may appear in this document, reserves the right to change devices or specifications herein at any time

without notice, and does not make any commitment to update the information contained herein. No licenses to patents or other intellectual property of Adesto are granted by Adesto herewith or in connec-

tion with the sale of Adesto products, expressly or by implication. Adesto’s products are not authorized for use in medical applications (including, but not limited to, life support systems and other medical

equipment), weapons, military use, avionics, satellites, nuclear applications, or other high risk applications (e.g., applications that, if they fail, can be reasonably expected to result in personal injury or

death) or automotive applications, without the express prior written consent of Adesto.

EcoXiP-EVK–02/2019

Page 6: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

NON-VOLATILE MEMORY | OCTAL FLASH

EcoXiP ATXP Series High-performance low-power octal flash

Octal interface optimized for execute-in-place

Best Performance, Best Power

Performance CoreMark® test on NXP’s i.MX RT1050 with 8 instruction cache invalidations every ms to simulate task switching & interrupt handling.

Efficiency CoreMark® score / power consumption

Best Performance blazingly fast eXecute-in-Place (XiP)

As microcontrollers push the performance envelope and use cutting edge technologies, the cost to integrate internal flash quickly

becomes prohibitive. With its blazingly fast performance and low power consumption, EcoXiP allows even time critical software to

be executed directly out of non-volatile memory, reducing boot time and system cost.

Best power high-efficiency low-power design

For all battery powered designs, power consumption is critical. Often this means sacrificing performance. EcoXiP’s intelligent

power management helps you simplify your design without the need for compromises. Special power saving modes and high

efficiency read operations make EcoXiP perfect for any battery operated application.

www.adestotech.com

Read While Write (RWW) simplified OTA updates

Updating program code can be a tedious task. Once downloaded, then the system has

to pause while it performs the update, before finally returning to normal operation. Read

While Write (RWW) allows an update to be programmed in the background without any

impact to the user. Once ready, the user can be prompted and the upgrade happens

immediately. No wait, no fuss, better user experience and simple design.

Page 7: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

cs

0

applications

• Instant on modules

• Industrial IoT

• Building automation

• Wearables

• Consumer devices

• Smart appliances

• Medical devices

• OTA intensive applications

• Network modules

• Audio subsystems

Density

Part Number

Speed

Quad

QPI DDR

OPI DDR

RWW

32Mbit

ATXP032

ATXPO32R

133MHz

133MHz

64Mbit

ATXP064

ATXPO64R

133MHz

133MHz

128Mbit

ATXP128

ATXP128R

133MHz

133MHz

Performance for the real world Today’s Internet of things (IoT), smart devices, and embedded processors de-

mand high performance and instant-on capabilities while keeping power con-

sumption to a minimum. eXecute in Place (XiP) technology is well suited to meet

these needs. Adesto’s EcoXiP takes this to the next level.

Specifically designed to work with cache controllers, EcoXiP dramatically reduces

latency for cache misses. Unlike other octal flash solutions that sacrifice power

consumption for high data rates, EcoXiP maintains low power operation by utiliz-

ing Adesto’s proprietary technology.

EcoXiP offers the perfect solution for memory expansion in systems that don’t

have enough embedded flash or SRAM. With its high performance, even

time-critical code can execute directly out of flash, eliminating the need to add

expensive and power hungry external DRAM.

CONTROL AND 1/0BUFFERS DS PROTECTION LOGIC AND LATCHES

SCK

SI (1/0o)

SO(l/01) INTERFACE

WP(l/02) CONTROL

AND 1/03 LOGIC Y-DECODER Y-GATING

1/04 I

Technical Specifications

• eXecute in Place (XiP)

- Instant-on capability

- Lower system cost

• Reduced cache latency

- Critical word first

- Zero latency for additional cache lines

• Up to 266MBytes / sec

- Octal DDR xSPI interface

- Full JESD251 , JESD252, and

JESD216D compatibility

• Low power consumption / high efficiency

- Low read current

- Variable strength I/O

- Deep sleep / ultra-deep sleep modes

• Read While Write (RWW)

• Flexible erase and program architecture

- Block erase: 4, 32, and 64KBytes

- Byte / page program (1-256 bytes)

- Suspend / resume, erase and program

operations

• Hardware and software write protection

• 256 byte OTP security register

I/Os

I/Os

o

...J FLASH (/)

• 100K erase / program cycles

• 20 years data retention

1/07 (/)

w X-DECODER MEMORY

c::: ARRAY 0

RESET -c

• Single 1.8V supply

• Industrial temp range: -40°C to +85°C

• Pb / Halide-free / RoHS compliant

3600 Peterson Way, Santa Clara, CA 95054 USA | Phone: +1 (408) 400-0578 | www.adestotech.com | e-mail: [email protected]

© Adesto Technologies 2019 all rights reserved PBEXrev1-0219

Page 8: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

© The Linley Group • Microprocessor Report October 2016

C/A Wait Line N Line N+1 Line N+2 Line N+3

ADESTO EXECUTES IN PLACE

New EcoXIP Memory Simplifies IoT Design

By Linley Gwennap (October 10, 2016)

...................................................................................................................

Adesto is a small memory supplier with big plans.

After introducing an innovative nonvolatile memory ear-

lier this year, it has taken standard NOR flash and added a

new high-speed interface designed specifically for streaming

instructions—a technique that designers call execute in place

(XIP). The new EcoXIP product is now sampling in 32Mb

(4MB) capacity, with production expected in 1Q17. Addi-

tional capacity options will follow.

Many small systems employ a microcontroller with

embedded flash memory that holds the application code.

When these devices add a radio for IoT capability, they re-

quire larger storage to hold the wireless protocol and IP

stack as well as security software. MCUs normally top out

at 1MB of internal flash, but IP-based IoT devices often

need more code space, requiring an external flash device.

Commodity flash chips connect through a low-speed

SPI, requiring the MCU to copy the code into a large inter-

nal SRAM to maintain reasonable performance. Many sys-

tems execute directly from the external flash (XIP); they

are not only slower but may require a second flash chip to

support over-the-air (OTA) code updates. In an XIP de-

sign, even writing data (such as log information) to flash

can be challenging.

Announced at the recent Linley Processor Conference,

Accelerating the Bus To more efficiently implement XIP, Adesto redesigned the

basic SPI protocol to better handle the typical access pat-

terns. SPI is designed for random accesses; it returns the re-

quested cache line (e.g., 16 bytes), then waits for the next

request. This approach works well for data storage, but

instruction fetches tend to be sequential as the CPU pro-

ceeds through a block of code. Therefore, EcoXIP continues

to provide sequential bytes until it receives a new command.

It calls this approach “command fusing.”

This approach can double the bus throughput, as

Figure 1 shows. Using an octal (8-bit) SPI, a transaction

typically requires 1 bus cycle for the command, 2 cycles for

the address, about 14 cycles to wait for the response, and 8

cycles to transmit 16 bytes of double-clocked data (DDR).

Fetching the next 16 bytes requires another 25 bus cycles,

or 50 for the two transactions. Using EcoXIP, the first 16

bytes take the same number of cycles, but data then con-

tinues to flow, delivering four cache lines in 49 cycles.

The benefit of Adesto’s approach comes when the

CPU executes a sequential set of instructions. For a 32-bit

MCU, each line holds four instructions, and a branch occurs

about every seven instructions; about half of all branches

are taken. The company estimates that the average number

EcoXIP solves these problems by enabling simul-

taneous read and write transactions. It can deliver

instructions at a sustained rate of 156MB/s (266MB/s

peak), which is fast enough for most MCUs and

better than other XIP memories. Because it uses a

modified SPI to improve performance, however, the

Octal SPI at 133MHz

C/A Wait Line N C/A Wait Line N+1

EcoXIP at 133MHz

new memory works only with compatible MCUs. At

the conference, Adesto CTO Gideon Intrater dis-

closed that NXP, a leading MCU supplier, will sup-

port the EcoXIP interface in future MCU products.

Figure 1. EcoXIP bus timing. C/A=command/address. By chaining data

responses using “command fusing,” the Adesto design can deliver twice

as many cache lines in the same number of bus cycles. (Source: Adesto)

Page 9: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

© The Linley Group • Microprocessor Report October 2016

T

hro

ug

hp

ut

(MB

/s) L

ate

ncy (

ns

)

2 Adesto Executes in Place

Price and Availability

Adesto is currently sampling a 32Mb EcoXIP prod-

uct to lead customers; it expects to enter production in

1Q17. The company withheld pricing. To download a

free copy of the Adesto presentation from the Linley

Processor Conference, access www.linleygroup.com/

processor-conference. For more information on EcoXIP,

access www.adestotech.com.

of line fetches per instruction-cache miss is 3.84, or nearly

14 instructions (the first cache line may have fewer than

four useful instructions if the target is in the middle of the

line). Using this average, Adesto calculates the sustained

throughput of its 133MHz EcoXIP at 156MB/s and the

average latency at just 57ns. When the CPU reaches a tak-

en branch, it sends a new request to the EcoXIP, which

then begins transmitting data from the new address.

Most flash chips have a quad SPI to reduce pin count

and cost. These chips generate as little as 58MB/s of sustain-

able throughput at an 80MHz bus speed. More-expensive

parts offer an octal SPI and operate at up to 200MHz, but

even they fall well behind the 133MHz EcoXIP in through-

put and latency for XIP applications, as Figure 2 shows.

Adesto plans to increase the EcoXIP bus speed to

200MHz in order to boost this performance further. In

addition to modifying the protocol, the EcoXIP interface

has an extra data strobe signal, which simplifies the imple-

mentation of designs that operate at speeds above 80MHz.

Current high-speed designs require a dynamic delay line to

synchronize the DDR transfers, but the strobe allows the

MCU to capture data using a simple fixed delay.

Two Banks, No Waiting Flash memory retains data even when a device is powered

down, but this feat requires a complex and time-consuming

write operation. For NOR flash, this operation involves

0

20

40

60

80

100

applying a high voltage (above 5V) to the cell for a period of

roughly 1ms. During this period, the MCU cannot fetch

instructions from the flash chip. If the flash is solely for code

execution (XIP), this situation won’t arise, as no writing is

necessary. But many systems use flash to store data, such as

configuration parameters and event logs. OTA code updates

also require writing data to the flash.

Of course, the system can simply stall for 1ms each

time it writes to flash, but that delay hampers performance.

Another option is to load a small amount of code into the

MCU’s internal memory before starting an OTA update,

but if anything unusual occurs (such as an interrupt), the

rest of the code will be unavailable. Thus, many designs

include two flash chips, so one can be read while the other

is written, but this approach adds cost.

EcoXIP separates its internal flash memory into two

banks. Doing so allows the MCU to read from one bank

while writing to the other. Designers can adjust the bound-

ary between the banks to split the memory 50/50 or put as

little as one-eighth in one bank. The former approach en-

ables OTA updates to store a complete set of code without

overwriting the original code; the latter is good for systems

that just need a small amount of data memory.

As a further enhancement, EcoXIP implements an

automatic power down after a write. Most other flash chips

require the MCU to stay awake during the 1ms write so it

can power down the flash once the write completes. With

EcoXIP, the MCU can “fire and forget,” starting the write

and immediately going into a low-power mode while the

flash chip finishes the write and then puts itself to sleep.

The Adesto chip provides a variety of sleep modes that

trade off power savings against wake-up time.

A Zippier XIP Adesto offers two product lines. One is a unique nonvola-

tile memory, called conductive-bridge RAM, that is CMOS

compatible (see MPR 2/22/16, “Adesto Targets IoT Us-

ing CBRAM”). It also acquired a family of standard NOR-

flash chips from Atmel in 2012. CBRAM is a lower-power

alternative, but NOR flash remains less expensive for stor-

ing large amounts of boot code. EcoXIP builds on these

standard products, adding a custom interface that improves

performance.

160

140

120

100

80

60

40

20

0

EcoXIP

133MHz

Octal XIP

200MHz

Octal XIP

133MHz

Quad XIP

80MHz

Quad SPI

80MHz

120 140

160

180

200

Other vendors also offer fast flash memories using

custom interfaces. For example, Cypress (formerly Span-

sion) offers the proprietary HyperBus interface, which can

deliver 333MB/s using a 166MHz DDR octal interface that

supports arbitrarily long bursts. The Macronix OctaFlash

and Micron XTRMFlash have similar capabilities at speeds

of up to 200MHz using modified SPI protocols. But all of

these parts are designed for fast boot in systems that copyFigure 2. Adesto EcoXIP performance. All numbers are for XIP operation and assume 16-byte instruction-cache lines

and an average of 3.84 line fetches per instruction-cache

miss. (Source: Adesto)

the code into RAM for execution, so they are available only

in sizes of 128Mb (16MB) and larger. These systems em-

ploy higher-performance processors instead of microcon-

trollers and often run complex operating systems.

Page 10: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

Adesto Executes in Place 3

Adesto began with a similar concept but optimized it

for XIP applications. Products such as XTRMFlash are de-

signed for long predetermined bursts, whereas EcoXIP

allows the CPU to inject a new target address into the burst

at any time. Furthermore, Adesto targets applications with

more than 1MB of code but less than 16MB, a range that

encompasses many MCU-based IoT clients that use a basic

real-time OS or no OS at all. The company’s dual-bank

design is a unique capability that can reduce cost in sys-

tems that would otherwise require two separate flash chips.

For MCU-based systems, EcoXIP is less expensive

than a large on-die flash memory, since an embedded-flash

process adds cost compared with a flash-optimized pro-

cess. Using XIP eliminates the need for a large and costly

on-die SRAM; in fact, EcoXIP can couple with an inexpen-

sive MCU that has minimal on-die memory. The dual-

bank chip is also less expensive than two separate flash

chips of half the capacity, in part because of package-cost

savings. EcoXIP’s unique capabilities should help Adesto

gain a foothold in the IoT market. ♦

To subscribe to Microprocessor Report, access www.linleygroup.com/mpr or phone us at 408-270-3772.

© The Linley Group • Microprocessor Report October 2016

Adesto Technologies Corporation 3600 Peterson Way | Santa Clara, California USA 95054 | Phone: 408-400-0578 FAX: 408-400-0721

www.adestotech.com

Page 11: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

In the News

PRESS RELEASE

Adesto’s EcoXiP™ enables ultra- low-power, low-latency XiP system operation on STMicroelectronics’ new STM32L4+ MCUs TTThhhuuurrrsssdddaaayyy,,, FFFeeebbbrrruuuaaarrryyy 222222,,, 222000111888...

Combined solution lets designers create sensor-rich IoT devices with

long battery life

SANTA CLARA, CA – February 22, 2018 – Adesto Technologies

(NASDAQ: IOTS), a leading provider of application-specific, ultra-low

power non-volatile memory (NVM) products, announced its collaboration

with STMicroelectronics to enable ultra-low-power, low-latency eXecute-

in-Place (XiP) system operation on ST’s STM32L4+ microcontrollers

(MCUs) through Adesto’s EcoXiP™ system-accelerating NVM. The

combination of STM32L4+ MCUs and EcoXiP enables designers to

create IoT devices requiring numerous sensors and other advanced

capabilities while ensuring long battery life.

STM32L4+ series MCUs build on ST’s popular STM32L4 series MCUs by

increasing performance, adding more embedded memory, and

delivering richer graphics and connectivity features, while maintaining

ultra-low power consumption. The STM32L4+ is the first STM32-family

architecture to offer two Octal SPI ports, which support NOR Flash

including XiP operation.

Page 12: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

For many applications, the embedded memory in an MCU is not

sufficient, and external XiP memory provides the natural solution. Built

on an innovative memory and protocol architecture, EcoXiP sets a new

standard for performance, cost and power for devices requiring a XiP

architecture. It delivers high system performance, optimized latency and

throughput, concurrent read/write capability, enhanced security, and the

best standby power for a wide range of connected products including IoT

edge devices, wearables, connected and wireless embedded systems,

medical monitors and POS controllers.

“By pairing ST’s new ultra-low power STM32L4+ MCUs with EcoXiP,

designers can architect high-performance, low-power XiP systems for

more energy-efficient, lower-cost products,” said Gideon Intrater, Adesto

CTO. “Not every application needs a XiP solution, but for those that can

benefit from it, there is no better solution in the market than EcoXiP.”

DDDeeemmmooonnnssstttrrraaatttiiiooonnn aaattt EEEmmmbbbeeeddddddeeeddd WWWooorrrlllddd

Adesto will demonstrate EcoXiP running with an STM32L4+ device at

Embedded World, being held February 27 – March 1 in Nuremberg,

Germany. Visit Adesto in Hall 4A, Booth 259. Contact

[email protected] to arrange a personal demonstration.

AAAvvvaaaiii lllaaabbbiii llliii tttyyy

Samples of Adesto’s EcoXiP non-volatile system-accelerating memory

are available now. EcoXiP is available in optimized densities from 32Mb

to 128Mb. For more information, see:

https://www.adestotech.com/products/ecoxip.

AAAbbbooouuuttt AAAdddeeessstttooo TTTechnologies

Adesto Technologies (NASDAQ:IOTS) is a leading provider of application-

specific, ultra-low power non-volatile memory products. The company

has designed and built a portfolio of innovative products with intelligent

features to conserve energy and enhance performance including Fusion

Serial Flash, DataFlash® and products based on Conductive Bridging

RAM (CBRAM®) technology. CBRAM® is a breakthrough technology

platform that enables 100 times less energy consumption than today’s

memory technologies without sacrificing speed and performance. Adesto

is focused on delivering differentiated solutions and helping its

customers usher in the era of the Internet of Things. See:

www.adestotech.com.

Page 13: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

In the News

PRESS RELEASE

Adesto’s EcoXiP Supports New JEDEC Standards to Pave the Way for a New Era of Smart Devices and Edge Processing Tuuuesday, November 13, 2018.

ELECTRONICA, MUNICH, GERMANY – November 13, 2018 – Adesto

Technologies (NASDAQ: IOTS), a leading provider of innovative

application-specific semiconductors for the IoT era, announced it has

shipped the first serial NOR flash devices supporting the new xSPI

standard (JESD251 and JESD251-1), Serial Flash Reset Signaling Protocol

(JESD252) and the latest version of the SFDP standard (JESD216C).

Adesto’s EcoXiP™ eXecute-in-Place (XiP) non-volatile memory (NVM) fully

supports these specifications, which were recently released by

microelectronics standards body JEDEC. These standards make it easier

for system designers to reap the benefits of EcoXiP in their designs and

deliver smarter, more efficient and user-friendly devices.

Today, many emerging Internet of Things (IoT) and high-end

microcontroller (MCU) designs need more program memory and data

processing storage than can be implemented economically on-chip

using embedded flash or SRAM. The new standards make it simpler for

the industry to adopt NVM devices that use the Octal Serial Peripheral

Interface (SPI), such as Adesto’s EcoXiP, which delivers the higher

performance and storage space needed. EcoXiP eliminates the need for

expensive on-chip embedded flash, and it hits the sweet spot for power,

system cost and performance, with significantly lower power consumption

compared to other Octal devices and dramatically higher performance

versus Quad SPI devices.

Page 14: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

“NXP architected crossover processors with no on-board flash and

provided an Octal interface to optimize off-chip NVM performance. This

allows NXP to deliver a class of microcontrollers that boost processing

performance and increase power efficiency at a very competitive price

point,” said Joe Yu, GM of Low-Power MPUs at NXP Semiconductors.

“Ultimately this helps designers add more features to their products and

improve the consumer experience. A low-power external memory device,

such as Adesto’s EcoXIP, is a complementary device for our i.MX RT

series.”

The new xSPI (expanded SPI) standard establishes hardware guidelines

to enable designers to easily add high-throughput Octal and Quad

devices to their systems. The Serial Flash Reset Signaling Protocol

defines a way to reset flash devices without a need for a dedicated reset

pin. The SFDP (Serial Flash Discoverable Parameter) standard provides a

consistent method of describing the functional and feature capabilities of

serial flash devices in a common set of internal parameter tables. With it,

OEMs can speed firmware development and time-to-market. The latest

revision of the SFDP specification adds support for Octal SPI.

“Adesto delivers solutions that ignite innovation for next generation IoT

devices. It was important that we help drive the development of these

standards, and we are delighted to be the first company to ship serial

flash devices with full support,” said Gideon Intrater, Adesto’s CTO. “This

is the first time that there is a robust set of standards that defines ways

for serial NOR flash to communicate, making it possible for companies to

easily integrate the latest technology and increase the performance of

their designs.”

Demonstration at Electronica

At Electronica 2018, being held November 13 – 16, 2018, Adesto will

demonstrate its EcoXiP NVM integrated with the NXP i.MX RT1050

crossover processor via the Octal xSPI interface, in compliance with the

new standards. Visit Adesto’s booth: Hall C3, Booth 121 at the Messe

München exhibition center. For more information, contact

[email protected].

Adesto Technologies Corporation 3600 Peterson Way | Santa Clara, California USA 95054 | Phone: 408-400-0578 FAX: 408-400-0721

www.adestotech.com

Page 15: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

CROSSOVER TO MEMORY EXPANSION

WITH ADESTO ECOXiP AND NXP’S

i.MX RT CROSSOVER PROCESSORS

Donnie Garcia, NXP Semiconductor: Solutions Architect

Eyal Barzilay, Adesto Technologies: System and Software

INTRODUCTION

With 8.4 billion connected “things” having shipped in 2017, the internet of tomorrow is clearly upon us. We

have entered a new age of human to machine interactions where technology is guiding many aspects of

our lives. For a variety of end devices such as wearables, home monitoring nodes and industrial controllers,

the capabilities of the embedded processor play a vital role in addressing the insatiable demand for a

higher order of functionality. This has led to industry focus on machine learning enabled by vision and audio

processing to bring the computation needed to make decisions at the edge node. These capabilities require

elevated levels of processing performance and memory space for MCUs. The push for processing has led

to a new breed of semiconductor device which does not fit into a traditional definition of a microcontroller.

The ‘Crossover Processor’ integrates attributes of a microprocessor such as higher CPU speeds, multimedia

interfaces and expandable memory into a microcontroller form factor built for cost effectiveness and fastest

development time. This new crossover processor class of device provides embedded developers the ability to

solve many problems in today’s fast-moving technology markets.

Collaboration between semiconductor manufacturers and memory vendors plays a vital role in ensuring that

the embedded systems that are brought to market achieve performance and usability goals. This is

accomplished by closing the gap between the typical embedded flash device and the crossover MCU with

external memory. Using external memory, crossover processors have the ability to support massive amounts

of software and data memory space. This is done with keeping the same look and feel of a traditional

embedded flash microcontroller. Together, the right serial flash memory coupled with a capable processor

address the challenges of performance, security, power consumption and development experience.

For the processor, considering eXecute-in-place (XiP) from the start of the semiconductor chip design brings

together a microarchitecture that is built for memory expansion. For serial flash, there are advancements in

the interface protocol, low energy read of memory, and read while write programming capabilities to address

these challenges. This paper will provide an overview of how performance and usability are addressed for

systems depending on external memory. The following sections will explore how the Adesto EcoXiP serial

flash and the i.MX RT1050 crossover processor pair together to provide the embedded platform needed to

conquer the challenges of future embedded designs.

Page 16: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

i.MX RT: Advanced Processor Architecture 9

Understanding XIP Performance 10

Throttling Test Case 11

Instrumenting Test Case 12

Examining Example Applications 14

Development and Debug with XiP 14

Conclusions 15

Resources 15

TABLE OF

CONTENTS

Crossover to Memory Expansion with Adesto’s

EcoXiP and NXP’s i.MX Crossover Processors 1

Introduction 1

Overview of Serial NOR Flash and eXecute in

Place (XiP) 2

Microcontroller Memory Architectures 2

How XiP is Achieved 3

FlexSPI Memory Controller 4

Adesto EcoXiP: Advanced Serial Flash 5

Application Use cases 7

2

Page 17: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

OVERVIEW OF SERIAL NOR FLASH AND EXECUTE IN PLACE

Serial NOR flash comes in the form of integrated circuits (ICs) with a range of memory size and physical

interface options. These memory devices typically operate at 1.8V or 3.3V, support 100 thousand write erase

cycles, and can easily be placed on printed circuit boards. The serial flash IC allows embedded systems to

easily introduce a non-volatile memory (NVM) with various packages ranging from the basic 8-pin to very

small chip scale. There are many use cases for applying serial NOR flash to a system. Persistent data logging

is one example of a common application use case which benefits from this technology. Another important

use is storing and executing software for the ever growing embedded applications.

The eXecute in Place, or XiP, is a capability that allows a processor to execute code directly from external

flash memory. Many embedded applications require connectivity stacks, audio processing, and vision. The

amount of executable code for these functions has grown to substantial sizes. When considering these

application requirements together for one embedded system, the capability of XiP with external flash is an

essential enabler as it allows nearly limitless data space for the embedded system. In the semiconductor

industry, thousands of capable microcontrollers are already integrating the type of memory controller

needed to support XiP cability from Serial NOR flash.

Microcontroller Memory Architectures

For embedded processing, there are several common memory architectures as shown in Figure 1. Starting

from the left, for most microcontrollers, internal non-volatile memory provides the execution space for

the software. Here the NVM is all provided internal to the chip. There are advantages due to the system

integration, but a limitation with regards to scalability. If the system needs more memory than what is

provided internal to the processor, then external memory must be added. Often, external memory (such as

EEPROM) is needed to store persistent data for other uses in the system as shown in the diagram.

The second architecture in the middle, is a copy-to-execute architecture. This means that the code is stored

in external flash but copied to internal RAM at startup and then executed. In this case external NVM is used

in conjunction with execute memory in RAM. This architecture, will be limited by the size of the internal

SRAM memory. If the size of code is larger than internal SRAM, software must bring in portions of code

as needed by the application. This copy to execute has penalties with regards to copy time and software

complexity. Large internal SRAM size could have a significant impact on cost. Alternatively, if external DRAM

is used, system cost can be reduced because of the low cost per bit for DRAM versus internal SRAM.

When using DRAM there are challenges with regards to power consumption. This is due to the volatile

nature of the DRAM memory and the need for self-refresh for low power states of DRAM. Even if the code

fits into SRAM, a low-power system would probably require shutting down the SRAM during sleep mode.

This means that a copy to SRAM would be necessary on each transition from sleep to active mode. In other

words, the system will be slow to wake up.

Internal NVM

Copy to Execute

External Execute

in Place (XIP)

CPU

I$/D$

SPI

Execute

Memory

FLASH

CPU

SPI

Execute

Memory

FLASH

MemCtrl

CPU

I$/D$

SPI

EEPROM EEPROM Execute

Memory

Ext. DRAM

Execute

Memory

FLASH

Page 18: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

Figure 1: Memory architecture diagrams 3

Page 19: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

Furthest to the right, the XiP architecture depends on the external memory for the execution of code. This

memory architecture has advantages with regards to scalability. Designers do not have to face issues with

over buying for a larger memory size to protect against software growth. The choice of external memory

can be made for what is needed for the embedded design. This ensures that every penny spent on the

processor components in the system goes towards relevant features for the end product. This architecture

reduces both risk and design cycle times as the XiP system architecture can be scaled with only a change

to the serial NOR flash in the bill of materials for the circuit boards. In addition, XiP brings an advantage in

terms of power and fast wakeup from sleep mode.

Still, there are challenges when using this architecture. In the coming sections, we will discuss how these

challenges are being mitigated by intelligent designs incorporated for both the processor and the serial

flash.

How XIP is achieved

Central to the support of XiP is the integration with a smart SPI (Serial Peripheral Interface) host controller on

the processor. Akin to a standard SPI, these host controller peripherals support a synchronous serial protocol

that depends on data and clock signals. For example, Figure 2 shows the most basic SPI read where an

opcode and address are sent to a slave device via Serial In (SI), and data is returned to the master device via

Serial Out (SO).

Figure 2: Example SPI data transfer

In addition to operating as traditional SPI, in order to better support the XiP use case, these enhanced

peripherals also operate as system memory controllers. They can take internal bus transfers generated in

the chip and translate them into the right serial commands needed to interact with the external memory. In

this way, data transfers from the external memory are accelerated by hardware. The instructions and data

residing in external serial NOR flash are directly fed into the CPU pipeline or other chip peripherals based on

memory transfers occurring inside the microarchitecture of the chip.

FlexSPI Memory Controller

One such memory controller is the FlexSPI. FlexSPI is NXP’s latest generation of the serial flash memory

controllers. The block diagram in Figure 3 represents the FlexSPI which is integrated on the i.MX RT

crossover processors. The 64bit AHB bus is the interface to the system bus which will come from a CPU or

other on-chip masters such as an LCD controller. The IPS BUS is a separate interface which allows software

to directly send commands to the NOR flash device by way of the FlexSPI register model. This interface is

also used for the initialization and configuration of the external serial flash as it can be used to initiate the

process of sending commands.

The capabilities of the i.MX FlexSPI memory controller enhance XiP. In the diagram, just to the right of the

AHB_CTL block, both transmit (TX) and receive (RX) buffering are shown. This buffering is used for

prefetching data when reading the external memory to improve latency and overall compute performance

for the XiP operation. 4

Page 20: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

Data

ARB_CTL

(Arbitrator)

Data

ARB_CMD

Data

Data

Data Data

AHB_CTL

Data

Data

AHB_TX_BUF

IP_CMD

ta IP_RX_FIFO

IP_RX_FIFO

IP_TXF_CTL

IP_CTL

Da

Data

AHB BUS

64-bit

RX_FIFO

ASYNC

Data

SPI Bus

CDC_SYNCH

SEQ_CMD

Cross)

SEQ_CTL

Data

Data

IO_CTL

SPI Bus

FB Port

IPS BUS

32-bit TX_FIFO

ASYNC

Data

Figure 3: FlexSPI Block Diagram

Shown in the diagram on the right side is the sequence control block. The sequence control block is a

large look-up table which holds preset instructions for different serial flash operations such as read, erase

and program. This block is what links accesses from the 64-bit AHB bus to the read command sequence

which is sent to the external serial flash. Not every flash will have the same command set or I/O interface.

The sequence control engine is programmable for adjusting the SPI transfers based on the command set

defined by the serial flash. This allows processors like the i.MX RT to interface to a broad range of external

flash types and capabilities. This flexibility allows the crossover processor to utilize flash attributes that play

an important role in supporting the most capable XiP embedded systems.

ADESTO ECOXIP: ADVANCED SERIAL FLASH

Serial flash is not only for storing code and data but also for executing code directly from flash (Execute-

in-Place or XiP). Advancements in serial flash technology have made it possible for newer serial flash to be

used in systems with high performance requirements. These advancements allow serial flash devices such

as Adesto EcoXiP to respond quickly to read requests from the host MCU and deliver instructions and data

with low latency and high throughput.

One advancement is the multi-line SPI interface. Traditionally, communication with a serial device was (as

the name suggests) serial. Data would be transferred over a single line at a time. For more capable devices,

communication is parallel, and data is transferred over up to eight data lines as shown in the Octal-SPI

transfer diagram in Figure 4. Adesto’s EcoXiP devices are equipped with JEDEC’s latest Octal SPI protocol

(xSPI), making the communication close to 8x faster than a single wire serial flash.

Figure 4: Example Octal-SPI Data Transfer 5

Page 21: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

6

Supplementing the Octal interface, serial flash can feature double data rate (DDR). This capability is more

common in high-speed DRAMs. With DDR, data bits are sampled on both the rising and falling edges of

the serial clock. Since it takes only half a clock cycle to send out a data bit, this feature has the potential to

double the throughput from the external memory. In addition, modern serial flash devices deliver high clock

speeds north of 100MHz. This is achievable due to a data strobe signal driven by the flash during the data

phase of a read.

To address latency, Adesto EcoXiP supports features to reduce the overhead of the command interface.

Latency is the time from when there is a request for data until the time that the data is available to the

requestor. EcoXiP supports special read commands such as Read Array to allow faster access to data by

reducing the number of clocks needed for subsequent reads of data. As shown in Figure 5, the Read Array

command with Octal SPI and DDR reduces the number of clock cycles needed for passing the command

and address data. An 8-bit command and 24-bit address are passed with only 3 clocks. Then subsequent

accesses to sequential data are available. All of these serial flash features (read array command, DDR, fast

clock speeds and Octal SPI) work to support the XiP use case.

Figure 5: Read Array Command

Application use cases

Beyond addressing the performance of eXecute in Place operation, there are other unique features in

EcoXiP to support application use cases. EcoXiP’s concurrent read-write, also known as read-while-write

or RWW, allows the host processor to continue reading from a partition of the flash memory array while

modifying data on another part. As an example, periodic logging of data which involves erase and program

operations to the serial flash does not put the XiP program on hold. With the RWW feature, instruction and

data fetching during programming continues as usual in a different partition of the flash. This scheme allows

read operations from one bank while the device is busy programming or erasing another bank. The serial

flash device can be configured into two banks: Bank A and Bank B. The border between the banks can be

set with a granularity of 1/8th of the full flash array size. Read commands to one bank can be done while a

write is in progress in the other bank.

The XiP architecture also provides advantages to systems which leverage power-down modes to save energy.

Unlike execute-from-RAM scenarios, wake up from very low-power modes is much faster. There is no need

to copy from a non-volatile memory device into the SRAM execution memory. The system can be set to start

executing immediately from external flash. The flash standby power consumption is significantly lower than

DRAM systems due to NOR flash memory technology.

In general, the serial flash leakage of Adesto’s memory devices is so low that there is no need to turn the

flash completely off. Devices like EcoXiP offer deep power-down and ultra-deep power-down modes which

result in an extremely low power consumption with only a small impact to wake up time. As shown in Table

1, there are power modes as low as 200 nanoAmps. The end energy consumption (current over time) is

significantly lower than what would be required to copy the code into RAM for DRAM based architectures

which may require self-refresh.

Page 22: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

7

Parameter EcoXiP Specifications

Densities 32 Mbit (4 MByte), 64 Mbit (8 MByte),

and 128Mb (16 MByte)

Interface Quad/Octal, SDR/DDR

Read Bandwidth (max) 133 MBs

Power Supply 1.7V – 1.95V

Max. Operating Frequency 133 MHz

Temperature Range (Ta) -40 °C - 85 °C

Temperature Range (Tj) -40 °C - 105 °C

Supply Current (Ultra Deep Power Down) 200 nA

Supply Current (Deep Power Down) 4 µA

Supply Current (Standby) 35 µA

1.8V Supply Current – Octal DDR 35 mA

1.8V Supply Current (Program/Erase) 15 mA

Table 1: Adesto EcoXiP Specifications

When not in power-down mode, EcoXiP offers competitive power consumption for active mode while

reading from memory and sending data to the host processor. The savings can be as much as half compared

to similar Octal SPI devices in the market. For 133MHz Octal SPI reads, the Adesto EcoXiP read current is

typically 35mA.

Flash devices offer security features as well. For example, EcoXiP contains a specialized OTP (One-Time

Programmable) security register that can be used for purposes such a unique device serialization, system-

level Electronic Serial Number (ESN) storage, locked key storage, etc. This register can be programmed

but not erased, so only a one-direction transition is possible for each bit. In addition, this register can be

permanently locked.

Flash devices are supported by the embedded development ecosystem in different ways. EcoXiP provides

flash-loader plug-ins for various embedded tool chains. The flash loader is engaged by the integrated

development environment once it detects that a program’s binary image, or part of it, falls into the flash

memory address range. It will initialize the flash and erase and program memory regions on-demand

as requested by the host tool. In this context, it’s worth mentioning a new feature called Serial Flash

Discoverable Parameter (SFDP) which provides useful information about the flash in a standardized way. This

allows the host to automatically figure out flash attributes and set it up the interface accordingly. In theory,

one could develop a universal flash loader which would work on all serial flash devices. An update of SFDP

to support the new Octal-SPI (xSPI) standard has been recently ratified by a JEDEC committee JC42.

I.MX RT: ADVANCED PROCESSOR ARCHITECTURE

Contributing to the support of the external serial flash in embedded systems are the advanced processor

architectures which are now available. For example, the i.MX RT crossover processor is built with the highest-

performance Arm® Cortex-M® processor, the Arm Cortex-M7. This CPU can execute up to two instructions

every clock cycle and supports 6-stage pipelining, improving computational ability versus other CPUs in

its class. The high-performance CPU ensures that even though slower memory accesses may stall the CPU,

the high compute power is delivered when data is made available. In addition to the CPU, the internal bus

system associated with this class of processor is the same as what has previously been used for higher-end

controllers built with Arm Cortex-A family of devices.

Page 23: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

8

AXI Masters

LCD

USB

PXP

(2D processing)

DCP

(Crypto)

Camera

AHB2AXI

2xSD/eMMC

AHB2AXI

AHB Masters

eDMA

ENET

The diagram in Figure 6 represents the architectural details of the i.MX RT 1050 crossover processor. With

regards to cache, the i.MX RT integrates 32KB for the instruction and 32KB for the data caches. This is the

largest size in the market and reduces the CPUs sensitivity to any delays imposed by slower memories. For

the Tightly Coupled Memory (TCM), the i.MX RT has a FlexRAM block of memory. This intelligent RAM

memory controller allows customization of the TCM up to the largest sizes available on the chip. The user

can select the maximum size, or repurpose the FlexRAM to work as on-chip SRAM to be shared with other

chip peripherals. Having a large TCM allows software architects to choose this memory option for the

portions of their code which need the absolute maximum performance. Software placed in the TCM will

achieve the lowest latency access times, producing the highest performance.

Other Masters FlexRAM

128 KB

SRAM

(32-bit)

128 KB

SRAM

(32-bit)

ITCM

DTCM

600 MHz

Arm® Cortex®-M7

Processor

Other Masters

256 KB

SRAM

(64-bit)

DTCM

32 KB

I-Cache

32 KB D-Cache

AXI Interconnect AHB Interconnect

FlexRAM

On Chip

AXI Slaves

SEMC

AXI2AHB

FlexSPI

AHB Slaves

RAM 8/16-bit SDRAM/ PSRAM/NOR/

NAND/8080

Serial Flash/ RAM/NAND

4xAIPS Peripherals

Arm Cortex-M7 Slave Port

Figure 6: i.MX RT Architecture Diagram

With regards to the use of the 64-bit AXI on the i.MX RT, there are a broad range of AXI masters which

are integrated onto the chip. The AXI bus is a split-transaction protocol and supports multiple outstanding

transfers. Some specific peripherals to highlight which are relevant to emerging application trends are the

camera interface and cryptographic accelerator (Data Co-Processor-DCP). These components differentiate

the i.MX RT in the market and align with the need for image processing capabilities and security. The FlexSPI

controller allows for these other masters to make use of the receive buffer. This allows the data stored in the

external flash to be quickly accessed as with the case of displaying graphics on a screen.

Finally, most relevant to the computational capabilities of the i.MX RT with external flash is the processor

speed. Reaching 600MHz allows the i.MX RT to be throttled up for the most intensive calculations. Once

data is available to the processor, it is processed at the CPU speed. With all of these capabilities working

together, the end result is a processor using XiP that can achieve high performance and is expandable to

a nearly limitless memory footprint. Figure 7 details how the CPU and the FlexSPI work together to reduce

stalling the flow of application code. Starting at stage 0, the figure represents the case of a full miss of the

target data and subsequent prefetching done by the FlexSPI. The stages show how the levels of cache and

buffers have to be missed to stall the CPU.

Page 24: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

Serial Flash

5: High performance serial flash

reduces access latency with high

speeds, double transfer rates and

up to 8 data lines

Trace Port (4-bits)

0: CPU fetches a target address for

instruction or data

Cortex M7

MPU ITM

CTI

ETM

TPIU

ROM Table

MCM

TSGEN

6: The prefetch from the FlexSPI

accelerates all subsequent reads,

even a full miss, with no cached

data will be accelerated by

prefetching

1: CPU Cache is checked for target

FPU

DPU ROM Table

AHB AP

DAP

SWJ DP

FlexSPI

1 KB Pre-Fetch

3: Bus access seeks data from

FlexSPI buffer

I$ D$ Bufferaddress

2: Cache miss leads to bus access

at target address

32 KB 32 KB

ITCM DTCM0 DTCM1 OCRAM

FlexRAM (512 KB)

Clock

Control

DFT

Control

4: Prefetch buffer miss leads to

FlexSPI read sequence for pre-

initialized read command to

external Flash

Figure 7: XIP Memory Access Stages

UNDERSTANDING XIP PERFORMANCE

As detailed in the previous sections, the technology associated with the processor and the external NOR

Flash memory is built to obscure the latencies involved with using XiP. This presents a challenge with

regards to fully understanding the performance impact for this architecture. For example, with the FlexSPI

receive buffer, each read access made to the serial flash can range from one cache line (32 bytes for the

Arm Cortex-M7) up to the full size of the receive buffer. The receive buffer is 1KB for the i.MX RT1050

processor. The maximum size of the read transaction to completely fill the receive buffer is preset as part of

the configuration of the FlexSPI.

Due to the receive buffer, smaller code loops, such as an iterative mathematical calculation, or a case

statement, after a few cache lines are pulled from external memory, the processor no longer depends

on additional data. At this point, the processor will be executing from buffered data. The receive data

continues to be drawn from the serial flash to fill the buffer size that has been preset. Because of this,

traditional methods of monitoring memory accesses as an indication of performance do not apply. High

performance is achieved even with high access rates to the external memory. Performance cannot be

directly correlated to the amount of external memory accesses made by the system.

In addition, many standard industry benchmarks are relatively small programs. These programs often fit in

the caches integrated on the processor. As such, they don’t represent full scale applications which push

memory size boundaries. Thus, in order to understand the expected performance levels for XiP, various

methods have to be applied. These are divided into the following three cases: throttling, instrumenting and

evaluating example application code.

Throttling Test Case

The throttling test case simulates a scenario where a change in program execution would result in processor

accesses which are all outside the CPU cached data. For throttling test cases, the industry standard

benchmark EEMBC CoreMark® is used. This benchmark is first placed in zero latency TCM to produce the

ideal case CoreMark score. This is the control measurement. Then, the benchmark is run in external serial

flash while periodically invalidating the instruction cache at set intervals. This method has the advantage of

relating to a standard benchmark (CoreMark). The generated results can be compared to many number of

publicly posted results that are hosted by EEMBC.

The drawbacks to this method are that for typical application code, such drastic changes to program flow

would rarely lead to a scenario where all of the CPU instruction cache would be invalidated. Estimating the

rate at which the cache should be invalidated is challenging. Regardless of these limitations, this test case

provides insight into how the technology enables high performance with XIP. The results show that with

feature rich serial NOR flash devices such as the Adesto EcoXiP set for Octal SPI and double data rate,

performance is only slightly affected by the CPU cache invalidation events. 9

Page 25: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

10

Figure 8 shows measurements taken with various cache invalidation rates (1ms, 500us, 250us and 125us).

There are two different serial flash conditions: the orange line represents a a single data rate, 4 I/O serial

flash, and the blue line represents the Adesto EcoXiP set for Octal SPI and DDR. The chart shows the

performance advantage of high performance serial flash like Adesto EcoXiP versus slower, lower pin count

flash. Considering the 1ms invalidation rate, there is just over a 3% impact to the CoreMark benchmark.

The 1ms condition is a relevant test case as the typical RTOS tick rate is set to 1ms. Even lower-performing

serial flash devices represented by the orange line have a minimal impact at this rate, delivering 88% of the

CoreMark score versus the ideal case. When considering more extreme cases where CPU cache invalidation

occurs 8 thousand times per second for example, the higher-performance technology delivers nearly 83% of

the performance compared to the ideal case.

EEMBC CoreMark Throttling

IDEAL CASE 2599

2950 2950

COREMARK - 1 ms I$ INVALIDATE

COREMARK - 500 us I$ INVALIDATE

COREMARK - 250 us I$ INVALIDATE

1490

1868

2241

2857

2770

2616

COREMARK - 125 us I$ INVALIDATE 2445

0 500 1000 1500 2000 2500 3000 3500

Quad-SDR 102 MHz Octal-DDR 131 MHz

Figure 8: Throttling CPU Cache Results

For the case of invalidating the CPU cache every 125 microseconds, the end result still achieves a 2,445

CoreMark score. This is significantly higher than many other processors in the market.

Instrumenting Test Case

In order to evaluate performance without using a drastic cache invalidation, code can be instrumented in

a way to allow cache misses to occur more naturally. For the instrumenting test case, a large block of

code is placed in sequential address space which is larger than the size of the CPU cache. So when there

is a cache miss, it is due to a more natural software execution scenario. This method involves creating a

number of smaller loops which can be set to execute a variable number of times (n). These smaller loops

are concatenated together to create a sequential code block that is larger than the CPU cache. When the

smaller loops are executed more frequently, by setting larger values of n, there are more cache hits. When

the smaller loops are executed less frequently, then there are more cache misses.

Figure 9 is a graphical representation of this method. For the purpose of creating a measurement to

evaluate, Fibonacci calculations were used. As shown in the diagram, the processing of each block always

requires one pass of the Fibonacci calculation loop leading to cache misses for that pass. When the CPU first

reaches a Fibonacci block, the first iteration will be cache misses, but all subsequent passes will be executed

from cached data. For the case of n = 10, the first Fibonacci calculation is a miss and the subsequent 9

Fibonacci calculations are cache hits. For the case of n = 30, the first Fibonacci calculation is a miss and the

subsequent 29 Fibonacci calculations are cache hits.

Page 26: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

11

I-C

ach

e

n Fibonacci 1

n Fibonacci 2

n Fibonacci 1023

n Fibonacci 1024

Figure 9: Instrumented Code

Measurements were taken for 10, 20 and 30 iterations of the Fibonacci calculations. Measurements of the

total number of Fibonacci calculations are taken with different memory space location and different types of

serial flash. Higher performance is represented by a higher number of Fibonacci calculations. As shown in

Figure 10, at 30 iterations, the impact to the number of Fibonacci calculations is just over 15% reduction.

Fibonacci Comparison

FIB - 30 ITERATIONS PER LOOP

19448 22529

26610

FIB - 20 ITERATIONS PER LOOP

FIB - 10 ITERATIONS PER LOOP

6629

13142

16807

20768

26357

25628

0 5000 10000 15000 20000 25000 30000

Quad-SDR 102 MHz Octal-DDR 131 MHz RAM

Figure 10: Results of Instrumented Code

As the cache miss rate is increased, the data shows that having high-performance serial flash leads to less

impact than using standard serial flash. This is comparing the orange bar to the gray bar results. Though

this method allows precise control over the cache miss rate, it does not fully represent standard application

code. The cache miss rate on standard application code can vary broadly depending on use case.

Examining example applications

To overcome the limitations of instrumented code and throttling cache, running an example application in

different target memory scenarios offers additional proof points to the performance when using XiP. This test

scenario is easily accomplished because XiP is enabled through the MCUXpresso Integrated Development

Environment. (IDE). The MCUXpresso IDE projects can be created to place software into the TCM zero-

latency memory. After performing measurements, the same software can be applied to the external serial

NOR flash space and measured again. There are many example projects to choose from in the software

Page 27: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

12

development kits (SDKs) offered by NXP. The entire process with the measured results is detailed in a step-

by-step lab guide (see link provided in the resources section). This guide allows developers the opportunity

to explore these methods themselves. The examinations can be done with the provided SDK application

examples or with the final application software created by the developer.

For the case demonstrated by the lab guide, Arm Mbed TLS benchmarking of Elliptical Curve Digital

Signature Algorithm (ECDSA) was performed. The results show that with CPU cache enabled, for this specific

benchmark the measured difference between ITCM and using external flash does not change. Whether

executing from the best case memory, the TCM, or executing from external serial flash with XiP, the ECDSA

benchmark application shows the same results.

For a different case, when using MCUXpresso compiler optimizations set for performance, the measured

difference for ECDSA computations is shown to be less than 6% lower for the XIP case. Changing

compiler settings changes the generated machine code so that it is much more compact. The end result is

approximately a 4x improvement for the ECDSA calculations. As the code becomes more optimized, the

throughput provided by the external serial flash begins to affect the measured performance, leading to the

slight impact when using XiP.

DEVELOPMENT AND DEBUG WITH XIP

As demonstrated by the lab guide, other experiments for XiP can be performed with the enablement

provided by the MCUXpresso. For example, the speed of the external memory can be varied by changing

definitions inside the project. The MCUXpresso platform provides the tools needed to quickly examine this

and other scenarios, allowing the developer to fully leverage the benefits of the expandable XiP architecture.

For downloading and debugging application software, the MCUXpresso IDE is preset to allow a seamless

connection to the serial flash components placed on the i.MX RT Evaluation Kit. (EVK). When a debug

session is initiated by the user, the flash loader scripts are automatically used by the debug tool. In addition

to the development tools, the off-the-shelf configuration of the i.MX RT EVK hardware has both a high-

performance 8-wire SPI as well as a 4-wire SPI. With both of these serial flash options placed on the board,

the user can choose the right attributes for their end design.

Figure 11: Selecting Adesto Serial Flash

Page 28: EcoXiP Industry Pubs & In-the-News · 2019-04-23 · • ead While rite (RWW) Flexible erase andprogramarchitecture - Block erase: 4, 32, and 64KBytes - Byte / page program (1-256

13

When importing SDK projects into MCUXpresso, the choice of the serial flash hardware is made based

on the memory settings in the memory configuration editor. The lab guide provides the detailed steps to

choose the Adesto flash during the import as highlighted in Figure 11. With a special edition of the i.MX RT

EVK that has the Adesto EcoXiP placed on the board, nearly all of the SDK examples can be run and debug

with the Adesto external flash. The operation of the enablement tools with the crossover processor is just as

it would be for a traditional microcontroller which contains embedded flash.

CONCLUSIONS

External memory for an embedded processor offers a scalable platform aligning to the challenges of

today’s embedded systems. When using external serial flash memory, success can be achieved with the

right processor and memory technology. Modern Arm CPUs integrate cache that greatly enhances the use

of external memory. In addition, processor designs are architected to use execute in place with memory

controllers, such as the FlexSPI memory controller which provides buffering and prefetch. Coupling this

with the enhanced capabilities offered by serial NOR flash addresses cost, power, performance and security

challenges. Furthermore, the infrastructure provided by tools such as MCUXpresso allows developers the

ability to get from concept to deployment quickly and efficiently.

RESOURCES

The following table includes links to resources which support developer investigation into using XIP.

Resource Description

Processor summary page The i.MX RT1050 family summary page provides links to chip documents (Data Sheet and Reference Manual)

Hardware evaluation kit The i.MX RT EVK provides a platform for embedded development. Multiple boot interfaces are supported

Software SDK The MCUXpresso SDK is the software enablement which provides drivers and middleware for the i.MX RT

Arm Cortex-M7 Whitepaper Detailed description of the Arm Cortex-M7 CPU

MCUXPresso IDE training Training material to understand the MCUXPresso Integrated Development Environment features

Using XIP Lab Guide This is the lab guide mentioned in this paper which provides the detailed steps for experimenting with XIP

CONTRIBUTOR

Wim Rouwet

Systems and Architecture Engineer

www.nxp.com

NXP and the NXP logo are trademarks of NXP B.V. All other product or ser vice names are the proper ty of their respective owners. The

Power Architecture and Power.org word marks and the Power and Power.org logos and related marks are trademarks and ser vice marks

licensed by Power.org. Arm is a registered trademark of Arm Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reser ved.

© 2018 NXP B.V.

Document Number: NXPADESTOWP REV 0

Release Date: September 2018

Adesto Technologies Corporation 3600 Peterson Way | Santa Clara, California USA 95054 | Phone: 408-400-0578 FAX: 408-400-0721

www.adestotech.com