TECHMAN Electronics
1
Confidential
TECHMAN XC100 NVMe SSD
Technical White Paper
v1.0
April, 2016
Techman reserves the right to change products, information and specifications
without notice.
Information in this document is provided in connection with Techman products.
No license, express or implied, by estoppel or otherwise, to any intellectual property
rights is granted by this document. Except as provided in Techman's terms and
conditions of sale for such products, Techman assumes no liability whatsoever and
Techman disclaims any express or implied warranty, relating to sale and/or use of
Techman products including liability or warranties relating to fitness for a particular
purpose, merchantability, or infringement of any patent, copyright or other
intellectual property right. Unless otherwise agreed in writing by Techman, the
Techman products are not designed nor intended for any application in which the
failure of the Techman product could create a situation where personal injury or
death may occur.
All brand names, trademarks and registered trademarks belong to their
respective owners.
Revision History
Version 1.0
Date Apr, 2016
Author Ilong.Hsiao, Ted.Hsieh
Approver Ilong.Hsiao
Amendment Robert.Hsiao
TECHMAN Electronics
2
Confidential
Contents
Overview
Part 1: High Performance Hardware
1-1: Multi-core Computing
1-2: Multi-channel Flash Controller
1-3: Multi-queue Engines
1-4: Embedded XOR & Randomizer
1-5: Strong BCH ECC
Part 2: Advanced NAND Flash Management
2-1: Bad Block Management
2-2: Read Disturb Policy
2-3: Data Retention Policy
2-4: Smart Read Retry Policy
Part 3: Data Integrity Guarantee
3-1: End-to-end Data Protection
3-2: Adaptive RAID Data Protection
3-3: Thermal Throttling Protection
3-4: Power Loss Protection
3-5: Firmware & Metadata Protection
Part 4: Intelligent Firmware Management
4-1: High Performance FTL
4-2: Global Wear Leveling
4-3: Efficient Garbage Collection
4-4: Fast Power-on Rebuild
4-5: TRIM support
4-6: Intelligent Write Flow Control
4-7: Intelligent Read Sequence Control
Part 5: Dual Port for High Availability
TECHMAN Electronics
3
Confidential
OVERVIEW
The digital universe is exploding. The data in the whole world is expected to
reach 17 zettabytes in 2017 and 44 zettabytes in 2020, due to the emerging of IoT.
Currently 90% of data on Earth is generated within the last 2 years. According to IDC,
every 60 seconds there will be: 72 hours of video uploading to Youtube, 350 GBs of
data generated on Facebook, 571 new websites created, 277,000 tweets on Twitter,
100 million emails sent, and over 2 million Google search queries happening.
Either a Youtube broadcaster showcasing a live game streaming or a seismic
professor analyzing the earthquakes’ data requires fast and stable process, e.g.
consistent and low latency on IO requests. To fulfill such requirement, the server
systems in charge of these processes must enhance the capability of both computing
cores and storage devices. The traditional HDD is becoming a bottleneck of
performance due to its extreme high latency. A PCIe SSD with the lowest latency
second to DDR is the best option to avoid such IO performance bottleneck.
To be ready for the era of High Speed Computing, such as Cloud service, Big Data
Analysis, Online Transaction Process, High Frequency Financial Trading, the storage
devices must be evolving simultaneously with the computing processors, to avoid
becoming an obstacle of the whole performance. As a result, Techman SSD has
decided to focus on Enterprise grade storage design and development. Based on PCI
Express Gen3X4, Techman XC100 further extends on NVM Express (NVMe) protocol
supporting higher volume of command sequences. With our optimized designs,
Techman XC100 will guarantee high speed processing with very stable response time
within a long operation period.
Below chapters describe what technologies Techman XC100 has designed and
delivered to achieve the highest speed performance with great consistency. Please
enjoy it.
TECHMAN Electronics
4
Confidential
PART 1: HIGH PERFORMANCE HARDWARE
Multi-core Computing The evolution of SSD controller is very much similar to that of the whole
computer industry. The more processors a SSD supports, the higher performance it
delivers. To keep up with the system’s increasing performance, SSD must numerically
increase the processors/cores accordingly.
The controller adopted by XC100 supports 16-cores architecture. With these 16
cores, the commands/threads from host system can be processed in parallel with
high speed. Among all these 16 cores, certain cores will have own dedicated
managers, e.g., Boot Processor Core with ROM manager function, to handle
functions specifically. Also all the threads will be processed via Inter-Process
Communication (IPC) which transfers with high speed and allows information sharing.
Furthermore, the SRAM and DRAM inside XC100 are also shared with all these 16
cores & threads. This will release more resources of the cache memory and CPU.
Together with the 16-cores architecture, IPC, dedicated manager functions, and
the shared cache design, multiple requests and commands from host system will be
handled fast and efficiently to achieve high performance requirement.
Multi-channel Flash Controller For a SSD, the more flash channels it controls, the higher performance it delivers.
XC100 supports up to 16 channels of NAND Flash control.
XC100 controller will utilize all these 16 channels simultaneously during
operations. All the Read/Program/Erase commands and data from host system will
TECHMAN Electronics
5
Confidential
be coordinated and distributed evenly, through XC100’s Flash Interface, to all these
16 channels. With Multi-channel Flash coordination & distribution, the Quality of
Service (QoS) will be guaranteed.
Multi-queue Engines Today, deploying multiple high performance processors is a basic requirement
not only to a server system but also to a personal computer. Also, thanks to the
development of NVM Express (NVMe), the interface protocol now also can afford
much, much more command queues sending from host to storage device than before.
A storage device seemed to become a bottleneck of the overall performance.
To fulfill the rapid & tons of IOs from the host, a storage device must be capable
of handling these IOs with maximum speed. XC100 has already adopted a 16-cores
controller and also supported NVMe protocol. In between controller and protocol,
XC100 further designed the Multi-queue Engines to process these high speed &
frequent IOs.
The Multi-Queue Engines of XC100 include 1 Admin Queue, 128 Submission
Queues, and 128 Completion Queues. And each Queue supports up to 1024 queue
entries (1024 queue depth).
IOs from multiple cores of host system will be first latched in Submission Queue
Engine and then distributed to XC100’s multi-core controller for corresponding
process. Once completed, the controller will submit a completion queue entry to
Completion Queue Engine and a notification interrupt (doorbell) to host. Finally, the
host will process the completion queue entry in Completion Queue Engine and
release its resource. Simply put, these frequent IOs issued by multiple cores of host,
TECHMAN Electronics
6
Confidential
going through NVMe protocol, will be handled rapidly and efficiently with the
Multi-queue engine of XC100.
Embedded XOR & Randomizer One of the interesting characteristics of NAND is that continuously storing some
identical data pattern into the NAND flash will impact the data integrity and accuracy.
To avoid such symptom, the Randomizer schemes with encryption can help.
Well-randomized data pattern by randomizer stored in NAND Flash will reduce the
data error during reading back. In addition, XC100 supports XOR Calculator, and XOR
Engine to have flash-aware RAID functionality to provide extra protection to increase
the data integrity. The XOR Calculator will calculate the parity information for Flash
RAID stripe, and the XOR Engine will deliver high performance Flash RAID rebuild
operations.
TECHMAN Electronics
7
Confidential
Strong BCH ECC Regarding the error correction capability on BIT/BYTE level, XC100 has adopted
the Bose-Chaudhuri-Hocquenghem (BCH) ECC scheme. This function supports the
error correction capability up to 100 bits within 4320 Bytes of data.
With such capability, XC100 can easily fulfill: (1) the 40bits/1000Bytes ECC
requirement of TOSHIBA 15nm MLC adopted in XC100; (2) the UBER ≤ 10−16
requirement in JEDEC Enterprise SSD specification.
TECHMAN Electronics
8
Confidential
PART 2: ADVANCED NAND FLASH MANAGEMENT
Bad Block Management
There are always some unhealthy cells in NAND flash memory, in nature or in
nurture. These unhealthy cells usually are called “Bad Blocks”. Bad Blocks are no
longer suitable to store any data. To avoid this, SSD must always monitor and record
the healthiness of all Blocks, from the beginning until its life-end.
There are 2 types of Bad Blocks: Original Bad Blocks (OBB) and Growth Bad Blocks
(GBB). OBB are those who existed after SSD manufacturing process while GBB usually
refer to those who are generated during SSD runtime operations.
Process and functions are built in XC100 to well manage Bad Blocks. During SSD
manufacturing, the BURN-IN process of XC100 will locate the Bad Blocks by scanning
all cells in NAND. Along with those generated during Wafer & Package process of
NAND vendor, all these OBB, before SSD shipping out, will be marked to avoid further
usage by customers.
Once XC100 started runtime operating on customers’ side, it will activate the
real-time monitor function to mark and record those who encounter: (1) Block Erase
Failures; (2) Page Program Failures. These 2 types of blocks will be categorized as
GBB.
Via such Bad Block Management, XC100 will mark and record all the possible Bad
Blocks to assure the healthiness of the whole SSD during its life span.
Read Disturb Policy One of the most interesting characteristics of NAND flash memory is the Read
Disturb Phenomenon. The cell’s electrons adjacent to the cell BEING READ will be
TECHMAN Electronics
9
Confidential
influenced, resulting into data loss in the adjacent cell. This is the so called “Read
Disturb”. For example, when Reading cell B, NAND circuits will also apply 5V to its
adjacent cell A & cell C. After reading cell B over 10,000 times or more, the data in
cell A or cell C might not be read out any more.
To avoid such phenomenon, XC100 will: (1) Monitor and record Read counts
information of each block; (2) Detect error bits of the Read block; (3) Refresh block
with Garbage Collection function (GC) based on information of (1) & (2). With these
operations, XC100 will no longer encounter Read Disturb phenomenon.
Data Retention Policy
Retention phenomenon is
another interesting characteristic
of NAND flash. Under some
conditions and circumstances, e.g.,
high temperature environment and
power-off, the data inside the
NAND might disappear after a
period of time.
The root cause is that there will
be charge leakage from floating gate every time after page program. And the
Program/Erase cycles (P/E cycles) also will influence the retention time. When the
P/E cycles of certain cells reached its limitation, e.g., life end, usually specified in
NAND spec., the data retention time of SSD must fulfill the requirement defined in
JEDEC: For client grade MLC, 30℃ for 1 year; for enterprise grade MLC, 40℃ for 3
months.
TECHMAN Electronics
10
Confidential
So how to make sure SSD data retention time fulfill JEDEC requirements? In
XC100, we will: (1) Monitor retention period for each block; (2) Detect bit error rate
of the Read block; (3) Refresh block with GC function based on information of (1) &
(2). Via these operations, XC100 assures that the data retention will meet JEDEC’s
specification.
Smart Read Retry Policy The NAND flash cell quality will become worse due to all operations, e.g. P/E
cycling, Read/Write disturbance, Retention, and Temperature. Also the cell voltage
distribution will shift, which means the Read threshold voltage (Vth) might require
adjustment to determine if it’s 0 or 1. Under such circumstances, the Read process
might require some retries to complete.
However, as long as Read Retry occurs, it will impact SSD’s performance. So, it’s
an important task for SSD designers to minimize the retry impact. XC100 supports
Smart Read Retry scheme to protect data integrity. Our scheme includes:
(a) Apply fast and adjustable Vth setting to read back data even with error bits.
(b) Apply previous Vth as an Optimal value to reduce the retry time and latency.
(c) Refresh data block once error bits exceeding preset limits.
(d) Flash-level RAID function is the last resort of data recovery.
Via this scheme, the Read Retry will not influence the overall performance of
XC100.
TECHMAN Electronics
11
Confidential
PART 3: DATA INTEGRITY GUARANTEE
End-to-End Data Protection Data integrity is extremely important for both service providers and service users.
To protect the data in storage device, XC100 supports End-to-End Data Protection to
maintain data accuracy and integrity. The End-to-End Data Protection function
adopted by XC100 includes:
(a) Support protections similar to T10-DIF/DIX specifications
(b) Support XTS-AES-256 data encryption
(c) XOR data protection on DRAM. The DRAM bus contains 64 bits of data and 8 bits
of data ECC
(d) Support BCH ECC with 4176 bytes of data and 200 bytes of parity data
(e) Support Flash-based RAID protection
With such End-to-End Data Protection, the data on every path within SSD will be
integrity-guaranteed.
Adaptive RAID Protection End-to-End Data Protection support BIT/BYTE level data protection. For
Page/Block protection, XC100 adopted the Adaptive RAID Protection.
This protection is similar to RAID-5 protection on device level, except that it is
operating over all flash channels by XC100. The concept is to store parity information
in 1 randomly-selected page among all n+1 pages. Since the parity page is distributed,
the protection of RAID-5 scheme is similarly activated.
Furthermore, the RAID stripe size is not fixed. XC100 will dynamically adjust the
stripe size once Bad Block symptom occurred. However if the quantities of Bad Block
TECHMAN Electronics
12
Confidential
are over 8, XC100 will mark the corresponding stripe as a bad one and activate the
refresh function accordingly.
Thermal Throttle Protection All electronic devices generate heat, so does a SSD. With high performance
request, the temperature of a PCIe/NVMe SSD operating at full speed performance
will certainly ramp up very rapidly. Thanks to the air-flow design, such phenomenon
seldom occurs in a well-ventilated system. However a good designer must hope for
the best and prepare for the worst. Therefore, XC100 adopted the Thermal Throttle
Protection to prevent any possible thermal damages to the SSD device.
There are 3 preset temperature thresholds in XC100 design. When the embedded
thermal sensor exceeds each of these thresholds, XC100 will throttle the data
transfer rate at its corresponding level to decrease the heat generation. Once the
internal mercury goes below the threshold, the limitation will be dismissed.
TECHMAN Electronics
13
Confidential
Power Loss Protection The data integrity in an Enterprise system is critically important even when
encountering the unexpected power shutdown. A system or a storage device must
guarantee the data integrity by all means.
XC100 has designed a function, Power Loss Protection (PLP), to avoid data loss
when ungraceful power shutdown occurs. With PLP, XC100 can operate normally
with a limited period of time without original power source. The concepts are
depicted in figure below.
(1) In normal mode (green path), XC100 operates on normal power source while the
PLP capacitors will be fully charged as a backup power source.
(2) In abnormal mode (red dotted path), XC100 will open the switch (SW), activating
the previously fully charged PLP capacitors as the backup power source, to keep
XC100 operating normally for a short while.
The backup power must supply long enough for a SSD to flush all important data
into NAND flash. Thus, the PLP function must be well-designed and optimized with all
other SSD functions, such as FTL, WL, & GC, etc., to prevent data loss.
Metadata & Firmware Protection Metadata mainly includes (I) FTL table info (II) Wear Leveling info (III)
Write/Read/Erase counts info of every block (IV) Bad and Free blocks info (V)
Firmware info. This means, other than user data, metadata contains lots of extremely
important information. To protect metadata and firmware, XC100 adopts 2 schemes:
(1) Pseudo SLC mode (2) Multi-copy backup.
(1) Pseudo SLC (pSLC) is a transformation from MLC to SLC. As we all know, the SLC
NAND has much better endurance (P/E cycle ≈ 60,000) and faster processing time
TECHMAN Electronics
14
Confidential
than MLC NAND (P/E cycle ≈ 3,000). By configuring some MLC blocks into pSLC
mode, XC100 will extend the P/E cycles of these pSLC blocks to 30,000. Thus, the
Metadata and Firmware in pSLC blocks will be much more intact, and also much
faster to process.
(2) Multi-copy concept is depicted as below figure. By distributing Metadata into
different LUNs & different Blocks, XC100 can still operate normally even if errors
occurred in some copies of Metadata.
As for Firmware Protection, since the NVMe 1.1 protocol has defined the SLOTs
concept for storing firmware image, XC100 has accordingly adopted this multi-slots
scheme to store the firmware images. There can be 3 versions of firmware images
stored in XC100 and each version will have another 3 backup copies (below figure).
Furthermore, all these firmware images are distributed into LUNs which is similar to
the concept of Metadata Protection. Via such protection, the firmware is intact even
encountering ungraceful power-off.
TECHMAN Electronics
15
Confidential
PART 4: INTELLIGENT FIRMWARE MANAGEMENT
High Performance FTL Under some operations, such as Garbage Collection, SSD will move the valid user
data from the going-to-be-erased block to other locations without notifying users.
This means that the Physical Block Address (PBA) of these valid data have been
changed while its Logical Block Address (LBA) remained the same. Such operations
require the Flash Translation Layer (FTL) function which monitors and records the
mapping relations between LBA & PBA. Moreover, due to the frequent IO commands
from the host to SSD, the FTL will be updated rapidly & frequently. This means that
the performance of the FTL will heavily influence the overall SSD performance.
XC100 has designed and support an optimized and high speed Flash Translation
Layer (FTL) scheme. The scheme includes features:
(1) High speed direct mapping between LBA & PBA;
(2) 4K data based mapping, which is the most utilized file size of various OS;
(3) Put FTL in S/DRAM for fast & frequent update operations;
(4) Optimized with WL & GC for better endurance & lower latency;
(5) Periodical-saving Snapshot algorithm for balancing system performance and
faster rebuild time;
Via these intelligent designs and
detailed verifications, XC100’s data
mapping function, FTL, will be
operating not only with high speed
but also with consistency.
Global Wear-Leveling Regarding the unique behaviors of NAND flash: Programming is Page-based;
Erasing is Block-based; A block consists of many Pages; Erase must be prior to
Program; P/E cycle has limitations; Read will induce Disturbance; Data has Retention
limits, etc. Almost every process applied to NAND cell will impact its life span. To
avoid unevenly usage of NAND cells, the “Wear Leveling (WL)” must be adopted and
carefully-designed.
Wear Leveling is the function that will calculate the P/E counts of all blocks and
move user data from block to block to assure that all blocks are evenly used, e.g.
TECHMAN Electronics
16
Confidential
with similar P/E cycles. Needless to say, such actions will involve FTL & GC mentioned
previously. Again, WL, GC, & FTL functions must all be well-designed and
together-optimized to avoid impacting SSD performance.
Data from host can be roughly separated into 2 categories: Hot Data & Cold Data.
Hot Data are very frequently updated while Cold Data might not be updated for a
very long time. Based on these 2 categories, there are as well 2 corresponding types
of WL in XC100:
(1) Dynamic WL: Mainly applied on Hot Data. XC100 will dynamically prioritize those
blocks with minimum P/E counts to store Hot Data. Via Global FTL, the original
PBA of the Hot Data will be marked as invalid, waiting for Garbage Collection (GC)
function to collect, erase, and release it.
(2) Static WL: Mainly applied on Cold Data. As previously mentioned, XC100 will
monitor the P/E counts of all blocks. Once the P/E counts of Cold Data block are
the minimum, XC100 will activate Static WL, forcing this Cold Data to move to
other area and release it.
XC100’s outstanding endurance specifications (3 and 7 DWPD) simply indicate the
well-design of WL scheme and its optimization via GC and FTL.
Efficient Garbage Collection There are many versions of Data stored in NAND flash. However there will be
ONLY ONE up-to-date version while others are out-of-dated. These out-of-date data
are usually referred to invalid data. Garbage Collection (GC) is to retrieve these
invalid data and release free space for further usage. However, too frequent GC
operations will increase the overhead and P/E cycles, impacting the overall
performance and endurance. Also, GC is always operating in the background. If GC is
not well-designed, along with FTL & WL, the host commands & device responses will
be severely influenced.
GC operations of XC100 include:
(1) Select GC target block;
(2) Acquire all information of valid/invalid pages of GC blocks;
(3) Select free blocks for GC destination;
(4) Copy valid data to destination block, leaving only invalid data on GC target block;
(5) Erase GC target block to release it free.
With unique and smart selection algorithm of GC blocks, GC scheme of XC100 has
been optimized efficiently with FTL & WL to reach the highest performance.
TECHMAN Electronics
17
Confidential
Fast Power-On Rebuild Comparing to HDD, SSD is much faster not only on operating but also on
rebooting. It’s because the native process speed of NAND is much higher than HDD.
However, there are still some topics a good SSD design must cover, e.g. the FTL
rebuild speed during power-on.
When power-on (rebooting), SSD must first acquire and rebuild all information of
FTL then upload it onto DRAM for the upcoming process. When shutting down power
gracefully, the system will wait for SSD until it flushes the complete FTL on DRAM
back to NAND. With complete FTL in NAND, power-on process will be very fast. But if
encountering abnormal power-off, the FTL on DRAM usually will be lost immediately.
Thus, SSD need to scan every page in every block to rebuild the complete FTL during
power-on. Such “scan-everything” process will take much longer time than normal
case.
In XC100, the Snapshot scheme with some special designs was adopted to avoid
long power-on duration.
When normal operations:
(1) Snapshot function will periodically save and update the FTL data back to NAND;
(2) Optimize update frequency of Snapshot to avoid performance impact;
(3) Store Snapshot data in pSLC area for better endurance & faster access;
When power-on:
(4) Retrieve information from the latest Snapshot data first;
(5) Scan the non-updated data, if any, to retrieve the rest information;
(6) Rebuild the whole FTL with (4) & (5).
With such Snapshot scheme, XC100 assures that the power-on rebuild is fast for
both normal and abnormal power-off.
TRIM Command Support Unlike HDD which allows overwritten process, SSD must first Erase the flash cell
before Programming data into it. In this case, SSD must activate WL to determine
which block contains the most invalid data and use it as the data target block.
Furthermore, the host system usually will not reveal to SSD which data (LBA) is no
TECHMAN Electronics
18
Confidential
longer valid. As a result, the more invalid data, the less free NAND space. To release
free NAND space, GC will be activated. Once activating GC, the performance will start
decreasing gradually.
The TRIM command is to get rid of such inconveniences. The host can issue TRIM
to SSD, indicating which data (LBA) is no longer valid. Then SSD can activate GC,
operating in the background, to collect and erase these invalid data to release more
space. Thus the SSD performance will be sustaining at a certain level instead of
continuous decreasing.
Intelligent Write Data Flow Control
XC100 has designed an intelligent scheme each for Read & Write data flow
management. For Write flow management, by treating GC data and host data both as
input, XC100 will adaptively control and adjust the balance of these 2 data to keep
XC100 performance consistent while maintaining the sufficient free blocks.
Intelligent Read Sequence Control
For Read flow management, XC100 adopted the Re-scheduler function to
rearrange the command sequences in order to simultaneously utilize as many
channels as possible. The advantages of such function are: (1) Read commands will
not be jammed on flash channels; (2) Read latency will be much lower with the
Pending Queue mechanism of Re-scheduler function.
TECHMAN Electronics
19
Confidential
PART 5: DUAL PORT SUPPORT
The PCI express bus can support 2 lanes, 4 lanes, 8 lanes, and 16 lanes. The more
lanes it support, the more data it allows. Why would a PCIe storage device with 4
lanes decide to downgrade itself to 2 lanes? The answer is for High Availability.
High Availability is to ensure a certain degree of operational continuity during a
given measurement period. HA will avoid Single Point of Failure (SPOF). A system
with HA will provide: (1) A certain amount of uptime; (2) Access of critical functions
of the system; (3) Redundancy.
For example, there is only one server system with one XC100 providing service
to customers. If failure occurs with this server, its service must be shut down for
further repair. Although the data itself is intact, customers still have to wait until
finished repairing. This is known as SPOF. If connecting 2 server systems to one single
XC100, when one of these two paths has failure, the other one will take over the
failed one’s jobs with the same data storage, e.g. XC100, to continue working for a
certain period of time. As a result, service users will not encounter any downtime
phenomenon while the system maintainer could repair the failed one in the
meantime.
Currently Techman design and validation teams are working together with our
server partners on such Dual Port feature evaluations. Estimated in early June,
Techman SSD will introduce our Dual-port-featured series, XC200, to the markets.
Top Related