[IEEE Telecommunication Systems (MASCOTS) - Baltimore, MD, USA (2008.09.8-2008.09.10)] 2008 IEEE...

Using Non-Volatile RAM as a Write Buffer for NAND Flash Memory-basedStorage Devices*

Sungmin Park, Hoyoung Jung, Hyoki Shim, Sooyong Kangt Jaehyuk ChaDivision ofInformation and Communications, Hanyang Univ., Seoul, 133-791, Korea

{syrilo, horong, dahlia, sykang, chajh}@hanyangac.kr

Abstract schemes, the performance of the write buffer managementscheme for a hard disk depends on the following factors:

Recent development of next generation non-volatilememory types such as MRAM, FeRAM and PRAMprovide * Total number ofdestages to the haroddiskhigher commercial value to Non-Volatile RAM (NVRAM). As more write requests are accommodated by theIn this paper, we suggest the utilization ofsmall-sized, next- write buffer, less requests are issued to the hard disk.generation NVRAM as a write buffer to improve the over- By exploiting the temporal locality of the data accessallperformance ofNANDflash memory-based storage sys- pattern, we can decrease the number of destages to thetems. Wepropose a novel block-basedNVRAMwrite buffer hard disk.managementpolicy, CLC. Simulation results show that the Average access cost ofeach destageCLCpolicy outperforms the traditionalpolicies. Since read and write operations show a symmetric op-

eration speed, the access cost can be modeled as thesum of the seek time, rotational delay and transfer

1. Introduction time. By exploiting the spatial locality of the data ac-cess pattern, we can decrease the access cost.

Using non-volatile random access memory (NVRAM)as a write buffer for a slow storage device has long been Thoseto factors also make sense,when ngmNANan active research area. In particular, many algorithms that flash memoryihardds Firest,th inumanage NVRAM write buffers for hard disks have been of destages to flash memory should be decreased to in-proposed [ 1, 2,4]. However, they are not suitable to be used crease the overall performance. To decrease the number offorNAND flash memory-based storage systems since they destages to flash memory, the write buffer hit ratio shoulddo not consider not only the characteristics of the NAND be increased. Therefore, we can use traditional buffer man-flash memory, but also the behavior of the flash transla- agement schemes that exploits the temporal locality of thetion layer (FTL) that enables the use of flash memory-based data access pattern. However, the access cost factor makes

storage as an ordinary block device such as a hard disk. it necessary to devise a novel write buffer managementIn this paper, we propose a novel NVRAM write buffer scheme. While the access costs for data blocks that are

management policy forNAND flash memory-based storage stored in physically different locations in the hard disk vary,systems and show a performance gain that can be achieved the physical location of a data block in the flash memoryusing a NVRAM write buffer. does not affect the access time to the block. The spatial

locality is no longer important factor for flash memory.Therefore, instead of the seek time and rotational delay,

2. Write Buffer Management Policy another important factor should be considered to estimatethe access cost for flash memory: extra operations issued

2.1. Design Considerations by FTL.To decrease the number of extra operations, the write

Write buffer management schemes for hard disk have buffer management scheme is required to 1) decrease thebeen developed over the past decade. According to those number of merge operations by clustering pages in the

*This work was supported by grant No. RO1-2007-OOO-20649-O from same block and destaging them at the same time, 2) destagethe Basic Research Program of the KOSEF. pages such that the FTL may invoke switch or switch copy

tCorresponding author operations which show relatively low cost rather than the

978-1-4244-281 8-2/08/$25.00 (C2008 IEEE

Cluster List Pag e List Sizeindependent LRU cluster list

Block nume'L ~ 1

|IBlock ntiumiber jr f LRlutherPiereeArrayE o-F~~~~~~*E 2 3 14 16S 4

Block nume Page PagBlock numbe| Blocknumber Block mib rn Blocknumber

Figure 1. Cluster structure B n Block nube Bloc n Blo numberBlocnumbr Blck nmber

Block numiber Blo

numbersmart copy operation, which is very expensive, and 3) de- B- Ftect sequential page writes and destage those sequential Size-dependentLRIUcluseer listpages preferentially and simultaneously.

Figure 2. Data structure for CLC policy2.2. Cold and Largest Cluster (CLC) Policy selected from the size-dependent LRU cluster list. There-

fore, only a cold and large cluster is selected as a victim.In this section, we propose a novel write buffer manage- For sequential page writes detecting, when 64 consecutive

ment policy, named Cold and Largest Cluster (CLC) Pol- page writes in a page cluster occur, the CLC policy movesicy. We assumed that the block level address mapping is the cluster into the size-dependent LRU cluster lists of sizeused in FTL since the page level address mapping is not 64. Hence the cluster is preferentially selected as a victim.used widely in pratical situation. Hence, data movement The buffer space partitioning between the two kindsbetween NVRAM and flash memory is done according to of lists is determined based on the number of page clus-the block-mapping algorithm in FTL. Since the page size ters, not the physical size, using a partition parameter,in the large block NAND flash memory is 2 KB, we as- o(O < a < 1). If a = 0.1 then IO% of the total number ofsumed the page size in the NVRAM is also 2 KB. CLC page clusters in write buffer are maintained with the size-policy clusters pages in the NVRAM by the block number independent LRU cluster list and the remaining 9000 of thein the flash memory and those page clusters are maintained page clusters are maintained with the size-dependent LRUthrough a linked list. In each page cluster, pages with the cluster lists, regardless of the size of each page cluster.same block number in flash memory are residing through alinked list. The size of a cluster is defined as the numberof pages in the cluster, which varies from 1 to 64. Figure 1 3. Performance Evaluationshows the structure of the page cluster.

In the CLC policy, both the temporal locality and clus- In this section, we evaluate the performance of the CLCter size are considered simultaneously. The replacement policy. We used BAST and FAST for the underlying FTLunit of the CLC policy is also a page cluster and the page algorithm and a Samsung K9NBGO8U5A 32Gbit largecluster with the largest cluster size among the cold clusters block flash memory for the storage device.is selected as a victim. To accommodate both the tempo- Figure 3 compares the performance of the CLC policyral locality and cluster size, it maintains two kinds of clus- with traditional write buffer management policies. The ex-ter lists: 1) size-independent LRU cluster list and 2) size- tra overhead is used as the performance metric. Extra over-dependent LRU cluster list for each cluster size. head is the time overhead induced by the extra operations.

Figure 2 shows the data structure for CLC policy. When Extra operations occur while merge operation is performeda page cluster is initially generated, it is inserted into the and a merge operation consists of valid page copies andMRU position ofthe size-independent LRU cluster list and, erases. The Y-axis of the figure represents the normalizedwhenever the cluster is accessed, it moves to the MRU posi- extra overhead and the X-axis represents the write buffertion of the list. When the size-independent LRU cluster list size. The performance is compared with three traditionalis full and a new page cluster arrives, the page cluster in the schemes: LRU-P, LRU-C and LC. LRU-P and LRU-C areLRU position of the list is evicted from the list and inserted LRU-based policies. LRU-P is the pure LRU policy whichinto the size-dependent LRU cluster list with a correspond- does not use page clusters while LRU-C is the modifieding cluster size. If a page cluster residing in any of the LRU policy so that it use page clusters. LC (Largest Clus-size-dependent LRU cluster lists is accessed, the page cluster) selects the largest cluster in the buffer as a victim. It ister is moved to the MRU position of the size-independent originated from the DRAM buffer replacement policy forLRU cluster list. In this manner, hot clusters come together flash memory, FAB (Flash-Aware Buffer replacement) [3],in the size-independent LRU cluster list and cold clusters in which clusters buffer pages and selects the largest clusterthe size-dependent LRU cluster list. The victim cluster is as a victim.

0.9 ~~~~~~~~~~~~~~~~~~Validpage copy *Erase Vaipgecy*Ersm 0.9 ..... Valid page copy Erase0.0.8 ...0..8

.7 .....>....0.70.6~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~..0.5 1'11 .5...x x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~..~~~~~~~~' 0.4~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0.4~~~~~~.....

...

0.5MB ....1..... ..MB 2MB.4.B.8.B..6MB.0.5MB..MB 2MB4MB. 8MB.16MBm 0.6~~NRM ie VAMsz

(a). BAST:. numbe of ..log. blocks..16.(b) FAST: numbeof.log. blocks .16Figur 3..........Perormnc comparison:..Extraoverhead is..normalized suchthat the. ovehea when there..is no.N.AM .write buffe is. 1...

Wecan figur out the.. effect... of. page clusteing.bcorn afiratv.efet only...by.resevin part ofth.bferspcparing.the.pefo .ace of. theLR-Pan LRU-Cpoli.for..pur LRU. cluster.. lis.(iz-ideenen LRUclutecies. The overallperformance of the LRU-C policy is about list). We can see the effect of the pure LRU cluster list from~....

110..4. 0.. higher in.thBAST case. and.120.o4O...higher Figure3,.where the.CLCpolic outerfrm othrsinalin he AS case.... .....t.showsthat clu tein pages.... in ..the. ...same..... cases....erasureunit (i.e~~~~~~~~~~....block)..can.dcreas the. number.......of....valid.page copy anderase operations. Also, since page cluster- 4. Conclusion~~~~...

into sequential pagewrites, the effect of page clustering is In this paper, we proposed a novel NVRAM writebuffer~~~~~~~~~...larger when... using...BAST...for.th.T.agrth.hn.sn management poic, .LC which..are.basedon.ag.cusFAST, which shows better performance for random page tering. The proposed policy clusters pages belongingto~~~~~~~~~~~~~~...

wries.ha BAST.. ..Futhror, .heffec of. paecls.tesaeblc in.the.flshmemor so. that thepage clus-tering increases as the~~~~~~~...... write buffer siz inc ea es sincea........ ........ ...ter. atches....the. erasurunit of. flash em ory.. ..... It notonlcluster can. stay.in. the buffer..for.alonge time durin which.......exlot.te.epoa locality... but. also maxiize thenum

more pages can be gathered in the cluster. Hence, the per- ber of simultaneously destaged pages. Simulation results......formance gap between LRU-P and LRU-C increases as the have shown that the CLC policy outperforms traditional.buffe sIze1increass.ISin e"LRU- shows far.. better page""".hi.rti.ta the. LC policy,. the oveallI/ performance.of..page...level...LRU policy.(RU-P).b a .maximum of. 50.....the LRU-C policy isbetter than that of the LC policy. Ac-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~...victimpagestheReferences....... ..... .. ..... ...... .. ... ... ...... .... .. .. ...... ..

LC policyissmaller than that of the LRU-C policy and the~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~...number of..destaged..clusters.in.theLC.policy is.larger than [1] M.. Bakr.. Asami,B.Deprit J..Oustehout,and .. Seltzer..

thatin.theLRU-C policy. ....Writing..a..small-sized.cluster..may..Non-volatile.memory.fofast, relabl file systems...... Operat-.....invoke, withhigher probability than writing a large-sized ing Systems Review, 26: 10-22, Oct.1992.~~~~~~~~~~~~~~.. ... ...cluster, the..smart.copy.operation,..which requiresa .greate [2] B ....Gil .anD.. S.... ......Modha..WOW Wise .oderin for. writes....number ...of valid....page...copies ....and.erasures .....rather .than..other...combining ..spatial. ...and...temporal locality. in. non-volatile...caches..... .n..Proc.... ...of.. .the.. ..U SEN....IX.. .File. ...and.. Storage Technolo-....types of......... erg.opeatio (sitc or. sw itccopy). ....... ence,......gies (FAT)Dec..... .. ..........

frequent.wriing.of.smallsized clustes.makes the.verall.[3]..Jo,.J-U ..Kang, S ..-PakJ.-S. Kim, and. J.. Le.. FABperformanceof....the. LC...policy..woseeentantato the.... Flas-aw re.uffr .mnagmen polic for.portabe. mediLRU-P policy.. ..Thrfoe.cnidrn only th.ie.fte.lyrs.EErnatin.n.osme.lctois,5()pagecluster can.be..theworst choic for.victi. seletion.2006

Wealso found that not only the number of.destaed.custes.[4 J.-F ....Paris T.... ......R..Haining,.and.D. ... Long. ..A..stac model....in theCLC policy is~~~~~.. smaller.. than. that..in.the LRU-C..policy,based.....replaemen.poliy.foa .no-voltil cache.......n Proc.of......butalsothe average~~~~~~~~~~~~..size of..the.victim.cluter is..largerthan. th..EESmpsimonMasStrae.ytes,pge.27thatinLRU-C policy~~~~~~~~~~~~...The CLC.policcould harves thos 224.Ma..2000

[IEEE Telecommunication Systems (MASCOTS) - Baltimore, MD, USA (2008.09.8-2008.09.10)] 2008 IEEE...

Documents

Transcript of [IEEE Telecommunication Systems (MASCOTS) - Baltimore, MD, USA (2008.09.8-2008.09.10)] 2008 IEEE...