Post on 16-Apr-2017
Why we need ext4?
Robin Dong
ext2 global layout
Image from: http://learn.akae.cn/media/ch29s02.html
ext2 global layout
Super Block (1 block)GDT (multi blocks)Block Bitmap (1 block)Inode Bitmap (1 block)Inode table (multi blocks)
Super-block and GDT are vital, therefore other groups will store their copies.If mkfs with sparse_super(default) not all groups have the copy of super block and GDT, only Group 0,1,3,5,7,32,52,72,33,53,73....have it.
ext2 global layout
There is a structure called Reserved GDT which is putted after GDT and before Block-bitmap, it is also a large file.It is used for resize feature which could expand the size of whole filesystem.
ext2 file layout
Image from: http://e2fsprogs.sourceforge.net/ext2intro.html
The ext2 directory layout is just like regular file, but the content of its data block is stored bystruct ext2_dir_entry
ext2 directory layout
Image from: http://www.pluto.it/files/journal/pj9811/e2fs.html
The length of ext2_dir_entry is obviously different, so when users try to find a file in directory, ext2 have to check filename one by one. (It can't use some algorithm like binary-search) If there is a large number of files in a directory, searching operation will be inefficent.
ext2 directory layout
ext2 directory remove
Image from: http://blog.csdn.net/anghlq/archive/2011/05/17/6427052.aspx
ext2 directory pack
e2fsck -D
Optimize directories in filesystem. This option causes e2fsck to try to optimize all directories, either by reindexing them if the filesystem supports directory indexing, or by sorting and compressing directories for smaller directories, or for filesystems using traditional linear directories.
Regular Symlink: link path is stored in data block
Fast Symlink: link path is stored in inode (if link path is smaller than 56 bytes)
ext2 symlink
ext2 symlink
Image from: http://www.pluto.it/files/journal/pj9811/e2fs.html
ext2 hard link
ext2 xattr
ext2 xattr
* +--------------------+
* | header |
* | entry 1 | |
* | entry 2 | | growing downwards
* | entry 3 | v
* | four null bytes |
* | . . . |
* | value 1 | ^
* | value 3 | | growing upwards
* | value 2 | |
* +--------------------+
ext2: badblock
e2fsck use program badblocks to detect bad blocks and mark these blocks as used in block bitmap.
If meta-data is in bad blocks,e2fsck will try to allocate new block for it.
enhane of ext3
Journalext3 could be looked like an ext2 filesystem with a journal file
dir_indexmore efficent directory-searching
ext3: journal
ext2 filesystem may corrupt after reboot from exception like power reset directly.
Journal will ensure filesystem consistent or recovery filesystem on system boot.
Journal modeWriteback
Ordered
Journal
ext3: dir_index
Compute hash value of ext3_dir_entry
Find dx_entry against hash value in root block by binary-search
Find ext3_dir_entry in leaf block one by one
ext3: dir_index
Advantage: dir_index could have no more than two level indexs , therefore finding a file in directory needs to read 3 blocks at most.Imaging an ext3 filesystem with 4K block size, a directory could contain about 5 million files (file name is 100 bytes)
Disadvantage: when add files to a directory, the b-tree will split, but after deleting files, the b-tree will not merge.A directory with a few files will occupy many blocks.
ext3 xattr
Put xattr into inode.
Less IO
mkfs.ext3 -I 256 /dev/sda
limits of ext2/ext3
Block SizeMax file sizeMax filesystem size
1KB16GB2TB
2KB256GB8TB
4KB2TB16TB
8KB (ppc arch)2TB32TB
Read data from the indirect block of a file will make extra IO
ext4
ext4 inherits all the features of ext2/ext3
Larger filesystemMax file size: 16TBMax filesystem size: 1EB(1048576TB)
ext4: meta_bg
Image from: http://www.ibm.com/developerworks/cn/linux/l-cn-filesrc5/
ext4: meta_bg
Group Descriptor size is 64 bytes
Imaging an ext4 filesystem with block_size = 1K1K/64 = 16 a meta group will contain 16 groups.The meta-GDT(1 block) will be put in Group 0, Group1, Group15Group 16, Group17, Group31Group 32, Group33, Group63...
ext4: flex_bg
ext4: flex_bg
Merge Block-Bitmap/Inode-Bitmap/Inode-table to Group 0
The position of Super-block and GDT follow the rule of sparse
Advantage: save the space of Group 1,Group 2,Group 3 (especially for the extent of ext4)
ext4: uninit_bg
mkfs.ext4 -O uninit_bg
Create a filesystem without initializing all of the block groups. This feature also enables checksums and highest-inode-used statistics in each blockgroup. This feature can speed up filesystem creation time noticeably (if lazy_itable_init is enabled), and can also reduce e2fsck time dramatically.
ext4: uninit_bg
When init block-group?lazy_itable_init run
ext4_new_inode ext4_read_block_bitmap
ext4: extent
Image from: http://www.ibm.com/developerworks/cn/linux/l-cn-filesrc5/
ext4: extent
An ext4_extent could point to 128MB continuious space.
Example: a 300G file in ext3 will occupied 300MB meta-data-blocks, but in ext4 it only occupuied 36KB
ext4: delay allocation
It consists of delaying block allocation until the data is going to be written to the disk
This improves performance and reduces fragmentation by improving block allocation decisions based on the actual file size
Q & AThanks!