Introduction to ACPI Based Memory Hot-Plug · 2017-12-14 · •ACPI: Advanced Configuration and...
Transcript of Introduction to ACPI Based Memory Hot-Plug · 2017-12-14 · •ACPI: Advanced Configuration and...
Introduction to ACPI Based Memory Hot-Plug
Tang Chen<[email protected]>
Agenda
1. Why need Memory Hot-Plug2. ACPI & Memory Hot-Plug3. Memory hot-add4. Memory hot-remove5. Movable node6. System initialization7. Future work
2
System Board(in use)
System Board(in error)
Why need memory Hot-Plug
3
Load(light) Load(heavy)
System Board(in use)
System Board(idle)
System Board(idle)
2. Need power saving.
1. Need load balancing.
3. Need error handling.
Why need memory Hot-Plug
4
Dynamic Resource Configuration
Power saving.
Load balancing.
Load(balanced)
Hot-removeHot-remove
Load(balanced)
Error handling.
5
Why need memory Hot-Plug
Need System Board hot-plug
All devices support hot-plug
Need memoryhot-plug
Memory
System Board
Network card Hard disk
……
CPU
Agenda
1. Why need Memory Hot-Plug2. ACPI & Memory Hot-Plug3. Memory hot-add4. Memory hot-remove5. Movable node6. System initialization7. Future work
6
ACPI & Memory Hot-Plug• ACPI: Advanced Configuration and Power Interface
7
ACPI is an interface specification of Operating System-directed motherboard device configuration and Power Management.
-- ACPI Specification 5.0
Kernel (Software)
ACPI Driver
Hardware
Methods(dynamic)
ACPI BIOS (Firmware)
ACPI Tables(static)
ACPI Registers
Run Time Boot Time
OS layer framework. Event handling API
Static info used only at boot time. DSDT SRAT ……
Events driven model. Event registers Control registers ……SCI
(System Control Interrupt)
Dynamic methods used at run time. _EJ0 _STA ……
Kernel
Memory Hot-Plug Subsystem
ACPI & Memory Hot-Plug
8
ACPI Driver
Hardware
Methods(dynamic)
ACPI BIOS
ACPI Registers
Event info
Call event handlerCall API
ACPI Tables(static)
Generate SCI(System Control Interrupt)
Call ACPI Method
Hardware operation
Read ACPI Tables
Install event handler
• ACPI and Memory Hot-PlugRun time processBoot time process
Call device dependent code
Hot-Plug happens
Memory Device Driver
Kernel
Memory Hot-Plug Subsystem
ACPI & Memory Hot-Plug
9
ACPI Driver
Hardware
Methods(dynamic)
ACPI BIOS
ACPI Registers
ACPI Tables(static)
• Main jobs of Memory Hot-Plug
Memory Device Driver
Main Jobs Node data
Direct mapping
Virtual memory mapping
Page online & offline
Page migration
Event handler
ACPI & Memory Hot-Plug
Physical space
Userspace
……
Kernelspace
hole
virtual memory map (1TB)
direct mapping (64TB)
kernel text mapping
module mapping space
hole
holevmalloc/ioremap space
hole
block X
movable pages (used)
otherpage_structs
page_structs ofblock X
……
……
process (128TB)
10
……
• Things associated with Memory Hot-Plug
2. User processes’ pagetable.
1. Memory block to be hot-plugged.
3. Kernel direct mappingpagetable.
4. Virtual memory mappingpages.
5. Virtual memory mappingpagetable.2
1
5
3
4
Agenda
1. Why need Memory Hot-Plug2. ACPI & Memory Hot-Plug3. Memory hot-add4. Memory hot-remove5. Movable node6. System initialization7. Future work
11
12
Memory hot-add
endstart
section
pages
Physical memory range
block blockblock …… block……
section
pages
section
pages……
A section is 128MB by default. (32768 regular pages)
A block is 128MB by default. (contains only 1 section)
The block is the minimum unit of memory hot-plug.
• Minimum unit of Memory Hot-Plug
Memory hot-add• When adding
Physical space……
Kernel space
hole
virtual memory map
(1TB)
direct mapping
(64TB)
kernel text mapping
module mapping space
hole
holevmalloc/ioremap space
hole
……
13
NEW
NEW
page_structs
page_structs
block X
pages(invalid)
NEW
empty
1
2
1. Initialize direct mappingpagetable.
2. Allocate virtual memory mapping pages.
3
3. Initialize virtual memory mapping pagetable.
Memory hot-add• After adding
Physical space……
Kernel space
hole
virtual memory map (1TB)
direct mapping (64TB)
kernel text mapping
module mapping space
hole
holevmalloc/ioremap space
hole
……
14
page_structs
page_structs
block X
pages (offline)
page_structs ofblock X The newly added pages are
offline and not present.
echo online >/sys/devices/system/memory/memoryX/state
Memory hot-add• After onlining pages
Physical space……
Kernel space
hole
virtual memory map (1TB)
direct mapping (64TB)
kernel text mapping
module mapping space
hole
holevmalloc/ioremap space
hole
……
15
page_structs
page_structs
block X
pages (online)
page_structs ofblock X
User space
process n (128TB)
……
process 1 (128TB)
process 3 (128TB)
process i (128TB)
process 2 (128TB)
Memory hot-add
• Configuration– mm/Kconfig
config MEMORY_HOTPLUGbool "Allow for memory hot-add“depends on SPARSEMEM || X86_64_ACPI_NUMAdepends on HOTPLUG && ARCH_ENABLE_MEMORY_HOTPLUGdepends on (IA64 || X86 || PPC_BOOK3S_64 || SUPERH || S390)
config MEMORY_HOTPLUG_SPARSEdef_bool ydepends on SPARSEMEM && MEMORY_HOTPLUG
16
Agenda
1. Why need Memory Hot-Plug2. ACPI & Memory Hot-Plug3. Memory hot-add4. Memory hot-remove5. Movable node6. System initialization7. Future work
17
Memory hot-remove• Before removing
Physical space
block X
movable pages (used)
……
pages(free)
pages(free)
Kernel space
hole
virtual memory map (1TB)
direct mapping (64TB)
kernel text mapping
module mapping space
hole
holevmalloc/ioremap space
hole
process n (128TB)
…
page_structs
page_structs ofblock X
……
……
……process 1 (128TB)
process 3 (128TB)
process i (128TB)
process 2 (128TB)
User space18
…
Memory hot-remove• Preparation: Migrate & Offline pages
Physical space
process n (128TB)
…process 1 (128TB)
process 3 (128TB)
process i (128TB)
process 2 (128TB)
1
1
1. Unmap user pages.
2. Allocate new pages.
3. Copy data from old pages to new pages.
pages(used)
pages(used)
block X
……
page_structs
page_structs ofblock X
……
……
……
movable pages (used)
2
2
3
3
19User space
…
pages(used)
pages(used)
Memory hot-remove• Preparation: Migrate & Offline pages
Physical space
block X
……
page_structs
page_structs ofblock X
……
……
……
movable pages (isolated)
process n (128TB)
……
process 1 (128TB)
process 3 (128TB)
updating
updating
process i (128TB)
process 2 (128TB)4. Update user processes’pagetable.
5. Isolate old pages.
4
54
20User space
pages(used)
pages(used)
removing
Memory hot-remove• Real job: Remove pages
Physical space……
Kernel space
hole
virtual memory map (1TB)
direct mapping (64TB)
kernel text mapping
module mapping space
hole
holevmalloc/ioremap space
hole
page_structs
……
……
……
freeing
freeing
freeing
freeing
freeing
6
7
6. Free kernel direct mappingpagetable.
7. Free virtual memory mapping pages.8
8. Free virtual memory mapping pagetable.
21
removed
pages(used)
pages(used)
Memory hot-remove• After removing
Physical space……
Kernel space
hole
virtual memory map (1TB)
direct mapping (64TB)
kernel text mapping
module mapping space
hole
holevmalloc/ioremap space
hole
process n (128TB)
……
page_structs
……
……
……process 1 (128TB)
process 3 (128TB)
freed
freed
updated
updated
process i (128TB)
process 2 (128TB)
freed
freed
22
freed
User space
Memory hot-remove• Post work: automatically remove the node
ZONE_MOVABLE
node i
cpu
CPU in use
CPU idle
normal memory
movable memory
memory removedCPU removed
removedmemory
node i
cpu cpu cpu cpu
RemoveCPUs
removedmemory
cpu cpu cpu
Removememory
ZONE_MOVABLE
All CPUs, memoryare removed ?
NO
YES
NO
Free node associated data. wait_table ……
Node hot-remove
23
Memory hot-remove
• Configuration– mm/Kconfig
config MEMORY_HOTREMOVEbool "Allow for memory hot remove"select MEMORY_ISOLATIONselect HAVE_BOOTMEM_INFO_NODE if X86_64depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVEdepends on MIGRATION
24
Memory hot-remove• Kernel pages cannot be hot-removed
25
direct mapping(64TB)
Userspace
Kernelspace
user mapping
(128TB)
Physical space
User page
User page
User page
Kernel page
Kernel pageKernel page
Kernel page
variable
1. migrate
not migratableva = pa + offset(1-1 mapped)
User page
Kernel page
not hot-removable
hot-removable
2. hot-remove
Agenda
1. Why need Memory Hot-Plug2. ACPI & Memory Hot-Plug3. Memory hot-add4. Memory hot-remove5. Movable node6. System initialization7. Future work
26
Kernel allocates ZONE_NORMAL on each node evenly
Movable node• Problem
ZONE_DMA
ZONE_DMA32
ZONE_NORMAL
ZONE_MOVABLE
…
node 0
cpu cpu
cpu cpu
CPU in use
CPU idle
normal memory
movable memory
ZONE_NORMAL
ZONE_MOVABLE
node 1
cpu cpu
cpu cpu
node i
Each node has ZONE_NORMAL
No node is hot-removable
ZONE_NORMAL
ZONE_MOVABLE
cpu cpu
cpu cpu
Kernel may use
ZONE_NORMAL
27
Configure a node to have only
ZONE_MOVABLE
Movable node• Solution
…
node 0
cpu cpu
cpu cpu
ZONE_NORMAL
ZONE_MOVABLE
node 1
cpu cpu
cpu cpu
ZONE_MOVABLE
node i
cpu cpu
cpu cpu
The node is hot-removable
CPU in use
CPU idle movable memory
ZONE_DMA
ZONE_DMA32
ZONE_NORMAL
ZONE_MOVABLE
Movable node has no ZONE_NORMAL
Kernel can not use
ZONE_MOVABLE
normal memory
28
Accessing memory on other nodes is slow.
Movable node
ZONE_NORMAL
unmovablenode
cpu cpu
cpu cpu
ZONE_MOVABLE
movablenode
ZONE_NORMAL
ZONE_MOVABLE
unmovablenode
cpu cpu
cpu cpu
cpu cpu
cpu cpu
Performancedown!
Kernel cannotuse local memory.
CPU in use
CPU idle movable memory
normal memory
CPU in use by kernel
29
• Drawback
ZONE_NORMAL
ZONE_MOVABLE
Movable node• Dynamic configuration
offline memory
node i
cpu cpu
cpu cpu
memory offlineCPU offline
mem_section XXX
mem_section XXX+1
CPU in use
CPU idle movable memory
normal memory
node i
cpu cpu
cpu cpu
Online memory
1. online_kernel (NEW)Set to ZONE_NORMAL.
2. online_movable (NEW)Set to ZONE_MOVABLE.
3. online (Improved)Keep the original state.
echo COMMAND >/sys/devices/system/node/nodei/memoryXXX/state
ZONE_MOVABLE should always be after ZONE_NORMAL, never overlaps.
30
Processor Local APIC/SAPIC AffinityProcessor Local
APIC/SAPIC AffinityProcessor Local x2APIC Affinity
Mainly useful information
SRAT
Local x2APIC ID PXM (proximity domain) ……
Static information of NUMA architecture.
KernelACPI Driver
Hardware
Methods
ACPI BIOS
Tables
Registers
Memory AffinityMemory AffinityMemory Affinity
Processor Local APIC/SAPIC AffinityProcessor Local
APIC/SAPIC AffinityProcessor Local APIC/SAPIC Affinity
APIC ID or SAPIC ID/EID PXM (proximity domain) ……
Memory range PXM (proximity domain) Hotpluggable flag ……
Movable node• Static configuration
– SRAT: System Resource Affinity Table
31
Movable node• movablecore = acpi
normal memory
movable memory
…
ZONE_DMA
ZONE_DMA32
ZONE_NORMAL
node 0
ZONE_MOVABLE
node 1
ZONE_MOVABLE
node i
ZONE_NORMAL
node n
…
Node 0unhotpluggable
Node 1hotpluggable
Node nunhotpluggable
Node ihotpluggable
… …
• Use SRAT to arrange ZONE_MOVABLE.
• Only for memory hotplug users.(New)
SRAT memory affinities
32
Movable node• The old way (no performance lost)
– kernelcore / movablecore = nn {G|M|K} (Old)
normal memory
movable memory
…
• Allocate ZONE_MORMAL in each node evenly.
• For regular users.
ZONE_DMA
ZONE_DMA32
ZONE_NORMAL(same size)
ZONE_MOVABLE
node 0
ZONE_NORMAL(same size)
ZONE_MOVABLE
node 1
ZONE_NORMAL(same size)
ZONE_MOVABLE
node i
ZONE_NORMAL(same size)
ZONE_MOVABLE
node n
…
33
Movable node
• Configuration– mm/Kconfig
config MOVABLE_NODEboolean "Enable to assign a node which has only movable memory"depends on HAVE_MEMBLOCKdepends on NO_BOOTMEMdepends on X86_64depends on NUMAdefault n
34
Agenda
1. Why need Memory Hot-Plug2. ACPI & Memory Hot-Plug3. Memory hot-add4. Memory hot-remove5. Movable node6. System initialization7. Future work
35
System initialization• Problem at boot time
– Memblock may allocate hotpluggable memory for kernel.
Hotpluggablememory
Kernel data
Kernel data
ZONE_MOVABLE
Movable node
Not hot-removable36
2. Parse SRAT (too late).
Allocated by memblock
1. Memblock is ready.Boot time
3. Initialize ZONEs.
Hotpluggable memory ranges are unknown.
System initialization
DEFAULTHOTPLUGGABLE
37
• Solution1. Parse SRAT earlier, before memblock starts to work.2. Reserve hotpluggable memory with flags in memblock.
2. Parse SRAT earlier.
Boot time
No memoryallocation.
1. Memblock is ready. 3. Initialize ZONEs.
ZONE_MOVABLE
Movable node
Hot-removable
Hotpluggablememory Kernel data
node0 node1
Boot time
System initialization
lowmemory
…
memblock.memory[]
memblock.reserve[]
unhotpluggable memoryhotpluggable memory
memory1 memory2 memory4
node2memory5
node3memory3 memory6 memory7
Any node the kernel resides in is unotpluggable.(Not necessarily to be node 0)
1. Memblock is ready.
Memblock has not started to work. memblock.reserve[] is empty.
38
Boot time
node0 node1
System initialization
lowmemory
…
memblock.memory[]
memblock.reserve[]
memory1 memory2 memory4
node2memory5
node3memory3 memory6 memory7
…
DEFAULT
unhotpluggable memoryhotpluggable memory
2. Before parsing SRAT.
• Reserve kernel _data, _text, … , with flag MEMBLK_DEFAULT.
• No new memory allocation, so no hotpluggable memory could be used by the kernel.
39
Boot time
node0 node1
System initialization
lowmemory
…
memblock.memory[]
memblock.reserve[]
memory1 memory2 memory4
node2memory5
node3memory3 memory6 memory7
… HOTPLUGGABLE HOTPLUGGABLE
unhotpluggable memoryhotpluggable memory
3. Parsing SRAT.
Reserve hotpluggable memory with flag HOTPLUGGABLE.
40
DEFAULT
Boot time
node0 node1
System initialization
lowmemory
…
memblock.memory[]
memblock.reserve[]
memory1 memory2 memory4
node2memory5
node3memory3 memory6 memory7
… HOTPLUGGABLE HOTPLUGGABLE … … …
unhotpluggable memoryhotpluggable memory
4. After parsing SRAT, hotpluggable memory has been reserved.
41
DEFAULT DEFAULT
No hotpluggable memory used by kernel.
Boot time
System initialization
memblock.reserve[]
… HOTPLUGGABLE HOTPLUGGABLE … … …
unhotpluggable memoryhotpluggable memory
5. Memory initialization has been finished.
Free hotpluggable memory to buddy system. (NEW)
Buddy systemorder
0
1
MAX
……
……
……
……
42
DEFAULT DEFAULT
Agenda
1. Why need Memory Hot-Plug2. ACPI & Memory Hot-Plug3. Memory hot-add4. Memory hot-remove5. Movable node6. System initialization7. Future work
43
Future work
44
Thank you!Q&A
45
block X
pages(invalid)
block X
pages(invalid)
Memory hot-add• Before adding
Physical space……
Kernel space
hole
virtual memory map
(1TB)
direct mapping
(64TB)
kernel text mapping
module mapping space
hole
holevmalloc/ioremap space
hole
……
46
unmapped
unmapped
free pages
page_structs
page_structs
block X
pages(invalid)
empty
Blocks in the memory range are hot-added one by one.