Memory Management Dave Eckhardt de0u@andrew.cmu.edu.

Memory Management

Dave Eckhardtde0u@andrew.cmu.edu

Synchronization

● “Pop Quiz”– What does “ld” do?

● Outline– ~ Chapter 9 (with occasional disagreement)

● Section 9.6 is particularly useful for P3

– Also read Chapter 10

Who emits addresses?

● Program counter (%cs:%eip): code area– Straight-line code– Loops, conditionals– Procedure calls

● Stack pointer (%ss:%esp, %ss:%ebp): stack area● Registers: data/bss/heap

– %ds:%eax– %es:%esi

Initialized how?

● Program counter– Set to “entry point” by OS program loader

● Stack pointer– Set to “top of stack” by OS program loader

● Registers– Code segment (“immediate” constants)– Data/BSS/heap– Computed from other values

Birth of an Address

int k = 3;int foo(void) {

return (k);}

int a = 0;int b = 12;int bar (void) {

return (a + b);

...retleavemovl _k,%eaxcode 0

12 3data 4096

0bss 8192

Image File vs. Memory Image

12 3data 4096

0bss 8192

12 3data 4096

header

Executable File Header

● Defines how image file gets loaded into memory– Sections

● Type● Memory address

● Common flavors– ELF – Executable and Linking Format (Linux)

● DWARF - Debugging With Attribute Record Format

– Mach-O – Mach Object (MacOS 10)– a.out - assembler output (primeval Unix format)

a.out File Header

● Key fields– File type (“magic”)– Text size– Data size– BSS size– Entry point

● Implicit, system-dependent: base address of text– NetBSD/i386: 0x1000 aka 4096

Other Executable File Contents

● Symbol table– Maps _main 0x1038

● Debugging information– Maps 0x1038 (file = test.c, line = 10)

● Relocation information– (below)

● Not necessary to run program– Removed via “strip” command

Multi-file Programs?

● “Link editor” combines into one image file– Unix “link editor” called “ld”

● Two jobs– Re-allocate each file's address space– Fill in references to symbols in other files

Every .o uses same address space

Combining .o's Changes Addresses

Linker uses relocation information

● Field– address, bit field size

● Field type– relative, absolute

● Field reference– symbol name

● Example– bytes 1024..1027 = absolute address of _main

Static Linking

● Must link a program before running– User program– Necessary library routines

● Duplication on disk– Every program uses printf()!

● Duplication in memory● Hard to patch every printf() if there's a bug

Dynamic Linking

● Defer “final link” as much as possible– The instant before execution

● Program startup invokes “shared object loader”– Locates library files– Adds into address space– Links, often incrementally

● Self-modifying “stub” routines

“Shared libraries”

● Extension/optimization of dynamic linking● Basic idea

– Why have N copies of printf() in memory?– Allow processes to share memory pages

● “Intelligent” mmap()

– Must avoid address-map conflicts● Can issue each library an address range● Can build libraries from position-independent code

Swapping

● Multiple user processes– Sum of memory demands exceeds system memory– Don't want say “no” too early

– Allow each process 100% of system memory

● Take turns– Temporarily evict process(es) to disk

● Not runnable● Blocked on implicit I/O request (e.g., “swaprd”)

Swapping vs. CPU Scheduling

● Textbook claims– Dispatcher notices swapped-out process

● Just before resuming execution!● Implication: huge stalls

● Common: two-level scheduling process– CPU scheduler schedules in-core processes– Swapper decides when to evict/reinstate

● [Cannot swap a process with pending DMA]

Contiguous Memory Allocation

● Goal: share system memory among processes

● Approach: concatenate in memory

Process 3

Process 4

Process 1

OS Kernel

Process 2

Logical vs. Physical Addresses

● Logical address– According to programmer, compiler, linker

● Physical address– Where your program ends up in memory– They can't all be loaded at 0x1000!

● How to reconcile?– Use linker to relocate “one last time”? - very slow– Use hardware!

Contiguous Memory Allocation

● Goal: share system memory among processes

● Approach: concatenate in memory

● Need two new CPU control registers– Memory base– Memory limit

Process 3

Process 4

Process 1

OS Kernel

Process 2

Mapping & Protecting Regions

● Program uses logical addresses 0..8192

● Memory Management Unit (MMU) maps to physical addresses

If L < limit P = base + L;

Else ERROR

Process 3

Process 4

Process 1

OS Kernel

819211009292

Allocating Regions

● Swapping out creates “holes”

● Swapping in creates smaller holes

● Various policies– First fit– Best fit– Worst fit

Process 3

Process 4

Process 1

OS Kernel

Process 2

Fragmentation

● External fragmentation– Scattered holes can't be

combined● Without costly

“compaction” step

– Some memory is unusable

Process 4

Process 1

OS Kernel

Process 2

Fragmentation

● Internal fragmentation– Allocators often round

up● 8K boundary (some

power of 2!)

– Some memory is wasted inside each segment

Process 3

Process 4

Process 1

OS Kernel

11009292

Paging

● Solve two problems– External memory fragmentation– Long delay to swap a whole process

● Divide memory more finely– Page = small logical memory region (4K)– Frame = small physical memory region– [I will get this wrong, feel free to correct me]

● Any page can map to any frame

Paging – Address Mapping

Logical Address

Page Offset

....f29f34....

Frame Offset

Page tablePhysical Address

Paging – Address Mapping

● User view– Memory is a linear array

● OS view– Each process requires N frames

● Fragmentation?– Zero external fragmentation– Internal fragmentation: maybe average ½ page

Bookkeeping

● One page table for each process● One frame table

– Manage free pages– Remember who owns a page

● Context switch– Must “activate” process's page table

● Simple on x86, still slow

Hardware Techniques

● Small number of pages?– “Page table” can be a few registers

● Typical case– Large page tables, live in memory

● Where? Processor has “Page Table Base Register”

Double trouble?

● Program requests memory access● Processor makes two memory accesses!

– Split address into page number, intra-page offset– Add to page table base register– Fetch page table entry (PTE) from memory– Add frame address, intra-page offset– Fetch data from memory

Translation Lookaside Buffer (TLB)

● Problem– Cannot afford double memory latency

● Observation - “locality of reference”– Program accesses “nearby” memory

● Solution– Cache virtual-to-physical mappings

● Small, fast on-chip memory● Don't forget context switch!

Page Table Entry (PTE) mechanics

● PTE flags– Protection

● Read/Write/Execute bits

– Valid bit– Dirty bit

● Page Table Length Register (PTLR)– Programs don't use entire virtual space– On-chip register detects out-of-bounds reference

● Allows small PTs for small processes

Page Table Structure

● Problem– Assume 4 KByte pages, 4 Byte PTEs– Ratio: 1000:1

● 4 GByte virtual address (32 bits) -> 4 MByte page table● For each process!

● Solutions– Multi-level page table– Hashed page table– Inverted page table

Multi-level page table

P1 Offset

....f29f34f25

Frame Offset

PageTables

....f99f87....

P2 ....f08

f07....

PageDirectory

Multi-level page table

P1 Offset

....f29f34f25

Frame Offset

PageTables

....f99f87....

Hashing & Clustering

● Hashed Page Table– PT is “just” a hash table

● Bucket chain entries: virtual page #, frame #, next-pointer

– Useful for sparse PTs (64-bit addresses)● Clustering

– Hash table entry is a miniature PT● e.g., 16 PTEs● Entry can map 1..16 (aligned) pages

Inverted page table

● Problem– Page table size depends on virtual address space– N processes * large fixed size

● Observation– Physical memory (# frames) is a boot-time constant– No matter how many processes!

● Approach– One PTE per frame, maps (process #, page#) to index

Inverted Page Table

Logical Address

Page Offset

....#1 p29#3 p34....

Frame i Offset

Hash table Physical AddressPid #

Segmentation

● Physical memory is (mostly) linear● Is virtual memory linear?

– Typically a set of regions● “Module” = code region + data region● Region per stack● Heap region

● Why do regions matter?– Natural protection boundary– Natural sharing boundary

Segmentation: Mapping

Seg # Offset

Physical AddressLimit Base

Segmentation + Paging

● 80386 (does it all!)– Processor address directed to one of six segments

● CS: Code Segment, DS: Data Segment– CS register holds 16-bit selector

● 32-bit offset within a segment -- CS:EIP

– Table maps selector to segment descriptor– Offset fed to segment descriptor, generates linear

address– Linear address fed through segment's page table

● 2-level, of course

Is there another way?

● Could we have no page tables?● How would hardware map virtual to physical?

Software TLBs

● Reasoning– We need a TLB for performance reasons– OS defines each process's memory structure

● Which memory ranges, permissions

– Why impose a semantic middle-man?● Approach

– TLB miss generates special trap– OS quickly fills in correct v->p mapping

Software TLB features

● Mapping entries can be computed many ways– Imagine a system with one process memory size

● TLB miss becomes a matter of arithmetic

● Mapping entries can be locked in TLB– Great for real-time systems

● Further reading– http://yarchive.net/comp/software_tlb.html

Summary

● Processes emit virtual addresses– segment-based or linear

● A magic process maps virtual to physical● No, it's not magic

– Address validity verified– Permissions checked– Mapping may fail temporarily (trap handler)– Mapping results cached in TLB

Memory Management Dave Eckhardt de0u@andrew.cmu.edu.

Documents

Transcript of Memory Management Dave Eckhardt de0u@andrew.cmu.edu.

LEMAN AKOGLU - andrew.cmu.edu

1 Introduction to CS 438 Nathaniel Wesley Filardo nwf@cs.jhu.edu (Slides lovingly stolen from Dave Eckhardt )de0u@andrew.cmu.edu.

1 Transactions Dave Eckhardt de0u@andrew.cmu.edu.

JEREMY AVIGAD - andrew.cmu.edu

Memory Management Dave Eckhardt

Carnegie Mellon University ms35@andrew.cmu.edu University ...

Software - MITweb.mit.edu/sis07/www/forman.pdfSoftware Ashish *Arora, Chris Forman Carnegie Mellon University ashish@andrew.cmu.edu, cforman@andrew.cmu.edu Jiwoong Yoon University

Abstract arXiv:1707.09641v2 [cs.LG] 16 Aug 2017mmv/papers/17icmlw-sandeep.pdf · Lengerich , Sandeep Konam . fore, machine

1 Host naming Dave Eckhardt de0u@andrew.cmu.edu. 1 Synchronization ● P3 survey: shell? ● No class Monday – Faculty candidate talk – Click, From Extensible.

In-situ materials characterization for planetary exploration1 Eng Keong Chua echua@andrew.cmu.edu echua@andrew.cmu.edu 16722 20090427.

1 Security Overview Dave Eckhardt de0u@andrew.cmu.edu.

1 Deadlock (2) Dave Eckhardt de0u@andrew.cmu.edu.

Dave Eckhardt de0u@andrew.cmu - Carnegie Mellon University

1 Review Dave Eckhardt de0u@andrew.cmu.edu. 1 Synchronization ● Exam will be closed-book ● Who is reading comp.risks? ● Some homework questions on.qa.

STRENG - andrew.cmu.edu

1 Memory Hierarchy Dave Eckhardt

Don Slater dslater@andrew.cmu.edu Wanda Dann wpdann@andrew.cmu.edu Joint Operations Copyright 2012 Wanda Dann, Don Slater All rights reserved.

1 Review 2 Dave Eckhardt de0u@andrew.cmu.edu. 1 Synchronization ● Exam will be closed-book – But you may bring a 1-sided 8.5x11 sheet of notes ● 6 point.

1 File System (Internals) Dave Eckhardt de0u@andrew.cmu.edu.

1 Protection Dave Eckhardt de0u@andrew.cmu.edu. 1 Synchronization ● Please fill out P3/P4 registration form – We need to know whom to grade when ● Debugging.