Introduction to SoftwareAs Needed to Understand
Virtualization
Latency Comparison Numbers--------------------------L1 cache reference 0.5 nsBranch mispredict 5 nsL2 cache reference 7 ns Mutex lock/unlock 25 nsMain memory reference 100 ns Compress 1K bytes with Zip 3,000 ns Send 1K bytes over 1 Gbps network 10,000 ns Read 4K randomly from SSD* 150,000 ns ~1GB/sRead 1 MB sequentially from memory 250,000 nsRound trip within same datacenter 500,000 nsRead 1 MB sequentially from SSD* 1,000,000 ns ~1GB/sDisk seek 10,000,000 nsRead 1 MB sequentially from disk 20,000,000 nsSend packet CA->Netherlands->CA 150,000,000 ns
BUSRAM Disk
ControllerCPU Keyboard
Controller
Disk Keyboard
Network Controller
NIC
Processor Memory IO Devices
NU
MA
late
ncie
s???
?
Hardware (CPU, Memory, Disk, I.O.)
Operating System Kernel
Userland Applications
Hardware
SoftwareSystem Call Interface
What does this mean?A type of software sitting on another??
Operating System Kernel
Userland Applications (these are “running”)
System Call Interface
Scheduler
DriversFile system services
Memory management
CPU management
• All this means is that OS/Kernel has more direct access to lower layers (i.e., the hardware)
• This gives no real indication of what is happening temporally (in terms of time) or spatially (how things are laid out)…
Process 2
Process 3
Process 4
Process N
More Accurate Picture
Operating System Kernel
Process 1
System Call Interface
Scheduler
DriversFile system services
Memory management
CPU management
…
• Now, what does this mean????• I count N+1 programs running (N processes, 1 kernel). • Are they running at the same time? How?
MultiProgramming(Not MultiProcessing)
REVIEW: MultiProcessing enabled HARDWARE/ PHYSICAL parallelism– Multiple cores ran different programs at same time– Simultaneous multithreading extends multiprocessing (by
upto 2x) by making each core simulate 2 lcores (in hardware)
• But even before MultiProcessing was implemented in computers, an innovation already existed that allowed the running of multiple programs concurrently– How? – (At this point, please figure out the difference between the
meaning of the two purple words used above)
Magic with even 1 core/processor• Idea: Time division multiplex multiple
programs (aka processes) onto the single core.
Core/ProcessorMemory
Registers
PC
Program 1
Program 2
Program 2
Kernel
Program Counter (PC) has memory address of next instruction of currently running program in it. Registers in core AND the data portions of program in memory have the “state” of program.
Data
Code
Other Logic
(e.g. ALU)
MultiProgramming• Have many programs loaded in main memory– Call set of all such programs the “running” programs– They are “apparently running at same time”– Give these “running” programs small time slices of
execution time on the processor/core• Each such switch is called a “context switch”• Note: State of any program is roughly:
PC(inside core) + Registers(inside core) + Data(in mem) • Thus must save PC + Registers of a process when it are
context switched out, so that later resuming it from exact same state is possible(Kernel saves each process’s “context” within it’s memory)
Context Switches on Timeline
P1 P2 P1 P3 P3
tick
tick
tick
tick
P1
tick
P4 P1
exits
P2
tick
P3
P3
tick
sys c
all
TIME
Notice: Some system calls “block” a process and immediately causes a context switch. Can you imagine which type of system calls lead to this?
sys c
all (
bloc
k)
tick
MultiProgramming == Apparent Running of Multiple Programs
• In single processor (prehistoric) scenario, there is only ever 1 process running at any moment in time (i.e., in running state).
• But all the processes in ready state plus the single running state process are “sort of” “running” at the same time (i.e., concurrently) because of context switches of CPU state plus the simultaneous existence of their CODE/DATA segments in memory.
• Moreover, asynchronous behaviour of certain IO requests (interleaved with another processes’ CPU bound activity) almost allows for intermittent true parallelism (think hard about this…)
Interlude: Virtual Memory• Virtual memory is not about virtualization• It was introduced long before wide-scale
adoption of virtualization, and is more “required” and/or “intertwined” with MultiProgramming
Memory
Program 1
Program 2
Program 2
Kernel
One of many problems solved by virtual memory:COMPILED TIME vs. LOAD TIME PROBLEM
A program starts its life LONG before it’s actually “run” and “loaded” into memory/CPU. It starts its life when it is compiled from source code to create an executable binary.But how does compiler convert source into assembly language instructions (which have to reference specific locations in memory) if it has no idea where the program will actually be loaded in memory when it runs?
Virtual Memory• We compile programs as if they own the entire memory space (i.e.,
all the addresses), thus can start referring to memory addresses starting from address 0x0000 (instead of guessing where the “start” of processes’ memory region is)
• i.e., assembly/machine language instructions coded into executables assume that there is no kernel, no multiprogramming, and that the process will see entire memory space.
E.g., each process “thinks” this is what is loaded in memory (i.e., does not know nor does it have access to anything else in memory, and thinks it’s memory space starts at first address):
Program 1 Program 2 Program 3
Process 1’s thought of what’s in memory
Process 2’s thought of what’s in memory
Process 3’s thought of what’s in memory
0x0000 0x0000 0x0000
Address Translation• The assembly/machine instructions inside a process after it is
loaded refer to virtual addresses• Translation is done each time memory is accessed during the
execution of a process• Virtual addresses are position independent (can start at 0x0000)• Kernel maintains a “page table” for each process which is used
to convert virtual memory addresses to physical addresses:
Program 1
Process 1’s virtual memory
Memory
Program 1
Program 2
Program 2
Kernel
0x0000 0xF5ABAddress translation
Page Tables• In fact, now that we have this ADDRESS
TRANSLATION map, we can further:– Relax the need for virtual memory of a process to map
onto continuous regions of real memory– Not all virtual memory of a process has to map to
actual RAM• i.e, can map some parts of process virtual memory to disk
– Further, this translation map is constantly changing during execution of processes(e.g., moving less used regions of virtual memory to disk would require a modification of this map/table)
Reality
Region of disk that stores excess pages not currently residing in main memory is called SWAP space (or swap file(s)).
Notice actual physical memory (i.e., RAM) holds memory related to many programs (processes and kernel), but that’s transparent to any single process’s virtual memory space.
More Real Reality• This map having a granularity of “per memory
address” would lead to a map/table that has 2^64 entries on 64 bit systems ( big deal ? )
• Better to map “pages” (i.e., blocks) of memory:– By default, 4kBytes is the page size on Linux– How this works? 4K = 2^12. So only map the
highest (64-12) = 52 bits of virtual address.
On older 32 bit systems, top 20 bits used to map:
Extras: Terminology• A “page” fault refers to an address lookup/translation that lands
on swap space (i.e., in disk) instead of main memory– This triggers kernel to do several things, include loading page from
swap (called “swapping in”) into memory and updating the page table– Occasionally, this requires swapping out of a page that is residing in
memory • On some computing systems, a feature called HUGE PAGES
exists, where some pages have sizes in the MBs, GBs– This leads to smaller page tables (more likely to fit into TLBs..)
• Page tables exist in kernel memory, so it is inefficient to look in RAM on every address translation.Instead a special purpose cache is used to hold heavily used portions of a process’ page table (Translation Look aside Buffer aka TLB)
MultiProcessing + Simultaneous MultiThreading + MultiProgramming:
Logical Core
Logical Core
Logical Core
Logical Core
Socket / CPU
Core
Core
LCORE1
LCORE2
LCORE3
LCORE4
Discuss: How many processes can run at same time?
Now do we understand what this really means spatially/temporally?
Process 2
Process 3
Process 4
Process N
Operating System Kernel
Process 1
System Call Interface
Scheduler
DriversFile system services
Memory management
CPU management
…
Next Time• Another evolution of parallelism/concurrency via “software threads”i.e, software layer of computing system is more like this (another small modification in abstraction):
Thread 2 of
Process 1
Process 2 (Single thread)
Thread k-1 of
Process N
Thread k of
Process N
Operating System Kernel
Thread 1 of
Process 1
System Call Interface
Scheduler
Drivers File system services
Memory management
CPU management
…
• Jump into big revelation: How OS virtualization works…• Quick intro into KVM
Top Related