Lecture 10 – Memory Operation and Performance

46
1 Lecture 10 – Memory Operation and Performance Caches – repeat some concepts •Virtual Memory (VM)

description

Lecture 10 – Memory Operation and Performance. Caches – repeat some concepts Virtual Memory (VM). Example of a matrix. int data[M][N]; for (i = 0 ; i < N; i++) { for (j = 0; j < M; j++) { sum += data[i[j]; } }. This is a MxN matrix. - PowerPoint PPT Presentation

Transcript of Lecture 10 – Memory Operation and Performance

Page 1: Lecture 10 –  Memory Operation  and Performance

1

Lecture 10 – Memory Operation

and Performance

•Caches – repeat some concepts

•Virtual Memory (VM)

Page 2: Lecture 10 –  Memory Operation  and Performance

2

Example of a matrix

int data[M][N];

for (i = 0 ; i < N; i++) {

for (j = 0; j < M; j++) {

sum += data[i[j];

}

}

This is a

MxN matrix

Page 3: Lecture 10 –  Memory Operation  and Performance

3

Row-major and Column-major – note the sequence

Row major – sequence of access

data

Column major

Page 4: Lecture 10 –  Memory Operation  and Performance

4

Accessing a column-major

Page 5: Lecture 10 –  Memory Operation  and Performance

5

Accessing row data – is faster

It will be faster. It is because once it accesses [0,0], it will load [0,1], [0,2] …up to [1, 3] into the cache line.

Row major is faster than column major

Page 6: Lecture 10 –  Memory Operation  and Performance

6

Changing the order of the iterations is not always better. Below is an example.

int original[M][N];

int transposed[N][M];

for (i = 0; i < M; i++) {

for (j = 0; j < N; j++) {

transposed[i][j] = original[j][i];

}

}

Page 7: Lecture 10 –  Memory Operation  and Performance

7

Effect of rotating shape

Rotate by 90 degree

Page 8: Lecture 10 –  Memory Operation  and Performance

8

Insufficient Temporal Locality

// the solution is to add a square cache memoryint original[M][N];

int transposed[N][M]; for (k = 0; k < M / m; k++) { for (l = 0; l < N / n; k++) { for (i = k*m; i < (k+1)*m; i++) { for (j = l*n; j < (l+1)*n; j++) { transposed[i][j] = original[j][i]; } } } }

Page 9: Lecture 10 –  Memory Operation  and Performance

9

Blocked transpose gets around cache misses

m and n must be a square and is determined by the cache line size, say 32 bytes.

Page 10: Lecture 10 –  Memory Operation  and Performance

10

Virtual memory – Glossary

thrashing (n.) a phenomenon of virtual memory systems that occurs when the program, by the manner in which it is referencing its data and instructions, regularly causes the next memory locations referenced to be overwritten by recent or current instructions. The result is that the performance is slow.thread (n.) a lightweight or small granularity process. tiling (n.) A regular division of a mesh into patches, or tiles. Tiling is the most common way to do geometric decomposition.

Page 11: Lecture 10 –  Memory Operation  and Performance

11

Virtual Memory

virtual memory (n.) A system that stores portions of an address space that are not being actively used. When a reference is made to a value not presently in main memory, the virtual memory manager must swap some values in main memory for the values required. Virtual memory is used by almost all uniprocessors and multiprocessors, but not array processors and multicomputers.Muticomputers still employ real memory storage only on each node.

Page 12: Lecture 10 –  Memory Operation  and Performance

12

Virtual Memory (VM)

The term virtual memory refers to a combination of hardware and operating system software that solves several computing problems. It receives a single name because it is a single mechanism, but it meets several goals:

To simplify memory management and program loading by providing virtual addresses. To allow multiple large programs to be run without the need for large amounts of RAM, by providing virtual storage.

Page 13: Lecture 10 –  Memory Operation  and Performance

13

Virtual Addresses

Segmentation – group pages together with different size

Memory Protection – due to the support of more than ONE process, to protect the memory being corrupted by others

Paging – use the same size in disk and memory and load it into memory or from memory to dis. But computers hold several programs in memory at the same time.

Page 14: Lecture 10 –  Memory Operation  and Performance

14

Page and Segmentation

Page16K

Page16K

Page16K

Page16K

Page16K

Page16K

Page16K

Page 15: Lecture 10 –  Memory Operation  and Performance

15

Memory Protection

If there are more than two processes (programs in the memory), there is a need to protect the programs not to modified by others.

Program 1

Program 2

memory

Page 16: Lecture 10 –  Memory Operation  and Performance

16

contradictory about VM facts:

The compiler determines the address at which a program will execute, by hard-wiring a lot of addresses of variables and instructions into the machine code it generates. The location of the program is not determined until the program is executed and may be anywhere in main memory.

Program 1

Program 2

memory

Page 17: Lecture 10 –  Memory Operation  and Performance

17

Solution to contradictory facts

Code Relocation: Have the compiler generate addresses relative to a base address, and change the base address when the program is executed. This means that the address of each reference is calculated explicitly by adding the relative address to the base address. This is the Drawback.:

Address Translation: At run time, provide programs the illusion that there are no other programs in memory. Compilers can then generate any absolute address they wish. Two programs may contain references to the same address without interference.

Page 18: Lecture 10 –  Memory Operation  and Performance

18

Virtual and Physical Addresses

The addresses issued by the compiler are called virtual addresses.

The addresses that result from the translation are called physical addresses, because they refer to an actual memory chip.

Page 19: Lecture 10 –  Memory Operation  and Performance

19

Multiple programs without relocation

Page 20: Lecture 10 –  Memory Operation  and Performance

20

Relocatable code can share memory

Page 21: Lecture 10 –  Memory Operation  and Performance

21

Segment

A segment is a region of the address space of varying length. In the next figure, there are two segments, one used to store program A and the other, program B. Each segment can be mapped to a region of physical memory independently, as shown, but the whole segment has to be translated as one contiguous (continuous) chunk.

Page 22: Lecture 10 –  Memory Operation  and Performance

22

Segment address translation

Page 23: Lecture 10 –  Memory Operation  and Performance

23

Memory Protection

It is to protect the memory from modifying by others.This is important not only to prevent malicious attacks or eavesdropping but also to contain unintended catastrophic errors. If a computer has ever frozen or crashed on you, you have probably experienced a bug in one program careening out of control and trampling over the memory of other programs as well as that of the operating system. Address translation is the foremost tool in preventing such behavior.

Memory

Page 24: Lecture 10 –  Memory Operation  and Performance

24

Paging

the allocation of memory into chunks of varying size causes external fragmentation.

To solve this problem we can change the nature of the address translation so that, instead of mapping virtual to physical address in big chunks of varying size, it maps them in small chunks of constant size,

Page 25: Lecture 10 –  Memory Operation  and Performance

25

An example of Paging

Page 26: Lecture 10 –  Memory Operation  and Performance

26

Page fault

The page needed is not in the memory.The operating system will load it from the disk (virtual memory)It takes time to load from diskThe performance is downThe performance is measured in terms of number of page faults.A program having a page fault of 10 is better than a program with 20 page faults.

memory

1

2

4

3

Page 27: Lecture 10 –  Memory Operation  and Performance

27

Page fault – not in the main memory, has to load for disk

Page 28: Lecture 10 –  Memory Operation  and Performance

28

Working Sets

The working set of a program is the set of memory pages that the program is currently using actively.The principle of locality suggests that the working set of a program will be, at any given time, much smaller than the memory used by the program over its lifetime. The working set will change as the program executes. It will change both in the exact pages that are members of it and in the number of pages. The working set of a program will expand and contract as the program's locality becomes more or less constrained. It is the size of the working set that is important in choosing a victim program.

memory

1

2

Page 29: Lecture 10 –  Memory Operation  and Performance

29

Thrashing

When the working set is smaller, it causes the operating system to re-load to the same memory locations. The performance is affected by this as it will create many collisions.

The computer will be doing a lot of work moving pages back and forth between memory and disk, but no useful work will get done. This situation is often referred to as thrashing.

CPU is busy but is not productive, as it loads data without executing

Page 30: Lecture 10 –  Memory Operation  and Performance

30

Thrasing – here, the program has insufficient memory to execute and load it from memory

It performs swaps in and out

Page 31: Lecture 10 –  Memory Operation  and Performance

31

Relationship between working set and page fault

Better to keep a small number

Page 32: Lecture 10 –  Memory Operation  and Performance

32

Impact of VM on Performance

int data[M][N]; for (i = 0 ; i < N; i++){ for (j = 0; j < M; j++){ sum += data[j][i]; } } //column major – more page fault

Page 33: Lecture 10 –  Memory Operation  and Performance

33

Impact of VM on Performance

int data[M][N]; for (j = 0 ; j < N; j++){ for (i = 0; i < M; i++){ sum += data[j][i]; } } //row major – less page fault

Page 34: Lecture 10 –  Memory Operation  and Performance

34

summary

Make use of cache size – it means to load up to 32 or 64 bytes to the cache

Understand the row major against column major to gain performance

Try to reduce the page fault (page fault means that the page is not in main memory, the CPU has to load from disk.)

Page 35: Lecture 10 –  Memory Operation  and Performance

35

Operating System Interaction

Dynamic Linking

Time-Sharing

Threads

Page 36: Lecture 10 –  Memory Operation  and Performance

36

Dynamic Linking

 Libraries

 Dynamic-Link Libraries (DLLs)

Example of DLL

Page 37: Lecture 10 –  Memory Operation  and Performance

37

Libraries

Almost all programs are composed from many separately compiled units. When you write a single-file program, it is compiled to a representation of machine instructions called an object file. For example, Visual C++ creates an .obj file from your C++ source code. The .obj file may seem to be a complete program, but there is much more code required to make it complete.

Your code

library

Page 38: Lecture 10 –  Memory Operation  and Performance

38

Reason of using library

1 many functions, such as memory allocation, do not require special privileges to perform, and they do not take much CPU time. If these functions were invoked using a time-consuming system call, it would have a dramatic impact on performance. It is much faster to implement them as simple functions.2 these functions are language specific. OSs are language independent, and it would greatly complicate the OS to provide run-time support for all languages, even if that were possible.3 even when system calls are required, some additional "glue" code is needed to translate between the standard language interface, such as printf() or operator <<, and the calling convention that is needed to set up parameters and invoke a trap instruction.

Don’t memorise

Page 39: Lecture 10 –  Memory Operation  and Performance

39

Library in Visual C++

Page 40: Lecture 10 –  Memory Operation  and Performance

40

Example of linking

Page 41: Lecture 10 –  Memory Operation  and Performance

41

Explanation - static

In the above diagram, the application object has to link with malloc and callinig main() to form an executable (exe) file.

Your code

library

Run time

Page 42: Lecture 10 –  Memory Operation  and Performance

42

Dynamic-Link Libraries (DLLs)dynamic linking means where linking is performed on demand at runtime. An advantage of dynamic linking is that executable files can be much smaller than statically linked executables. Of course, the executable is not complete without all of the associated library files, but if many executables share a set of libraries, there can be a significant, overall savings.

Your code

library

Run time

Page 43: Lecture 10 –  Memory Operation  and Performance

43

Advantage of DLL (1)

In most systems, the space savings extend to memory.

When libraries are dynamically linked, the operating system can arrange to let applications share the library code so that only one copy of the library is loaded into memory.

With static linking, each executable is a monolithic binary program. If several programs are using the same libraries, there will be several copies of the code in memory.

Your code

library

Run time

librarylibrary

Don’t memorise

Page 44: Lecture 10 –  Memory Operation  and Performance

44

Advantage of DLL(2)

Another potential memory savings comes from the fact that dynamically linked libraries do not necessarily need to be loaded. For example, an image editor may support input and output of dozens of file formats. It could be expensive (and unnecessary) to link conversion routines for all of these formats.

With dynamic linking, the program can link code as it becomes useful, saving time and memory. This can be especially useful in programs with ever growing lists of features.

Page 45: Lecture 10 –  Memory Operation  and Performance

45

Disadvantage of DLL

First, there are version problems. Like all software, libraries tend to evolve. New libraries may be incompatible with old libraries in subtle ways. If you update the libraries used by a program, it may have good, bad, or no effects on the program's behavior. In contrast, a statically linked program will never change its behavior unless the entire program is relinked and installed.

Page 46: Lecture 10 –  Memory Operation  and Performance

46

Summary

Dynamic link is to combine the library during run time

It reduces program size, but causes version problem.