Lecture 10 – Memory Operation and Performance

Post on 16-Jan-2016

22 views 0 download

Tags:

description

Lecture 10 – Memory Operation and Performance. Caches – repeat some concepts Virtual Memory (VM). Example of a matrix. int data[M][N]; for (i = 0 ; i < N; i++) { for (j = 0; j < M; j++) { sum += data[i[j]; } }. This is a MxN matrix. - PowerPoint PPT Presentation

Transcript of Lecture 10 – Memory Operation and Performance

1

Lecture 10 – Memory Operation

and Performance

•Caches – repeat some concepts

•Virtual Memory (VM)

2

Example of a matrix

int data[M][N];

for (i = 0 ; i < N; i++) {

for (j = 0; j < M; j++) {

sum += data[i[j];

}

}

This is a

MxN matrix

3

Row-major and Column-major – note the sequence

Row major – sequence of access

data

Column major

4

Accessing a column-major

5

Accessing row data – is faster

It will be faster. It is because once it accesses [0,0], it will load [0,1], [0,2] …up to [1, 3] into the cache line.

Row major is faster than column major

6

Changing the order of the iterations is not always better. Below is an example.

int original[M][N];

int transposed[N][M];

for (i = 0; i < M; i++) {

for (j = 0; j < N; j++) {

transposed[i][j] = original[j][i];

}

}

7

Effect of rotating shape

Rotate by 90 degree

8

Insufficient Temporal Locality

// the solution is to add a square cache memoryint original[M][N];

int transposed[N][M]; for (k = 0; k < M / m; k++) { for (l = 0; l < N / n; k++) { for (i = k*m; i < (k+1)*m; i++) { for (j = l*n; j < (l+1)*n; j++) { transposed[i][j] = original[j][i]; } } } }

9

Blocked transpose gets around cache misses

m and n must be a square and is determined by the cache line size, say 32 bytes.

10

Virtual memory – Glossary

thrashing (n.) a phenomenon of virtual memory systems that occurs when the program, by the manner in which it is referencing its data and instructions, regularly causes the next memory locations referenced to be overwritten by recent or current instructions. The result is that the performance is slow.thread (n.) a lightweight or small granularity process. tiling (n.) A regular division of a mesh into patches, or tiles. Tiling is the most common way to do geometric decomposition.

11

Virtual Memory

virtual memory (n.) A system that stores portions of an address space that are not being actively used. When a reference is made to a value not presently in main memory, the virtual memory manager must swap some values in main memory for the values required. Virtual memory is used by almost all uniprocessors and multiprocessors, but not array processors and multicomputers.Muticomputers still employ real memory storage only on each node.

12

Virtual Memory (VM)

The term virtual memory refers to a combination of hardware and operating system software that solves several computing problems. It receives a single name because it is a single mechanism, but it meets several goals:

To simplify memory management and program loading by providing virtual addresses. To allow multiple large programs to be run without the need for large amounts of RAM, by providing virtual storage.

13

Virtual Addresses

Segmentation – group pages together with different size

Memory Protection – due to the support of more than ONE process, to protect the memory being corrupted by others

Paging – use the same size in disk and memory and load it into memory or from memory to dis. But computers hold several programs in memory at the same time.

14

Page and Segmentation

Page16K

Page16K

Page16K

Page16K

Page16K

Page16K

Page16K

15

Memory Protection

If there are more than two processes (programs in the memory), there is a need to protect the programs not to modified by others.

Program 1

Program 2

memory

16

contradictory about VM facts:

The compiler determines the address at which a program will execute, by hard-wiring a lot of addresses of variables and instructions into the machine code it generates. The location of the program is not determined until the program is executed and may be anywhere in main memory.

Program 1

Program 2

memory

17

Solution to contradictory facts

Code Relocation: Have the compiler generate addresses relative to a base address, and change the base address when the program is executed. This means that the address of each reference is calculated explicitly by adding the relative address to the base address. This is the Drawback.:

Address Translation: At run time, provide programs the illusion that there are no other programs in memory. Compilers can then generate any absolute address they wish. Two programs may contain references to the same address without interference.

18

Virtual and Physical Addresses

The addresses issued by the compiler are called virtual addresses.

The addresses that result from the translation are called physical addresses, because they refer to an actual memory chip.

19

Multiple programs without relocation

20

Relocatable code can share memory

21

Segment

A segment is a region of the address space of varying length. In the next figure, there are two segments, one used to store program A and the other, program B. Each segment can be mapped to a region of physical memory independently, as shown, but the whole segment has to be translated as one contiguous (continuous) chunk.

22

Segment address translation

23

Memory Protection

It is to protect the memory from modifying by others.This is important not only to prevent malicious attacks or eavesdropping but also to contain unintended catastrophic errors. If a computer has ever frozen or crashed on you, you have probably experienced a bug in one program careening out of control and trampling over the memory of other programs as well as that of the operating system. Address translation is the foremost tool in preventing such behavior.

Memory

24

Paging

the allocation of memory into chunks of varying size causes external fragmentation.

To solve this problem we can change the nature of the address translation so that, instead of mapping virtual to physical address in big chunks of varying size, it maps them in small chunks of constant size,

25

An example of Paging

26

Page fault

The page needed is not in the memory.The operating system will load it from the disk (virtual memory)It takes time to load from diskThe performance is downThe performance is measured in terms of number of page faults.A program having a page fault of 10 is better than a program with 20 page faults.

memory

1

2

4

3

27

Page fault – not in the main memory, has to load for disk

28

Working Sets

The working set of a program is the set of memory pages that the program is currently using actively.The principle of locality suggests that the working set of a program will be, at any given time, much smaller than the memory used by the program over its lifetime. The working set will change as the program executes. It will change both in the exact pages that are members of it and in the number of pages. The working set of a program will expand and contract as the program's locality becomes more or less constrained. It is the size of the working set that is important in choosing a victim program.

memory

1

2

29

Thrashing

When the working set is smaller, it causes the operating system to re-load to the same memory locations. The performance is affected by this as it will create many collisions.

The computer will be doing a lot of work moving pages back and forth between memory and disk, but no useful work will get done. This situation is often referred to as thrashing.

CPU is busy but is not productive, as it loads data without executing

30

Thrasing – here, the program has insufficient memory to execute and load it from memory

It performs swaps in and out

31

Relationship between working set and page fault

Better to keep a small number

32

Impact of VM on Performance

int data[M][N]; for (i = 0 ; i < N; i++){ for (j = 0; j < M; j++){ sum += data[j][i]; } } //column major – more page fault

33

Impact of VM on Performance

int data[M][N]; for (j = 0 ; j < N; j++){ for (i = 0; i < M; i++){ sum += data[j][i]; } } //row major – less page fault

34

summary

Make use of cache size – it means to load up to 32 or 64 bytes to the cache

Understand the row major against column major to gain performance

Try to reduce the page fault (page fault means that the page is not in main memory, the CPU has to load from disk.)

35

Operating System Interaction

Dynamic Linking

Time-Sharing

Threads

36

Dynamic Linking

 Libraries

 Dynamic-Link Libraries (DLLs)

Example of DLL

37

Libraries

Almost all programs are composed from many separately compiled units. When you write a single-file program, it is compiled to a representation of machine instructions called an object file. For example, Visual C++ creates an .obj file from your C++ source code. The .obj file may seem to be a complete program, but there is much more code required to make it complete.

Your code

library

38

Reason of using library

1 many functions, such as memory allocation, do not require special privileges to perform, and they do not take much CPU time. If these functions were invoked using a time-consuming system call, it would have a dramatic impact on performance. It is much faster to implement them as simple functions.2 these functions are language specific. OSs are language independent, and it would greatly complicate the OS to provide run-time support for all languages, even if that were possible.3 even when system calls are required, some additional "glue" code is needed to translate between the standard language interface, such as printf() or operator <<, and the calling convention that is needed to set up parameters and invoke a trap instruction.

Don’t memorise

39

Library in Visual C++

40

Example of linking

41

Explanation - static

In the above diagram, the application object has to link with malloc and callinig main() to form an executable (exe) file.

Your code

library

Run time

42

Dynamic-Link Libraries (DLLs)dynamic linking means where linking is performed on demand at runtime. An advantage of dynamic linking is that executable files can be much smaller than statically linked executables. Of course, the executable is not complete without all of the associated library files, but if many executables share a set of libraries, there can be a significant, overall savings.

Your code

library

Run time

43

Advantage of DLL (1)

In most systems, the space savings extend to memory.

When libraries are dynamically linked, the operating system can arrange to let applications share the library code so that only one copy of the library is loaded into memory.

With static linking, each executable is a monolithic binary program. If several programs are using the same libraries, there will be several copies of the code in memory.

Your code

library

Run time

librarylibrary

Don’t memorise

44

Advantage of DLL(2)

Another potential memory savings comes from the fact that dynamically linked libraries do not necessarily need to be loaded. For example, an image editor may support input and output of dozens of file formats. It could be expensive (and unnecessary) to link conversion routines for all of these formats.

With dynamic linking, the program can link code as it becomes useful, saving time and memory. This can be especially useful in programs with ever growing lists of features.

45

Disadvantage of DLL

First, there are version problems. Like all software, libraries tend to evolve. New libraries may be incompatible with old libraries in subtle ways. If you update the libraries used by a program, it may have good, bad, or no effects on the program's behavior. In contrast, a statically linked program will never change its behavior unless the entire program is relinked and installed.

46

Summary

Dynamic link is to combine the library during run time

It reduces program size, but causes version problem.