Applications of Non-Blocking Data Structures to Real-Time Systems

53
Håkan Sundell, [email protected] Chalmers University of Technology 1 Applications of Non-Blocking Data Structures to Real-Time Systems Seminar for the degree of Licentiate of Philosophy Håkan Sundell Computing Science Chalmers University of Technology

description

Applications of Non-Blocking Data Structures to Real-Time Systems. Seminar for the degree of Licentiate of Philosophy Håkan Sundell Computing Science Chalmers University of Technology. ARTES project: ”Applications of wait/lock-free protocols to real-time systems” Started in March 1999. - PowerPoint PPT Presentation

Transcript of Applications of Non-Blocking Data Structures to Real-Time Systems

Håkan Sundell, [email protected]

Chalmers University of Technology

1

Applications of Non-Blocking Data Structures to Real-Time Systems

Seminar for the degree of Licentiate of Philosophy

Håkan Sundell

Computing Science

Chalmers University of Technology

Håkan Sundell, [email protected]

Chalmers University of Technology

2

Background

• ARTES project: ”Applications of wait/lock-free protocols to real-time systems”

• Started in March 1999.

• One active Ph.D.-student.

• Project leader: Philippas Tsigas

Håkan Sundell, [email protected]

Chalmers University of Technology

3

Schedule• Introduction

– Real-Time Systems– Synchronization

• Shared Data Objects: Snapshots– Evaluation

• The Effect of Using Timing Information– Snapshot– Shared Register

• Software engineering part• Conclusions & Future Work

Håkan Sundell, [email protected]

Chalmers University of Technology

4

Real-Time Systems

• Uni- or Multi-processor system

• Interconnection Network– e.g. The Controller Area Network (CAN).

CPUCPU CPUCPU

CPUCPU CPUCPU

Håkan Sundell, [email protected]

Chalmers University of Technology

5

Real-Time Systems• Shared Memory

CPU CPU CPU

CPU CPU CPU CPU CPU CPU

Cache Cache Cache

Cache bus Cache bus Cache bus

Memory

Memory Memory Memory

...

. . .

... .... . .

- Uniform Memory Access (UMA)

- Non-Uniform Memory Access (NUMA)

Håkan Sundell, [email protected]

Chalmers University of Technology

6

Real-Time Systems• Cooperating Tasks

– Timing Constraints

• Inter-task Communication: Shared Data Objects– Needs Synchronization

? ? ?? ? ?? ? ?? ? ?

T1T1

T2T2

T3T3

Håkan Sundell, [email protected]

Chalmers University of Technology

7

Schedule• Introduction

– Real-Time Systems– Synchronization

• Shared Data Objects: Snapshots– Evaluation

• The Effect of Using Timing Information– Snapshot– Shared Register

• Software engineering part• Conclusions & Future Work

Håkan Sundell, [email protected]

Chalmers University of Technology

8

Synchronization

• Synchronization using Locks– Uses semaphores, spinning,

disabling interrupts

– Negative• Blocking

• Priority inversion

• Risk of deadlock

– Positive• Execution time guarantees easy to do, but pessimistic

Take lockTake lock

... do operation ...... do operation ...

Release lockRelease lock

Håkan Sundell, [email protected]

Chalmers University of Technology

9

Non-blocking Synchronization

• Lock-Free Synchronization– Retries until not interfered by other operations

• Usually detecting interference by using some kind of shared variable indicating busy-state or similar.

Change flag to unique value, or remember current stateChange flag to unique value, or remember current state

... do the operation while preserving the active structure ...... do the operation while preserving the active structure ...

Check for same value or state and then validate changes, Check for same value or state and then validate changes, otherwise retryotherwise retry

Håkan Sundell, [email protected]

Chalmers University of Technology

10

Non-blocking Synchronization

• Lock-Free Synchronization– Negative

• No execution time guarantees, can continue forever - thus can cause starvation

– Positive• Avoids blocking and priority inversion

• Avoids deadlock

• Fast execution on average

Håkan Sundell, [email protected]

Chalmers University of Technology

11

Non-blocking Synchronization

• Non-blocking Synchronization – Uses atomic synchronization primitives

– Uses shared memory

• Wait-Free Synchronization– Always finish in a finite number of its

own steps

– Negative• Complex algorithms

• Memory consuming

Test&SetTest&Set

CompareCompare&Swap&Swap

CopyingCopying

HelpingHelping

AnnouncingAnnouncing

SplitSplitoperationoperation

??????

Håkan Sundell, [email protected]

Chalmers University of Technology

12

Non-blocking Synchronization

• Wait-Free Synchronization– Positive

• Execution time guarantees

• Fast execution

• Avoids blocking and priority inversion

• Avoids deadlock

• Avoids starvation

• Same implementation on both single- and multiprocessor systems

Håkan Sundell, [email protected]

Chalmers University of Technology

13

Schedule• Introduction

– Real-Time Systems– Synchronization

• Shared Data Objects: Snapshots– Evaluation

• The Effect of Using Timing Information– Snapshot– Shared Register

• Software engineering part• Conclusions & Future Work

Håkan Sundell, [email protected]

Chalmers University of Technology

14

Shared Data Objects

• Correctness criteria for concurrent operations: linearizability– All concurrent executions can be transformed

into an equivalent serial sequence of atomic operations preserving the partial order

t

Read

Write

Writeti

tj

tk

ser

Håkan Sundell, [email protected]

Chalmers University of Technology

15

Snapshot

• Snapshot– A consistent momentous state of a set of several

shared variables that are logically related– One reader (scanner)

• Reads the whole set of variables in one atomic step

– Many writers (updaters)• Writes to only one variable each time

Håkan Sundell, [email protected]

Chalmers University of Technology

16

Snapshot: Correctness

• Atomicity / Linearizability criteria

t

t

Write Write

Read

Write Write

Read

YES

YES

ci

ci

= returned by scanner

tWrite Write

Read

ci

NO

Håkan Sundell, [email protected]

Chalmers University of Technology

17

Snapshot: Correctness

• Atomicity / Linearizability criteria

tWrite Write

Read

ciNO

= returned by scanner

Write Write

Write

ci

cj tNO

Håkan Sundell, [email protected]

Chalmers University of Technology

18

Schedule• Introduction

– Real-Time Systems– Synchronization

• Shared Data Objects: Snapshots– Evaluation

• The Effect of Using Timing Information– Snapshot– Register

• Software engineering part• Conclusions & Future Work

Håkan Sundell, [email protected]

Chalmers University of Technology

19

Used by writerUsed by writer

Used by readerUsed by reader

What are we evaluating

• Wait-free snapshot algorithm by Ermedahl et. al– 3 register copies for each component

– Uses the Test&Set atomic primitive for synchronization

Håkan Sundell, [email protected]

Chalmers University of Technology

20

Analysis

• Real-Time System: Measured schedulability• Created “realistic” scenarios on a theoretic 68020

uni-processor system– Real RTOS parameters– Manual WCET-analysis on cycle level– 1 scanner (5 components), 24 updaters (10 real-time

tasks, 15 interrupts)– Fixed priority response time analysis– Schedulable without any synchronization– Adding lock/wait-free or semaphore synchronization

Håkan Sundell, [email protected]

Chalmers University of Technology

21

Analysis: Schedulability (%)

Håkan Sundell, [email protected]

Chalmers University of Technology

22

Experiments

• Simulation– RT-simulator written in Erlang by Ermedahl

and Sjödin.• Fixed priority preemptive scheduler

• Semaphores

• Messages

– Subset of scenarios used in analysis

Håkan Sundell, [email protected]

Chalmers University of Technology

23

Experiments: Schedulability (%)

Håkan Sundell, [email protected]

Chalmers University of Technology

24

Experiments

• Multi-node: Simulation of CAN-bus 1 MHz– 10 nodes connected using messages– Local snapshots on each node – 1 super-snapshot task on 1 node– Subset of scenarios used for single-node

analysis

Håkan Sundell, [email protected]

Chalmers University of Technology

25

Experiments: Rsnap for multi-node

Håkan Sundell, [email protected]

Chalmers University of Technology

26

Schedule• Introduction

– Real-Time Systems– Synchronization

• Shared Data Objects: Snapshots– Evaluation

• The Effect of Using Timing Information– Snapshot– Register

• Software engineering part• Conclusions & Future Work

Håkan Sundell, [email protected]

Chalmers University of Technology

27

Timing Information

• Previously used by Chen and Burns in 1999.– Assuming system with periodic fixed-priority

scheduling– Notations from Standard Real-Time Response Time

Analysis

– Use information about• Periods , T• Worst-case Computation time , C• Worst-case Response times , R

)(ihpjj

j

iii C

T

RCR

Håkan Sundell, [email protected]

Chalmers University of Technology

28

Schedule• Introduction

– Real-Time Systems– Synchronization

• Shared Data Objects: Snapshots– Evaluation

• The Effect of Using Timing Information– Snapshot– Register

• Software engineering part• Conclusions & Future Work

Håkan Sundell, [email protected]

Chalmers University of Technology

29

Snapshot

• Back to Basics: Unbounded Memory Protocol– The reader increases global index and scans backwards.

tv ? ? ? ? w nil nil

v ? ? ? ? w nil nil

v ? ? ? ? w nil nilc1

ci

cc

Snapshotindex ? = previous values / nilw = writer position

. . .

. . .

. . .

Håkan Sundell, [email protected]

Chalmers University of Technology

30

Snapshot• Bounded Memory: Cyclical Buffers

– Needed buffer length is dependent on how fast the updaters is compared to the scanner

– Each component can have different buffer lengths

Håkan Sundell, [email protected]

Chalmers University of Technology

31

Timing Information• Bounding

– Needed buffer length for component k

– Can be refined even further

where Ts is the period for the snapshot taskTw is the period for the writer tasks

2max*2 )(

S

Wkwrik T

Tl i

Håkan Sundell, [email protected]

Chalmers University of Technology

32

Experiments

• Using a Sun Enterprise 10000 multiprocessor computer

• 1 scanner task and 10 updater tasks, one on each CPU

• Comparing two wait-free snapshot algorithms– Using timing information– Using Test-and-Set synchronization

Håkan Sundell, [email protected]

Chalmers University of Technology

33

Experiments• Scenarios with different ratios between

scanner/updater:

– Measuring response time for scan versus update operations

Ratio 500/50

200/50

100/50

50/50

50/100

50/200

50/500

Buffer length 3 3 3 4 6 10 22

Håkan Sundell, [email protected]

Chalmers University of Technology

34

Experiments• Scan operation - Average Response Time

Håkan Sundell, [email protected]

Chalmers University of Technology

35

Experiments• Update operation – Average Response Time

Håkan Sundell, [email protected]

Chalmers University of Technology

36

Schedule• Introduction

– Real-Time Systems– Synchronization

• Shared Data Objects: Snapshots– Evaluation

• The Effect of Using Timing Information– Snapshot– Shared Register

• Software engineering part• Conclusions & Future Work

Håkan Sundell, [email protected]

Chalmers University of Technology

37

Shared Register• Target domain: Shared Memory (Even no cache

coherency)• Wait-Free Atomic Shared Buffer by Vitanyi et. al

– A Matrix of 1-reader 1-writer registers– Each register contains a value/tag pair encoded as one value

... ... ...

R21 R22 …

R11 R12 ... Readers

Writers

Rij - written by processor i read by processor j

tag value

Håkan Sundell, [email protected]

Chalmers University of Technology

38

Shared Register• Algorithm:

– Readers scans its column for highest tag and returns the corresponding value

– Writers scan its column and writes the next tag together with the new value to its row

• Unbounded maximum size for the tag field in the value/tag pair– Assume 8 writer tasks with 10 ms period

• Maximum tag after one hour is 2880000 which needs 22 bits!

Håkan Sundell, [email protected]

Chalmers University of Technology

39

Timing Information• Analyzing the maximum difference between tags possible

observable by a task at two consecutive invocations of the algorithm– In any possible execution:

• Tmax is the longest period

• Rmax is the longest response time

• Twr is the period of the writer tasks

• Recycling tags:– Newer tags can restart from zero when we reach a certain tag value– In order to be able to decide if newer tags are newer we need to have:

n

i Wr

n

i Wr iiT

R

T

TMaxTagDiff

1

max

1

max

2*MaxTagDiffzeTagFieldSi

v3 v4 v1 v2

0 N

v3 v4

Håkan Sundell, [email protected]

Chalmers University of Technology

40

Examples• Example Task Scenario on 8 processors:

• Unbounded algorithm would have reached tag 68400 in one hour , needing >16 bits

Task Period Task Period

Wr1 1000 Rd1 500

Wr2 900 Rd2 450

Wr3 800 Rd3 400

Wr4 700 Rd4 350

Wr5 600 Rd5 300

Wr6 500 Rd6 250

Wr7 400 Rd7 200

Wr8 300 Rd8 150

n

i Wr

n

i Wr iiT

R

T

TMaxTagDiff

1

max

1

max

38100010008

1

i WrWr iiTT

76*2 MaxTagDiffzeTagFieldSi

776log2 tsTagFieldBi

1000maxmax RT

Håkan Sundell, [email protected]

Chalmers University of Technology

41

Schedule• Introduction

– Real-Time Systems– Synchronization

• Shared Data Objects: Snapshots– Evaluation

• The Effect of Using Timing Information– Snapshot– Register

• Software engineering part• Conclusions & Future Work

Håkan Sundell, [email protected]

Chalmers University of Technology

42

Background

• Multithreaded programming needs communication.

• Communicating using shared data structures like stacks, queues, lists and so on.

• This needs synchronization!• Locks (Mutual exclusion) has several drawbacks,

especially for Real-Time Systems.• Non-blocking solutions are often complex to

implement and have non-standard interfaces.

Håkan Sundell, [email protected]

Chalmers University of Technology

43

NOBLE: A Non-Blocking Inter-Process Communication Library

• Designed with the following properties:– Functionality – Stacks, Queues, Lists,

Snapshot, Register… with clear specifications– Programmer friendly - #include <noble.h> ,

NBL<function>– Easy to adapt existing solutions – Provides

locks as well as non-blocking synchronization

Håkan Sundell, [email protected]

Chalmers University of Technology

44

NOBLE: A Non-Blocking Inter-Process Communication Library

• Designed with the following properties (cont.):– Efficient – Object oriented design “virtual

functions and inheritance with base classes” in C

– Portable – Modular design, platform-dependent code separated

– Adaptable for different programming languages – C, C++, Standard dynamic linked library

Håkan Sundell, [email protected]

Chalmers University of Technology

45

Examples

• #include <noble.h>• First create a global variable handling the shared

data object, for example a stack:NBLStack *stack;stack=NBLCreateStackLF(10000);

• When some thread wants to do some operation:NBLStackPush(stack, item);

oritem=NBLStackPop(stack);

Håkan Sundell, [email protected]

Chalmers University of Technology

46

Examples

• When the data structure is not in use anymore:NBLStackFree(stack);

• To change the synchronization mechanism, only one line of code has to be changed!stack=NBLStackCreateLF(10000);replaced withstack=NBLStackCreateLB();

Håkan Sundell, [email protected]

Chalmers University of Technology

47

Experiment

• Set of 50000 random operations performed multithreaded on each data structure, with either low or high contention.

• Comparing the different synchronization mechanisms and implementations available.

• Varying number of threads from 1 – 30.• Performed on multiprocessors:

– Sun Enterprise 10000 with 64 CPUs, Solaris– Compaq PC with 2 CPUs, Win32

Håkan Sundell, [email protected]

Chalmers University of Technology

48

Experiments: Linked List (high)

Håkan Sundell, [email protected]

Chalmers University of Technology

49

Status

• Multiprocessor support– Sun Solaris (Sparc)– Win32 (Intel x86)– SGI (Mips) – Evaluation stage– Linux (Intel x86) – Evaluation stage

• Extensive Manual• Web site up and running,

http://www.cs.chalmers.se/~noble

Håkan Sundell, [email protected]

Chalmers University of Technology

50

Schedule• Introduction

– Real-Time Systems– Synchronization

• Shared Data Objects: Snapshots– Evaluation

• The Effect of Using Timing Information– Snapshot– Register

• Software engineering part• Conclusions & Future Work

Håkan Sundell, [email protected]

Chalmers University of Technology

51

Conclusions

• Contributions:– Evaluations of snapshot

• Non-blocking performs better than lock-based in all cases. Lock-free performs best on uni-processor systems.

– The effect of using Timing Information• Snapshot and Shared Register• Algorithms can be simplified and increase the

performance significantly.• Efficient recycling of time-stamps is possible

Håkan Sundell, [email protected]

Chalmers University of Technology

52

Conclusions

• Contributions (cont.):– A library of non-blocking protocols

• Easy to use, efficient and portable

• Non-blocking protocols always performs better than lock-based, especially on multi-processor systems.

• Concluding judgment:– Non-blocking protocols are highly applicable to real-

time systems. Lock-free protocols seems very promising and will be applicable to real-time systems with applied analysis

Håkan Sundell, [email protected]

Chalmers University of Technology

53

Future work

• NOBLE– Adapt to commercial RTOS (Enea OSE).– Extend to embedded systems

• Simpler uni- and multi-processor systems including 8-bit processors with/without or different support for atomic synchronization primitives.

• Timing Information– Create lock-free translations to fulfill real-time systems

properties– General time-stamp recycling scheme– More non-blocking protocols