RCU

27
R(ead) C(opy) U(pdate) [email protected]

description

 

Transcript of RCU

Page 1: RCU

R(ead) C(opy) U(pdate)

[email protected]

Page 2: RCU

Agenda

What is RCU? Why? RCU Primitives RCU List Operations Sleepable RCU User Level RCU Q&A

Page 3: RCU

What is RCU?

Read-copy-update An alternative of rwlock Allow low over-head wait-free read Update can be expensive: need to

maintain old copies if in use

Page 4: RCU

Why RCU? W/o lock, this is broken due to compiler optimization and CPU out-of-

order exec 1 struct foo { 2 int a; 3 int b; 4 int c; 5 }; 6 struct foo *gp = NULL; 7 8 /* . . . */ 9 10 p = kmalloc(sizeof(*p), GFP_KERNEL); 11 p->a = 1; 12 p->b = 2; 13 p->c = 3; 14 gp = p;

Page 5: RCU

Why RCU?

Mutex, no concurrent readers Spin_lock, ditto Rwlock, allow concurrent readers.

The right choice?

Page 6: RCU

Why RCU?

rwlock is expensive Even read_lock has more overhead

than spin_lock If write_lock is not really rare,

rwlock contention is much worse than spin_lock contension

Page 7: RCU

RCU Basis Split update into removal and

reclamation phases Removal is performed immediately,

while reclamation is deferred until all readers active during the removal phase have completed

Takes advantage of the fact that writes to single aligned pointers are atomic on modern CPUs

Page 8: RCU

RCU Terminology read-side critical sections: code

delimited by rcu_read_lock() and rcu_read_unlock(), MUST NOT sleep.

quiescent state: any code not within an RCU read-side critical section

grace period: any time period during which each thread resides at least one quiescent state

Page 9: RCU

RCU Terminology

More on grace period: after a full grace period, all pre-existing RCU read-side critical sections are completed.

Page 10: RCU

RCU Update Sequence Remove pointers to a data structure, so

that subsequent readers cannot gain a reference to it

Wait for all previous readers to complete their RCU read-side critical sections (AKA, a grace period passes)

At this point, there cannot be any readers who hold references to the data structure, so it now may safely be reclaimed (e.g., in another thread)

Page 11: RCU

When Grace Period Passes? RCU readers are not permitted to block,

switch to user-mode execution, or enter the idle loop.

As soon as a CPU is seen passing through any of these three states, we know that that CPU has exited any previous RCU read-side critical sections.

If we remove an item from a linked list, and then wait until all CPUs have switched context, executed in user mode, or executed in the idle loop, we can safely free up that item.

Page 12: RCU

Core RCU APIs

rcu_read_lock() rcu_read_unlock() synchronize_rcu()/call_rcu() rcu_assign_pointer() rcu_dereference()

Page 13: RCU

Wait for Readers

synchronize_rcu(): waits only for all ongoing RCU read-side critical sections to complete

call_rcu(): registers a function and argument which are invoked after all ongoing RCU read-side critical sections have completed

Page 14: RCU

Assign & Retrieve

rcu_assign_pointer(): assign a new value to an RCU-protected pointer

rcu_dereference(): fetch an RCU-protected pointer, which is safe to use until rcu_read_unlock()

Page 15: RCU

RCU List Insert

list_add_rcu() list_add_tail_rcu() list_replace_rcu()

Must be protected by some locks.

Page 16: RCU

Sample Code 1 struct foo { 2 struct list_node *list; 3 int a; 4 int b; 5 int c; 6 }; 7 LIST_HEAD(head); 8 9 /* . . . */ 10 p = kmalloc(sizeof(*p), GFP_KERNEL); 11 p->a = 1; 12 p->b = 2; 13 p->c = 3; 14 spin_lock(&list_lock); 15 list_add_head_rcu(&p->list, &head); 16 spin_unlock(&list_lock);

Page 17: RCU

RCU List Transversal

list_for_each_entry_rcu() rcu_read_lock() and

rcu_read_unlock() must be called, but they never spin or block

Allows list_add_rcu() execute concurrently

Page 18: RCU

RCU List Removal list_del_rcu() removes element from list.

Must be protected by some lock But when to free it? synchronize_rcu() blocks until all read-

side critical sections that begin before synchronize_rcu() is completed

call_rcu() runs after all read-side critical sections that begin before call_rcu() is completed.

Page 19: RCU

Sample Codespin_lock(&mylock);p = search(head, key);if (p == NULL)

spin_unlock(&mylock);else {

list_del_rcu(&p->list);spin_unlock(&mylock);synchronize_rcu();kfree(p);

}

Page 20: RCU

Sleepable RCU

Why? the realtime kernels that require

spinlock critical sections be preemptible also require that RCU read-side critical sections be preemptible

Page 21: RCU

SRCU Implementation Strategy prevent any given task sleeping in

an RCU read-side critical section from getting an unbounded number of RCU callbacks refusing to provide asynchronous

grace-period interfaces, such as the Classic RCU's call_rcu() API

isolating grace-period detection within each subsystem using SRCU

Page 22: RCU

SRCU Grace Period?

grace periods are detected by counting per-CPU counters. readers manipulate CPU-local

counters. Two sets of per-CPU counters to do

read-copy-update

Page 23: RCU

SRCU Data Structure

struct srcu_struct {int completed;struct srcu_struct_array __percpu

*per_cpu_ref;struct mutex mutex;};struct srcu_struct_array {

int c[2];};

Page 24: RCU

Wait for Grace Period

synchronize_srcu() Flip the completed counter. So new

readers will be using the other set of per-CPU counters.

Wait for the old count to drain to zero.

Page 25: RCU

SRCU APIsint init_srcu_struct(struct srcu_struct *sp);void cleanup_srcu_struct(struct srcu_struct *sp);

int srcu_read_lock(struct srcu_struct *sp) __acquires(sp);

void srcu_read_unlock(struct srcu_struct *sp, int idx);

void synchronize_srcu(struct srcu_struct *sp);void synchronize_srcu_expedited(struct srcu_struct

*sp);

long srcu_batches_completed(struct srcu_struct *sp);

Page 26: RCU

Userspace RCU Available on http://lttng.org/urcu git clone git://git.lttng.org/userspace-

rcu.git Debian: aptitude install liburcu-dev Examples

Page 27: RCU

Q & A