Linux Device Driver parallelism using SMP and Kernel Pre-emption
-
Upload
hemanth-venkatesh -
Category
Technology
-
view
773 -
download
8
description
Transcript of Linux Device Driver parallelism using SMP and Kernel Pre-emption
![Page 1: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/1.jpg)
Slide 1
Driver Parallelism using SMP and Kernel Pre-emption
Hemanth V
![Page 2: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/2.jpg)
Slide 2
• Understanding of Linux Device Drivers
• Basic understanding of Linux Synchronization mechanisms like Semaphore, Mutex and Spin Locks
PrerequisitesPrerequisites
![Page 3: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/3.jpg)
Slide 3
Contents
Kernel Pre-emption Feature
SMP Architecture
USB Usecase Analysis
Driver Scenarios
Summary
What's Driver Parallelism
![Page 4: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/4.jpg)
Slide 4
Driver Parallelism
• Parallelism or Concurrency arises when system tries to do more than one thing at once
– Concurrency is when two tasks can start, run, and complete in overlapping time periods. It doesn't necessarily mean they'll ever both be running at the same instant.
– Parallelism is when tasks literally run at the same time
• The goal of parallelism/concurrency is to improve system performance
• The side affect is that it can also lead to Race conditions
• Further discussion in the slides will highlight the sources of parallelism/concurrency, howto improve performance and avoid race conditions for Linux Device Drivers
http://www.fasterj.com/cartoon/cartoon106.shtml
![Page 5: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/5.jpg)
Slide 5
Kernel Preemption
• CONFIG_PREEMPT– This kernel config option reduces the latency of the kernel by making all kernel
code (that is not executing in a critical section) preemptible.
– This allows reaction to interactive events by permitting a low priority process to be preempted involuntarily even if it is in kernel mode executing
– After execution of an asynchronous event like interrupt handler, if a higher priority process is ready to run the current process is replaced.
– Useful for embedded system with latency requirements in the milliseconds range.
![Page 6: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/6.jpg)
Slide 6
SMP Architecture• Evolution of multiprocessor architectures
– Late 60s saw need for more CPU processing power for scientific and compute intensive applications.
– Two or more CPUs combined to form a single computer
• SMP (Symmetric Multiprocessing) is one of the multiprocessor architecture.
• AMP, Cluster are others
• Basic idea, more tasks in parallel per unit time
![Page 7: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/7.jpg)
Slide 7
SMP Architecture
Cache Cache Cache Cache
CPU CPU CPU CPU
I/OMemory
Fig 1 : Logical view of SMPIn actual hardware implementation, cache will not be directly connected to bus.
Cache Cache Cache Cache
CPU CPU CPU CPU
I/OMemory
Fig 1 : Logical view of SMPIn actual hardware implementation, cache will not be directly connected to bus.
![Page 8: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/8.jpg)
Slide 8
SMP Architecture Contd• 4 CPU SMP system shown in diagram, all CPUs would be symmetric i.e.
would be of same architecture, frequency etc
• CPU, Memory, IO tightly coupled using high speed interconnect bus, allowing any unit connected to bus to communicate with any other unit
• Single globally accessible memory used by all CPUs, No local RAM in CPUs, Data changes visible to all CPUs
• Symmetric or equal access to global shared memory, contents are fully shared, all CPUs use the same address whenever referring to the same piece of data
• I/O access also symmetric, i.e. any cpu can initiate I/O
![Page 9: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/9.jpg)
Slide 9
SMP Architecture Cont• Interrupts distributed across CPUs by PIC
• Access to bus and memory has to be arbitrated so that no 2 CPUs step on each other, and all have guaranteed fair access
• Max CPUs that can be used depends on Bus bandwidth
• Only one instance of OS or Operating System, which is loaded in main memory
• Concurrent access to kernel data structures, hence kernel needs to be SMP aware
![Page 10: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/10.jpg)
Slide 10
SMP Intricacies: Cache Coherency
![Page 11: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/11.jpg)
Slide 11
SMP Intricacies: Cache Coherency
• CPU stores data into cache in most implementations to improve system performance.
• Consider the case of 2 Threads running on 2 different CPUs in a SMP system. Both use global variable “Data”. If one of them modifies it to 1, it is reflected in its own cache only. Values in main memory and other cpu’s cache are stale, and if those values are read by other CPU, results could be unpredictable. Hence the need to maintain consistency or coherency of caches.
• This problem is typically solved by Hardware cache consistency protocols, which include snooping and write-update/write-invalidate
![Page 12: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/12.jpg)
Slide 12
SMP Intricacies: Atomic operations
• Two threads trying to obtain the same semaphore simultaneously. Both read value of 0 think its available and set it to 1.
• These issues are solved by using atomic instructions provided by each architecture
• Special instructions provide Atomic test and set operations. Example load-linked and store-conditional instructions in MIPS and load-exclusive store-exclusive in ARM
![Page 13: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/13.jpg)
Slide 13
USB Subsystem Analysis
USB Host Controller
EHCI Driver
USB Core
USB Print Class Driver
USB Mass Storage Class Driver
USB Print APP
USB Mass Storage APP
Linux Host
USB Device Controller
UDC Driver
Mass storage gadget Driver
Print gadget Driver
USB Print App
Linux Device
Simplified view of USB Subsystem
![Page 14: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/14.jpg)
Slide 14
USB Subsystem Analysis: No preempt
• Assume Linux host has initiated a large transfer for USB mass storage.
• In-kernel transfer would not be pre-empted until available data is exhausted.
• High priority, small amount of data for Print would get scheduled only after mass storage transfer is complete.
• This affects end user experience
![Page 15: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/15.jpg)
Slide 15
USB Subsystem Analysis: Preempt Enabled
• Assume the same scenario with kernel preemption enabled.
• In kernel transfer of mass-storage can be preempted and replaced by Print data transfer, for example after processing a keyboard or timer interrupt
• Opens another parallel path into both USB core and Ehci drivers, since mass storage transfer is not complete and Print transfer has started.
• Print transfer could re-open the same device, access the same data structures for initiating transfer, and could even disconnect the device.
![Page 16: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/16.jpg)
Slide 16
USB Subsystem Analysis: Preempt Enabled
• Hence driver design needs to determine all parallel paths and points at which its safe to be pre-empted, at the same time enable parallelism.
• For example it could be safe to pre-empt once URB request is queued, but might not be safe to pre-empt when DMA is in progress since DMA configuration registers could be overwritten.
![Page 17: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/17.jpg)
Slide 17
USB Subsystem Analysis: SMP• Assume the previous scenario on a SMP system
• In this case the scheduler need not pre-empt the running mass storage transfer, but can schedule the print transfer on an another CPU.
• This too opens a new parallel path into the drivers, and both would be executing at the same instant of time.
• Hence if parallelism is taken care in the drivers, its to a large extent SMP safe.
• In SMP systems Interrupt handler and driver code could run concurrently on different CPUs.
• Hence the need to protect Interrupt handlers using spin locks
![Page 18: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/18.jpg)
Slide 18
Driver Scenarios
static LIST_HEAD(ts_list);
int process_ts_entries ()
{
local_irq_disable();
list_for_each_entry(ts, &ts_list, node) {
/* Process List elements */
list_del(node);
}
local_irq_enable();
}
irqreturn_t ts_isr (int irq, void *dev_id) { /* Process Interrupt */ list_add_tail(node, &ts_list); }
local_irq_disable () protects from both interrupt handler and preemption
spin_lock_irqsave () needs to be added for SMP safe in Driver Code & ISR
![Page 19: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/19.jpg)
Slide 19
Driver Scenarios: Cont
Locking using Mutex/Semaphore doesn't disable pre-emption, but guarantees that data structure is not corrupted on pre-emption
Both SMP safe and Pre-empt Safe
static LIST_HEAD(ts_list);
int process_ts_entries ()
{
mutex_lock_interruptible(ts->lock);
list_for_each_entry(ts, &ts_list, node) {
/* Process List elements */
list_del(node);
}
mutex_unlock(ts->lock);
}
int process_rest_entries(){ mutex_lock_interruptible(ts->lock);
list_for_each_entry(ts, &ts_list, node) {
/* Process remaining elements */
}
mutex_unlock(ts->lock); }
![Page 20: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/20.jpg)
Slide 20
Driver Scenarios: Cont
Functions process_ts_entries() and process_rest_entries() could deadlock if pre-empted while holding one of the locks
Locks need to be obtained in the same order, to avoid deadlock
static LIST_HEAD(ts_list);
static LIST_HEAD(tc_list);
int process_ts_entries ()
{
mutex_lock_interruptible(ts->lock);
/* Some processing */
mutex_lock_interruptible(tc->lock);
}
int process_rest_entries(){ mutex_lock_interruptible(tc->lock);
/* Some processing */
mutex_lock_interruptible(ts->lock);
}
![Page 21: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/21.jpg)
Slide 21
Driver Scenarios: Cont
In some cases it might be better to access resources from a single function, rather than have locks spread across through out the code
static LIST_HEAD(ts_list);
int process_ts_entries ()
{
mutex_lock_interruptible(ts->lock);
list_for_each_entry(ts, &ts_list, node) {
/* Process List elements */
list_del(node);
}
mutex_unlock(ts->lock);
}
{ /* Process list elements */ process_ts_entries(); }
{ /* Process list elements */ process_ts_entries(); }
![Page 22: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/22.jpg)
Slide 22
Driver Scenarios • Don’t use one big lock for everything, reduces concurrency• Too fine-grained locks increases overhead • Need to balance both aspects
• Reader –Writer locks– If Data structures are read more often than being updated– Allows multiple reads locks to be obtained simultaneously.– Allows single write lock to be obtained, and also prevents any read lock from
being obtained while write lock is held– Available for both spin locks and semaphores
• Stack variables/structures don't need locking, since on pre-emption another instance is created
![Page 23: Linux Device Driver parallelism using SMP and Kernel Pre-emption](https://reader034.fdocuments.in/reader034/viewer/2022052505/5562f931d8b42a6f598b4911/html5/thumbnails/23.jpg)
Slide 23
Summary• Concurrency/Parallelism needs to be one of the criteria during Driver Design
phase
• Analysis required to determine the parallel paths and protection for critical sections
• Drivers which ensure concurrency using appropriate locking techniques, not only avoids race conditions but also improves performance
• Unit testing could be used to test some of the parallel paths in the driver– Two different applications which will enable parallel path into the same driver.
– Two instances for the same application.