Pycon11: Python threads: Dive into GIL!
-
Upload
chetan-giridhar -
Category
Technology
-
view
1.614 -
download
3
description
Transcript of Pycon11: Python threads: Dive into GIL!
![Page 1: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/1.jpg)
Python threads: Dive into GIL!
PyCon India 2011Pune, Sept 16-18
Vishal Kanaujia & Chetan Giridhar
![Page 2: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/2.jpg)
Python: A multithreading example
![Page 3: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/3.jpg)
Setting up the context!!
Python v2.7 Execution Time
Single Core 74 s
Dual Core 116 s
Single Core Dual Core0
20406080
100120140
Execution Time
Execution Time
57 % dip in Execution Time on dual core!!
Hmm…My threads should be twice as faster
on dual cores!
![Page 4: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/4.jpg)
Getting in to the Problem Space!!
Getting into the problem space!!
GIL
Python v2.7
![Page 5: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/5.jpg)
Python Threads
• Real system threads (POSIX/ Windows threads)• Python VM has no intelligence of thread
management – No thread priorities, pre-emption
• Native operative system supervises thread scheduling
• Python interpreter just does the per-thread bookkeeping.
![Page 6: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/6.jpg)
What’s wrong with Py threads?
• Each ‘running’ thread requires exclusive access to data structures in Python interpreter
• Global interpreter lock (GIL) provides the synchronization (bookkeeping)
• GIL is necessary, mainly because CPython's memory management is not thread-safe.
![Page 7: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/7.jpg)
GIL: Code Details
• A thread create request in Python is just a pthread_create() call
• The function Py_Initialize() creates the GIL• GIL is simply a synchronization primitive, and can
be implemented with a semaphore/ mutex.../Python/ceval.cstatic PyThread_type_lock interpreter_lock = 0; /* This is the GIL */
• A “runnable” thread acquires this lock and start execution
![Page 8: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/8.jpg)
GIL Management• How does Python manages GIL?– Python interpreter regularly performs a check on the running thread– Accomplishes thread switching and signal handling
• What is a “check”? A counter of ticks; ticks decrement as a thread runs A tick maps to a Python VM’s byte-code instructions Check interval can be set with sys.setcheckinterval
(interval) Check dictate CPU time-slice available to a thread
![Page 9: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/9.jpg)
GIL Management: Implementation
• Involves two global varibales:• PyAPI_DATA(volatile int) _Py_Ticker;• PyAPI_DATA(int) _Py_CheckInterval;
• As soon as ticks reach zero:• Refill the ticker• active thread releases the GIL• Signals sleeping threads to wake up• Everyone competes for GIL
File: ../ceval.c
PyEval_EvalFrameEx () { --_Py_Ticker; /* Decrement ticker */ /* Refill it */ _Py_Ticker = _Py_CheckInterval; if (interpreter_lock) { PyThread_release_lock(interpreter_lock); /* Other threads may run now */
PyThread_acquire_lock(interpreter_lock, 1); } }
![Page 10: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/10.jpg)
Two CPU bound threads on single core machine
Running Suspended
Suspended Wakeup Running
waiting for GIL
Signals thread2
GIL released
waiting for GIL
GIL acquired
time check1
thread2
thread1
check2
![Page 11: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/11.jpg)
GIL impact• There is considerable time lag with – Communication (Signaling)– Thread wake-up– GIL acquisition
• GIL Management: Independent of host/native OS scheduling
• Result– Significant overhead– Thread waits if GIL in unavailable– Threads run sequentially, rather than concurrently
Try Ctrl + C. Does it stop execution?
![Page 12: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/12.jpg)
Curious case of multicore system
thread1, core0
thread2, core1
time
thread
• Conflicting goals of OS scheduler and Python interpreter• Host OS can schedule threads concurrently on multi-core• GIL battle
![Page 13: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/13.jpg)
The ‘Priority inversion’
• In a [CPU, I/O]-bound mixed application, I/O bound thread may starve!
• “cache-hotness” may influence the new GIL owner; usually the recent owner!
• Preferring CPU thread over I/O thread• Python presents a priority inversion on multi-
core systems.
![Page 14: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/14.jpg)
Getting in to the Problem Space!!
Understanding the new story!!
GIL
Python v3.2
![Page 15: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/15.jpg)
New GIL: Python v3.2• Regular “check” are discontinued• We have new time-out mechanism.
• Default time-out= 5ms• Configurable through sys.setswitchinterval()
• For every time-out, the current GIL holder is forced to release GIL
• It then signals the other waiting threads• Waits for a signal from new GIL owner
(acknowledgement).• A sleeping thread wakes up, acquires the GIL, and
signals the last owner.
![Page 16: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/16.jpg)
time
Running Wait Suspended
Suspended Wakeup Running
sig
na
lwaiting for GIL
Time Out
Waiting for the GIL
GIL released
GIL acquired
la
ng
is
Curious Case of Multicore: Python v3.2
CPU Thread core1
CPU Thread core0
![Page 17: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/17.jpg)
Positive impact with new GIL
• Better GIL arbitration• Ensures that a thread runs only for 5ms
• Less context switching and fewer signals• Multicore perspective: GIL battle eliminated!• More responsive threads (fair scheduling)• All iz well
![Page 18: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/18.jpg)
I/O threads in Python• An interesting optimization by interpreter– I/O calls are assumed blocking
• Python I/O extensively exercise this optimization with file, socket ops (e.g. read, write, send, recv calls)./Python3.2.1/Include/ceval.h
Py_BEGIN_ALLOW_THREADS Do some blocking I/O operation ...Py_END_ALLOW_THREADS
• I/O thread always releases the GIL
![Page 19: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/19.jpg)
Convoy effect: Fallout of I/O optimization
• When an I/O thread releases the GIL, another ‘runnable’ CPU bound thread can acquire it (remember we are on multiple cores).
• It leaves the I/O thread waiting for another time-out (default: 5ms)!
• Once CPU thread releases GIL, I/O thread acquires and releases it again
• This cycle goes on => performance suffers
![Page 20: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/20.jpg)
Convoy “in” effectConvoy effect- observed in an application comprising I/O-bound and CPU-bound threads
Running Wait Suspended
waiting for GIL
GIL released GIL released
Suspended Running Wait Suspended
Running Wait
Time Out
GIL acquired
time
I/O Thread core0
GIL acquired
CPU Thread core1
![Page 21: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/21.jpg)
Performance measurements!• Curious to know how convoy effect translates
into performance numbers• We performed following tests with Python3.2:
• An application with a CPU and a I/O thread• Executed on a dual core machine• CPU thread spends less than few seconds
(<10s)!
I/O thread with CPU thread I/O thread without CPU thread
97 seconds 23 seconds
![Page 22: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/22.jpg)
Comparing: Python 2.7 & Python 3.2
v2.7 v3.20
1020304050607080
Execution Time – Single Core
Execution Time
v2.7 v3.20
20406080
100120140
Execution Time – Dual Core
Execution Time
Python v3.2 Execution Time
Single Core 55 s
Dual Core 65 s
Python v2.7 Execution Time
Single Core 74 s
Dual Core 116 s
Single Core Dual Core 505254565860626466
Execution Time – Python v3.2
Execution Time
Performance dip still
observed in dual cores
![Page 23: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/23.jpg)
Getting in to the Problem Space!!
Getting into the problem space!!
GIL
Python v2.7 / v3.2
Solution space!!
![Page 24: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/24.jpg)
GIL free world: Jython
• Jython is free of GIL • It can fully exploit multiple cores, as per our
experiments• Experiments with Jython2.5– Run with two CPU threads in tandem
• Experiment shows performance improvement on a multi-core system
Jython2.5 Execution time
Single core 44 s
Dual core 25 s
![Page 25: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/25.jpg)
Avoiding GIL impact with multiprocessing• multiprocessing — Process-based “threading” interface• “multiprocessing” module spawns a new Python interpreter
instance for a process.• Each process is independent, and GIL is irrelevant;
– Utilizes multiple cores better than threads.– Shares API with “threading” module.
Python v2.7 Single Core Dual Core
threading 76 s 114 s
multiprocessing 72 s 43 s
Single Core Dual Core 0
20
40
60
80
100
120
threadingmultiprocessing
Cool! 40 % improvement in Execution Time on dual core!!
![Page 26: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/26.jpg)
Conclusion• Multi-core systems are becoming ubiquitous• Python applications should exploit this
abundant power• CPython inherently suffers the GIL limitation• An intelligent awareness of Python interpreter
behavior is helpful in developing multi-threaded applications
• Understand and use
![Page 27: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/27.jpg)
Questions
Thank you for your time and attention
• Please share your feedback/ comments/ suggestions to us at:• [email protected] , http://technobeans.com • [email protected], http://freethreads.wordpress.com
![Page 28: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/28.jpg)
References
• Understanding the Python GIL, http://dabeaz.com/talks.html• GlobalInterpreterLock,
http://wiki.python.org/moin/GlobalInterpreterLock • Thread State and the Global Interpreter Lock,
http://docs.python.org/c-api/init.html#threads• Python v3.2.2 and v2.7.2 documentation,
http://docs.python.org/• Concurrency and Python,
http://drdobbs.com/open-source/206103078?pgno=3
![Page 29: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/29.jpg)
Backup slides
![Page 30: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/30.jpg)
Python: GIL
• A thread needs GIL before updating Python objects, calling C/Python API functions
• Concurrency is emulated with regular ‘checks’ to switch threads
• Applicable to only CPU bound thread• A blocking I/O operation implies relinquishing the GIL
– ./Python2.7.5/Include/ceval.hPy_BEGIN_ALLOW_THREADS Do some blocking I/O operation ...Py_END_ALLOW_THREADS
• Python file I/O extensively exercise this optimization
![Page 31: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/31.jpg)
GIL: Internals
• The function Py_Initialize() creates the GIL• A thread create request in Python is just a
pthread_create() call• ../Python/ceval.c• static PyThread_type_lock interpreter_lock = 0; /*
This is the GIL */• o) thread_PyThread_start_new_thread: we call it for
"each" user defined thread.• calls PyEval_InitThreads() ->
PyThread_acquire_lock() {}
![Page 32: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/32.jpg)
GIL: in action
• Each CPU bound thread requires GIL• ‘ticks count’ determine duration of GIL hold• new_threadstate() -> tick_counter• We keep a list of Python threads and each
thread-state has its tick_counter value• As soon as tick decrements to zero, the
thread release the GIL.
![Page 33: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/33.jpg)
GIL: Details
thread_PyThread_start_new_thread() ->void PyEval_InitThreads(void){ if (interpreter_lock) return; interpreter_lock = PyThread_allocate_lock(); PyThread_acquire_lock(interpreter_lock, 1); main_thread = PyThread_get_thread_ident();}
![Page 34: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/34.jpg)
Convoy effect: Python v2?
• Convoy effect holds true for Python v2 also• The smaller interval of ‘check’ saves the day!– I/O threads don’t have to wait for a longer time (5
m) for CPU threads to finish– Should choose the setswitchinterval() wisely
• The effect is not so visible in Python v2.0
![Page 35: Pycon11: Python threads: Dive into GIL!](https://reader035.fdocuments.in/reader035/viewer/2022062307/5549f5fbb4c905507a8b4583/html5/thumbnails/35.jpg)
Stackless Python
• A different way of creating threads: Microthreads!
• No improvement from multi-core perspective• Round-robin scheduling for “tasklets”• Sequential execution