PyCon India 2011: Python Threads: Dive into GIL!
-
Upload
chetan-giridhar -
Category
Technology
-
view
1.350 -
download
7
description
Transcript of PyCon India 2011: Python Threads: Dive into GIL!
![Page 1: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/1.jpg)
Python threads: Dive into GIL!
PyCon India 2011
Pune, Sept 16-18Pune, Sept 16-18
Vishal Kanaujia & Chetan Giridhar
![Page 2: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/2.jpg)
Python: A multithreading example
![Page 3: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/3.jpg)
Setting up the context!!Hmm…My threads
should be twice as faster
on dual cores!
Python v2.7 Execution Time
Single Core 74 s
Dual Core 116 s0
50
100
150
Single Core Dual Core
Execution Time
Execution Time
57 % dip in Execution Time on dual core!!
![Page 4: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/4.jpg)
Getting in to the Problem Space!!
Getting into the
problem space!!
Python
v2.7Getting in to the Problem Space!!
GIL
v2.7
![Page 5: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/5.jpg)
Python Threads
• Real system threads (POSIX/ Windows threads)
• Python VM has no intelligence of thread management management
– No thread priorities, pre-emption
• Native operative system supervises thread scheduling
• Python interpreter just does the per-thread bookkeeping.
![Page 6: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/6.jpg)
What’s wrong with Py threads?
• Each ‘running’ thread requires exclusive
access to data structures in Python interpreter
• Global interpreter lock (GIL) provides the
synchronization (bookkeeping) synchronization (bookkeeping)
• GIL is necessary, mainly because CPython's
memory management is not thread-safe.
![Page 7: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/7.jpg)
GIL: Code Details
• A thread create request in Python is just a
pthread_create() call
• The function Py_Initialize() creates the GIL
• GIL is simply a synchronization primitive, and can • GIL is simply a synchronization primitive, and can
be implemented with a semaphore/ mutex.
• A “runnable” thread acquires this lock and start
execution
![Page 8: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/8.jpg)
GIL Management• How does Python manages GIL?
– Python interpreter regularly performs a
check on the running thread
– Accomplishes thread switching and signal handling
• What is a “check”?• What is a “check”?
� A counter of ticks; ticks decrement as a thread runs
� A tick maps to a Python VM’s byte-code instructions
� Check interval can be set with sys.setcheckinterval
(interval)
� Check dictate CPU time-slice available to a thread
![Page 9: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/9.jpg)
GIL Management: Implementation
• Involves two global varibales:• PyAPI_DATA(volatile int) _Py_Ticker;
• PyAPI_DATA(int) _Py_CheckInterval;
• As soon as ticks reach zero:
• Refill the ticker• Refill the ticker
• active thread releases the GIL
• Signals sleeping threads to wake up
• Everyone competes for GIL
![Page 10: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/10.jpg)
Two CPU bound threads on single core machine
Running Suspended
Signals thread2
GIL
released
waiting for
GILthread1
Suspended Wakeup Running
waiting
for GIL
GIL
acquired
time check1
thread2
check2
![Page 11: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/11.jpg)
GIL impact• There is considerable time lag with
– Communication (Signaling)
– Thread wake-up
– GIL acquisition
• GIL Management: Independent of host/native OS • GIL Management: Independent of host/native OS scheduling
• Result
– Significant overhead
– Thread waits if GIL in unavailable
– Threads run sequentially, rather than concurrently
Try Ctrl + C. Does it stop
execution?
![Page 12: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/12.jpg)
Curious case of multicore systemthread1, core0
thread2, core1
thread
time
• Conflicting goals of OS scheduler and Python interpreter
• Host OS can schedule threads concurrently on multi-core
• GIL battle
![Page 13: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/13.jpg)
The ‘Priority inversion’
• In a [CPU, I/O]-bound mixed application, I/O
bound thread may starve!
• “cache-hotness” may influence the new GIL
owner; usually the recent owner!owner; usually the recent owner!
• Preferring CPU thread over I/O thread
• Python presents a priority inversion on multi-
core systems.
![Page 14: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/14.jpg)
Getting in to the Problem Space!!
Understanding the
new story!!
Python
v3.2Getting in to the Problem Space!!
GIL
v3.2
![Page 15: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/15.jpg)
New GIL: Python v3.2
• Regular “check” are discontinued
• We have new time-out mechanism.
• Default time-out= 5ms
• Configurable through sys.setswitchinterval()
• For every time-out, the current GIL holder is forced to • For every time-out, the current GIL holder is forced to
release GIL
• It then signals the other waiting threads
• Waits for a signal from new GIL owner
(acknowledgement).
• A sleeping thread wakes up, acquires the GIL, and
signals the last owner.
![Page 16: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/16.jpg)
Running Wait Suspended
Time Out
Waiting for
the GIL
GIL
released
Curious Case of Multicore: Python v3.2CPU Thread
core0
time
Suspended Wakeup Running
waiting
for GIL
GIL
acquired
CPU Thread
core1
![Page 17: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/17.jpg)
Positive impact with new GIL
• Better GIL arbitration
• Ensures that a thread runs only for 5ms
• Less context switching and fewer signals
• Multicore perspective: GIL battle eliminated!• Multicore perspective: GIL battle eliminated!
• More responsive threads (fair scheduling)
• All iz well☺
![Page 18: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/18.jpg)
I/O threads in Python
• An interesting optimization by interpreter
– I/O calls are assumed blocking
• Python I/O extensively exercise this optimization with file, socket ops (e.g. read, write, send, recv calls)write, send, recv calls)./Python3.2.1/Include/ceval.h
Py_BEGIN_ALLOW_THREADS
Do some blocking I/O operation ...
Py_END_ALLOW_THREADS
• I/O thread always releases the GIL
![Page 19: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/19.jpg)
Convoy effect: Fallout of I/O
optimization
• When an I/O thread releases the GIL, another
‘runnable’ CPU bound thread can acquire it
(remember we are on multiple cores).
• It leaves the I/O thread waiting for another • It leaves the I/O thread waiting for another
time-out (default: 5ms)!
• Once CPU thread releases GIL, I/O thread
acquires and releases it again
• This cycle goes on => performance suffers �
![Page 20: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/20.jpg)
Convoy “in” effectConvoy effect- observed in an application comprising I/O-
bound and CPU-bound threads
Running Wait Suspended
GIL releasedGIL
released
Running Wait
I/O Thread
core0GIL acquired
Running Wait Suspended
waiting for
GIL
Suspended Running Wait Suspended
Running Wait
Time Out
GIL acquired
time
CPU Thread
core1
![Page 21: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/21.jpg)
Performance measurements!
• Curious to know how convoy effect translates
into performance numbers
• We performed following tests with Python3.2:
• An application with a CPU and a I/O thread• An application with a CPU and a I/O thread
• Executed on a dual core machine
• CPU thread spends less than few seconds
(<10s)!
I/O thread with CPU thread I/O thread without CPU thread
97 seconds 23 seconds
![Page 22: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/22.jpg)
Comparing: Python 2.7 & Python 3.2
60
80
Execution Time – Single Core
150
Execution Time – Dual Core
Python v3.2 Execution Time
Single Core 55 s
Dual Core 65 s
Python v2.7 Execution Time
Single Core 74 s
Dual Core 116 s
0
20
40
60
v2.7 v3.2
Execution Time
0
50
100
v2.7 v3.2
Execution Time
50
52
54
56
58
60
62
64
66
Single Core Dual Core
Execution Time – Python v3.2
Execution Time
Performance
dip still
observed in
dual cores �
![Page 23: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/23.jpg)
Getting in to the Problem Space!!
Getting into the
problem space!!
Python
v2.7 / v3.2Getting in to the Problem Space!!
GILSolution
space!!
![Page 24: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/24.jpg)
GIL free world: Jython
• Jython is free of GIL ☺
• It can fully exploit multiple cores, as per our experiments
• Experiments with Jython2.5• Experiments with Jython2.5
– Run with two CPU threads in tandem
• Experiment shows performance improvement on a multi-core system
Jython2.5 Execution time
Single core 44 s
Dual core 25 s
![Page 25: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/25.jpg)
Avoiding GIL impact with multiprocessing
• multiprocessing — Process-based “threading” interface
• “multiprocessing” module spawns a new Python interpreter
instance for a process.
• Each process is independent, and GIL is irrelevant;
– Utilizes multiple cores better than threads.
– Shares API with “threading” module.– Shares API with “threading” module.
Python v2.7 Single Core Dual Core
threading 76 s 114 s
multiprocessing 72 s 43 s
0
20
40
60
80
100
120
Single Core Dual Core
threading
multiprocessing
Cool! 40 % improvement in Execution
Time on dual core!! ☺☺☺☺
![Page 26: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/26.jpg)
Conclusion
• Multi-core systems are becoming ubiquitous
• Python applications should exploit this
abundant power
• CPython inherently suffers the GIL limitation• CPython inherently suffers the GIL limitation
• An intelligent awareness of Python interpreter
behavior is helpful in developing multi-
threaded applications
• Understand and use ☺
![Page 27: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/27.jpg)
Questions
Thank you for your time and attention ☺Thank you for your time and attention ☺
• Please share your feedback/ comments/ suggestions to us at:
• [email protected] , http://technobeans.com
• [email protected], http://freethreads.wordpress.com
![Page 28: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/28.jpg)
References
• Understanding the Python GIL, http://dabeaz.com/talks.html
• GlobalInterpreterLock,
http://wiki.python.org/moin/GlobalInterpreterLock
• Thread State and the Global Interpreter Lock,
http://docs.python.org/c-api/init.html#threadshttp://docs.python.org/c-api/init.html#threads
• Python v3.2.2 and v2.7.2 documentation,
http://docs.python.org/
• Concurrency and Python, http://drdobbs.com/open-
source/206103078?pgno=3
![Page 29: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/29.jpg)
Backup slides
![Page 30: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/30.jpg)
Python: GIL
• A thread needs GIL before updating Python objects, calling C/Python API functions
• Concurrency is emulated with regular ‘checks’ to switch threads
• Applicable to only CPU bound thread• Applicable to only CPU bound thread
• A blocking I/O operation implies relinquishing the GIL– ./Python2.7.5/Include/ceval.h
Py_BEGIN_ALLOW_THREADS
Do some blocking I/O operation ...
Py_END_ALLOW_THREADS
• Python file I/O extensively exercise this optimization
![Page 31: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/31.jpg)
GIL: Internals
• The function Py_Initialize() creates the GIL
• A thread create request in Python is just a pthread_create() call
• ../Python/ceval.c
• static PyThread_type_lock interpreter_lock = 0; • static PyThread_type_lock interpreter_lock = 0; /* This is the GIL */
• o) thread_PyThread_start_new_thread: we call it for "each" user defined thread.
• calls PyEval_InitThreads() -> PyThread_acquire_lock() {}
![Page 32: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/32.jpg)
GIL: in action
• Each CPU bound thread requires GIL
• ‘ticks count’ determine duration of GIL hold
• new_threadstate() -> tick_counter
• We keep a list of Python threads and each • We keep a list of Python threads and each
thread-state has its tick_counter value
• As soon as tick decrements to zero, the
thread release the GIL.
![Page 33: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/33.jpg)
GIL: Details
thread_PyThread_start_new_thread() ->
void PyEval_InitThreads(void)
{
if (interpreter_lock)
return;return;
interpreter_lock = PyThread_allocate_lock();
PyThread_acquire_lock(interpreter_lock, 1);
main_thread = PyThread_get_thread_ident();
}
![Page 34: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/34.jpg)
Convoy effect: Python v2?
• Convoy effect holds true for Python v2 also
• The smaller interval of ‘check’ saves the day!
– I/O threads don’t have to wait for a longer time (5
m) for CPU threads to finishm) for CPU threads to finish
– Should choose the setswitchinterval() wisely
• The effect is not so visible in Python v2.0
![Page 35: PyCon India 2011: Python Threads: Dive into GIL!](https://reader033.fdocuments.in/reader033/viewer/2022052821/5549f4e2b4c90512488b56fd/html5/thumbnails/35.jpg)
Stackless Python
• A different way of creating threads:
Microthreads!
• No improvement from multi-core perspective
• Round-robin scheduling for “tasklets”• Round-robin scheduling for “tasklets”
• Sequential execution �