Multithreading vs Multiprocessing
Python offers two main models for concurrent execution: threads (via threading) and processes (via multiprocessing).
The GIL — Global Interpreter Lock
The GIL is a mutex in CPython that allows only one thread to execute Python bytecode at a time.
- Impact on threads: CPU-bound threads do not truly run in parallel.
- Impact on I/O: threads do release the GIL while waiting for I/O (file, network, sleep), so multithreading IS effective for I/O-bound work.
- Multiprocessing bypasses the GIL by using separate processes with separate interpreters.
When to Use Which
| Scenario | Best Choice |
|---|---|
| I/O-bound (network, disk, sleep) | threading or asyncio |
| CPU-bound (computation, parsing) | multiprocessing |
| Mixed or simplicity needed | concurrent.futures |
threading Module
import threading
def worker(name):
print(f"Thread {name} running")
threads = [threading.Thread(target=worker, args=(i,)) for i in range(5)]
for t in threads:
t.start()
for t in threads:
t.join() # wait for all to finish
Thread Synchronization
import threading
counter = 0
lock = threading.Lock()
def increment():
global counter
with lock: # only one thread enters at a time
counter += 1
# Other primitives:
# threading.RLock() — reentrant lock
# threading.Event() — signal between threads
# threading.Semaphore(n) — allow at most n threads
# threading.Condition() — wait/notify
# queue.Queue() — thread-safe FIFO (preferred for producer/consumer)
multiprocessing Module
from multiprocessing import Process, Pool
def cpu_task(n):
return sum(i**2 for i in range(n))
# Individual processes
p = Process(target=cpu_task, args=(10_000_000,))
p.start()
p.join()
# Pool — distribute work across CPU cores
with Pool(processes=4) as pool:
results = pool.map(cpu_task, [10**6, 2*10**6, 3*10**6])
concurrent.futures — High-Level API
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import urllib.request
urls = ["http://example.com", "http://python.org"]
# I/O-bound: use threads
with ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(urllib.request.urlopen, urls))
# CPU-bound: use processes
with ProcessPoolExecutor(max_workers=4) as executor:
results = list(executor.map(cpu_task, [10**6, 2*10**6]))
asyncio — Cooperative Concurrency
For I/O-bound tasks with many concurrent operations (e.g., thousands of HTTP requests).
import asyncio
import aiohttp
async def fetch(session, url):
async with session.get(url) as resp:
return await resp.text()
async def main():
async with aiohttp.ClientSession() as session:
tasks = [fetch(session, url) for url in urls]
results = await asyncio.gather(*tasks)
asyncio.run(main())
Key Interview Points
- The GIL makes CPU-bound multithreading ineffective in CPython; use
multiprocessingor external C extensions that release the GIL (e.g. NumPy). - Threads share memory → use locks; processes have separate memory → use
Queue/Pipeor shared memory objects. concurrent.futuresis the clean, high-level choice for both models.asynciois single-threaded but handles thousands of concurrent I/O operations via an event loop.- Rule of thumb: threads for I/O, processes for CPU, asyncio for high-concurrency I/O.
queue.Queueis thread-safe; use it for producer-consumer patterns instead of raw shared variables.