Concurrency in Python
When writing python, there are several levers we can pull for concurrency. These all have differing strengths and weaknesses, and it really depends on the needs and use case of the program you are writing. Not like JS or go, you need to actually choose which lever for async/concurrency you want to reach for.
multiprocessing
Module:
1. The multiprocessing module allows you to create multiple processes, each with its own Python interpreter. This can take full advantage of multi-core processors. It’s suitable for CPU-bound tasks.
import multiprocessing
def worker_function(param):
# Your function logic here
return result
pool = multiprocessing.Pool(processes=4)
results = pool.map(worker_function, params_list)
pool.close()
pool.join()
concurrent.futures
Module:
2. The concurrent.futures module provides a high-level interface for asynchronously executing functions using threads or processes. It’s suitable for both CPU-bound and I/O-bound tasks.
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def worker_function(param):
# Your function logic here
return result
with ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(worker_function, params_list))
threading
Module:
3. The threading module allows you to create multiple threads within a single process. Threads are lightweight and are suitable for I/O-bound tasks but may not provide true parallelism due to Python’s Global Interpreter Lock (GIL).
import threading
def worker_function(param):
# Your function logic here
return result
threads = []
for param in params_list:
thread = threading.Thread(target=worker_function, args=(param,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
4. Third-party Libraries:
There are several third-party libraries that provide additional functionality for parallelism, such as joblib for parallel processing, ray for distributed computing, and dask for parallel computing on larger-than-memory datasets.
Here’s an example using a library called JobLib
from joblib import Parallel, delayed
def worker_function(param):
# Your function logic here
return result
results = Parallel(n_jobs=4)(delayed(worker_function)(param) for param in params_list)
Conclusion?
Generally reaching for ThreadPoolExecutor
for anything that might require CPU, otherwise reaching for threading
for anything lightweight. Anything beyond that, consider scaling your app horizontally with infra instead of using python’s fraught multithreading.