Concurrency in Python

When writing python, there are several levers we can pull for concurrency. These all have differing strengths and weaknesses, and it really depends on the needs and use case of the program you are writing. Not like JS or go, you need to actually choose which lever for async/concurrency you want to reach for.

1. multiprocessing Module:

The multiprocessing module allows you to create multiple processes, each with its own Python interpreter. This can take full advantage of multi-core processors. It’s suitable for CPU-bound tasks.

import multiprocessing

def worker_function(param):
    # Your function logic here
    return result

pool = multiprocessing.Pool(processes=4)
results = pool.map(worker_function, params_list)
pool.close()
pool.join()

2. concurrent.futures Module:

The concurrent.futures module provides a high-level interface for asynchronously executing functions using threads or processes. It’s suitable for both CPU-bound and I/O-bound tasks.

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def worker_function(param):
    # Your function logic here
    return result

with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(worker_function, params_list))

3. threading Module:

The threading module allows you to create multiple threads within a single process. Threads are lightweight and are suitable for I/O-bound tasks but may not provide true parallelism due to Python’s Global Interpreter Lock (GIL).

import threading

def worker_function(param):
    # Your function logic here
    return result

threads = []
for param in params_list:
    thread = threading.Thread(target=worker_function, args=(param,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

4. Third-party Libraries:

There are several third-party libraries that provide additional functionality for parallelism, such as joblib for parallel processing, ray for distributed computing, and dask for parallel computing on larger-than-memory datasets.

Here’s an example using a library called JobLib

from joblib import Parallel, delayed

def worker_function(param):
    # Your function logic here
    return result

results = Parallel(n_jobs=4)(delayed(worker_function)(param) for param in params_list)

Conclusion?

Generally reaching for ThreadPoolExecutor for anything that might require CPU, otherwise reaching for threading for anything lightweight. Anything beyond that, consider scaling your app horizontally with infra instead of using python’s fraught multithreading.

end of storey Last modified: