Course

Python Multiprocessing: Syntax, Usage, and Examples

Python multiprocessing allows you to run multiple processes in parallel, leveraging multiple CPU cores for improved performance. Unlike multithreading, which is limited by Python’s Global Interpreter Lock (GIL), multiprocessing lets you execute CPU-bound tasks concurrently. The multiprocessing module provides tools to create, manage, and synchronize processes, making it ideal for heavy computations, parallel data processing, and task automation.

How to Use Python Multiprocessing

The multiprocessing module provides the Process class, which you can use to create and run independent processes. Here’s how you can create a new process:

python
import multiprocessing def worker_function(): print("Process running") # Creating a process process = multiprocessing.Process(target=worker_function) # Starting the process process.start() # Waiting for process to finish process.join()
  • multiprocessing.Process(target=function_name): Creates a new process that runs the specified function.
  • .start(): Starts the process execution.
  • .join(): Ensures the main process waits for the child process to finish.

When to Use Multiprocessing in Python

Multiprocessing is ideal when your code needs to handle CPU-intensive tasks. Here are three common situations where multiprocessing improves performance:

Parallelizing CPU-Intensive Tasks

When processing large datasets, using multiple CPU cores can significantly speed up execution.

python
import multiprocessing def calculate_square(n): return n * n numbers = [1, 2, 3, 4, 5] with multiprocessing.Pool(processes=3) as pool: results = pool.map(calculate_square, numbers) print(results) # Output: [1, 4, 9, 16, 25]

Handling Multiple Processes in Web Scraping

Web scraping involves sending multiple requests, which can be parallelized to speed up data retrieval.

python
import multiprocessing import requests def fetch_url(url): response = requests.get(url) return f"{url}: {response.status_code}" urls = ["https://example.com", "https://google.com", "https://github.com"] with multiprocessing.Pool(processes=3) as pool: results = pool.map(fetch_url, urls) print(results)

Efficiently Processing Large Files

If you need to process large files, multiprocessing lets you split the work across multiple CPU cores.

python
import multiprocessing def count_lines(file_name): with open(file_name, 'r') as file: return len(file.readlines()) files = ["file1.txt", "file2.txt", "file3.txt"] with multiprocessing.Pool() as pool: line_counts = pool.map(count_lines, files) print(line_counts)

Examples of Python Multiprocessing

Running Multiple Processes Concurrently

python
import multiprocessing def worker(task_id): print(f"Task {task_id} is running") tasks = [multiprocessing.Process(target=worker, args=(i,)) for i in range(3)] for task in tasks: task.start() for task in tasks: task.join()

Using a Shared Memory Variable

When multiple processes need to share data, you can use Value or Array from the multiprocessing module.

python
import multiprocessing def increment(shared_counter): for _ in range(100000): with shared_counter.get_lock(): shared_counter.value += 1 counter = multiprocessing.Value('i', 0) # Shared integer value processes = [multiprocessing.Process(target=increment, args=(counter,)) for _ in range(2)] for p in processes: p.start() for p in processes: p.join() print(f"Final counter value: {counter.value}")

Using a Process Pool for Task Distribution

If you have many small tasks, a process pool distributes the workload efficiently.

python
import multiprocessing def square(n): return n * n numbers = [1, 2, 3, 4, 5] with multiprocessing.Pool(processes=2) as pool: results = pool.map(square, numbers) print(results) # Output: [1, 4, 9, 16, 25]

Learn More About Python Multiprocessing

Python ThreadPool vs. Multiprocessing

ThreadPool is best for I/O-bound tasks like web scraping or file I/O, while multiprocessing excels in CPU-bound tasks like numerical computations or image processing. The choice depends on whether your bottleneck is CPU or waiting time.

Implementing a Rate Limit for Python Multiprocessing Requests

When making multiple HTTP requests, you can prevent overloading a server by implementing a rate limit.

python
import multiprocessing import time def fetch_data(url): time.sleep(1) # Simulating rate limiting print(f"Fetched data from {url}") urls = ["https://example.com", "https://api.github.com", "https://google.com"] with multiprocessing.Pool(processes=2) as pool: pool.map(fetch_data, urls)

Managing Shared Memory in Python Multiprocessing

Multiprocessing allows processes to share memory using multiprocessing.Manager().

python
import multiprocessing def add_to_list(shared_list, value): shared_list.append(value) with multiprocessing.Manager() as manager: shared_list = manager.list() processes = [multiprocessing.Process(target=add_to_list, args=(shared_list, i)) for i in range(5)] for p in processes: p.start() for p in processes: p.join() print(list(shared_list)) # Output: [0, 1, 2, 3, 4]

Python Multiprocessing Contexts and Fork Issues

Python provides three multiprocessing contexts:

  1. fork (Unix only, duplicates memory)
  2. spawn (creates a new Python interpreter)
  3. forkserver (starts a new process from a clean state)

If you run into segmentation faults due to fork, use spawn instead:

python
import multiprocessing if __name__ == "__main__": multiprocessing.set_start_method("spawn") process = multiprocessing.Process(target=print, args=("Hello from a spawned process!",)) process.start() process.join()

Preventing Deadlocks with Python Multiprocessing Lock

When multiple processes modify shared resources, use multiprocessing.Lock() to prevent conflicts.

python
import multiprocessing lock = multiprocessing.Lock() def task(name): with lock: print(f"{name} is running") processes = [multiprocessing.Process(target=task, args=(f"Process {i}",)) for i in range(3)] for p in processes: p.start() for p in processes: p.join()

Using Multiprocessing Logging

Logging across multiple processes can be tricky, but multiprocessing.Queue() helps.

python
import multiprocessing import logging def worker(log_queue): logger = logging.getLogger() handler = logging.handlers.QueueHandler(log_queue) logger.addHandler(handler) logger.setLevel(logging.INFO) logger.info("Logging from process") log_queue = multiprocessing.Queue() process = multiprocessing.Process(target=worker, args=(log_queue,)) process.start() process.join() while not log_queue.empty(): print(log_queue.get())

When to Avoid Multiprocessing

Multiprocessing has some downsides:

  • High Overhead: Creating and managing processes requires more memory than threads.
  • Not Always Faster: If a task involves a lot of data transfer between processes, the overhead can slow things down.
  • Not Suitable for Short Tasks: The time taken to start and terminate processes can outweigh the benefits.

If your program is mostly I/O-bound or requires frequent inter-process communication, consider using multithreading instead.

Python multiprocessing is a powerful tool for optimizing performance when working with CPU-bound tasks. By leveraging multiple CPU cores, you can significantly speed up data processing, image transformations, and complex calculations.

However, multiprocessing comes with trade-offs, including higher memory usage and potential synchronization challenges.