Python Multiprocessing

Summary: in this tutorial, you’ll learn how to run code in parallel using the Python multiprocessing module.

Introduction to the Python multiprocessing

Generally, programs deal with two types of tasks:

  1. I/O-bound tasks: if a task does a lot of input/output operations, it’s called an I/O-bound task. Typical examples of I/O-bound tasks are reading from files, writing to files, connecting to databases, and making a network request. For I/O-bound tasks, you can use multithreading to speed them up.
  2. CPU-bound tasks: when a task does a lot of operations using a CPU, it’s called a CPU-bound task. For example, number calculation, image resizing, and video streaming are CPU-bound tasks. To speed up the program with lots of CPU-bound tasks, you use multiprocessing.

Multiprocessing allows two or more processors to simultaneously process two or more different parts of a program.

In Python, you use the multiprocessing module to implement multiprocessing.

Python multiprocessing example

See the following program:

import time def task(): result = 0 for _ in range(10**8): result += 1 return result if __name__ == '__main__': start = time.perf_counter() task() task() finish = time.perf_counter() print(f'It took {finish-start:.2f} second(s) to finish')Code language: Python (python)

Output:

It took 5.55 second(s) to finishCode language: Python (python)

How it works.

First, define the task() function is a CPU-bound task because it performs a heavy computation by executing a loop for 100 million iterations and incrementing a variable result:

def task(): result = 0 for _ in range(10**8): result += 1 return resultCode language: Python (python)

Second, call the task() functions twice and record the processing time:

if __name__ == '__main__': start = time.perf_counter() task() task() finish = time.perf_counter() print(f'It took {finish-start: .2f} second(s) to finish')Code language: Python (python)

On our computer, it took 5.55 seconds to complete.

Using multiprocessing module

The following program uses the multiprocessing module but takes less time:

import time import multiprocessing def task() -> int: result = 0 for _ in range(10**8): result += 1 return result if __name__ == '__main__': start = time.perf_counter() p1 = multiprocessing.Process(target=task) p2 = multiprocessing.Process(target=task) p1.start() p2.start() p1.join() p2.join() finish = time.perf_counter() print(f'It took {finish-start:.2f} second(s) to finish')Code language: Python (python)

Output:

It took 3.43 second(s) to finishCode language: Python (python)

How it works.

First, import the multiprocessing module:

import multiprocessingCode language: Python (python)

Second, create two processes and pass the task function to each:

p1 = multiprocessing.Process(target=task) p2 = multiprocessing.Process(target=task)Code language: Python (python)

Note that the Process() constructor returns a new Process object.

Third, call the start() method of the Process objects to start the process:

p1.start() p2.start()Code language: Python (python)

Finally, wait for the processes to complete by calling the join() method:

p1.join() p2.join()Code language: Python (python)

Python multiprocessing practical example

We’ll use the multiprocessing module to resize the high-resolution images.

First, install the Pillow library for image processing:

pip install PillowCode language: Python (python)

Second, develop a program that creates thumbnails of the pictures in the images folder and save them to the thumbs folder:

import time import os from PIL import Image, ImageFilter filenames = [ 'images/1.jpg', 'images/2.jpg', 'images/3.jpg', 'images/4.jpg', 'images/5.jpg', ] def create_thumbnail(filename, size=(50,50), thumb_dir ='thumbs'): # open the image img = Image.open(filename) # apply the gaussian blur filter img = img.filter(ImageFilter.GaussianBlur()) # create a thumbnail img.thumbnail(size) # save the image img.save(f'{thumb_dir}/{os.path.basename(filename)}') # display a message print(f'{filename} was processed...') if __name__ == '__main__': start = time.perf_counter() for filename in filenames: create_thumbnail(filename) finish = time.perf_counter() print(f'It took {finish-start:.2f} second(s) to finish')Code language: Python (python)

On our computer, it took about 4.06 seconds to complete:

images/1.jpg was processed... images/2.jpg was processed... images/3.jpg was processed... images/4.jpg was processed... images/5.jpg was processed... It took 4.06 second(s) to finishCode language: Python (python)

Third, modify the program to use multiprocessing. Each process will create a thumbnail for a picture:

import time import os from PIL import Image, ImageFilter import multiprocessing filenames = [ 'images/1.jpg', 'images/2.jpg', 'images/3.jpg', 'images/4.jpg', 'images/5.jpg', ] def create_thumbnail(filename, size=(50,50), thumb_dir ='thumbs'): # open the image img = Image.open(filename) # apply the gaussian blur filter img = img.filter(ImageFilter.GaussianBlur()) # create a thumbnail img.thumbnail(size) # save the image img.save(f'{thumb_dir}/{os.path.basename(filename)}') # display a message print(f'{filename} was processed...') def main(): start = time.perf_counter() # create processes processes = [multiprocessing.Process(target=create_thumbnail, args=[filename]) for filename in filenames] # start the processes for process in processes: process.start() # wait for completion for process in processes: process.join() finish = time.perf_counter() print(f'It took {finish-start:.2f} second(s) to finish') if __name__ == '__main__': main()Code language: Python (python)

Output:

images/5.jpg was processed... images/4.jpg was processed... images/1.jpg was processed... images/3.jpg was processed... images/2.jpg was processed... It took 2.92 second(s) to finishCode language: Python (python)

In this case, the output shows that the program processed the pictures faster.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *