Python Multithreading Example

Summary: in this tutorial, you’ll learn how to use the Python threading module to develop a multithreaded program.

Extending the Thread class

We’ll develop a multithreaded program that scraps the stock prices from the Yahoo Finance website.

To do that, we’ll use two third-party packages:

  • requests – to get the contents of a webpage.
  • lxml – to select a specific element of an HTML document.

First, install the requests and lxml modules using the pip command:

pip install request lxmlCode language: Python (python)

Next, define a new class called Stock that inherits from the Thread class of the threading module. We’ll place the Stock class in stock.py module:

import threading class Stock(threading.Thread): passCode language: Python (python)

Then, implement the __init__() method that accepts a symbol and initializes the url instance variable based on the symbol:

import threading import requests from lxml import html class Stock(threading.Thread): def __init__(self, symbol: str) -> None: super().__init__() self.symbol = symbol self.url = f'https://finance.yahoo.com/quote/{symbol}' self.price = NoneCode language: Python (python)

For example, if you pass the symbol GOOG to the __init__() method, the URL will be:

https://finance.yahoo.com/quote/GOOGCode language: Python (python)

After that, override the run() method of the Thread class. The run() method gets the contents from the self.url and grabs the stock price:

class Stock(threading.Thread): def __init__(self, symbol: str) -> None: super().__init__() self.symbol = symbol self.url = f'https://finance.yahoo.com/quote/{symbol}' self.price = None def run(self): headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" } response = requests.get(self.url, headers=headers) if response.status_code == 200: # parse the HTML tree = html.fromstring(response.text) # get the price in text price_text = tree.xpath( '//*[@id="quote-header-info"]/div[3]/div[1]/div[1]/fin-streamer[1]/text()') if price_text: try: self.price = float(price_text[0].replace(',', '')) except ValueError: self.price = None def __str__(self): return f'{self.symbol}\t{self.price}'Code language: Python (python)

How it works.

Make a request to the URL using the requests.get() method:

headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" } response = requests.get(self.url, headers=headers)Code language: Python (python)

Notice that without valid headers, Yahoo will return 404 instead of 200.

If the request is successful, the HTTP status code is 200. In this case, we get the HTML contents from the response and pass it to the fromstring() function of the html module from the lxml package:

if response.status_code == 200: tree = html.fromstring(response.text)Code language: Python (python)

Every element on a webpage can be selected using something called XPath.

To get the XPath of an element using Google Chrome, you inspect the page, right-click the element, select copy, and Copy XPath.

The XPath of the stock price at the time of writing this tutorial is as follows:

//*[@id="quote-header-info"]/div[3]/div[1]/div[1]/fin-streamer[1]Code language: Python (python)

To get the text of the element, you append the text() at the end of the XPath:

//*[@id="quote-header-info"]/div[3]/div[1]/div[1]/fin-streamer[1]/text()Code language: Python (python)

Notice that if Yahoo changes the page structure, you need to change the XPath accordingly. Otherwise, the program won’t work as expected:

price_text = tree.xpath('//*[@id="quote-header-info"]/div[3]/div[1]/div[1]/fin-streamer[1]/text()')Code language: Python (python)

Once getting the price as text, we remove the comma and convert it to a number:

if price_text: try: self.price = float(price_text[0].replace(',', '')) except ValueError: self.price = NoneCode language: Python (python)

Finally, add the __str__() method that returns the string representation of the Stock object:

import threading import requests from lxml import html class Stock(threading.Thread): def __init__(self, symbol: str) -> None: super().__init__() self.symbol = symbol self.url = f'https://finance.yahoo.com/quote/{symbol}' self.price = None def run(self): headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" } response = requests.get(self.url, headers=headers) if response.status_code == 200: # parse the HTML tree = html.fromstring(response.text) # get the price in text price_text = tree.xpath( '//*[@id="quote-header-info"]/div[3]/div[1]/div[1]/fin-streamer[1]/text()') if price_text: try: self.price = float(price_text[0].replace(',', '')) except ValueError: self.price = None def __str__(self): return f'{self.symbol}\t{self.price}' Code language: Python (python)

Using the Stock class

The following main.py module uses the Stock class from the stock.py module:

from stock import Stock symbols = ['MSFT', 'GOOGL', 'AAPL', 'META'] threads = [] for symbol in symbols: t = Stock(symbol) threads.append(t) t.start() for t in threads: t.join() print(t) Code language: Python (python)

Output:

MSFT 253.67 GOOGL 2280.41 AAPL 145.86 META 163.27Code language: Python (python)

How it works.

First, import the Stock class from the stock.py module:

from stock import StockCode language: Python (python)

Second, initialize a list of symbols:

symbols = ['MSFT', 'GOOGL', 'AAPL', 'META']Code language: Python (python)

Third, create a thread for each symbol, start it, and append the thread to the threads list:

threads = [] for symbol in symbols: t = Stock(symbol) threads.append(t) t.start() Code language: Python (python)

Finally, wait for all the threads in the threads list to complete and print out the stock price:

for t in threads: t.join() print(t)

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *