How to Scrape Amazon Product Data [Step By Step Guide]

June 25, 2021 | 4 mins read

Contents

ScrapeOwl is an easy-to-use web scraping tool that can be used for monitoring real-time data, tracking financial data or fetching product data.

A few days ago, I was looking to purchase a new monitor but I was short on budget. I used to visit Amazon multiple times a week to check the price and was eagerly waiting for the drop. So, I came up with an idea to write a Python script to check the price of my monitor listing daily and notify me once via email once the price drop to my budget. Turns out, it’s very easy to scrape Amazon through ScrapeOwl and you won’t need to manipulate anything including IP or headers.

In this post, I’ll show how to use Python, ScrapeOwl & Requests to perform quick and effective techniques to scrap the Amazon. Python will be used as the programming language, requests will be used to send the request to the website and the scraping work will be done through ScrapeOwl API. This is a beginner-friendly tutorial, no advanced knowledge of code is required. Let’s dive in.

Step 1 - Find the URL

For this tutorial, we will scrape the Amazon website to extract the Name and Price of the monitor. The URL for this page is https://www.amazon.com/LG-32GN50T-B-Ultragear-Monitor-Compatibility/dp/B08KTN8KFH/ref=sr_1_13?crid=3QJW8HC4OILVV&dchild=1&keywords=monitor+32+inch&qid=1617046975&sprefix=monitor+%2Caps%2C459&sr=8-13

Step 2 - Inspect the Page

Next, we need to inspect the source code for the web page. The data is usually nested under id and class tags. Start by moving the pointer to the title and price, to inspect, just right click on the element and click on “Inspect”. Copy the CSS selector of the title and price from the page.

Title Inspect

Price Inspect

Step 3 - Building an API Request

To build an API Request, sign in at scrapeowl.com and head over to the dashboard to create API Request.

  • Change the Element type to CSS.
  • Paste the CSS Selector for title and price copied from the listings page.

Dash correct

The API request is ready to be used. Now, all we need is to write a small Python program to send the request to ScrapeOwl.

Step 4- Write the Code

Let’s create a python file and import the necessary libraries.

import requests
import json
import smtplib
from smtplib import SMTP

The REQUESTS module helps to send HTTP requests in Python. JSON module is used to work with JSON data. SMTPLIB module is used to send mail to any device with the SMTP protocol

We need to write a script that could send the request to scrape the title and price of the product from the webpage.

def fun():
    base_url = "https://api.scrapeowl.com/v1/scrape"
    object = {
	"api_key": "V1ASuGyUEwxQfjrIZfpEwpuxqqbLBIuuAqOOmDC2zJ79QalzNYGZIOPuBiUK",
	"url": "https://www.amazon.com/LG-32GN50T-B-Ultragear-Monitor-Compatibility/dp/B08KTN8KFH/ref=sr_1_13?crid=3QJW8HC4OILVV&dchild=1&keywords=monitor+32+inch&qid=1617046975&sprefix=monitor+%2Caps%2C459&sr=8-13",
	"elements": [
		{
				"type": "css",
				"selector": "#productTitle"
		},
		{
				"type": "css",
				"selector": "#priceblock_ourprice"
		}

]

}
    data = json.dumps(object)
    response = requests.post(base_url, data)
    response = response.json()
    print (response)
  • The base_url variable defines the URL endpoint needed to send the request to ScrapeOwl
  • The object variable defines the JSON request we created earlier from ScrapeOwl dashboard.
  • data variable defines a json.dumps() function that converts a Python object into a JSON string.
  • Response variable defines a method that sends a POST request to the ScrapeOwl API and converts the response to JSON format.

By running the above code, you will see an output something similar to this.

[{'type': 'css', 'selector': '#productTitle', 'results': [{'text': 'LG 32GN50T-B 32" Class Ultragear FHD Gaming Monitor with G-SYNC Compatibility (Renewed)'}]}, {'type': 'css', 'selector': '#priceblock_ourprice', 'results': [{'text': '$219.99'}]}]

Up till now, we have scraped the title and price of our product but let’s refine the output a little bit more to get the title and price.

title = response[0]['results'][0]['text']
price = response[1]['results'][0]['text']
price = float(price [1:])
print (title)
print (price)

if (price < target_price):
	print ('Price Match')
        send_email()

Final output:

LG 32GN50T-B 32" Class Ultragear FHD Gaming Monitor with G-SYNC Compatibility (Renewed)
219.99

Step 5 - Sending Emails in a standard format

An automated email will be generated once the price hits the targeted price.

  • Establish the SMTP connection at port 587
  • Start TLS based SMTP Session
  • Insert the secret credentials along with the email address.
  • Set email parameters
  • Terminate the SMTP session
def send_email():
    try:
        server= smtplib.SMTP('smtp.gmail.com:587')
        server.ehlo()
        server.starttls()
        server.ehlo()
        server.login('lalbikhan2014@gmail.com', 'Your App password here') #Sender email here
        subject="Pricedown alert"
        body="This is an automated message to inform you that monitor price is now below your targeted price. Thanks! "
        msg = f"Subject:{subject}\n\n{body}"
        server.sendmail( 'lalbikhan2014@gmail.com',
        'bilal.khan2014@hotmail.com',
        msg)
    except:
        print ("Error, Email not sent")

    print ('\nEmail sent success\n')
    server.quit()

The complete code is available here. Feel free to check it out.

Thanks for reading. I hope you found it helpful.

Till next time :) Happy scraping!

Bilal K.

ScrapeOwl

Web scraping API and proxy service

© ScrapeOwl 2024