VBA Tutorial 6: Web Scraping Pt 2

Topics covered in this Tutorial:

  • Inspecting a Web Page
  • Scraping Data from a Web Page

Inspecting a Web Page

Let’s take a look at the Car Listings on Carousell: https://carousell.com/categories/cars-32/cars-for-sale-1173/

1) To view the HTML code of a part of the website, Right Click on the first listing and select “Inspect”

01 Inspect Element.gif

2) Try to move your cursor over the higher level tags and find the line which covers the entire listing

Tutorial5 03 Find Encompassing Line.gif

3) Now you can observe that the line which encompasses one listing is:

div class="col-lg-3 col-md-4 col-sm-4 col-xs-6"

Tag Name: div
Class Name: col-lg-3 col-md-4 col-sm-4 col-xs-6

4) Find the exact location where the information we need (Car Name, Price, Reg Year, Depreciation) can be found. Right-click on the info and select “Inspect”.

Tutorial5 04 Find Details.gif

The listing name, price, registration year and depreciation can be found in this block of code:

Listing Details Screenshot.png

Tag Name: figcaption
Class Name: A-N

It is important to take note of the Tag Name & Class Name as we will use them to identify which parts of the page to scrape. Now that we have inspected the page and know what to look for we are all set to scrape data!

Scraping Data

1) Loop through each Car listing on the Website 

The first 9 lines are from Section 1 of this Tutorial. Let’s explore the remaining lines.

Dim CarListings As IHTMLElementCollection

This line introduces CarListings as an object that can hold multiple HTML elements (all the car listings on the website).

Dim CarListing As IHTMLElement

This line introduces CarListing as an object that holds just 1 HTML element.

Set CarListings = IE.Document.body.getElementsByClassName("col-lg-3 col-md-4 col-sm-4 col-xs-6")

We observed that each listing is found under the class name “col-lg-3 col-md-4 col-sm-4 col-xs-6”. Therefore we assign all the elements (all listings) with this class name to CarListings. CarListings now has all 40 cars assigned to it.

For Each CarListing In CarListings
     Debug.Print CarListing.innerText
Next CarListing

With this chunk of code, we are looping through each listing in the collection and printing out the innerText of each listing.

Try this out!
Change line 15 in the code to "Debug.Print CarListing.InnerHTML"

Compare the results for .innerHTML with .innerText
.innerHTML will be print the HTML tags you see in the Inspect Window.

2) Get the Car Name from each Car Listing

Now that we have gotten Text from all the Car listings, we need to filter out just the information we need.

Listing Details Screenshot.png

The Listing Names are found under Class=”A-i” under each listing. The remaining information are found under tag <dl>. Let’s filter out the Listing Names first.

Dim ItemName As IHTMLElementCollection

This line introduces ItemName as an object that will hold a collection of elements.

Set ItemName = CarListing.getElementsByClassName("A-i")
Debug.Print ItemName(0).innerText

Notice that instead of IE.Document.Body.getElementsByClassName we used CarListing.getElementsbyClassName to find out the Item Name. This is because the Item Name we want to find is already inside CarListing and we don’t need to search the entire Document again.

We use ItemName(0).innerText to print out the Car Name. ItemName(0) refers to the 1st element in the ItemName collection. As we know that there is only 1 Car Name per listing, we use ItemName(0). ItemName(2) refers to the 3rd element in the ItemName collection.

3) Get the Car Price, Car Registration Year and Depreciation from each Car Listing

Dim ItemDetails As IHTMLElementCollection
Dim ItemDetail As IHTMLElement

Similar to CarListings and CarListing, we introduce ItemDetails to hold a collection of elements (Price, Reg Year & Depreciation) and ItemDetail to hold just one element.

Other Information Screenshot

Set ItemDetails = CarListing.getElementsByTagName("dd")

Notice that the Item Details (boxed up in red) are in “<dd>” tags and do not have Class Names. Hence we use .get ElementsbyTagName(“dd”) instead.

For Each ItemDetail In ItemDetails
     Debug.Print ItemDetail.innerText
Next ItemDetail

Similar to CarListing in CarListings, we print each Item Detail in the immediate window.

Your output should look like this:


4) Instead of Debug.Print, write the values to Excel Workbook

Writing values to cells was covered in Tutorial 2. Transferring data from the immediate window to the Excel file is extremely important. Not everyone knows Excel VBA and thus the data must be displayed on the Excel surface.

Want simple, easy-to-follow & streamlined tutorials for any other projects in VBA, Python or R? Leave us a comment below!

Previous: Tutorial 5 Web Scraping Pt. 1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s