VBA Tutorial 5: Web Scraping Pt 1

Topics covered in this Tutorial:

  • VBA Objects
  • Navigating to a Site w/ Internet Explorer (IE) Object
  • Understanding the basics of HTML
  • Inspecting a Web Page

Note: Sadly, the parts covered hereafter will only work on Windows. 

VBA Objects

Python, Java and VBA are object-oriented languages. A Car is an object. The language recognizes the Car as an object, its properties, and methods.

object-oriented-programming.jpg

Object: Car

Properties: License Plate No., No. of Seats, Colour, Brand, Engine Capacity
A property is an attribute of an object that defines one of the object’s characteristics.

Method: Accelerate, Brake, Steer Left / Right, Reverse, Open / Close Door
A method is an action that an object can perform.

Events: Unlocking Car, Stepping on Accelerator, Pulling Hand Brake Lever
An event is an action recognized by an object, such as clicking the mouse or pressing a key, and for which you can write code to respond. 

Object-Oriented programming focuses on manipulating the object (Car) instead of the logic required to manipulate them. You do not have to instruct how to Accelerate or Brake as the Car’s properties and methods available are already known.

Navigating to a Site w/ Internet Explorer (IE) Object

Like the Car, Worksheets & Workbooks (Tutorial 3) are Objects that Excel recognises. Excel VBA also recognises Internet Explorer (IE) as an object. We will be using the IE Object, its properties and methods to scrape data.

Before we start,
On the VBA editor go to Tools > References and Check Microsoft HTML Object Library & Microsoft Internet Controls

Tutorial5 01 References.gif

The following code can be used to navigate to a website and then print the website’s name.

Let’s break down the code and understand each line.

Dim IE as Internet Explorer
Set IE = CreateObject("InternetExplorer.Application")

This introduces the variable IE as an Internet Explorer Object and sets IE as an Internet Explorer object. (Similar to how we introduce strings in VBA)

IE.Visible = True
IE.Navigate "http://www.google.com"

.Visible is a property of IE. When the code is executed you will see the Internet Explorer open up when Visible is set as True.

.Navigate is a method of IE. The browser will navigate to the URL specified above after executing this line.

Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop

Do While…Loop is another alternative to the For Loop we learnt in Tutorial 2.

.ReadyState is a property of IE. It specifies the state which the browser is currently at. We only want the code to proceed to the next stage when the website is fully loaded (READYSTATE_COMPLETE).

In simple terms, we are telling the code to loop (wait) as long as the browser is not ready. When it is ready, it can proceed to the next line of code.

Debug.Print IE.LocationName

.LocationName is a property of IE. This line will print the website name in the Immediate Window (Can’t see the Immediate Window? Press Ctrl + G on Windows).

Basics of HTML

Before we scrape data from a website, we need to understand the basics of HTML. HTML stands for Hypertext Markup Language and describes the structure of Web Pages in blocks.

Anything in (angle brackets) are tags. Tags usually come in pairs like <html> and </html>

  • The <html> element is the root element of an HTML page
  • The <head> element contains meta information about the document
  • The <title> element specifies a title for the document
  • The <body> element contains the visible page content
  • The <h1> element defines a large heading
  • The <p> element defines a paragraph
<a href = "https://google.com">This is a link</a>

The tag used here is <a>, which is used to attach a hyperlink to text (“This is a link”). Href is an attribute that provides additional information about the tag. 

Want simple, easy-to-follow & streamlined tutorials for any other projects in VBA, Python or R? Leave us a comment below!

Next: Tutorial 6 Web Scraping Pt 2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s