Topics covered in this Tutorial:
- VBA Objects
- Navigating to a Site w/ Internet Explorer (IE) Object
- Understanding the basics of HTML
- Inspecting a Web Page
Note: Sadly, the parts covered hereafter will only work on Windows.
Python, Java and VBA are object-oriented languages. A Car is an object. The language recognizes the Car as an object, its properties, and methods.
Properties: License Plate No., No. of Seats, Colour, Brand, Engine Capacity
A property is an attribute of an object that defines one of the object’s characteristics.
Method: Accelerate, Brake, Steer Left / Right, Reverse, Open / Close Door
A method is an action that an object can perform.
Events: Unlocking Car, Stepping on Accelerator, Pulling Hand Brake Lever
An event is an action recognized by an object, such as clicking the mouse or pressing a key, and for which you can write code to respond.
Object-Oriented programming focuses on manipulating the object (Car) instead of the logic required to manipulate them. You do not have to instruct how to Accelerate or Brake as the Car’s properties and methods available are already known.
Navigating to a Site w/ Internet Explorer (IE) Object
Like the Car, Worksheets & Workbooks (Tutorial 3) are Objects that Excel recognises. Excel VBA also recognises Internet Explorer (IE) as an object. We will be using the IE Object, its properties and methods to scrape data.
Before we start,
On the VBA editor go to Tools > References and Check Microsoft HTML Object Library & Microsoft Internet Controls
The following code can be used to navigate to a website and then print the website’s name.
Let’s break down the code and understand each line.
Dim IE as Internet Explorer Set IE = CreateObject("InternetExplorer.Application")
This introduces the variable IE as an Internet Explorer Object and sets IE as an Internet Explorer object. (Similar to how we introduce strings in VBA)
IE.Visible = True IE.Navigate "http://www.google.com"
.Visible is a property of IE. When the code is executed you will see the Internet Explorer open up when Visible is set as True.
.Navigate is a method of IE. The browser will navigate to the URL specified above after executing this line.
Do While IE.ReadyState <> READYSTATE_COMPLETE Loop
Do While…Loop is another alternative to the For Loop we learnt in Tutorial 2.
.ReadyState is a property of IE. It specifies the state which the browser is currently at. We only want the code to proceed to the next stage when the website is fully loaded (READYSTATE_COMPLETE).
In simple terms, we are telling the code to loop (wait) as long as the browser is not ready. When it is ready, it can proceed to the next line of code.
.LocationName is a property of IE. This line will print the website name in the Immediate Window (Can’t see the Immediate Window? Press Ctrl + G on Windows).
Basics of HTML
Before we scrape data from a website, we need to understand the basics of HTML. HTML stands for Hypertext Markup Language and describes the structure of Web Pages in blocks.
Anything in (angle brackets) are tags. Tags usually come in pairs like <html> and </html>
- The <html> element is the root element of an HTML page
- The <head> element contains meta information about the document
- The <title> element specifies a title for the document
- The <body> element contains the visible page content
- The <h1> element defines a large heading
- The <p> element defines a paragraph
Want simple, easy-to-follow & streamlined tutorials for any other projects in VBA, Python or R? Leave us a comment below!