So you’d like to scrape data from a website but aren’t sure how to get started. Luckily, the ImportFromWeb tool makes it simple for people who don’t have a technical background to easily scrape data into Google Sheets.
To properly set yourself up, the first thing you need to do is install the ImportFromWeb add-on into your Google Sheets. This is an easy process that just takes a minute and enables you to easily extract a high-volume of data from most websites.
Follow these steps:
1. Simply, install the tool into your Google Sheets from the Google Workspace Marketplace following the step-by-step instructions indicated by the add-on.
2. From there, you can access the tool within any given spreadsheet by activating it through your menu: Extensions >> ImportFromWeb >> Activate
3. A side bar opens and gives access to helpful demos that can give you further insights into how to leverage the tool within your web data research.
Once you’re all set-up, it’s time to understand how to collect powerful data using CSS selectors or XPaths that tell ImportFromWeb what information you would like to scrape.
For more advanced users who already have a grasp of what CSS is, you can learn how to create your own CSS and XPath for data scraping in those posts. Otherwise, let’s jump into the basics of CSS!
If you’re asking yourself what CSS and XPath are, don’t worry – we’ve got you covered. In this post we’re going to focus on CSS but if you’d like to learn more about XPath, head right over here.
CSS is a stylesheet language that is used to create what you see on a webpage. With a basic understanding of CSS, you can take certain characteristics of a webpage and use them as an indicator for the information that you’d like to scrape from that webpage.
Say, for example, that we’d like to gather the names of products on an online webshop. By determining how product names on this page are defined in the language of CSS, you can instruct ImportFromWeb to extract these product names with just the ‘CSS selector’ aka that CSS definition that qualifies product names on this page.
If it sounds complicated, we promise it’s not. One time through the process, and you’ll be easily instructing ImportFromWeb to scrape your website data with CSS selectors! But to make it that much easier for you, we’ve also created accessible solutions that enable users to facilitate this step of the process.
Let’s take a look at how you can do this yourself with the ImportFromWeb tool.
Once you’re open to a page that you’d like to extract data from, the first step is understanding how the information that you’d like to extract is laid out in the language of CSS. The Developer tool in your Google Chrome browser grants you access to this information. Similar versions of the Developer tool can also be found in other browsers.
If you’re using Google Chrome, open the Chrome Menu in the upper-right-hand corner of the browser window and select More Tools followed by Developer Tools. You can also use the shortcut Option + Command + J (on macOS), or Shift + CTRL + J (on Windows/Linux).
The important step here is to click on the arrow symbol in the top left-hand corner that says: “Select an element in the page to inspect it.” With this enabled, you’ll be able to view the information related to the elements that you’d like to extract.
When you click on an element, you’ll see that it highlights the related code within the Developer view. Right click on the highlighted code, then choose Copy, and finally Copy selector. Now you’re ready to feed this information to the =IMPORTFROMWEB() function so that it can easily extract your data.
Turning to your Google Sheets, it’s time to write our first function with the copied CSS selector. Simply input:
=IMPORTFROMWEB([the url you want to search], [the copied CSS selector that describes the information you want to import])
Then watch as the tool scrapes this data from the website of your choosing, copying it directly to your spreadsheet.
If you’re still unsure, feel free to browse our ready-to-use solutions, which make it easier for users to follow this process on popular websites like Amazon, Google, Instagram, Yahoo Finance….
- h1 → Typically, this is the title of the web page, which indicates the topic and key information.
- h2 or h3 → Here you will find subtitles that explain subtopics within the overall topic, as well as steps to further explain it.
- body → This is generally where you will find normal paragraphs or body text with information that gives more specific detail about the topic.
- a/href → This indicates any links that are embedded into the text on the page.
If you’d like further resources to expand your knowledge on web scraping with CSS selectors, you can read more about how to use them in this blogpost. Plus, check out this full list of CSS selectors or take a deeper dive into the theory of CSS selectors.
And if you have any questions or concerns in the process, feel free to reach out to us. We would be happy to provide personal support as you work with ImportFromWeb and CSS Selectors!