With our custom-built function, ImportFromWeb, you can easily scrape and extract data from Yelp. It can be interesting to compare your business with similar ones in the same area to be sure that you can retain customers and attract new ones.
In this tutorial, we’ll explain how you can scrape useful data quickly without having to worry to write complex code. It will be as easy as using a regular Excel function! Ready?
How does it work?
Our function, build on top of Google Sheets extending its functionalities, can extract any element available on a webpage. You can retrieve directly in Google Sheets any information you need and cross it with other data sources. It also supports JavaScript rendering to ensure that you can scrape any visible element, even if we won’t need this functionality for Yelp.
The only pieces of information you need are:
- An URL or a list of URLs you want to scrape data from
- The Xpath of the elements you want to extract. If you are not familiar with this concept, don’t be afraid because we’ll provide them to you during this tutorial.
What are we going to scrape from Yelp?
To compare similar businesses from Yelp, you need the following details:
- Name
- Address
- Phone number
- Rating
- Number of ratings
1. Get the businesses URLs we are going to scrape
The first step to scrape data from a list of local businesses from Yelp is to get the list of URLs for the businesses we want to scrape data from.
In this tutorial, we are going to compare several burger restaurants located in Barcelona. You can obviously pick the businesses that are the most relevant for you.
2. Include them in a Google spreadsheet
Once you have defined all the product URLs you want to scrape, include them in a Google spreadsheet:
Our formula will use this list as a crawler does: crawl them one by one and retrieve the information identified by the Xpath provided at the next step.
3. List the Xpath you need
As explained at the beginning, if you want to scrape data from Yelp using ImportFromWeb, you need to find the Xpath of the elements you want.
We created the following for you:
- Name: //h1
- Rating: //div[contains(@class,”photo-header-content-container”)]//div[contains(@aria-label, “star rating”)]/@aria-label
- Number of reviews: //div[contains(@class,”photo-header-content-container”)]//span[contains(text(), “reviews”)]
- Price Range : //div[contains(@class,”photo-header-content-container”)]/div/div/div/span[2]
- Category: //div[contains(@class,”photo-header-content-container”)]/div/div/div/div[3]//span[2]
- Location: //section//a[text()=”Get Directions”]/following-sibling::p
- Phone Number: //p[text()=”Phone number”]/following-sibling::p
- Website: //section//p[text()=”Business website”]/following-sibling::p/a/@href
If you need to extract something else, you can follow our Xpath tutorial or get in touch with us and we’d be happy to help.
4. Add the Xpath to the Google Sheets
These Xpath will be used by our formula, hence you need to add it to the Google Sheets you use to extract your data.
5. Install and activate ImportFromWeb
To use our custom formula, you need to install it from the Google Marketplace and then activate it under Add-ons > IMPORTFROMWEB > Activate add-on.
This step is mandatory to be able to use the formula.
6. Run the function to scrape data from Yelp
We can now run our function, which needs only two parameters to work correctly:
=IMPORTFROMWEB(urls_to_crawl;Xpath_of_elements_to_scrape)
How simple and cool is that?
To compare easily the scraped information, just add headers to your table:
And then use the function. Here we use a TRANSPOSE() function with our ImportFromWeb because, by default, results are stacked vertically and not horizontally.
As you can see, you can build quickly a local dashboard to track or benchmark your competition. Moreover, you can schedule the execution of this extraction to track changes that can affect your business.