Find information from JSONs hidden in HTML code (advanced)

Sometimes webpages hide more information than it shows.

The modern web being dynamic, it uses Javascript extensively to display information. Therefore, data is often pulled from an API, but in many cases it is already available in the source code.

On the other hand, developers are rather lazy. They often leave more data than what the page uses.

ImportFromWeb helps you retrieve this data and display it as a well-structured table that you can manipulate with ease

From something like this:

Get this:

Search into the source code

Use the Chrome Developer Tool to search for some data.

If you’re not sure how to use the Google Developer tool, check the guide:

https://playground.nodatanobusiness.com/resources/find-an-xpath-with-little-html-knowledge/

Get the selector corresponding to the wrapping tag

Call IMPORTFROMWEB()

Open a Google Spreadsheet and call the IMPORTFROMWEB() function (Make sure the add-on has been previously activated).

=IMPORTFROMWEB(A1,"//*[@type='application/ld+json'][contains(text(),'Product')]")

A1 is a reference to the url of the page

//*[@type='application/ld+json'][contains(text(),'Product')] is the xPath retrieved in the previous step

You should see the JSON text displayed on a. single cell

Call IMPORTFROMWEB() again

IMPORTFROMWEB() accepts a series of options that give it superpowers. Here will use the isJSON option.

=IMPORTFROMWEB( your_data_source, your_json_filters, "isJSON")

The data source being the output we received in the previous step, we can wrap it like this:

=IMPORTFROMWEB(IMPORTFROMWEB(A1,"//*[@type='application/ld+json'][contains(text(),'Product')]"), ,"isJSON, showHeaders")

As a result, we obtain a structured table with all the data from our JSON

Sharp eyes may have seen that we also added the showHeaders option. It will display the keys of the JSON on the first column

The isJSON option provides the same features as ImportJSON straight into your IMPORTFROMWEB function

Filter the table

In the previous formula, we left an empty space

=IMPORTFROMWEB(IMPORTFROMWEB(A1,"//*[@type='application/ld+json'][contains(text(),'Product')]"), [EMPTY SPACE] ,"isJSON, showHeaders")

This is the place for filters. Filters are optional which is why we left this space free. Filters work the same way as selectors

You may want to fill the space to only show the rows you need. In this case, let’s say we need to identify the product name and the price. The function will look like:

=IMPORTFROMWEB(IMPORTFROMWEB(A1,"//*[@type='application/ld+json'][contains(text(),'Product')]"), "/name, offers/price" ,"isJSON, showHeaders")

It is not compulsory to add a slash before the filter, but doing so indicates that we want to get the name at the root. Otherwise it would display all the filters that contain a key called name!