Scrape Web URL (Dynamic)
Extract data from web pages by scraping their text content. This node works great for dynamic and complex sites that rely on javascript to load.
Using the Scrape Web URL Node (Dynamic)
The Scrape Web URL node's functionality includes fetching the content of a web page for the provided url. Under the hood, it uses Puppeteer (opens in a new tab) to scrape the web page thus allowing for more dynamic scraping. The node accepts these parameters:
-
URL (required): The URL of the page you want to scrape. Example: https://docs.buildship.com (opens in a new tab).
-
Selector (optional): Specific HTML selector you want to extract text content from (by default body will be used).
-
Steps (optional): List of steps to follow after loading the page in given url.
Usage Example: Suppose you want to scrape the below information:
To begin we can setup the node, set the URL to https://www.google.com/
and the selector to #result-stats
. This will
extract the text content of the search result stats. The steps
parameter can be used to interact with the page before
extracting the text content.
So, the steps
input value would look something like below:
[
// type "buildship" in google search input box
{
"action": "type",
"params": ["#APjFqb", "buildship"]
},
// click on google search button
{
"action": "click",
"params": [".gNO89b"]
},
// wait for searched query to load and if page doesn't load within 3 seconds, move to next step
{
"action": "waitForNavigation",
"params": [
{
"timeout": 3000,
"waitUntil": "load"
}
]
}
]
And after execution we get the information we're looking for:
The root selector
value is the selector from which you want to extract the text-content, after all steps are executed.
Each step object in steps
list consists of action
and params
.
The action
parameter is the name of any method from
puppeteer-page-methods (opens in a new tab) list. And, the params
is list of parameters
required in the selected action
(a puppeteer method name).
For one of the action
- type
, the parameters for the
puppeteer-type-method (opens in a new tab) are:
Hence, the step object for type
action would look like:
{
// puppeteer method name
"action": "type",
// "#APjFqb" is the "selector" (the selector to find <input>)
// "buildship" is the "text" (the value to be typed in <input>)
// As per parameters list of "type" method, the third parameter is optional,
// hence we can either include or exclude it from "params" list
"params": ["#APjFqb", "buildship"]
}
Need Help?
- 💬Join BuildShip Community
An active and large community of no-code / low-code builders. Ask questions, share feedback, showcase your project and connect with other BuildShip enthusiasts.
- 🙋Hire a BuildShip Expert
Need personalized help to build your product fast? Browse and hire from a range of independent freelancers, agencies and builders - all well versed with BuildShip.
- 🛟Send a Support Request
Got a specific question on your workflows / project or want to report a bug? Send a us a request using the "Support" button directly from your BuildShip Dashboard.
- ⭐️Feature Request
Something missing in BuildShip for you? Share on the #FeatureRequest channel on Discord. Also browse and cast your votes on other feature requests.