Scrape Web URL (Dynamic)

Scrape Web URL (Dynamic)

Extract data from web pages by scraping their text content. This node works great for dynamic and complex sites that rely on javascript to load.

Using the Scrape Web URL Node (Dynamic)

The Scrape Web URL node's functionality includes fetching the content of a web page for the provided url. Under the hood, it uses Puppeteer (opens in a new tab) to scrape the web page thus allowing for more dynamic scraping. The node accepts these parameters:

  • URL (required): The URL of the page you want to scrape. Example: https://docs.buildship.com (opens in a new tab).

  • Selector (optional): Specific HTML selector you want to extract text content from (by default body will be used).

  • Steps (optional): List of steps to follow after loading the page in given url.

    Usage Example: Suppose you want to scrape the below information:

    Buildship google search

To begin we can setup the node, set the URL to https://www.google.com/ and the selector to #result-stats. This will extract the text content of the search result stats. The steps parameter can be used to interact with the page before extracting the text content.

Buildship google search

So, the steps input value would look something like below:

[
  // type "buildship" in google search input box
  {
    "action": "type",
    "params": ["#APjFqb", "buildship"]
  },
 
  // click on google search button
  {
    "action": "click",
    "params": [".gNO89b"]
  },
 
  // wait for searched query to load and if page doesn't load within 3 seconds, move to next step
  {
    "action": "waitForNavigation",
    "params": [
      {
        "timeout": 3000,
        "waitUntil": "load"
      }
    ]
  }
]

And after execution we get the information we're looking for:

Buildship google search

The root selector value is the selector from which you want to extract the text-content, after all steps are executed.

Each step object in steps list consists of action and params.

The action parameter is the name of any method from puppeteer-page-methods (opens in a new tab) list. And, the params is list of parameters required in the selected action (a puppeteer method name).

For one of the action - type, the parameters for the puppeteer-type-method (opens in a new tab) are:

Puppeteer type method parameters

Hence, the step object for type action would look like:

{
  // puppeteer method name
  "action": "type",
 
  // "#APjFqb" is the "selector" (the selector to find <input>)
  // "buildship" is the "text" (the value to be typed in <input>)
  // As per parameters list of "type" method, the third parameter is optional,
  // hence we can either include or exclude it from "params" list
  "params": ["#APjFqb", "buildship"]
}

Need Help?

  • 💬
    Join BuildShip Community

    An active and large community of no-code / low-code builders. Ask questions, share feedback, showcase your project and connect with other BuildShip enthusiasts.

  • 🙋
    Hire a BuildShip Expert

    Need personalized help to build your product fast? Browse and hire from a range of independent freelancers, agencies and builders - all well versed with BuildShip.

  • 🛟
    Send a Support Request

    Got a specific question on your workflows / project or want to report a bug? Send a us a request using the "Support" button directly from your BuildShip Dashboard.

  • ⭐️
    Feature Request

    Something missing in BuildShip for you? Share on the #FeatureRequest channel on Discord. Also browse and cast your votes on other feature requests.