shot-scraper

Created
Oct 19, 2023 6:54 PM
URL
https://shot-scraper.datasette.io/en/stable/index.html
Type

A command-line utility for taking automated screenshots of websites

Quick start:

pip install shot-scraper
shot-scraper install
shot-scraper https://github.com/simonw/shot-scraper -h 900

Produces this screenshot in a file called github-com-simonw-shot-scraper.png:

image

Contents

  • Installation
    • shot-scraper install --help
  • Taking a screenshot
    • Adjusting the browser width and height
    • Screenshotting a specific area with CSS selectors
    • Specifying elements using JavaScript filters
    • Waiting for a delay
    • Waiting until a specific condition
    • Executing custom JavaScript
    • Using JPEGs instead of PNGs
    • Retina images
    • Transparent background
    • Interacting with the page
    • Logging all requests
    • Taking screenshots of local HTML files
    • Tips for executing JavaScript
    • Viewing console.log() output
    • shot-scraper shot --help
  • Websites that need authentication
    • shot-scraper auth --help
  • Taking multiple screenshots
    • shot-scraper multi --help
  • Scraping pages using JavaScript
    • Running more than one statement
    • Using async/await
    • Running JavaScript from a file
    • Using this for automated tests
    • Example: Extracting page content with Readability.js
    • shot-scraper javascript –help
  • Saving a web page to PDF
    • shot-scraper pdf --help
  • Dumping the HTML of a page
    • Retrieving the HTML for a specific element
    • shot-scraper html --help
  • Dumping out an accessibility tree
    • shot-scraper accessibility --help
  • Using shot-scraper with GitHub Actions
    • shot-scraper-template
    • Building a workflow from scratch
    • Optimizing PNGs using Oxipng
  • Contributing
    • Documentation
    • Tweeting the release notes

shot-scraper

A command-line utility for taking automated screenshots of websites

For background on this project see shot-scraper: automated screenshots for documentation, built on Playwright.

Documentation

  • Full documentation for shot-scraper
  • Tutorial: Automating screenshots for the Datasette documentation using shot-scraper
  • Release notes

Get started with GitHub Actions

To get started without installing any software, use the shot-scraper-template template to create your own GitHub repository which takes screenshots of a page using shot-scraper. See Instantly create a GitHub repository to take screenshots of a web page for details.

Quick installation

You can install the shot-scraper CLI tool using pip:

pip install shot-scraper
# Now install the browser it needs:
shot-scraper install

Taking your first screenshot

You can take a screenshot of a web page like this:

shot-scraper https://datasette.io/

This will create a screenshot in a file called datasette-io.png.

Many more options are available, see Taking a screenshot for details.

Examples

  • The shot-scraper-demo repository uses this tool to capture recently spotted owls in El Granada, CA according to this page, and to generate an annotated screenshot illustrating a Datasette feature as described in my blog.
  • The Datasette Documentation uses screenshots taken by shot-scraper running in the simonw/datasette-screenshots GitHub repository, described in detail in Automating screenshots for the Datasette documentation using shot-scraper.
  • Ben Welsh built @newshomepages, a Twitter bot that uses shot-scraper and GitHub Actions to take screenshots of news website homepages and publish them to Twitter. The code for that lives in palewire/news-homepages.
  • scrape-hacker-news-by-domain uses shot-scraper javascript to scrape a web page. See Scraping web pages from the command-line with shot-scraper for details of how this works.