Running Screaming Frog on GCP with Cloud Run Jobs

Running Screaming Frog on a VM means paying for idle time between crawls. Cloud Run Jobs let you spin up a container, run the crawl, and shut down - you only pay for actual compute time.

At Precis, we built an internal service that runs Screaming Frog crawls at scale using GCP. This post walks through the core setup - a simplified version you can deploy in about 30 minutes.

For this proof-of-concept, we’re going to use Cloud Run Jobs, Cloud Storage, and a simple Dockerfile combined with a bash filed used as its entrypoint.

What we’re building

The setup is straightforward:

Dockerfile that installs Screaming Frog
Entrypoint script that runs the crawl
Cloud Run Job that executes the container
GCS bucket to store exports

Prerequisites

You’ll need:

Google Cloud Project with billing enabled
gcloud CLI installed and configured
Screaming Frog license
Basic Docker knowledge

Building the Docker image

Create a Dockerfile:

 1FROM ubuntu:22.04
 2
 3# Install dependencies
 4RUN apt-get update && apt-get install -y \
 5    openjdk-11-jre \
 6    xvfb \
 7    wget \
 8    ca-certificates \
 9    && rm -rf /var/lib/apt/lists/*
10
11# Download and install Screaming Frog
12RUN wget -O /tmp/screamingfrog.deb \
13    https://download.screamingfrog.co.uk/products/seo-spider/screamingfrogseospider_23.1_all.deb && \
14    apt-get update && \
15    apt-get install -y /tmp/screamingfrog.deb && \
16    rm /tmp/screamingfrog.deb
17
18# Copy entrypoint script
19COPY entrypoint.sh /entrypoint.sh
20RUN chmod +x /entrypoint.sh
21
22ENTRYPOINT ["/entrypoint.sh"]

A few things to note:

Ubuntu 22.04: Screaming Frog’s official .deb package works best on Ubuntu
xvfb: Provides a virtual display for headless operation (Screaming Frog technically needs X11)
OpenJDK 11: Screaming Frog runs on Java

Now create entrypoint.sh:

 1#!/bin/bash
 2set -e
 3
 4URL="${CRAWL_URL}"
 5OUTPUT_DIR="${OUTPUT_DIR:-/mnt/crawl-output}"
 6
 7# Setup license and EULA
 8mkdir -p /root/.ScreamingFrogSEOSpider
 9echo "${SF_LICENSE}" > /root/.ScreamingFrogSEOSpider/licence.txt
10cat > /root/.ScreamingFrogSEOSpider/spider.config << 'EOF'
11eula.accepted=1
12EOF
13
14# Run the crawl
15xvfb-run screamingfrogseospider \
16    --headless \
17    --crawl "$URL" \
18    --output-folder "$OUTPUT_DIR" \
19    --export-tabs "Internal:All,External:All,Response Codes:All,Page Titles:All,Meta Description:All,H1:All,Images:All" \
20    --overwrite \
21    --save-crawl
22
23echo "Crawl completed successfully"

The script does three things:

License setup: Screaming Frog CLI requires a license file at /root/.ScreamingFrogSEOSpider/licence.txt
EULA acceptance: Required for it to run.
Runs the crawl: xvfb-run provides the virtual display, and we export a few selected tabs - feel free to edit.

So now in your directory you should have:

.
├── Dockerfile
└── entrypoint.sh

Setting up GCP resources

These next steps below assumes you have gcloud sdk installed and that you are somewhat familiar with GCP.

Note that many of the steps below requires you to have these two ENVs set.

1PROJECT_ID="your-gcp-project"
2REGION="your-preferred-region"

Enable the required APIs:

1gcloud services enable artifactregistry.googleapis.com run.googleapis.com

Create a storage bucket for the CSV exports that Screaming Frog will generate. We’ll later mount this bucket to the Cloud Run Job instance.

1gsutil mb -l ${REGION} gs://${PROJECT_ID}-crawl-output

Create a service account for the job and give it permission to read, write and delete files:

1gcloud iam service-accounts create screamingfrog-runner \
2    --display-name="ScreamingFrog Runner"
3
4gcloud projects add-iam-policy-binding ${PROJECT_ID} \
5    --member="serviceAccount:screamingfrog-runner@${PROJECT_ID}.iam.gserviceaccount.com" \
6    --role="roles/storage.objectAdmin"

Store your Screaming Frog license in Secret Manager:

 1# Enable Secret Manager API
 2gcloud services enable secretmanager.googleapis.com
 3
 4# Create secret
 5echo -n "YOUR-LICENSE-KEY" | gcloud secrets create screamingfrog-license \
 6    --data-file=-
 7
 8# Grant access to service account
 9gcloud secrets add-iam-policy-binding screamingfrog-license \
10    --member="serviceAccount:screamingfrog-runner@${PROJECT_ID}.iam.gserviceaccount.com" \
11    --role="roles/secretmanager.secretAccessor"

Deploying the Cloud Run Job

Deploy the job directly from source (this builds the image using Cloud Build and deploys in one command):

 1gcloud run jobs deploy screamingfrog-crawler \
 2    --source . \
 3    --region=${REGION} \
 4    --service-account=screamingfrog-runner@${PROJECT_ID}.iam.gserviceaccount.com \
 5    --cpu=4 \
 6    --memory=16Gi \
 7    --max-retries=0 \
 8    --task-timeout=3600 \
 9    --set-env-vars=OUTPUT_DIR=/mnt/crawl-output \
10    --set-secrets=SF_LICENSE=screamingfrog-license:latest \
11    --add-volume name=crawl-storage,type=cloud-storage,bucket=${PROJECT_ID}-crawl-output \
12    --add-volume-mount volume=crawl-storage,mount-path=/mnt/crawl-output

This command will:

Build your Docker image using Cloud Build
Push it to Artifact Registry automatically
Create (or update) the Cloud Run Job
Mount the bucket we created as a volume

Some notes on configuration:

The 4 CPU, 16GB RAM I found to be a good starting point for most crawls. Scale up to 8 CPU / 32GB for large sites (100K+ URLs).

Important: Screaming Frog periodically checks available disk space and stops the crawl if it detects 5GB or less remaining. On Cloud Run, available memory serves as disk space - there’s no separate disk allocation, even with the GCS mount. So while 2 CPU / 8GB technically works, you’re cutting it close to Screaming Frog’s 5GB limit.

Cost: With 4 CPU / 16GB, expect roughly $0.15-0.20 per hour of crawl time. A typical 10K URL crawl takes 15-30 minutes, so around $0.05-0.10 per crawl. Storage costs are negligible for CSV exports.

Running crawls

Manual execution:

1gcloud run jobs execute screamingfrog-crawler \
2    --region=${REGION} \
3    --update-env-vars=CRAWL_URL=https://example.com

Check execution status:

1gcloud run jobs executions list \
2    --job=screamingfrog-crawler \
3    --region=${REGION}

View logs:

1gcloud logging read \
2    "resource.type=cloud_run_job AND resource.labels.job_name=screamingfrog-crawler" \
3    --limit=50 \
4    --format=json

Accessing crawl results

You can browse and download files directly from the GCP Console by navigating to your bucket. Or use the CLI:

List crawl outputs:

1gsutil ls gs://${PROJECT_ID}-crawl-output/

Download a specific crawl:

1gsutil -m cp -r gs://${PROJECT_ID}-crawl-output/execution-abc123/* ./local-output/

The output directory includes:

internal_all.csv - All internal URLs discovered
external_all.csv - External links
response_codes_all.csv - HTTP status codes
page_titles_all.csv - Page titles
meta_description_all.csv - Meta descriptions
h1_all.csv - H1 tags
images_all.csv - Image inventory
crawl.seospider - Full crawl file (open in Screaming Frog GUI)

What’s next

This gives you a working setup for running Screaming Frog crawls serverless. From here, you could:

Add configuration files: Use Screaming Frog’s config files to standardize crawl settings across executions
Implement progress tracking: Parse log output to report crawl progress in real-time
Build a web UI: Create a simple interface for managing crawls and viewing results (hint: this is what we built)
Add notifications: Send alerts when crawls complete or fail
Track changes: Compare crawls over time to detect new issues

Once deployed, crawls run unattended and you only pay for what you use.

Published on 29 Jan 2026 Updated on 01 Feb 2026