Seeding data

Populate preview databases with seed data or a production dump — once — using PULLPREVIEW_FIRST_RUN and state that persists across deploys.

Staging environments usually need data to be useful. PullPreview preserves the state of your environment between deployments to the same pull request, so any data loaded once persists across redeploys — the only question is how to load it the first time. This guide covers seeding from framework seeds and restoring a production dump, gated so it runs only once. The examples below target the Compose deployment target.

How state persists

PullPreview keeps Docker volumes between deployments on the same pull request. Once you load data into a database volume, it stays there across every redeploy of that PR. That means you only need to seed on the very first deployment.

To make first-run logic easy, PullPreview sets PULLPREVIEW_FIRST_RUN to true on the first deployment to an instance and false on every deployment after that. This variable, along with the other PULLPREVIEW_* variables, is written to /etc/pullpreview/env on the server and is available to your pre-script and for Compose interpolation. See environment variables for the full list.

Seeding from framework seeds

If your framework ships a seed task (for example Rails db:seed), run it as a one-off Compose service with a restart: on-failure policy. The service runs once, retrying only if it fails, then exits.

docker-compose.yml
services:
db:
image: postgres
web:
build: .
command: bundle exec rails s
depends_on: [db, seeder]
seeder:
command: bundle exec rails db:seed
restart: on-failure
depends_on: [db]

Because state persists between deployments, the seeded data remains available on subsequent deploys. If running db:seed again would create duplicate data, gate the command on PULLPREVIEW_FIRST_RUN (see the dump example below for the pattern).

Seeding from a production dump

For more realistic previews, restore a dump of your production database. There are two approaches.

Restore manually over SSH

Admins can SSH into the preview server, so after the first deploy you can copy a dump up and restore it. The SSH user is ec2-user on AWS Lightsail and root on Hetzner — adjust the commands accordingly. For a Postgres service named db:

Terminal window
scp my-dump.gz ec2-user@SERVER_IP:/tmp/
zcat /tmp/my-dump.gz | docker compose exec -u postgres db pg_restore -d DBNAME

The aws CLI is preinstalled on every preview server, so you can also pull the dump straight from S3 on the server instead of copying it from your machine.

Auto-fetch from S3 and restore on first run

To fully automate this, fetch the dump in your workflow before the PullPreview step, then have a seeder service restore it only on the first run.

Add a step to your workflow that downloads the dump into a directory that your Compose file mounts:

# .github/workflows/pullpreview.yml — extra step before the pullpreview step
- name: Fetch dump
env:
AWS_ACCESS_KEY_ID: "${{ secrets.AWS_ACCESS_KEY_ID }}"
AWS_SECRET_ACCESS_KEY: "${{ secrets.AWS_SECRET_ACCESS_KEY }}"
run: |
mkdir -p dumps/
aws s3 cp s3://my-backup-bucket/latest-dump.gz dumps/

Then define a seeder service that restores the dump only when PULLPREVIEW_FIRST_RUN is true:

docker-compose.yml
services:
seeder:
image: postgres
command: '[ "$PULLPREVIEW_FIRST_RUN" = "true" ] && pg_restore -h db -d DBNAME /dumps/latest-dump.gz'
restart: on-failure
volumes:
- ./dumps:/dumps
depends_on: [db]

Since PULLPREVIEW_FIRST_RUN is true only on the first deployment, the restore is skipped on every later deploy while the data stays in place.

Next steps

  • See environment variables for the full list of PULLPREVIEW_* variables available during deploys.
  • The pre-script page covers another place to run first-run seeding logic, gated the same way on PULLPREVIEW_FIRST_RUN.
  • For more on troubleshooting previews, see the FAQ.