Skip to main content

Command Palette

Search for a command to run...

How I Accidentally Polluted My Production Storage Bucket (And How to Fix It)

Updated
4 min read
How I Accidentally Polluted My Production Storage Bucket (And How to Fix It)
V

I am an accomplished Solution Architect, Full Stack Developer and DevOps Specialist with a passion for creative leadership and mentorship, business optimization and technical direction, and ingenious solutions to complex problems.

I am especially interested in App & Web Development, Cyber Security, Cloud Computing, Data Science, Open Source Software, Statistical Analysis and Discrete Mathematics.

A cautionary tale about Active Storage, environment credentials, and an unexpected Rails gotcha.

The Setup

Like many Rails developers, I occasionally need to debug production issues locally. The usual workflow: download a production database dump, load it into my local MySQL, and poke around. Nothing unusual there.

My config/environments/development.rb was configured to use local disk storage:

config.active_storage.service = :local

And my config/storage.yml looked like this:

local:
  service: Disk
  root: <%= Rails.root.join('storage') %>

production:
  service: GCS
  project: my-project
  credentials: <%= Rails.root.join('config/storage.json') %>
  bucket: my_prod_bucket

So far, so good. Development uses local disk, production uses Google Cloud Storage. Totally separate, right?

Wrong.

The Problem

After loading the production database, I fired up my development server and noticed something strange: I could see production images. Actual images from the production bucket were rendering in my local environment.

My first thought was that Rails was somehow resolving URLs from the database. Weird, but maybe that's how it works?

Then I noticed something alarming in my Rails logs. When the app generated image variants (thumbnails, resized versions), they were being uploaded to the cloud. My development environment was writing to production storage.

I was baffled. My development config clearly specified local storage. How could it have write access to production GCS buckets?

The Gotcha: service_name in the Database

Here's what I didn't know: Active Storage stores the service name in the database, not just in your environment config.

Every record in active_storage_blobs has a service_name column. When you upload a file in production, it saves service_name: "production". When Rails retrieves that blob later, it looks up the service by that name in storage.yml, regardless of what config.active_storage.service says.

So when I loaded my production database:

  1. All blobs had service_name: "production"

  2. My storage.yml had a production service defined with GCS credentials

  3. My config/storage.json contained valid production service account keys

  4. Rails happily connected to production GCS for every blob operation

The config.active_storage.service = :local setting only affects new uploads. Existing blobs use whatever service name is stored in their database record.

The Damage

Every time my development environment generated a variant (a resized image, a thumbnail), it uploaded that variant directly to the production bucket. I ended up with hundreds of orphaned files in production storage, files that existed in GCS but had no corresponding database records in the actual production database.

The Lesson: Separate Your Credentials

The real problem wasn't Active Storage's behaviour. It was having production credentials accessible in development at all.

Use environment-specific credential files. Rails supports this out of the box:

config/credentials/development.yml.enc
config/credentials/production.yml.enc

If my development credentials file didn't contain production GCS keys, none of this would have happened. Even with the production database loaded, Rails would have failed to connect to GCS (no credentials) and I would have noticed immediately.

The Fix: Cleaning Up Orphaned Files

To clean up the mess, I wrote a rake task that compares files in the GCS bucket against records in the active_storage_blobs table. Any file in the bucket without a matching database record is orphaned and can be deleted.

Preview Orphaned Files

RAILS_ENV=production bin/rake active_storage:preview_orphaned_files

This scans the entire bucket and shows you what would be deleted, without actually deleting anything. The output includes file names, sizes, content types, and creation dates.

Delete Orphaned Files

RAILS_ENV=production bin/rake active_storage:delete_orphaned_files

This prompts for confirmation before deleting. You have to type "DELETE" to proceed.

How It Works

The script:

  1. Fetches all blob keys from the active_storage_blobs table

  2. Fetches all variant paths from the active_storage_variant_records table

  3. Iterates through every file in the GCS bucket

  4. For original files, checks against the blobs table

  5. For variant files, first checks active_storage_variant_records (Rails 6.1+), then falls back to checking if the parent blob exists (for pre-6.1 variants)

  6. Files without matching records are flagged as orphaned

  7. Only files older than 2 hours are considered (to avoid catching in-progress uploads)

The Code

You can view the complete rake task via my Github Gist:

🔗 cleanup_orphaned_storage.rake

Takeaways

  1. Active Storage uses the service_name from the database, not your environment config. This is by design, it allows blobs to be migrated between services. But it can bite you when loading production data locally.

  2. Never put production credentials in development. Use Rails' environment-specific credential files. If you need to test against production services, create a separate read-only service account.

  3. Be careful when loading production databases locally. Consider scrubbing sensitive data and resetting the service_name column to match your local environment.

  4. Have a cleanup script ready. Orphaned files can accumulate for many reasons (failed uploads, deleted records, migrations gone wrong). A simple comparison between your bucket and database can identify them.

More from this blog

K

Konoson Tech Chronicles

22 posts

Technical insights on web development, DevOps, and system architecture with practical guides and real-world solutions.