Wiped my company's production DB last week.
Preface: 8 YoE, Big company (where I work) acquired a small but very successful product last year. I recently moved over to this product to help integrate it into our suite of software.
Story: Unfortunately, this product lacked any sort of staff tooling, so support requests were more often than not accomplished by running SQL directly on the production database (💀).
One of the most standard requests was updating product codes that were specific to a user's account, i.e. a given product code for one user would not work for another user. The SQL boiled down to:
UPDATE "users"
SET "product_access_codes" = "..."
WHERE "users"."id" = '289571032';
Last week, while on-call, I wake up to an "urgent" request to enable a user's product codes in time for a demo "very soon". Having done this countless times, I whip up and run the following:
UPDATE "users"
SET "product_access_codes" = "...";
WHERE "users"."id" = '289571032';
Notice anything? Well I didn't until I saw the dreaded "12857294 rows affected" result. There is truly no stronger stimulant than the realizing that you just bricked the production database by overwriting the user table with bad data.
After coming to terms with the reality of my situation over the next 10 seconds (felt like 10 hours) I hit up our SRE team and give them the bad news.
Outcome: Luckily for me, our SRE team had backups configured such that we were able to restore the database to the state ~2 minutes before my mishap. Total downtime ended up being ~20 minutes while we ran the restore.
After the dust settled I'm glad to report I did not in fact lose my job. I did feel incredibly embarrassed, but equally thankful for my coworkers being empathetic and understanding that mistakes can happen. My EM blamed the situation more on our lack of tooling, so we sliced up some time last week to write our first version of staff tools.
Takeaway: Doesn't matter how many times you've done something or how long you've been in the game, fuck-ups do happen and often when you feel the most complacent. This was a query I'd written many times over; the early morning request plus the urgency led me to get complacent and cut corners.
More importantly though, in retrospect, always turn off autocommit in your production DB sessions. I could have avoided the entire situation had my SQL instead been
\set AUTOCOMMIT off
BEGIN;
UPDATE "users"
SET "product_access_codes" = "...";
WHERE "users"."id" = '289571032';
Upon seeing the syntax error and rows affected output I could have just ran ROLLBACK
and avoided the whole situation. I honestly wanted to write this post mainly just to call out the fact that anytime you run SQL in production it should be wrapped in an explicit transaction.