How AWS avoided millions in fines from one Leadership Principle
There's more to it, but you get the drift.
Every organization has some sort of literature that attempts to embody their culture and values. And it is no surprise that Amazon’s 14 leadership principles have captured peoples imagination well outside the boundaries of the company.
Among those, one leadership principle stands out as a very interesting one. Have a backbone; Disagree and commit. What makes it interesting is two seemingly unrelated sentences are combined to bring a bigger meaning.
At the surface it is first about passionately advocating for what you believe in (by having a backbone). And in the face of disagreements to commit to the decision, even if it's not what you advocated for. The group is left to decide what to commit to and how, but the expectation is there has to be commitment at the end of the engagement.
When AWS had to comply with GDPR and begin deleting customer data after customer accounts are closed, we had a mere 4 months to get hundreds of teams to align and execute a destructive action that the company had never done before, let alone at high scale. The way we went about corralling all those teams would not have been possible if not for this humble principle.
Upbeat roadmap
It was the start of 2018, and we were busy planning our roadmap. The team I inherited had accumulated a good amount tech debt we had to pay off, and we were on the clock as a critical database was in its last legs. We were making steady progress to migrate our dependency off the database, and this year was going to be when we were fully off it. So much was the risk that a fully built customer feature was put on hold from launch until we were off this database. We were excited to hit that final nail in the coffin.
The jinx was materializing
One evening as I was settling down with a nice cup of coffee, and my phone rang. It was my manager. "Hey, I got pulled in by legal. Looks like there is a major regulation coming up in May, GD..something and it may impact our plans. There's a meeting at 8 AM tomorrow that I'll pull you into." I was conflicted. A big part of me wanted to protect my roadmap at all costs, at the same time I was curious to learn more about this "GD..something".
Knocked my socks off!
At the meeting the next day, I sure learned what "GD..something" was. And it knocked my socks off. Going forward, AWS had to delete customer data en-masse on closed accounts, for the first time in its history, ever. Failing which, we were faced with hefty fines and regulatory wrist slapping. And we had to pull this off as a company in 5 months as the GDPR regulation was going to be passed in May that year.
Being in the cloud division expanded the scope of data for us. 'Data' in this context included customer's cloud resources. Compute instances, storage, network infrastructure everything that entire online businesses depended on. Deleting the wrong resource, could spell doom. A closed customer's account can be reopened as well, and they'd want their resources running at the time. How do we restore what was deleted? AWS had hundreds of cloud services operating as truly distributed systems. How would they all delete their resources on time, and how do we know they've all done it? The questions kept piling up.
Cross the line or die trying
At the time, I owned the foundational systems that controlled the lifecycle of customer accounts. If anyone would know for sure if an account is closed, it'd be my systems. We could signal to the rest of the company when an account gets closed, and if we did it right, the hundreds and thousands of systems could act in a timely manner to safely delete customer data. I had to compartmentalize my thoughts for a while. The 'beloved' migration project to get off the old database had turned into a fervor-filled mission for my team. Was I going to undo all the hard work that went into motivating the team to get us until here?
Time for some backbone - I thought to myself. Problems like these come rarely, where your systems are uniquely positioned to solve a problem for the entire company you work for. I was grounded to this truth. And in hindsight I realize that this truth helped me navigate many bends down the road. Including the hard decision to peel away precious engineering resources from the migration initiative to this death march toward a cliff called GDPR. We will either be successful or die trying. And it was weirdly motivating.
The godsend called Principal Engineers
Besides the timeline pressure, what I was proposing was a centralized solution in a company that had an aversion to anything centralized. It was perceived as single point of failure. And it was. "A wrong event could take down Netflix and Amazon.com at the same time" - that was an oft-repeated reminder among many principal engineers that read my proposal. Besides the technical challenges, I now had cultural challenges to reckon with as well. The plot was certainly thickening.
My manager was supportive of me taking this on, but she was clear that it cannot push out the database migration timelines. The project had slipped in the past and the risks were too great to ignore. However she knew the stakes were high with GDPR and used her network to rally around Principal Engineers to help me.
It was a shot in the arm to have a veteran Sr. Principal Engineer (PE) that could now convince other Principals on why the approach had to be different with this project. Seeing them in action was a learning experience I couldn't imagine getting elsewhere. The meetings were passion-filled, as we debated on engineering safety. At the same time, there was zero deadlock. For every disagreement, there was a commitment that eventually followed. We walked away from meetings more aligned than we walked into it.
Scaling the solution
Once we had alignment among the Distinguished Engineers (DEs), it came time to scale the buy-in among engineering teams. The neat thing about convincing DEs and PEs are that their VPs are also convinced, given they are their advisors. This helped scale the buy-in across hundreds of teams as they resolved ideological differences locally within their organization, allowing my lean time to focus solely on the technical challenges.
It took more than a village to finally align on all the safety features we had to build for the account events. The event themselves had very little information on what to do which protected teams from acting on stale ones. Teams had to call a verify API to really learn if they had to act on that event and what that action would be, and an acknowledge API to indicate the action has been taken. This helped my team expire events in case a closed account was reopened after the event was sent out, and audit if all teams have acknowledged. Before an event was published several checks occurred like evidence of API access or console logins or active subscriptions - any of which if tripped meant the event would not be sent.
Committing to alignment
Along the way we navigated several disagreements with teams near and far in the company. Some of them being show-stopping in nature. But each time, either the individual or their leadership demonstrated "Disagree and Commit" to move forward. It was a prime leadership principle that really enabled AWS to achieve GDPR compliance in 4 short months, and not one infraction from the European Union. Even when Amazon was dealing with fuzzy fines, AWS was bullet-proof.
My manager had one more charm up her sleeve. To have the backbone to pushback to executive leadership on the migration project. They could disagree all day long about priorities, but they had to commit to either increasing funding of the migration project or reduce its scope since we needed resources to solve GDPR for the entire company. Once again, the humble little leadership principle had its outsized impact, punching well above its weight. We were off the database and launched the customer feature.