For example, if a developer is working towards a product launch on day X, they aren’t sure whether their code should be submitted to our repository by day X-1, X-2 or even earlier, as another developer’s code might cause a critical bug in an unrelated component on day X and necessitate a rollback of the entire cluster completely unrelated to their own code. Inconsistent push cadence leads to unnecessary uncertainty in the development experience. We were nowhere close to ideal on this dimension. Common best practices (for example, from Accelerate) point to fast, consistent deploys as the key to developer productivity. This would necessitate rollbacks and cherry picks of the entire monolith, and caused an inconsistent and unreliable push cadence for developers. Unfortunately, with hundreds of developers effectively contributing to the same codebase, the likelihood of at least one critical bug being added every day had become fairly high. ![]() We push Metaserver to production for all our users daily. Even though this let us ship code faster in the short term, it left the codebase much less maintainable, and problems compounded. For example, to unblock a product feature, a team would introduce import cycles into the codebase rather than refactor code. This blog post captures key ideas and learnings from our journey.īecause the codebase had multiple teams working on it, no single team felt strong ownership over codebase quality. introducing autoscaling and canary analysis). standardizing on gRPC and using Envoy’s g RPC -HTTP transcoding) and the operations (e.g. To do so, we had to innovate both the architecture (e.g. In 2020, we ran a project to break apart the monolith and evolve it into a serverless managed platform, which would reduce code tangles and liberate services and their underlying engineering teams from being entwined with one another. Likewise, in production, the fate of their endpoints was tied to every other endpoint, regardless of the stability, criticality, or level of ownership of these endpoints. Every line of code they wrote was, whether they wanted or not, shared code-they didn’t get to choose what was smart to share, and what was best to keep isolated to a single endpoint. Developers wrangled daily with unintended consequences of the monolith. ![]() It works, but we realized the monolith was also holding us back as we grew. We mostly use Python for our server-side product development, with more than 3 million lines of code belonging to our monolithic Python server. The majority of software developers at Dropbox contribute to server-side backend code, and all server side development takes place in our server monorepo. In this post, we’ll explain why and how we developed and deployed Atlas, a platform which provides the majority of benefits of a Service Oriented Architecture, while minimizing the operational cost that typically comes with owning a service. Systems that worked great for a startup hadn’t scaled well, so we needed to devise a new model for our internal systems, and a way to get there without disrupting the use of our product. ![]() As a company, we’ve had to scale constantly since our start, today serving more than 700M registered users in every time zone on the planet who generate at least 300,000 requests per second. Dropbox, to our customers, needs to be a reliable and responsive service.
0 Comments
Leave a Reply. |