Managing 15K Databases Simultaneously – Before Breakfast
Curators Note: This article was originally published Yuval Lubowich on the Xeround blog back in July of 2012.
A couple of days ago I had breakfast with a friend of mine. In between courses we talked about the food, the level of service we received and how hard it must be for the (very) few waiters we saw to cater everyone.
Towards the end of the meal our conversation shifted towards work and I found myself talking about Xeorund. I quickly explained what we do – hey, it’s not that hard explaining the notion of a database as a service and why it makes sense to “outsource” this aspect of development to experts :).
What I had to spend more time talking about was how we do it. How can a relatively small group of individuals manage such a large customer base? Sure, we have great people, all experts in their fields, all super-professional and motivated. But even to a great team like ours- managing Xeround’s ever growing user base could have been a handful.
As the popularity of our service grows (we appreciate your support!), Xeround’s fleet of active databases continues to grow exponentially.
Today, Xeround’s database-as-a-service boasts more than 15,000 databases live at any given time. Sure- some databases are more active than others: we’ve all been there where in those early stages of development nothing much happens in your DB, and we all know developers can put a project on the backburner and come back to it at any time. So to be even more strict on the numbers: our current stats show that over 9,000 apps are served DAILY using Xeround as their DB, with more than 50,000 concurrent connections.
That’s a lot of databases. A lot of people counting on our service.
So how do we do it? Automation. If I had to sum it all up in one word, automation would be it.
Automating DB Management in the Cloud
I went on explaining that Xeround uses extensive business logic and employs automated mechanisms in order to manage our customers’ databases. There’s a lot of magic that goes on behind the scenes – a lot of knowhow and a lot of experience (building on our roots providing database solutions to Telco operators- where downtime is simply not an option). But, basically, we knew from day one that the only way to succeed in managing data in the cloud is to be prepared to handle EVERY eventuality automatically. Be it availability issues, replication, burst in demand or what now. The brain of the system need to know how to handle it.
Xeround’s ‘system’ is unique. I personally see it as the mother-of-all-“M”s in RDMBS :).
It’s unique in the sense that there’s nothing centralized, no single or centralized brain that’s taking care of everything, no single point of failure, no middleware or hardware for that matter. The system is made up of virtualized MySQL frontends, distributed data stores, quorums and quite a few constructs all of which work together to insure Xeround’s high availability and elasticity. The Xeround system manages itself automatically; it grown and shrinks; it heals itself and even buys/terminates virtual machines (VMs) as needed.
The algorithmic notions that lie at the heart of Xeorund revolve around distributed decision making, quorums and leader elections. Xeround users are able to access their databases using a DNS that is mapped to Xeround frontend machines. Through these machines data is automatically partitioned and replicated across existing data stores. Each data store can manage this process by becoming the ‘leader’ of this activity. If something “bad” happens to a leader, a new one is selected in its place. The selection process is distributed and is based on quorums and leader election algorithms. And while, naturally, our OPS center is manned 24×7 and we receive alerts and notifications in real time when anything “bad” happens, allowing us to monitor the situation closely – we allow the system to continue to run automatically and heal itself.
Simply put, Xeround’s database apparatus isn’t a case of some watch dogs and scripts that instantiate the database. It’s an orchestra of sorts, designed to handle it all automatically so that our users won’t have to, and so that neither would our employees.
What DB Automation Means for Your Bottom Line
We usually say in house that what we invested in R&D getting things just right, we save on Operations and IT. So this is how we come to manage over 15,000 databases across 4 public clouds, 6 data centers worldwide and various PaaS integrations – all with only four people in charge of our production environment. Yes. Four.
I can tell you I get to meet with a lot of cloud providers, different cloud platforms, IT services operators and even managed hosting providers- all with thousands of customers and DB deployments – and neither have that ratio of DBs per Admin. So not only does a database service save on operational costs and management headache for end users (waking up at 4am trying to solve some problem), but it makes sense for those running the infrastructure as well.
And to sum up: it suddenly hit me that we’re closing our 1 year anniversary in production, after launching commercially in July of last year. And, sure, I get to boast a bit in this post, but this is what proud fathers do ;)