Don't ever, ever write your own Sharding code
There are two types of successful applications – those written with scaling in mind, and those that had to deal with scaling after their initial success (of course, one can argue that even when one designs for scale, scaling issues will occur, since “premature optimization is the root of all evil”).
But Application Scaling, albeit hard, is a walk in the park compared to scaling databases. There’s just a limit to how many transactions you can pull from a single MySQL instance. Even after adding multiple caching layers – the database quickly becomes the bottleneck.
The answer to this problem is quite well known: shard the database. And usually this task gets added to one of the near-future sprints, with an optimistic development schedule.
WAIT! That’s NUTS! There’s much more to sharding than just a task in a sprint. As discussed in the following paragraphs sharding has the tendency to start small, and end up as a huge obstacle for both the future application features and the operations team.
Home grown sharding solutions, especially those developed as parts of a sprint, have a tendency to grow out of proportion. The original sharding mechanism usually only supports single shard inserts and selects, and as the application evolves, additional features are limited by the home grown sharding code. Does your home grown sharding solution support SQL features like GROUP BY, ORDER BY or JOIN? Improving the sharding is an expensive (and sometimes impossible) task that often introduces delays and has serious schedule implications for releases of new functionality.
Other problems include supporting transactions across multiple shards (In a 2-Phase commit form, which is not even supported on MySQL), and merging results from multiple database servers when requests need to spawn multiple databases. All in all, you’ll quickly find yourself developing a new database – not a task for the faint of heart.
Sharding has major effects on operations, since the ops team now needs to manage multiple databases instead of just one. On top of that those databases are identical, and need to remain so. This means that every DBA operation needs to run on all servers, and almost always simultaneously. This includes database tuning, index manipulation, schema modifications etc. These operations are extremely error prone.
Backups too, become very complex. With sharding, backup operations have to be synchronized across all shards at the same time – otherwise data inconsistencies will occur.
Even with Sharding, scalability is not guaranteed. Why? Because the original code most probably assumed a static number of shards, and did not support resharding. Well – what happens if the application needs additional shards? Usually that translates to long downtime periods that can be measured in hours and days.
The most important thing to remember is that there’s no longer any reason to develop an internal sharding solution. ScaleBase has done it for you. Full SQL support, dynamic resharding, automatic analysis (so you’ll know what’s the best sharding configuration), and support for external tools like backup, management, and MySQL Workbench all translate to time saved from writing infrastructure code, allowing you to concentrate on writing your business logic rather than dealing with infrastructure code!
Suffering database scaling issues is so 2000. It’s time to enter the age of dynamically scaling databases without writing sharding logic code. Download ScaleBase Database Load Balancer here.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)