20 Real-Life Challenges of Cloud Computing
Here is a short list in no particular order that I have accumulated over a year by now and something that drives many of the improvements in GridGain that are currently in the works:
- Most likely you do NOT need cloud computing – but if you do you would know it for sure by now; people who have legitimate both technical and business-wise use cases for cloud computing have been trying to do it internally for many years
- The best way to think about cloud computing is “Data Center with API” – that should clarify most of the questions…
- Creating the image for something like Amazon EC2 is worth about 45 minutes of your effort – but you will spend weeks and months after that fine tuning your application and developing additional functionality; plan accordingly
- You are about to deal with 100s and 1000s of remote nodes… Things that worked in 10s of nodes often “mysteriously” don’t work on the “cloud” scale. We were surprised by the amount of configuration tweaks we had to make to run GridGain on 512 nodes on Amazon EC2 under the load. Proven grid middleware is essential (quite obviously)
- You cannot rely on the fact that environment will be homogeneous – most likely it will be not: different CPUs, different amount of memory, etc.
- Debugging problem on that scale requires pretty deep understanding of distributed computing; learning curve is very steep; trial and error is often the only solution; plan accordingly
- IP multicast will likely not work or work with significant networking limitations. For example, you may not get all the computers in your cloud in the same IP multicast group. QoS on IP multicast can be unknown, at best.
- Traffic inside is very cheap or free – but traffic outside is expensive and can “get you” very quickly
- If you have to use cloud all the time – economics go down and in many case it is cheaper to traditionally rent in data center; that means that in many cases using clouds is best as an options to “outsource” pick loads – in such cases the economic effect can be dramatic
- Up time and per-computer reliability is low – comprehensive failover support on grid middleware is a must
- Static IPs are not guaranteed – it kills automatic deployment for 90% of the grid framework out there
- Almost always plan on having multiple clouds, at least one internal and one or many external; you are always going to have data and processing that cannot cross the boundaries of your internal data center; without comprehensive support from grid middleware for location transparency (a.k.a. virtual cloud) this is a show stopper
- External clouds (i.e. hosted NOT by you) present problem of sharing data:
- Do you copy data to the cloud?
- Usually no local-DB access from the cloud
- Can you legally copy the data?
- Double storage of data locally and on the cloud? Synchronization?
- Data affinity?
- Local data is removed once image is undeployed…
- Etc, etc.
- Carefully think through dev/qa/prod layout and how this is all organized – things get way hairy with multiple cloud, etc.
- Clunky (re)deployment of your application onto the cloud can slow down development process to a halt – support from grid middleware is absolutely essential here
- Often connections are one-directional, i.e. you can connect to the cloud – but NOT from the cloud back to you – comprehensive communication capabilities supporting one-directional connectivity and disjoint clouds in grid middleware is a must
- Cloud are implemented based on hardware virtualization – make sure your grid middleware can dynamically provision such images on demand, i.e. basically start the image (start paying) when certain conditions are met and stop the image (stop paying) when other conditions in your system are met
- Stick with open source stack (no, this is not a plug) – having a source code helps greatly during debugging in such unusual situations
- Linier scalability can only be achieved in a control test environment (like in our recent test) – real world applications will exhibit some sort of non-linier scalability; it is essential to have at least a ballpark number of what you are expecting the scalability and performance should be when you run your application on the cloud – battery of performance and scalability tests developed upfront is usually the best option
- Personal recommendation: use Amazon EC2/S3 services – the best offering at this point by a long, long mile
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)