Cloud Zone is brought to you in partnership with:

Nikita Ivanov is a founder and CEO if GridGain Systems – developer of one of the most innovative real time big data platform in the world. I have almost 20 years of experience in software development, a vision and pragmatic view of where development technology is going, and high quality standards in software engineering and entrepreneurship. Nikita is a DZone MVB and is not an employee of DZone and has posted 27 posts at DZone. You can read more from them at their website. View Full User Profile

20 Real-Life Challenges of Cloud Computing

  • submit to reddit
We’ve done some extensive work on cloud computing (i.e., deploying Java grid applications on cloud environment) here at GridGain and there is a distinct set of problems and challenges associated with that work whether it is related to the cloud computing in general or to grid middleware you are using on it.

Here is a short list in no particular order that I have accumulated over a year by now and something that drives many of the improvements in GridGain that are currently in the works:

  1. Most likely you do NOT need cloud computing – but if you do you would know it for sure by now; people who have legitimate both technical and business-wise use cases for cloud computing have been trying to do it internally for many years
  2. The best way to think about cloud computing is “Data Center with API” – that should clarify most of the questions…
  3. Creating the image for something like Amazon EC2 is worth about 45 minutes of your effort – but you will spend weeks and months after that fine tuning your application and developing additional functionality; plan accordingly
  4. You are about to deal with 100s and 1000s of remote nodes… Things that worked in 10s of nodes often “mysteriously” don’t work on the “cloud” scale. We were surprised by the amount of configuration tweaks we had to make to run GridGain on 512 nodes on Amazon EC2 under the load. Proven grid middleware is essential (quite obviously)
  5. You cannot rely on the fact that environment will be homogeneous – most likely it will be not: different CPUs, different amount of memory, etc.
  6. Debugging problem on that scale requires pretty deep understanding of distributed computing; learning curve is very steep; trial and error is often the only solution; plan accordingly
  7. IP multicast will likely not work or work with significant networking limitations. For example, you may not get all the computers in your cloud in the same IP multicast group. QoS on IP multicast can be unknown, at best.
  8. Traffic inside is very cheap or free – but traffic outside is expensive and can “get you” very quickly
  9. If you have to use cloud all the time – economics go down and in many case it is cheaper to traditionally rent in data center; that means that in many cases using clouds is best as an options to “outsource” pick loads – in such cases the economic effect can be dramatic
  10. Up time and per-computer reliability is low – comprehensive failover support on grid middleware is a must
  11. Static IPs are not guaranteed – it kills automatic deployment for 90% of the grid framework out there
  12. Almost always plan on having multiple clouds, at least one internal and one or many external; you are always going to have data and processing that cannot cross the boundaries of your internal data center; without comprehensive support from grid middleware for location transparency (a.k.a. virtual cloud) this is a show stopper
  13. External clouds (i.e. hosted NOT by you) present problem of sharing data:
    • Do you copy data to the cloud?
    • Usually no local-DB access from the cloud
    • Can you legally copy the data?
    • Double storage of data locally and on the cloud? Synchronization?
    • Security?
    • Data affinity?
    • Local data is removed once image is undeployed…
    • Etc, etc.
  14. Carefully think through dev/qa/prod layout and how this is all organized – things get way hairy with multiple cloud, etc.
  15. Clunky (re)deployment of your application onto the cloud can slow down development process to a halt – support from grid middleware is absolutely essential here
  16. Often connections are one-directional, i.e. you can connect to the cloud – but NOT from the cloud back to you – comprehensive communication capabilities supporting one-directional connectivity and disjoint clouds in grid middleware is a must
  17. Cloud are implemented based on hardware virtualization – make sure your grid middleware can dynamically provision such images on demand, i.e. basically start the image (start paying) when certain conditions are met and stop the image (stop paying) when other conditions in your system are met
  18. Stick with open source stack (no, this is not a plug) – having a source code helps greatly during debugging in such unusual situations
  19. Linier scalability can only be achieved in a control test environment (like in our recent test) – real world applications will exhibit some sort of non-linier scalability; it is essential to have at least a ballpark number of what you are expecting the scalability and performance should be when you run your application on the cloud – battery of performance and scalability tests developed upfront is usually the best option
  20. Personal recommendation: use Amazon EC2/S3 services – the best offering at this point by a long, long mile




Published at DZone with permission of Nikita Ivanov, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)


Vadim Pesochinskiy replied on Fri, 2008/09/05 - 3:30pm

Yes, multicast does not work well in enterprise environment. What would you recommed to use for discovery/load balancing and remoting/serialization? How do you analyze the load of the grid. We have a grid of ~500 services splitting a large computing tasks and I can see the load using windows performance monitor (perfmon.msc). We acheived nearly 100% load when the granularity is right, but that took a long time to tune and the visualization of CPU load helped. I bet you cannot get similar visualization of the load on Amazon.

Patrick Kerpan replied on Sun, 2008/11/09 - 1:51am

Excellent overview of a large percentage of the issues that are prevalent in utilizing cloud computing today.

 Probably the only item I disagree on is "Most likely you do NOT need cloud computing – but if you do you would know it for sure by now".  Even for people who haven't been pursuing dynamic data center initiatives for years can quickly come to appreciate and value the "elasticity" of clouds.

 Many of the items you list can be mitigated or eliminated by CohesiveFT's recently announced VPN-Cubed (think of it as a Cloud VPN).

 As you make quite clear above ones "mileage may vary" given all the variables - but VPN-Cubed (and its assocated services from the Elastic Server platform) can mitigate or eliminate problems numbered: #3,#7, #11, #12,#13, #14, #16. 

 Would be happy to chat with you GridGain folks about the use-cases.


Pat K
CTO, CohesiveFT

Sirikant Noori replied on Fri, 2012/03/30 - 12:50pm


Nikita, Debuggin issue has always been frustrating and to have a good understanding there exists a strong need to learn the distributed computing. In this area, hit and trial is necessary and the only way to come up with something gr8.

Second thing to meet the challanges which i think in my opinion is the data centre with API, to solve out maximum number of questions.


Java Exam

Marc Wallbergg replied on Fri, 2012/05/18 - 6:36am

If you have to use cloud all the time – economics go down and in many case it is cheaper to traditionally rent in data center; that means that in many cases using clouds is best as an options to “outsource” pick loads – in such cases the economic effect can be dramatic cloud computing

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.