Dmitriy Setrakyan manages daily operations of GridGain Systems and brings over 12 years of experience to GridGain Systems which spans all areas of application software development from design and architecture to team management and quality assurance. His experience includes architecture and leadership in development of distributed middleware platforms, financial trading systems, CRM applications, and more.
Dmitriy is a DZone MVB and is not an employee of DZone and has posted 57 posts at DZone. You can read more from them at their website. View Full User Profile
With emergence of cloud computing, the term "Hybrid Topology" or "Hybrid
Deployment" is becoming more and more common. Let me first start with
definition of what "Hybrid Topology" is. A "Hybrid Topology" is when you
join different cloud deployments into one connected cluster. For
example, you can have your local data center forming a join cluster with
several images deployed on a cloud.
Here is a simple use case.
Let's say that you have an application deployed in your local data
center that stays idle for most of the time and only peaks for 3 hours
in a day, say from 4pm to 7pm. To make it cost efficient, you want to
keep as few nodes as possible for the most of the time and, once load
peaks, you want to automatically detect that and bring up a few more
nodes. These new nodes that you bring up to help your existing cluster
may be in a totally different data center, or different cloud (e.g.
Amazon EC2 or GoGrid), yet you want them to join your cluster and
participate in load balancing, job collision resolution, job execution,
Here are some challenges to consider when setting up hybrid clouds:
1. On Demand Startup and Shutdown Your
infrastructure must be able to start up and shutdown cloud nodes on
demand. Usually you should have some policy implemented which listens to
some of your application characteristics and reacts to them by starting
or stopping cloud nodes. In simplest case, you can react to CPU
utilization and start up new nodes if main cloud gets overloaded and
stop nodes if it gets underloaded.
2. Cloud-based Node Discovery The
main challenge in setting up regular discovery protocols on clouds is
that IP Multicast is not enabled on most of the cloud vendors (including
Amazon and GoGrid). Your node discovery protocol would have to work
over TCP. However, you do not know the IP addresses of the new nodes
started on the cloud either. To mitigate that, you should utilize some
of the cloud storage infrastructure, like S3 or SimpleDB on Amazon, to
store IP addresses of new nodes for automatic node detection.
3. One-Directional Communication One
of the challenges in big enterprises is opening up new ports in
Firewalls for connectivity with clouds. Quite often you will only be
allowed to make only outgoing connections to a cloud. Your middleware
should support such cases. On top of that, sometimes you may run into
scenario of *disconnected clouds*, where cloud A can talk to cloud B,
and cloud B can talk to cloud C, however cloud A cannot talk to cloud C
directly. Ideally in such case cloud A should be allowed to talk to
cloud C through cloud B.
4. Latency Communication
between clouds may take longer than communication between nodes within
the same cloud. Often, communication within the same cloud is
significantly slower than communication within local data center. Your
middleware layer should properly react to and handle such delays without
breaking up the cluster into pieces.
5. Reliability and Atomicity Many
operations on the cloud are unreliable and non-transactional. For
example, if you store something on Amazon S3 storage, there is no
guarantee that another application can read the stored data right away.
There is also no way to ensure that data is not overwritten or implement
some sort of file locking. The only way to provide such functionality
is at application or middleware layers.
There are certainly other
things that could go wrong, but these turned out to be the main
challenges we had to resolve while working on the GridGain 3.0 version.
Some of the cool features we plan to support are On Demand Startup and Shutdown Policies (including Cost-based policies) and Disconnected Clouds. GridGain 3.0 is planned to be released this summer, so stay tuned!