Cloud Zone is brought to you in partnership with:

Eric is the Editorial Manager at DZone, Inc. Feel free to contact him at Eric has posted 804 posts at DZone. You can read more from them at their website. View Full User Profile

The Director of Education at Cloudera Offers an Insider View of Hadoop

  • submit to reddit
Last week, I attended a Hadoop Tutorial presented in Durham, NC by Sarah Sproehnle, the Director of Educational Services at Cloudera.  The tutorial offered both an informative, high-level history of the software framework that has been generating a lot of buzz in surprisingly different industries.  When I caught up with Sarah after the tutorial, she told me that one aspect of the design of this tutorial was to cut through a lot of the buzz surrounding Hadoop in order to provide a diverse audience with "what Hadoop is really about" and show how they can use it.  I still had a few more questions related to Hadoop, Big Data, and how Hadoop appeals to the Java community.  Here's what she had to say.

DZone:  How do you think that Hadoop (and Big Data processing / analytics) will impact the overall developer space over the next few years?

Sarah Sproehnle:  We're seeing a tremendous investment in developers moving from traditional back end database development to the Hadoop space. Processes that used to be coded in PL/SQL or that relied on large in-memory state are now being written using Hadoop for data processing and HBase for real-time applications. A lot of applications that were built on top of databases, where developers struggled to fit non-relational paradigms into relational stores, are now being built more quickly and with access to data at any scale.

DZone: You covered some common implementations of Hadoop in your presentation - which of these do you think is the most innovative or interesting?

Sarah Sproehnle:  A lot of people use Hadoop to do complex data processing such as billing mediation and transaction reconciliation. Similarly, Hadoop is a popular tool for recommendation engines and predictive modeling. At the forefront through is people building real-time interactive applications on top of HBase. These are driving both data serving (such as user profiles or POIs) and as the basis for incremental analytics where business can monitor how their systems are behaving in real-time.

DZone:  What are some cool tools (or uses) of Hadoop in development or coming up that Java developers should be aware of?

We're seeing a lot of interest in higher level libraries that make Hadoop much more accessible to Java developers. For example Crunch (here and here and is a FlumeJava inspired library. We've been hearing some very positive feedback from Java developers who want a lot of the mechanics of MapReduce taken care of but don't want to write in Hive or Pig.
Published at DZone with permission of its author, Eric Genesky.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)


Fahmeed Nawaz replied on Tue, 2012/06/12 - 11:05am

This is needed less since the 2.2 version of the driver. The connection pool is now thread-aware and will use the same connection (in the thread) as long as the pool isn't depleted between operations."

Unless the thread calls db.requestDone(), how would this pool know it is safe to reuse the connection? Too much time between API calls by the thread?

John Smith replied on Tue, 2013/02/19 - 5:17am

 This is actually the kind of information I have been trying to find. Thank you for writing this information.      interior designs singapore


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.