NoSQL Zone is brought to you in partnership with:

Max De Marzi, is a seasoned web developer. He started building websites in 1996 and has worked with Ruby on Rails since 2006. The web forced Max to wear many hats and master a wide range of technologies. He can be a system admin, database developer, graphic designer, back-end engineer and data scientist in the course of one afternoon. Max is a graph database enthusiast. He built the Neography Ruby Gem, a rest api wrapper to the Neo4j Graph Database. He is addicted to learning new things, loves a challenge and finding pragmatic solutions. Max is very easy to work with, focuses under pressure and has the patience of a rock. Max is a DZone MVB and is not an employee of DZone and has posted 60 posts at DZone. You can read more from them at their website. View Full User Profile

Neo4j on Heroku - Part 1

01.17.2012
| 8077 views |
  • submit to reddit
On his blog Marko A. Rodriguez showed us how to make A Graph-Based Movie Recommender Engine with Gremlin and Neo4j.

In this two part series, we are going to take his work from the Gremlin shell and put it on the web using the Heroku Neo4j add-on and altering the Neovigator project for our use case. Heroku has a great article on how to get an example Neo4j application up and running on their Dev Center and Michael Hunger shows you how to add JRuby extensions and provides sample code using the Neo4j.rb Gem by Andreas Ronge.

We are going to follow their recipe, but we are going to add a little spice. Instead of creating a small 2 node, 1 relationship graph, I am going to show you how to leverage the power of Gremlin and Groovy to build a much larger graph from a set of files.

Let’s start by cloning the Neoflix Sinatra application, and instead of installing and starting Neo4j locally, we are going to create a Heroku application, and add Neo4j.

 

git clone git@github.com:maxdemarzi/neoflix.git
cd neoflix
bundle install
heroku apps:create neoflix --stack cedar
heroku addons:add neo4j
git push heroku master


Let’s make sure that Neo4j was successfully added to our project:

$ heroku addons
logging:basic
neo4j:test
releases:basic


Great, there it is (if you are reading this in the future it might say neo4j:basic or neo4j:silver or something like that). So where is our Neo4j database exactly?

$ heroku config
GEM_PATH       => vendor/bundle/ruby/1.9.1
LANG           => en_US.UTF-8
NEO4J_HOST     => 70825a524.hosted.neo4j.org
NEO4J_INSTANCE => 70825a524
NEO4J_LOGIN    => xxxxxxxx
NEO4J_PASSWORD => yyyyyyyy
NEO4J_PORT     => 7014
NEO4J_REST_URL => http://xxxxxxxx:yyyyyyyy@70825a524.hosted.neo4j.org:7014/db/data
NEO4J_URL      => http://xxxxxxxx:yyyyyyyy@70825a524.hosted.neo4j.org:7014
PATH           => bin:vendor/bundle/ruby/1.9.1/bin:/usr/local/bin:/usr/bin:/bin
RACK_ENV       => production


The xs and ys are our username and password. We can use the address given in NEO4J_URL to take a look at the server. For part two, it would be wise to keep an eye on the “dashboard” as we create new nodes and relationships. The Neoflix project layout:

neoflix.rb
public/movies.dat
public/users.dat
public/ratings.dat


Let’s take a look at the source code in neoflix.rb: We require our gems and use the NEO4J_URL variable to tell Neography how to reach the Neo4j server.

require 'rubygems'
require 'neography'
require 'sinatra'

neo = Neography::Rest.new(ENV['NEO4J_URL'] || "http://localhost:7474")


Then we create a route in Sinatra that will clear and populate the graph when we visit it.

get '/create_graph' do
  neo.execute_script("g.clear();")
  create_graph(neo)
end


We use a Gremlin shortcut to delete the graph before creating it.

g.clear();


The Backup and Restore feature of the Heroku Add-on lets you reload your graph as well, but the Neo4j instance will be down temporarily during the exchange.

If you want to permanently delete the Neo4j instance (once you are done with this example application) you can simply remove the heroku addon.

heroku addons:remove neo4j:test
Removing neo4j:test from neoflix...done.


Let’s see part of the create_graph method.

We do not want to create the graph if it already exists. So we check to see if there are any Movie nodes before starting.

 

def create_graph(neo)
  return if neo.execute_script("g.idx('vertices')[[type:'Movie']].count();").to_i > 0


Since we wiped everything clean, we setup automatic Indexing on all vertices and all properties.

if neo.execute_script("g.indices;").empty?  
  neo.execute_script("g.createAutomaticIndex('vertices', Vertex.class, null);") 
end


We are going to create a lot of data, so we set our graph to commit every 1000 changes in an automatic transaction.

g.setMaxBufferSize(1000);


Here comes some magic. We do not have access to the file system of the server running our Neo4j instance but since we have the full power of Groovy at our disposal, we simply grab the file from Sinatra instead. Anything you put in the public directory will be automatically served for you. The fields of movies.dat are delimited by “::” and the generas are delimited by “|”.

1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance


So for each line in our file, we are going to create a movie vertex, and link it to one or more generas. We are sending this Gremlin script inside a Ruby String, so we must escape the escape slashes which escape the | in the final script. As we go along, we are also creating vertices for the generas if they don’t already exist.

'http://neoflix.heroku.com/movies.dat'.toURL().eachLine { def line ->
  def components = line.split('::');
  def movieVertex = g.addVertex(['type':'Movie', 
                                 'movieId':components[0].toInteger(), 
                                 'title':components[1]]);
  components[2].split('\\\\|').each { def genera ->
    def hits = g.idx(Tokens.T.v)[[genera:genera]].iterator();
    def generaVertex = hits.hasNext() ? hits.next() : g.addVertex(['type':'Genera', 
                                                                   'genera':genera]);
    g.addEdge(movieVertex, generaVertex, 'hasGenera');
  }
};


If you are a Rubyist, you should be able to read that Groovy code, but let me point out a few things. In Groovy variable definitions it is mandatory to either provide a type name explicitly or to use “def” in replacement.

And this funky piece of code is an unfortunate escape of the pipe character by a backslash which also needs to be escaped, which are both in our Ruby String and must also be escaped.

components[2].split('\\\\|').each { def genera ->


This next bit of code looks up the genera in our index, and if it doesn’t exist, it creates it.

def hits = g.idx(Tokens.T.v)[[genera:genera]].iterator();
def generaVertex = hits.hasNext() ? hits.next() : g.addVertex(['type':'Genera', 
                                                               'genera':genera]);


This Hash inside an Array inside an Array looking construct is Gremlins way of querying the index. We are telling it to return a node if it has a property genera that matches the genera variable we parsed after splitting the components[2] field.

g.idx(Tokens.T.v)[[genera:genera]].iterator();


We do this a few more times to load the users and ratings into our graph and end with this:

g.stopTransaction(TransactionalGraph.Conclusion.SUCCESS);")


Which commits any left over items in our transaction buffer.

In Part Two, we’ll bring up our Heroku app, load the data, possibly add Movie Posters from a third party API, and visualize some of the implicit relationships in the graph as outlined in the original blog post… and I’ll probably do a part Three which will use the fresh off the presses CSV File Importer and reload the graph with a bigger set of movie data using Heroku. In between however I think it’s time we looked at Neo4j Spatial. You’ll know when new posts are published by following me on Twitter.


Source: http://maxdemarzi.com/2012/01/13/neo4j-on-heroku-part-one/

Published at DZone with permission of Max De Marzi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)