NoSQL Zone is brought to you in partnership with:

Max De Marzi, is a seasoned web developer. He started building websites in 1996 and has worked with Ruby on Rails since 2006. The web forced Max to wear many hats and master a wide range of technologies. He can be a system admin, database developer, graphic designer, back-end engineer and data scientist in the course of one afternoon. Max is a graph database enthusiast. He built the Neography Ruby Gem, a rest api wrapper to the Neo4j Graph Database. He is addicted to learning new things, loves a challenge and finding pragmatic solutions. Max is very easy to work with, focuses under pressure and has the patience of a rock. Max is a DZone MVB and is not an employee of DZone and has posted 60 posts at DZone. You can read more from them at their website. View Full User Profile

Neo4j on Heroku - Part 2

01.17.2012
| 4975 views |
  • submit to reddit

We are picking up where we left off on Neo4j on Heroku –Part One so make sure you’ve read it or you’ll be a little lost. So far, we have cloned the Neoflix project, set up our Heroku application and added the Neo4j add-on to our application. We are now ready to populate our graph.

Bring up two browser windows. On one you’ll go to your Neo4j instance running on Heroku,

$ heroku config
NEO4J_URL      => http://xxxxxxxx:yyyyyyyy@70825a524.hosted.neo4j.org:7014


and on the other you’ll go to the create_graph route of your app. So if you named your app neoflix, you’d go to neoflix dot herokuapp dot com/create_graph.

This will run the create_graph method and you’ll see nodes and relationships being created on the Neo4j Dashboard. It’s just over a million relationships, so it will take a few minutes. There are faster ways to load data into Neo4j (wait for part three of this series), but this will work in our case.



 

The fine folks at themoviedb.org provide an API for any developers that want to integrate movie and cast data along with posters or movie fan art. You can request an API key and they’ll respond very quickly. So let’s add this to our Heroku configs.

heroku config:add TMDB_KEY=XXXXXXX
Adding config vars and restarting app... done, vXX
  TMDB    => XXXXXXX


If you want to test locally you can do so by:

export TMDB_KEY=XXXXXXX


We can now use this environment variable on our application along with the ruby-tmdb gem by Aaron Gough:

require 'ruby-tmdb'

Tmdb.api_key = ENV['TMDB_KEY']
Tmdb.default_language = "en"

  def get_poster(data)
    movie = TmdbMovie.find(:title => CGI::escape(data["title"] || ""), :limit => 1)
    if movie.empty?
     "No Movie Poster found"
    else
      "<a href="#{movie.url}" target='_blank'>
       <img src="#{movie.posters.first.url}">
       <h3>#{movie.tagline}</h3>
       <p>Rating: #{movie.rating} <br />
          Rated: #{movie.certification}</p><p>#{movie.overview}</p>"
    end
  end


We will visualize the graph like I showed you earlier using Neovigator, but instead of retrieving the properties of our node (since they’re pretty bland), we’ll request a movie poster.


We will not visualize the explicit relationships we created. Instead we will visualize the implicit movie recommendations graph. Let’s take a look at that method now:

def get_recommendations(neo, node_id)
  rec = neo.execute_script("m = [:];
                            x = [] as Set;
                            v = g.v(node_id);

                            v.
                            out('hasGenera').
                            aggregate(x).
                            back(2).
                            inE('rated').
                            filter{it.getProperty('stars') > 3}.
                            outV.
                            outE('rated').
                            filter{it.getProperty('stars') > 3}.
                            inV.
                            filter{it != v}.
                            filter{it.out('hasGenera').toSet().equals(x)}.
                            groupCount(m){\"${it.id}:${it.title.replaceAll(',',' ')}\"}.iterate();

                            m.sort{a,b -> b.value <=> a.value}[0..24];",
                            {:node_id => node_id.to_i})

  return [{"id" => node_id,
           "name" => "No Recommendations",
           "values" => [{"id" => "#{node_id}",
                         "name" => "No Recommendations"}]
          }] if rec == "{}"

  values = rec[1..rec.size-1].split(',').collect{ |v| {:id => v.split(':')[0].strip, 
                                                       :name => v.split(':')[1] } }

  [{"id" => node_id ,"name" => "Recommendations","values" => values }]
end


Let’s go through the code. In Groovy [:] is a map (equivalent to a Ruby Hash) and ultimately what we want to return, so we’ll create an empty one and fill it later. Then we’ll create a Set “x” (which is an unordered collection see Groovy List for ordered collections). We also get our starting vertex and assign it to “v”.

 

m = [:];
x = [] as Set;
v = g.v(node_id);


We will fill the empty Set we created with the generas of our movie and we’ll compare the generas of other movies against it later on.

v.
out('hasGenera').
aggregate(x).


We then go back 2 steps, which puts us at our starting movie and go to the users that have rated the movie with more than 3 stars.

back(2).
inE('rated').
filter{it.getProperty('stars') > 3}.


From these users, we step out to find all the movies they have also rated with more than 3 stars.

outV.
outE('rated').
filter{it.getProperty('stars') > 3}.


Which are not our starting movie (remember we set it to the variable “v”).

inV.
filter{it != v}.


…and we check that these movies have the same generas as our starting movie (remember we filled the Set “x”).

filter{it.out('hasGenera').toSet().equals(x)}.


groupCount does what it sounds like and stores the value in the map “m” we created earlier. However, we want to get the id, title and count, so we do a little string wrangling to get both id and title (minus commas… I’ll tell you why in a minute) and iterate(). The Gremlin shell iterates automatically for you, but since we’re sending this Gremlin script over the REST API, it doesn’t. One day you’ll be pulling out your hair trying to figure out what’s wrong and you’ll curse “iterate” once you figure it out…

groupCount(m){\"${it.id}:${it.title.replaceAll(',',' ')}\"}.iterate();


Here we sort our Map (b has the count) and get the top 25 entries.

m.sort{a,b -> b.value <=> a.value}[0..24];",


Since Neo4j will be executing this code many times over, you want to parametize it, so it parses it only once.

{:node_id => node_id.to_i})


If we get an empty hash back, we’ll return an unfortunate “No Recommendations” message,

return [{"id" => node_id,
         "name" => "No Recommendations",
         "values" => [{"id" => "#{node_id}",
                       "name" => "No Recommendations"}]
        }] if rec == "{}"


Finally we structure our Groovy Map into an array of hashes which we use in our visualization like I showed you with Neovigator. Notice I’m splitting the record by commas (hence why we substituted them earlier). This piece won’t be necessary very soon as the final version of Neo4j 1.6 will have JSON support for Groovy Maps.

values = rec[1..rec.size-1].split(',').collect{ |v| {:id => v.split(':')[0].strip, 
                                                     :name => v.split(':')[1] } }
[{"id" => node_id ,"name" => "Recommendations","values" => values }]

 


We save the results of getting a movie poster and its recommendations for 30 days by taking advantage of the Varnish Cache provided to us by Heroku. We then get our starting node either by id or by title.

get '/resources/show' do
  response.headers['Cache-Control'] = 'public, max-age=2592000'
  content_type :json

  if params[:id].is_numeric?
    node = neo.get_node(params[:id])
  else
    node = neo.execute_script("g.idx(Tokens.T.v)[[title:'#{CGI::unescape(params[:id])}']].next();")
  end

  id = node_id(node)

  {:details_html => "<h2>#{get_name(node["data"])}</h2>" + get_poster(node["data"]),
   :data => {:attributes => get_recommendations(neo, id),
             :name => get_name(node["data"]),
             :id => id}
   }.to_json
end


By title? Yes, we are adding JQuery UI autocomplete to our application. Which will pass the name of the movie and look it up in the automatic index we created.

node = neo.execute_script("g.idx(Tokens.T.v)[[title:'#{CGI::unescape(params[:id])}']].next();")


… and there you have it. Your very own Movie Recommendation website on Heroku. See the complete code at github.com/maxdemarzi/neoflix.

UPDATE: Looks like a few of you dear readers tried to run create_graph multiple times and it made a mess. I will try to fix it and get it back up soon. Note to future self: remove create_graph route on heroku before publishing post.


Source: http://maxdemarzi.com/2012/01/16/neo4j-on-heroku-part-two/

Published at DZone with permission of Max De Marzi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)