How-to Guide: Explore your Twitter Network with Neo4j
The following two articles get you started on exploring your social network through your tweets using Neo4j
Quick Start With Neo4j Using YOUR Twitter Data
When approaching a graph database technology like Neo4j, if you’re as avid of a Twitter user as I am then POOF you already have the best possible data set for becoming familiar with the technology — your own Social network. And this blog post will help you download and setup Neo4J, set up a Twitter app (needed to access the Twitter API), pull down your social network as well as any other social network you might be interested in. At that point we’ll interrogate the network using the Neo4J and the Cypher syntax. Let’s go!
Installing And Setting Up Neo4j
Since we’re not setting Neo4J up for production use, this part’s real easy. Just go to the Neo4j download page, click on that giant blue download button, and 36.1M later you’ll have your very own copy of Neo4j. Unzip it to some reasonable place on your machine, cd into that directory, and simply issue the command bin/neo4j start. (Once you’re finished, a bin/neo4j stop will shut Neo4J down.) Now if you point your browser at http://localhost:7474 and see stuff (rather than lack of stuff), then you’re ready to start shoveling data into Neo4J.
You’ll need to create a Twitter app before you can start pulling down your connections because you need the app’s credentials in order to access Twitter’s API. But don’t sweat it, this literally takes less than a minute. Just go to the Twitter developer apps page, sign in, and there will be yet another big blue button, this time labeled “Create a new application” — click it! After filling out a really short form, checking the “I blindly agree to whatever is included in this legal contract” checkbox, entering a CAPTCHA string, and clicking the “Create your own Twitter application” button, you will indeed have your very own Twitter app. You’ll be taken to a screen that contains the details for your new app, but most importantly the OAuth credentials. Initially, you won’t have the access tokens, but you can click the “Create access tokens” button at the bottom and next time you refresh the page (wait a few seconds) you’ll see that the access keys are available. Keep track of the credentials here because you’ll need to refer to them soon.
Scraping Your Social Circles From Twitter
Check out my Python TwitterScraper script. Though it’s not yet the most beautiful code, it doesn’t really matter, because there’s not much here! Let’s take a moment to walk through it. The first section is where you set up Twitter and Neo4J. Naturally you’ll need to pip install the Tweepy and Py2Neo libraries, but they don’t have any weird dependencies, so this shouldn’t be a problem. Also notice, this is where all the access keys for your Twitter app should be used. Go ahead and copy and paste your credentials there. Now you should be ready to go.
Analysing Twitter Data with Neo4j
As part of a larger project we’ve been working on to build a simulation of social influence (more on that in a later post) we’ve been exploring ways of ingesting and analysing Twitter data.
There are a lot of moving parts to this task, of course, but one of the more interesting ones is how best to store and query the data. Given that Twitter is a part of the social graph, it made sense to us to take a look at what graph databases, such as Neo4J, have to offer.
Graph databases are particularly well suited to capturing social data because they store data as nodes and edges. In our Neo4J Twitter database we have four kinds of node: Twitter users, tweets, retweets and mentions.
If a user tweets a new tweet, this is stored as a user node, a tweet node, and a ‘tweeting’ relationship between the nodes.
If someone retweets the original tweet, the retweet and the person tweeting it are added as new nodes, and the retweeting user is connected via the tweet relationship to the retweet, and the retweet is connected via the retweet relationship to the original tweet.
If a user mentions another user, the mention tweet is connected to both the user that tweets it and the user mentioned.