Fascinating Food Networks, in Neo4j
Rik Van Bruggen creates an interesting food network graph using Neo4j!
When you’re passionate about graphs like I am, you start to see them everywhere. And as we are getting closer to the food-heavy season of the year, it’s perhaps no coincidence that this graph I will be introducing in this blogpost – is about food.
A couple of weeks ago, when I woke up early (!) Sunday morning to get “pistolets” and croissants for my family from our local bakery, I immediately took notice when I saw a graph behind the bakery counter. It was a “foodpairing” graph, sponsored by the people of Puratos – a wholesale provider of bakery products, grains, etc. So I get home and start googling, and before you know it I find some terribly interesting research by Yong-Yeol (YY) Ahn, featured in a Wired article, and in Scientific American, and in Nature. This researcher had done some fascinating work in understanding al 57k recipes from Epicurious, Allrecipes and Menupan, their composing ingredients and ingredient categories, their origin and – perhaps most fascinating of all – their chemical compounds.
And best of all: he made his datasets (this one and this one) available, so that I could spend some time trying to get it into neo4j and take it for a spin.
The dataset: some graph cleanup required
The dataset was there, but clearly wasn’t perfect for import yet. I would have to do some work. And like always, that works starts with a model. Time to use Arrows again, and start drawing. I ended up with this:
The challenge really was in the recipes. As you can see from the screenshot below, that data is/was hugely denormalised in the dataset that I found, and logically so: some recipes will only have a very limited number of ingredients, others will have lots and lots:
So what do you do – especially when you’re not a programmer like myself? Indeed, MS Excel to the rescue!
It turned out to be a bit of manual work, but in the end I found it very easy to create the sheet that I needed. It was even less than 500k rows long in the end – so Excel didn’t really blink. You can find the final excel file that I created over here.
Then it was really just a matter of exporting excel to CSV files, and getting it ready for import into neo4j with neo4j-shell-tools. Again: easy enough – I sort of went through this a couple of times before. You can find the zip file with all the csv files over here, and the neo4j-shell instructions are in this gist.
Read the Full Article Here.