Open Gov Data talk at Neo4j London User Group

A call for Lightning Talks was sent out to the Neo4j London User Group, so I put in a few ideas and my Open Gov Data proposal was selected above Authentication (and adding extra users) for the March 2017 meetup.

The angle of my talk was around the suitability of Neo4j for rapidly exploring data sets.

If you’re just after the slides, they are available here: Exploring Open Government Data using Neo4j

Query profiling & tuning

In the interests of time I decided to update some of my previous work that used a GraphGist with a cut-down dataset. This time I decided to use the full version of the latest (2015) UK road safety dataset for makes & models which is licensed under the Open Government Licence v3.0. Out of curiosity I wanted to see how the LOAD CSV query from my GraphGist performed with ~142k rows. The answer wasn’t pretty as it took around an hour on my laptop.

Profiling revealed the absence of indices on the ‘lookup’ nodes was causing a variety of ‘NodeByLabelScan‘ operations as can be seen in figure 1.

Image may be NSFW.
Clik here to view. profile_before

Figure 1 – Query profile of the unoptimised query

The developer guide section on importing CSV files has some useful pointers (as well as links to great resources by Michael Hunger & Mark Needham). The most relevant one in this case was to “Avoid merging nodes and relationships in the same query“, as you can see that figure 1 contains both “MergeCreateNode” and “MergeCreateRelationship“. Consequently I made a slight simplification to the graph model by retaining the car model as a property on the Vehicle node instead of using a separate Model node. This removed a relationship merge and meant I could run the data load during the live demo. Plus the Model nodes could always be reconstituted by a separate query.

Rather than leaving dead time, I talked through what the LOAD CSV query was doing and let it run in the background… On the night it took 38 seconds, which is almost 2 orders of magnitude faster than the unoptimised query. As I put down on slide 15 – profile your queries!

Open Gov Data talk at Neo4j London User Group

Query profiling & tuning

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112