My first results were quite good - I was able to quickly import data from our Teiid relational federated database and build a graph with 322591 vertices and 322636 edges. The complete load process took about 40 minutes, mostly because slowness on our side. Our federated database consist from many different parts: fast and small H2 tables, custom data translators to a legacy data, and large and slow DB2 tables with hundreds of thousand records on mainframe.
I used both Native and Tinkerpop API and found it useful and easy to use. In my POC I took Jung for graph visualization, and for a collection of graph algorithms.
Here are just some results for the graph above:
- "Find Shortest Path" (Dijkstra): 4781.0 ms
- "Check Nodes Connected" using Dijkstra algo: 7392.0 ms
- "Check Nodes Connected" using SQL TRAVERSE extension from OrientDB: 23829.0
My next step was the integration of OrientDB into Teiid. Since we have a huge federated database, the way to use SQL to send and retrieve data from graphDB considered as convenient but somewhat unnatural approach. I can explain benefits and drawbacks of using this approach separately, it deserves a whole new post. Unfortunately that time (6 months ago) I had to step back: the JDBC implementation of OrientDB was very raw and did not provide enough metadata for Teiid 7.6 to be included in federated database.
I got back to this task recently, trying a new Teiid 8.1 (it has a useful notion of defining metadata in DDL along with NATIVE support like the old Teiid) and with latest OrientDB 1.2.0-SNAPSHOT. I took the latest orientdb JDBC driver sources from git and compiled jar file by myself. Taking the latest JDBC sources was probably my biggest mistake. I've tried to connect to running OrientDB instance using Eclipse Database Explorer and run some queries. Unfortunately even simple and innocent queries against my imported graph or even against built-in "tinkerpop" database provided inconsistent results and even caused Eclipse to freeze sometime:
select * from V where o_id like 'demodata.s%'; select from 6:6 where any() traverse(0,10) (o_id = 'demodata.stats');
Something definitely wrong in the latest JDBC driver code. Obviously, the JDBC driver is not a top priority feature for a graph database. I probably can implement the needed functionality by myself - write a custom Teiid translator (might be a considerably big task, depends of the level of SQL support we want to achieve), or try another graphDB implementations. I will be looking also on neo4j - they also have a jdbc driver. The query language they support in JDBC is not SQL, but (I think) it can be worked out.
This task is on hold for now anyway, I'll make an update in two-three months.