Tuesday, August 7, 2012

My Experience with OrientDB JDBC

I did some graph DB research and POC about six month ago for a "configuration management" task. The short description: the system might have a different set of modules, each consist from a bigger set of different parts. Almost all parts has different types of relations inside one module and across them. The task is to easily store this data, and have some complex search and retrieval logic. A typical graph database task. My weapon of choice at that point was OrientDB.
My first results were quite good - I was able to quickly import data from our Teiid relational federated database and build a graph with 322591 vertices and 322636 edges. The complete load process took about 40 minutes, mostly because slowness on our side. Our federated database consist from many different parts: fast and small H2 tables, custom data translators to a legacy data, and large and slow DB2 tables with hundreds of thousand records on mainframe.
I used both Native and Tinkerpop API and found it useful and easy to use. In my POC I took Jung for graph visualization, and for a collection of graph algorithms.
Here are just some results for the graph above:
  • "Find Shortest Path" (Dijkstra): 4781.0 ms
  • "Check Nodes Connected" using Dijkstra algo: 7392.0 ms
  • "Check Nodes Connected" using SQL TRAVERSE extension from OrientDB: 23829.0
Note that results were obtained from in-memory graph, with no optimization (raw quality java code, no indexes defined in OrientDB, unlimited walk depth in traversal). I am confident the results can be improved very easily.
My next step was the integration of OrientDB into Teiid. Since we have a huge federated database, the way to use SQL to send and retrieve data from graphDB considered as convenient but somewhat unnatural approach. I can explain benefits and drawbacks of using this approach separately, it deserves a whole new post. Unfortunately that time (6 months ago) I had to step back: the JDBC implementation of OrientDB was very raw and did not provide enough metadata for Teiid 7.6 to be included in federated database.
I got back to this task recently, trying a new Teiid 8.1 (it has a useful notion of defining metadata in DDL along with NATIVE support like the old Teiid) and with latest OrientDB 1.2.0-SNAPSHOT. I took the latest orientdb JDBC driver sources from git and compiled jar file by myself. Taking the latest JDBC sources was probably my biggest mistake. I've tried to connect to running OrientDB instance using Eclipse Database Explorer and run some queries. Unfortunately even simple and innocent queries against my imported graph or even against built-in "tinkerpop" database provided inconsistent results and even caused Eclipse to freeze sometime:

select * from V where o_id like 'demodata.s%';
select from 6:6 where any() traverse(0,10) (o_id = 'demodata.stats');

Something definitely wrong in the latest JDBC driver code. Obviously, the JDBC driver is not a top priority feature for a graph database. I probably can implement the needed functionality by myself - write a custom Teiid translator (might be a considerably big task, depends of the level of SQL support we want to achieve), or try another graphDB implementations. I will be looking also on neo4j - they also have a jdbc driver. The query language they support in JDBC is not SQL, but (I think) it can be worked out.
This task is on hold for now anyway, I'll make an update in two-three months.

4 comments:

  1. Hi,
    yes the JDBC driver is quite raw, but please report issues to the OrientDB Group (http://groups.google.com/group/orient-database). The OrientDB team is very fast on fix issues!

    ReplyDelete
  2. Thanks Luca, as far as I see the JDBC driver is not included in the official OrientDB codebase yet. I'll gather more information and probably start with the discussion thread on community forum.

    ReplyDelete
  3. I am just starting to play with OrientDB alongside TeiiD. I am wondering if you were able to populate Orient through TeiiD (i.e. INSERT into select x,y,z from ?

    ReplyDelete
  4. Sorry for the delay with my reply. I haven't touched OrientDB for a long time, but I quickly looked up the latest documentation. It seems that they did a great job, lots of improvements. However I believe the OrientDB SQL dialect would not allow you to seamlessly plug it in to Teiid. See the differences here: http://www.orientechnologies.com/docs/last/orientdb.wiki/SQL.html#orientdb-sql-dialect (not only INSERTs are affected as you can see).
    If this is a show-stopper for you, the missing support can be added on Teiid side. It should not be very hard to build a custom translator which will extend the standard JDBC translator and add not supported features. I have built several custom translators already, some of them are available in my Github repository ( https://github.com/rokhmanov/teiid-translators ), feel free to get back to me if you need help.

    ReplyDelete