Useful Bits and Bytes: August 2012

Friday, August 17, 2012

Embedded Teiid

When developing a new Teiid translators, I found very convenient to use a new 8.1 feature: Embedded Teiid. Basically this is a working Teiid server, which does not require JBoss AS as a host platform. Of course some functionality provided by Application Server has to be added manually (like transactions, clustering or data connection pooling), but in many cases it is unnecessary, especially when you need just a simple federated database or have to quickly test your new translator.

Basic functionality can be unit-tested with frameworks like Mockito. But only integration test can give you a full picture. For integration tests I have to run a JBoss and Teiid instance as Cargo container with translator jars deployed - basically a mini-scaled copy of our application. This is a fragile and slow way - you have to emulate the environment on which your application is running, provide a proper configs and data sources, make sure the integration tests executed as part of your local build. Better if you have a flexibility of unit tests and completeness of integration tests combined together.

Below is a sample code how to instantiate an Embedded server:

 EmbeddedConfiguration ec = new EmbeddedConfiguration();  
 server = new EmbeddedServer();  
 server.start(ec);  
 final JdbcDataSource h2ds = new JdbcDataSource();  
 h2ds.setURL("jdbc:h2:file:target/accounts");  
 EmbeddedServer.ConnectionFactoryProvider<DataSource> jdbcProvider =   
 new EmbeddedServer.ConnectionFactoryProvider<DataSource>()   
 {  
      public DataSource getConnectionFactory() throws TranslatorException   
      {  
           return h2ds;  
      }  
 };  
 server.addConnectionFactoryProvider("source-jdbc", jdbcProvider);  
 server.addTranslator(new H2ExecutionFactory());  
 ModelMetaData jdbcModel = new ModelMetaData();  
 jdbcModel.setName("Accounts");  
 jdbcModel.setSchemaSourceType("native");  
 jdbcModel.addSourceMapping("h2-connector", "h2", "source-jdbc");       
 teiidServer.deployVDB("example", jdbcModel);

Create server, define model and data sources, deploy VDB - in a less then twenty lines of code. Now you can connect and run your queries:

 TeiidDriver td = server.getDriver();  
 Connection c = td.connect("jdbc:teiid:example", null);  
 final String sql = "select * from Product";            
 List<String> results = execute(c, sql, true);

Really, cannot be simpler. The complete listing of unit-test class attached for reference. Also take a look on "embedded-portfolio" example from Teiid 8.1 sources, my post was greatly inspired by it.

Tuesday, August 7, 2012

My Experience with OrientDB JDBC

I did some graph DB research and POC about six month ago for a "configuration management" task. The short description: the system might have a different set of modules, each consist from a bigger set of different parts. Almost all parts has different types of relations inside one module and across them. The task is to easily store this data, and have some complex search and retrieval logic. A typical graph database task. My weapon of choice at that point was OrientDB.
My first results were quite good - I was able to quickly import data from our Teiid relational federated database and build a graph with 322591 vertices and 322636 edges. The complete load process took about 40 minutes, mostly because slowness on our side. Our federated database consist from many different parts: fast and small H2 tables, custom data translators to a legacy data, and large and slow DB2 tables with hundreds of thousand records on mainframe.
I used both Native and Tinkerpop API and found it useful and easy to use. In my POC I took Jung for graph visualization, and for a collection of graph algorithms.
Here are just some results for the graph above:

"Find Shortest Path" (Dijkstra): 4781.0 ms
"Check Nodes Connected" using Dijkstra algo: 7392.0 ms
"Check Nodes Connected" using SQL TRAVERSE extension from OrientDB: 23829.0

Note that results were obtained from in-memory graph, with no optimization (raw quality java code, no indexes defined in OrientDB, unlimited walk depth in traversal). I am confident the results can be improved very easily.
My next step was the integration of OrientDB into Teiid. Since we have a huge federated database, the way to use SQL to send and retrieve data from graphDB considered as convenient but somewhat unnatural approach. I can explain benefits and drawbacks of using this approach separately, it deserves a whole new post. Unfortunately that time (6 months ago) I had to step back: the JDBC implementation of OrientDB was very raw and did not provide enough metadata for Teiid 7.6 to be included in federated database.
I got back to this task recently, trying a new Teiid 8.1 (it has a useful notion of defining metadata in DDL along with NATIVE support like the old Teiid) and with latest OrientDB 1.2.0-SNAPSHOT. I took the latest orientdb JDBC driver sources from git and compiled jar file by myself. Taking the latest JDBC sources was probably my biggest mistake. I've tried to connect to running OrientDB instance using Eclipse Database Explorer and run some queries. Unfortunately even simple and innocent queries against my imported graph or even against built-in "tinkerpop" database provided inconsistent results and even caused Eclipse to freeze sometime:

select * from V where o_id like 'demodata.s%';
select from 6:6 where any() traverse(0,10) (o_id = 'demodata.stats');

Something definitely wrong in the latest JDBC driver code. Obviously, the JDBC driver is not a top priority feature for a graph database. I probably can implement the needed functionality by myself - write a custom Teiid translator (might be a considerably big task, depends of the level of SQL support we want to achieve), or try another graphDB implementations. I will be looking also on neo4j - they also have a jdbc driver. The query language they support in JDBC is not SQL, but (I think) it can be worked out.
This task is on hold for now anyway, I'll make an update in two-three months.

Thursday, August 2, 2012

Poll vs Push in Database using JBoss Teiid

Sometimes you have to monitor a specific table or view in your database, and do some action when INSERT or UPDATE happens. There are two approaches - you can periodically poll the table using Quartz or any other scheduling tool. Or you can be notified by database itself when your table modified - database will push you the new data.
The second approach even more beneficial when your table get updates rarely (so you not waste CPU by returning empty results most of the time), or when you want to be notified immediately when the new data available.

The good solution to the task above on my mind is database trigger. The commercial database vendors (like Oracle, if I am not mistaken) provide a triggers which (when fired up) are capable to run an arbitrary java code. But what if you don't have Oracle, and the data source of your choice is mySQL, Apache Hive, H2, or even arbitrary CSV file, laying down somewhere on filesystem?

Here is my approach: I made an UPDATABLE VIEW for my table in H2 database under JBoss Teiid, then created a VIRTUAL FUNCTION (Teiid virtual functions are capable to run specified java class and method). Last step - define two triggers: INSTEAD OF UPDATE and INSTEAD OF INSERT. When my table in H2 got a new record, or existing one updated - my java class gets immediately executed by Teiid.

Implementation Details:

Below is the Teiid 8.1 model definition with VIEW, VIRTUAL FUNCTION and two TRIGGERs defined. Note a property "lib" - it is a name of JBoss 7 Module with your java class in it (yes, you have to create a JBoss module. It is a simple task and well documented).

 <vdb name="DynamicPortfolio" version="1">  
   ...  
   <property name="lib" value="com.test.teiid.translator.async" />  
   <model name="TrigView" type="VIRTUAL">  
      <metadata type="DDL"><![CDATA[  
             CREATE VIRTUAL FUNCTION RepHealth(  
                     healthTime varchar,  
                     policyKey varchar,  
                     objKey varchar,  
                     healthState varchar)   
                RETURNS integer   
                OPTIONS (JAVA_CLASS 'com.test.teiid.udf.HealthStatesTeiidFunction',   
                     JAVA_METHOD 'reportHealthState');  
           CREATE VIEW HS_VIEW OPTIONS(UPDATABLE TRUE)  
                as select * from Accounts.HEALTHSTATE;  
             CREATE TRIGGER ON HS_VIEW INSTEAD OF INSERT  
                AS FOR EACH ROW  
                BEGIN ATOMIC  
                     SELECT RepHealth(  
                          New.HEALTHTIME, New.POLICYKEY, New.OBJKEY, New.HEALTHSTATE)   
                     from HS_VIEW;       
                END;  
             CREATE TRIGGER ON HS_VIEW INSTEAD OF UPDATE  
                AS FOR EACH ROW  
                BEGIN ATOMIC  
                     SELECT RepHealth(  
                          New.HEALTHTIME, New.POLICYKEY, New.OBJKEY, New.HEALTHSTATE)   
                     from HS_VIEW;       
                END;  
      ]]></metadata>  
   </model>  
 ...  
 </vdb>

The VDB changes above should work with Teiid's "dynamicvdb-portfolio" example, I installed this example on my Teiid in advance and created schema for additional "Accounts.HEALTHSTATE" table in H2. The DDL is below. You can add a new model right into portfolio-vdb.xml from the example.

 CREATE TABLE HEALTHSTATE  
 (  
   HS_ID integer,  
   HEALTHTIME varchar(8),  
   POLICYKEY varchar(8),  
   OBJKEY varchar(8),  
   HEALTHSTATE varchar(8),  
   CONSTRAINT HEALTHSTATE_PK PRIMARY KEY(HS_ID),  
 );

Write your java class and package it as JBoss Module. The called java method should be static.

 package com.test.teiid.udf;  
 public class HealthStatesTeiidFunction {  
      public static int reportHealthState(String healthTime, String policyKey, String objKey, String healthState)  
      {  
           String ret = "At:" + healthTime + " for Policy:" + policyKey + " and Object:" + objKey + " the State is:" + healthState;  
           System.out.println(ret);  
           return 1;  
      }  
 }

Start JBoss, make sure your module and VDB are deployed and active. Connect to your database and issue INSERT or UPDATE statement against your defined view.

 insert into HS_VIEW(HS_ID, HEALTHTIME, POLICYKEY, OBJKEY, HEALTHSTATE)  
 values(4, '10:28 AM', 'policy1', 'obj1', 'RED');

See the java output in console:

 13:51:18,336 INFO [stdout] (Worker0_QueryProcessorQueue0) At:10:28 AM for Policy:policy1 and Object:obj1 the State is:RED

Note that since triggers defined as "INSTEAD OF..", the actual insert or update does not happen. Also, in order to be notified by the database changes, your underlying system should issue inserts or updates not to the actual H2 table, but to Teiid UPDATEABLE VIEW we defined (HS_VIEW in example above). Teiid itself is a federated database, available by JDBC, but connecting to it might be an added complexity to your solution.

Wednesday, August 1, 2012

JBoss 7 and Maven Plugin

This days I play a lot with the latest version of JBoss Teiid (8.1.Beta2). As part of my development activity I have JBoss 7.1.1.Final running on my workstation, and I want to shorten my development cycle: modify code -> build war -> deploy. Yes, I know about JRebel, but today I want not to use it.
My project build tool is Maven, so solution is straight: use jboss-as-maven-plugin. I want to generate war file from maven and deploy it to standalone JBoss without restarting it.

Steps to follow

Define plugin in pom.xml. The definition below attaches to "clean" and "package" maven life cycles.

<project...>
    <build>
        ...
        <plugin>
                    <groupId>org.jboss.as.plugins</groupId>
                    <artifactId>jboss-as-maven-plugin</artifactId>
                    <version>7.1.1.Final</version>
            <configuration>
                <filename>${project.build.finalName}.war</filename>
            </configuration>
                    <executions>
                           <execution>
                                <id>undeploy</id>
                                <phase>clean</phase>
                                <goals>
                                    <goal>undeploy</goal>
                                </goals>
                                <configuration>                                           <ignoreMissingDeployment>
                                       true
                                   </ignoreMissingDeployment>
                                </configuration>
                           </execution>
                           <execution>
                                <phase>package</phase>
                                <goals>
                                    <goal>deploy</goal>
                                </goals>
                            </execution>
                    </executions>
            </plugin>
        ...
    </buiid>
</project>

Optionally - create Eclipse launch configuration to execute "mvn package" or "mvn clean package".
Start JBoss, run launch configuration in Eclispe, check messages in your server log that new context is regustered and war file is deployed / replaced.

What happens behind the scenes?

Usually deployed war files placed in <jboss7>/standalone/deployments folder (when standalone configuration used). After JBoss restart the war file gets deployed (new "xxxx.deployed" flag file gets created in the same folder).
In case of "jboss-as-maven-plugin" deployment, the <jboss7>/standalone/configuration/standalone.xml file gets modified: a new "<deployment>...</deployment>" added to it.
The actual deployed war file can be found now under one of the <jboss7>/standalone/data/content/ subfolders.