domingo, 26 de outubro de 2014

Kettle Database Value Lookup ignore case hack

Technology credits:
Java & Oracle RDBMS
Spoon (a.k.a Kettle)



Today i want to share with you a hack on the Pentaho ETL engine: Kettle.  I was investigating this tool, exploring the capabilities of Kettle to see what it could do for me. I've came to the conclusion that this is a very powerful tool that can do a lot, but also has it's 'shortcomings' especially when it comes to performance when dealing with a large amounts of data (around 1Gb).

Problem:
As i was testing this tool, i was trying to compare an input field from a file to a database column on the Oracle DB. As sometimes happens input data can sometimes be RaNdOm CaSe. That means the data can match if we ignore the upper/lower casing.

Solution:
Easy! Do a case insensitive match of the input! (you might say...) As i've come to the conclusion, it's not that easy to do that using Kettle... I wandered the web in search of clean and straighforward solution but i didn't find any so here is my solution:

First of all, convert the input field from file to upper case. This is easy, you can just run a simple JavaScript step to the input field and you are done.
Then you have to compare the uppercased field with the column from the database also in uppercase mode. Not so easy when the Spoon UI only enables you to select the table and column name on the 'Database Value Lookup' step, unless... you hack it!!!

Here's a screenshot of the hack:
What's the secret?
I've found out that the field 'Lookup table' is added to an SQL statement after the FROM keyword but...it's not as simple as that. You have to trick Kettle: Make sure your query has double quotes in it (such as "ID"), that's how the hack works. Kettle ends up injecting that SQL, requesting a column in uppercase mode from the database :)

Happy Coding!