Tableau supports initial SQL for Hadoop Hive connections, which allows you to define a collection of SQL statements to perform immediately after the connection is established. For example, you can set Hive and Hadoop configuration variables for a given connection from Tableau to tune performance characteristics. Refer to the Designing for Performance article for more information. You can also register custom UDFs as scripts, JAR files, etc., that reside on the Hadoop cluster. Registering these allows you, other developers, and analysts to collaborate on developing custom data processing logic and quickly incorporating that into Tableau views.
Because initial SQL supports arbitrary Hive query statements, you can use Hive to accomplish a variety of interesting tasks when connecting to Tableau.
Custom analysis with UDFs and Map/Reduce
Although Hive offers additional UDFs that Tableau does not yet support as functions for you to use in calculated fields, Tableau does offer "Pass Through" functions for using UDFs, UDAFs (for aggregation) and arbitrary SQL expressions in the SELECT list. For example, to determine the co-variance between two fields 'f1' and 'f2', the following Tableau calculated field takes advantage of a UDAF in Hive:
RAWSQLAGG_REAL("covar_pop(%1, %2)", [f1], [f2])
Similarly, Tableau allows you to take advantage of custom UDFs and UDAFs built by the Hadoop community or by your own development team. Often these are built as JAR files that Hadoop can easily copy across the cluster to support distributed computation. To take advantage of JAR files or scripts, inform Hive of the location of these files and Hive will take care of the rest.
Note: You can also do this with Initial SQLwith one or more SQL statements separated by semicolons:
add JAR /usr/lib/hive/lib/hive-contrib-0.7.1-cdh3u1.jar;
add FILE /mnt/hive_backlink_mapper.py;
For more information, refer to the Hive language manual section on CLI: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveResources.
Hive supports explicit control over how to perform the Map/Reduce operations. While Tableau allows you to perform sophisticated analysis without having to learn the Hive query language, as an advanced Hive and Hadoop user, you can take full advantage of this knowledge in Tableau. Using Custom SQL, you can define arbitrary Hive query expressions, including the
MAP, REDUCE, and
TRANSFORM operators described in the Hive language manual. As with custom UDFs, using custom transform scripts may require you to register the location of those scripts using Initial SQL.
Refer to the following blog for an interesting example of using custom scripts and explicit Map/Reduce transforms: http://www.cloudera.com/blog/2009/09/grouping-related-trends-with-hadoop.