Follow my blog with Bloglovin

Friday, December 27, 2013

Elasticsearch Indexing using Talend ETL



Talend Open Stutio is open source ETL tool, supporting 450+ components (read/writre/transorm). Support for most of the databases, file ormats and other data sources, like web service, FTP, SMTP and some NoSQL, HDFS/Hadoop (in Talend BigData). But, no component to work with full text search engines like Lucene, Solr or Elasticsearch.
Often, we need full text search support in our application along with analytics. Having a component will allow indexing created/updated along with data loaded in warehouse. So, I have created a component for indexing (create/update) for Elasticsearch. Elasticsearch, is distributed, open source, full text engine built on top of Lucene. It is well documented, supports REST API over JSON and many other native language API for indexing and querying.
NOTE: This component is tested with Elasticsearch version 0.90.7. And, to support add or update, at least one field in schema must be designated as key to identify rows as unique. Multiple columns can be designated as key (in case of composite primary keys).

Download component from Talend Exchange
Download sample jobs for your talend version from https://github.com/sylentprayer/sample-projects/releases. Two sample jobs are there for talend version v5.0.3 and v5.3.1.

A tutorial for walk through these jobs are here.

Popular Posts