Hello,
We currently use Cognos Data Manager as an ETL tool but we have come many across many issues and feel it is no longer beneficial to continue creating work arounds for the product inadequacies.
We have future requirements for DI across the board which includes federated data, MDM, Metadata Management and schema creation. I am keen to understand which tools take advantage of power of the NPS using push down processing.
Does anybody have any advice or experiences they wish to share regarding the pros and cons of other tools on the market. Any assistance would be most welcomed.
I am specifically interested in Informatica, SAS, BODI and IBM Information Server
Cheers
Guy
Hello Guy, good to see you appearing on this forum.. (we spoke a few months back).
At Edge we've exeperience of using Informatica, Ab Initio, Sunopsis (now Oracle ), DataStage on client projects with Netezza. We're also currently working with Expressor to optimise their Netezza interface.
Not sure if this forum is the right place for potentially long wide-ranging discussions, but there are already some comments in other threads here if you search for ETL.
Do you have any specific questions at this stage?
Hope things are going well with your Netezza implementation
Rgds
D.
Dave Shuttlerworth
Principal Consultants
Edge Associates
I don't think there are any ETL tools that really utilize the ability to push transforms down into Netezza very well. Informatica is supposed to have a feature which does this but based on the 2nd hand information I've heard it does a pretty poor job of it when it works at all. I have no direct experience with Informatica so that is pretty much all I can say about it.
We use DataStage (IBM Information Server) version 7.5.2 which has very limited integration with Netezza. You are basically limited to ODBC connectivity to pull data out of Netezza and nzload/external tables to load data into Netezza (it works well with these interfaces, but no "push down" capability). It is a solid tool and easy to find resources familiar with it, but there are many things which ought to work much better than they do in general -- not just with Netezza but with Oracle as well.
Parsing a string into a date or timestamp in DataStage is a royal pain in the rear and sometimes NULL column values can be a bit painful too. The primary way we use DataStage is to pull data from external sources, do transformations inside of the tool, and then push data into a staging table. It does these things well enough that we are not looking to move to a different tool, but I'm a "glass half full" type person so I'd love to see if other tools make certain things easier than DataStage does (also maybe DataStage 8 addresses some of these issues, but I keep hearing that it still has a lot of bugs).
Anyway I think that you will find pretty much every ETL tool has pretty major flaws, especially with their Netezza integration, but if you pick one of the major tools (Informatica, DataStage, Ab Initio) you'll be able to get the job done with the tool and you'll be able to find plenty of people who know those tools for contract work and/or new hires. I don't believe any ETL tool does a good job of "push down" of transformations into Netezza at this time, others can correct me if my understanding is wrong. We just use the ETL tool to push data into a staging table which looks just like the target table and then run the nzreplace.ksh script which I wrote a few years back, see post/attachment here if you want a copy of this script: http://www.enzeecommunity.com/message/2701#2701
I've not personally used either Informatica or Ab Initio though, so hopefully others with actual experience using those tools can enlighten us a bit as to how well they work.
Also I think you should take a serious look at the open source ETL tools. I've tested Pentaho Data Integrator (aka Kettle or PDI) and the performance is actually pretty good, it is also much easier to do many things with PDI than it is with DataStage. I think Talend is also good but I dislike the Eclipse based development environment. The biggest advantage of PDI is that you can download it and try it out for a bit. If you don't like it then don't use it. If you pay for a license from Pentaho you get support plus some additional features in the tool... can probably also get them to build a direct loader for PDI instead of relying on Java bulk inserts (pretty fast but considerably slower than you could get with nzload).
One other thing, in general since DataStage doesn't do any push down of transformations into the database, what we do is just push the data into a staging table and then run a hand coded SQL statement which either pushes data into another staging table or pulls data back into DataStage. From what I have heard from others this is the most common way that people utilize Netezza for transformation type work using other ETL tools as well as DataStage. The ETL tools in general just do not do a good job of it so people just hand code those types of transformations in pure SQL.
Very strong SQL knowledge is a absolute requirement for anyone doing serious ETL work anyway so I think that in general this is a good methodology. As long as the SQL is well written it is just as easy to understand as it is to look at a bunch of code written inside of the ETL tool. You can do a lot with dragging lines and pictures around inside of the ETL tool GUI but in the end you always have to write a bit of code with complex if/then type constructs, function calls, etc anyway. It is just as easy to read the code in SQL as it is in whatever language the ETL tool has, plus the SQL is generally more portable since you can keep the SQL if you ever decide to replace the ETL tool.
Hi,
We use the Oracle Data Integrator, formally Sunopsis, to run our Elt on Netezza and it runs fantastic. The Odi tool is simply a repository that runs native sql on the target data source. That target can be Netezza, Db2, Oracle and many,many more databases. There is no special server for the tool. It also contains a progamming language, Jython, to teek the tool to do just about any thing you want. I do not know much about the other tools nut this tool is the best for running your Elt processes on the NPS server and they do run fast.
If you want more information let me know.
Thanks.
ODI used to license by number of CPUs on the server, thus if you have a big Netezza box the cost could be huge, last time I talked to them was nearly 3 years ago though so maybe they have altered their pricing structure somewhat. We didn't bother looking at the tool itself due to the pricing issues at the time.
Yeah, I don't know what it costs now but when we bought it, it was still Sunopsis. They sold it with Designer licenses and we bought 5 of them. So our total cost plus the Netezza connector was about $150,000. Since Oracle bought them I think they charge buy the cpu but I am not sure that is the same for the Netezza connection because of all of the Spu's. Oracle actually called me when they bought this tool to ask about the Netezza connection. They were trying to figure out priceing and were concerned that there "by cpu" pricing scheme would put them out of the market. So they may price it differently that other databases. We were grandfathered in and Oracle opened up the licences to no limit. I know that Oracle has made this tool the backbone of there middleware going forward using J Developer. It should replace Oracle Warehuse Builder in the near future with some of the best features of each. I am on the Customer Advisory Board for the Odi product so I get the future news.
The thing is if you go with Informatica (Expensive) or Data Stage you will have to buy hardware to run the Etl engine, unless you already have a server to run them on. We have another group who bought Informatica, after we bought Sunopsis, and the cost went well over $2 Million once you add in the high powered server. So the cost for Odi may not be that significant. With Odi you run all of your Etl on a server that you already have. like Netezza, using SQL.
It probably would make sense to check them out again. Let me know if you need more information.