Recently I was asked about choosing an ETL tool. And that was probably the sixth times somebody asked me that question, around choosing an ETL tool. I have written long and wide about all the considerations when we choose an ETL tool, see my article here: https://dwbi1.wordpress.com/2015/10/10/six-most-important-features-of-an-etl-tool/.
But it is wrong to consider every single factor because most of them will be irrelevant. When choosing an ETL tool, or indeed any other system/software, you should only consider what is relevant to you.
When comparing ETL software many people look at things which are not relevant to them. We must not fall into this trap. For example, suppose you have to choose between Pentaho, SSIS and Informatica. You should not look at the all the features of Pentaho, SSIS, and Informatica, but only the features that you need. For example, you should not look at the transformations that you will never use.
In most cases, you should not look at performance either, because most ETL software will be able to satisfy your performance requirement. You won’t be loading 1 TB data in 30 minutes! You are no where near the performance limit of most ETL tools. All three I mentioned (Pentaho, SSIS, Informatica) will be able to satisfy your performance requirements, unless you work for the top companies doing extreme data movements.
Your primary considerations should be not be functionality, but these 3 things:
I know both Pentaho and SSIS are free, but there are hidden costs, and actual costs. Prices vary according to usage. The more money you spend, the lower the unit price. So most vendors have tailored prices, very few have standard prices across all usages/volumes. So you have to ask around, most vendors will only be too pleased to answer your enquiries. My point is: know your limits. If an ETL tool is $1.2 million initial outlay + $200k annually, and your budget is only £200k initial + £20 annually, quickly strike them off your list. So first step: establish which ETL tools are within your budget.
b) Compatibility with the existing technologies in your company
I would say that this is the most important factor. Are you a SQL Server shop? If so then don’t look at Pentaho, but or Oracle ODI or BODI (or whatever SAP calls it these days), or WebSphere. Look at SSIS and Oracle. If you are an Oracle shop, look at ODI, Informatica, and probably Ab Initio. Don’t just look at RDBMS, but your middleware/messaging too. Is it WebSphere MQ or MSMQ or Tibco? If your data movement will be dealing with messages, then you need to choose a suitable one. Look at your BI tools too, i.e. if you use SAS BI then you have to consider SAS ETL first, as well as vendor neutral tool such as Informatica and Ab Initio.
c) Vendor financial strength
If the ETL vendor went bankrupt, your company will be in trouble. So choose a vendor which will be there in 10 years time? This doesn’t only apply to ETL, but when choosing any system/software. Choose a vendor with 50 employees with $1m in the bank, not 5 employees with $50k. Choose a vendor with 100 customers, not 5 customers. Asking for last 5 years balance sheet is a normal process. You can’t risk one vendor bringing your company down.
Your secondary considerations should be:
Once again we must not look at the candidates with a “general view”. But look at them with specifically “in your context”. For example: SSIS can do A and B, Pentaho can’t do C and D. Ask yourself: do you use A,B,C,D? If you don’t use them, then they are irrelevant! It does not matter whether SSIS or Pentaho can or can not do them. Look at them within your context.
b) The user friendliness
Download them (or ask for an evaluation copy) and use them both to your daily tasks and you can find out which one is easier to use. One or two vendor refuse to give evaluation copy before they see real opportunity to sell. Again this is simply a “level mismatch”, i.e. if your budget is $200k the $1.2m vendor would not be interested in talking to you. But neither should you!
c) Your skill set
You and your colleagues already have certain skills (and looking forward to develop them). Which one are you more capable of? If you are a SQL Server shop, naturally your skill set will be in SSIS, and not Pentaho, or Ab Initio. Same with SAS, IBM, and SAP.
Factor you should not care about
Those are the primary and secondary consideration factors. But there are a few factors which you should not spend any time on. They are:
- Don’t look at what they are made of. You should not care whether they are written in Java or .NET or R. This factor is irrelevant.
- Don’t think about what other companies think. Don’t bother asking for a reference because the vendor will always refer you to their best client, who will always sing the praises for them. The vendor will never put you in touch with a failed client who might say negative things about the software.
- Look at the Magic Quadrant. Do read the details but don’t look at the quadrant. The details specify the strengths and weaknesses of each ETL tool, but the position in the quadrant will blind your actual need. Again, only look at the features which are relevant to you, not the whole features. And the quadrant is constructed to reflect the whole feature set, hence it is irrelevant to you.
EII vs ETL
In addition to ETL tools you could consider EII tool (Enterprise Information Integration). The best in this field is Composite Software. EII does not move the data, but integrate it and feed it directly to the BI tools. It is probably irrelevant to your company’s need. But in special cases it is relevant. So don’t get blinded with ETL, ETL, ETL. But also consider the alternative.