Netezza Director of Product Marketing Razi Raziuddin is blogging today.
I’ve been at The 2010 TDWI World Conference in San Diego this week, where the theme is "agile BI that delivers data (I would use the term ‘insights’) at the speed of thought.” Timing is everything when it comes to making decisions – and influencing other to make decisions we’d like to see.
We’ve all experienced Red Car Syndrome at some point or another. You test drive a red car. You like it. Suddenly, you start noticing red cars everywhere – not because the number of red cars has increased, but because the experience of driving a red car is now personalized. Online advertisers use Red Car Syndrome to connect consumers with the products they genuinely want, as I was reminded first-hand recently. While searching for kitchen fixtures online, I noticed that many of the ads featured a pair of pricey fixtures that initially caught our eye, but that we had rejected as exceeding our budget. But the ads seemed to know our tastes better than we did, and ultimately we succumbed and made the purchase.
The experience brought home the power of right-time analytics. Speed is critical in making analytics actionable and delivering real value to the business. The trifecta of huge data volumes, complex analytics and query performance is an increasingly common thread in the BI and data warehousing world. It is true not just for online marketers, but cuts across industry lines. Whether it is an insurance provider trying to prevent fraud, a telco determining the cheapest and best path to route a call or a government agency unearthing criminal activity, time to insight from big data makes the difference in every case.
Doug Henschen recently wrote a good article on this topic for InformationWeek in which he calls out success in the Big Data era as the ability to get faster insights from huge data sets. The article highlights Catalina Marketing’s petascale data warehouse environment and the fast insights they derive from a huge database of 195 million consumers.
Although not every enterprise has a data warehouse environment quite that large, the need to perform complex analytics and derive insight in the shortest time possible is common in every environment, big or small. While scalable MPP architectures address the big data problem quite well, the big math problem associated with complex and advanced analytics is what many customers still wrestle with. There’s general agreement that in-database processing, especially in scalable MPP systems, is the right solution to the big math problem. Doug’s article again highlights Catalina’s use of in-database analytics to radically streamline their analytic modeling environment and gain efficiencies of 10X as a result.
However, not every data warehouse platform is geared up for the challenges of performing in-database analytics at scale. The first and obvious challenge is the additional processing overhead required to run advanced analytic algorithms alongside the traditional data warehouse workload. You need a system architecture that is not overwhelmed by the data volumes typical of data warehouses in the Big Data era. Then there is the question of what analytics you want to perform. The majority of commonly available analytic libraries are written for in-memory processing in SMP systems and need to be parallelized in order to take advantage of MPP architectures. The analytic system should not only offer parallelized versions of the analytics you desire, but also provide primitives to easily parallelize advanced analytic algorithms while hiding the complexity of parallel programming from developers.
Finally, the dearth of universally accepted standards in the advanced analytics world poses yet another challenge. A typical analytic environment may consist of a mish-mash of commercially available tools such as SAS and SPSS, open source ones such as R and Hadoop (which are gaining popularity), and tons of application code written in various languages such as Java and Python. The underlying system must offer tremendous flexibility in integrating with a wide array of analytic tools and support for a variety of frameworks and languages.
In subsequent posts, I’ll talk about Netezza’s advanced analytic capabilities to enable big math on big data. In the meantime, as you plan your analytic infrastructures for the Big Data era, tell us what challenges you are coming up against.







