[ www.netezza.com ]
1 2 Previous Next

Thinking Inside the Box

19 Posts authored by: Administrator
1

April 28, 2008

 

"‘To be is to do.' - Immanuel Kant

"‘To do is to be.' - Jean Paul Sartre

"‘Do-be-do-be-do' - Frank Sinatra"

--Kurt Vonnegut, Jr. (Nov 1922 - Apr 2007)

In the news today: the Compress Engine

 

In 1783 Immanuel Kant wrote, "David Hume woke me up from my dogmatic slumbers," and revolutionized the way humanity thinks about metaphysics. Almost 220 years later, Netezza set out to achieve a similar goal - redefine analytics. When the first NPS® data warehouse appliance was introduced, the market released itself from yet another dogmatic slumber and realized that there is a different, better way to do data warehousing; a way without compromise, a way without limits.

 

Netezza has helped to reenergize the data warehouse market in creating and leading the data warehouse appliance category.

 

  • "Every time you turn around you see another industry that's facing a tidal wave of data and they need to understand what this data is saying. Many of them have data volumes in this range that they haven't been able to afford to analyze, as much as they'd like to. ... Netezza can deliver that analytic capability, and at a very attractive price." - Richard Winter, Winter Corporation, from Netezza will scale its appliance to petabyte range, InfoWorld (January 2008)

  • "This is what Netezza has done in the data warehousing market: it has totally changed the way that we think about data warehousing... So the bottom line is not just that Netezza's entry into the market was a black swan event but that that event has not ceased to unfold." - from Netezza: a black swan by Philip Howard, Bloor Group (October 2007)

  • "Appliances are here to stay and are revolutionizing the data warehouse industry." - from Business Analytics Appliances Are Here to Stay, by Dan Veset, IDC (June 2006)

  • "The term data warehouse appliance was coined by Netezza, and this vendor has blazed a trail by proving the concept and educating the market." - from Defining the Data Warehouse Appliance, by Philip Russom, TDWI (August 2005)

 

Since 2002, Netezza has been repeatedly breaking the latency barrier and challenging the boundaries of data analytics. Since our first release, we have been continuously refuting the alleged mutual dependencies that became the building blocks of the industry's dogmatic misconceptions; namely the expensive nature of performance, the necessary complexity of the analytics architecture and the unavoidable limits of scalability. With today's announcement of the Compress Engine, Netezza disproves yet another myth - the inverse relationship between data compression and query performance.

 

The architectures of traditional data warehouses, steeped in a legacy of serving OLTP applications, were not designed to handle the ever-growing amounts of data combined with larger and more complex user workloads and shrinking data latency requirements that characterize the modern enterprise. Regulatory compliance, electronic commerce and the need to process and analyze all data in a matter of seconds has pushed the capabilities of traditional data warehouse systems to their limits. In reaction to the data capacity pressures, vendors introduced compression; not as an enhancement but as a compromise solution that allows for further data growth at the cost of processing performance.

 

Traditional compression approaches, used by several of the competing data warehouse vendors, typically result in performance degradation to accomplish the compression effect. Netezza's addition to the FPGA-Accelerated Streaming Technology (FAST) Engines framework - Compress Engine - utilizes its innovative streaming architectureTM not only to increase the system's storage capacity by 2-4X but actually boost overall streaming query performance by a factor of about 2X (100%). All this is achieved without requiring any tuning or administration, and it is in fact a software-only upgrade that enables Compress Engine on the Netezza appliance.

 

It's actually really cool technology, obviously something we love to rave about. Late last year, I wrote about FAST Engines in this blog. We'll use that as a starting point and dig a level deeper into how Compress Engine works. I'm sure it will tickle the fancy of the geek in you!

 

 

The NPS system employs a patent-pending method for compiling (yes, compiling) columnar data in all the tables of the database as it is being written to disk e.g. during load, insert or update operations. The process converts row-based data into column streams that are independently compiled to replace the original data in the columns with a stream of "instruction sets" for the FPGA. The "instructions" themselves are much smaller in size than the data they replace, resulting in a highly compressed data stream emerging from the process.

 

While the compression occurs on columnar data because of the inherent compressibility within database columns, the compressed data is reassembled in rows before being written to disk. Row-wise storage of tables avoids the data scan complexity associated with columnar stores and ensures that scanned data can be efficiently parsed and processed without the need to reconstitute it from multiple sources. The compressed data uses disk much more efficiently and increases the data density of NPS systems by 2-4X - in some cases substantially higher - allowing customers to scale their NPS data warehouse systems into the hundreds of terabytes of user data.

 

But if the NPS system's data compression and scale brought the system's performance to its knees or severely limited performance speedup due to compression (as it does on many of those other systems), it wouldn't be so great, would it? The beauty of the Netezza way of providing data compression is that not only does it have no negative impact on performance, but it actually increases query performance by up to 100%!

 

 

 

As the compressed data is read off the disk, it is passed through the Compress Engine which applies the instructions embedded in the data stream to restore it to its original form. Our compilation algorithm ensures that this decompression process can be performed entirely in silicon, at wire speeds. Each physical block scanned from the disk can mushroom into 2 to 4 or more times its size in memory without incurring any overhead in processing time - i.e. 2 to 3 times more data is scanned in the same amount of time without any increase in system hardware! Our internal benchmark testing reflecting real customer configurations and workloads has shown an overall 2.2X increase in streaming query performance through the use of Compress Engine.

 

 

This software-only enhancement, enabled by our unique architecture, is only the beginning. As we continue to develop our platform, we are investigating further enhancements to the Compress Engine or the addition of new FAST engine(s), aimed at directly increasing streaming performance on the NPS system.

 

 

Our philosophy and aim is to continue to shake the industry out of its dogmatic slumbers by extending the price/performance advantages of our products; showing that there's a different way to do data warehousing and advanced analytics. One where performance and scalability are neither the result of expense nor complexity, where you can get more performance from compression, where you do have the power to question everythingTM ...

 

 

 

 

1 Comments 0 References Permalink
0

April 21, 2008

"Imitation is the sincerest of flattery."

- Charles Caleb Colton (1780-1832), from his Lacon, Vol. I, published in 1820

Welcome to the Data Warehouse Appliance club - another validation of an important, growing market segment

 

Well, well, well! "Only" eight years after Netezza coined the term and invented the market segment, Teradata today finally officially entered the Data Warehouse Appliance market. Though it's a bit late, and certainly behind a number of other vendors, perhaps today's entry will put an end to Teradata's vacillating over whether they 'invented' the concept or not, were an appliance or not, or whatever. In the past couple of years, it seems Teradata spokespeople have gone out of their way to say their product was simultaneously a data warehouse appliance and absolutely not one - even booking appearances on panels of data warehouse appliance "vendors". Certainly their announcement is another validation that the role of Data Warehouse Appliances is an important and growing one not only in the current market, but for the future as well.

 

Derivative Marketing and a "Repackaged, Warmed-over" Product?

 

 

Teradata is positioning this new product as being, "simple, powerful and cost-effective" - which to our way of thinking sounds much more than a little derivative from Netezza's["Performance, Value and Simplicity"|http://www.netezza.com/customers/data-warehouse-appliance-customers.cfm], but I'll leave it to the reader to decide if you think so. Our reading of the Teradata announcement sounds like just another larger vendor's "repackaging" alternative to respond to the competition. Like others before them such as IBM and Oracle, it appears that with the 2500 model Teradata has done nothing more than cobble together a collection of elements from the company's model 5500 systems, repackaged and sold as an appliance.

 

 

Powerful. Um, How's That Again?

 

 

And while anyone who is serious about the appliance segment of the data warehouse market (like Netezza) has focused on delivering systems that can scale to highly complex, enterprise-wide, high performance systems, we think the 2500 will struggle to deliver even modest performance for just 6 TB in a single equipment rack.

 

 

While Teradata is quoting just over 6 TB of user capacity per two nodes in this new system, let's remember that they have been advising customers for the past year not to put more than 1.5 TB against each of those same dual-core CPU nodes. Which is it? Is the 2500 underpowered for its 6 TB data capacity per dual-node rack, or has Teradata been advising its model 5500 customers to pay at least 2X too much for their data warehouse systems for the past year?

 

 

Time will tell whether Teradata has made other compromises to the 2500 model in an attempt to limit its impact on its flagship products (5500 and the new 5550). Beyond its underpowered nodes, have they sacrificed anything else like workload management or system availability, or even the system's ability to handle highly-interactive, operational applications? As the days and weeks help raise the shroud covering the model 2500 further, we'll know more. For now though, it just feels like "me-too" imitation.

 

 

 

 

0 Comments 0 References Permalink
0

by Jit Saxena, Netezza Chairman and CEO

"If you are a Big Dog and you are not persuaded by data, then in God we trust...but everyone else, bring data." - Jane E. Shaw, retired Chairman and CEO of Aerogen, Inc. and current member of the Intel Board of Directors (quoted from PowerSpeaking Inc.)

More and more companies recognize the power of analytics as part of their competitive strategy. But most solutions only provide a glimpse of what can be achieved. What is the potential impact when performance barriers fall away? In this post, I’d like to explore the possibilities and introduce a few examples of companies leveraging the intelligence in their data in new and unexpected ways. After all, competing is good, but winning is better. In finance, the term arbitrage refers to the ability to find and exploit market disparities (hedging strategies monitoring currency or securities fluctuations being prime examples). Most arbitrage opportunities are very time-sensitive - you have to recognize value in an overlooked stock then swoop in to buy it before others take notice, get the same idea and drive up the price. On Wall Street, an arbitrage virtuoso, able to consistently spot untapped potential that others miss, is worth his or her weight in gold.

Leaping through Tiny Windows

 

The term Information Arbitrage has many similarities to its finance equivalent, and it’s a good way to think about the impact that analytics can have on a company or even an entire industry. Information arbitrage is about finding game-changing intelligence buried in vast, unappreciated data assets, and exploiting it to leap ahead of the competition. Like a financial investor, the Information Arbitrager takes advantage of an opportunity before the window slams shut (which can be very fast indeed). Companies in certain industries make particularly good arbitrage candidates. These are companies dealing with "Big Data" - tera-scale or even peta-scale databases, and a constant flood of incoming data. Telecommunications, eBusiness, RFID retail applications and online advertising are a few segments that come to mind. Often the operational data is changing very quickly, and key insights are only found at a very granular level. Now suppose this normally takes hours or days, and one company can suddenly do it in minutes, seconds or even sub-seconds. As Netezza customers well know, this kind of intelligence disparity can have dramatic implications, both for that company and its market.

 

 

 

 

For example, telecommunications is a high-volume, low-margin business. Constant changes in network utilization demand real-time decisions about rating and pricing structures for an operator to stay competitive. By running pricing scenarios against billions of call data records, and by examining individual customers to determine their current calling patterns and preferences, iBasis, a major telco provider, knows exactly what options to offer each customer. In contrast, competitors might only see that customer as part of a larger segment measured at some time in the past, and come up short with their offers and pricing.

 

 

There are several challenges to Big Data analytics that make arbitrage opportunities hard to pursue. Predictive modeling, optimization and other analytic applications are much more processor intensive than the SQL queries used in standard business intelligence applications. When complex algorithms and gargantuan databases converge with real-time business demands, something usually has to give.

 

 

Many companies find they are unable to fully exploit their growing data holdings, and have to make do with sampling or high-level summaries rather than the complete, granular data they often want to examine. But using partial or high-level data can be dangerous; even the most powerful algorithms can suggest spurious or meaningless conclusions when they are applied to insufficient data. Companies may also lose hours offloading data from the data warehouse to an external cluster of processors to run the analysis. With all these approaches, the result is an incomplete solution that provides just a hint of the possibilities of analytics, because that’s all the current technology is capable of delivering.

 

 

Consider the problem of optimization, for example. Optimization solutions play a key role in helping companies target the right customers, make the right offers, determine manufacturing volumes or accurately price products to take full advantage of market conditions while minimizing expenses. Depending on the problem being addressed, an accurate optimization solution needs to account for many variables and constraints such as products, branches, budget, time, contact channels, offer history, market segmentation and privacy preferences, to name a few.

 

 

Due to the multiple permutations and combinations among the different elements, even a simplified optimization model limited to only a month of data, a thousand customers and ten different offers results in an astronomical solution search space of 2 to the power of 10,000. Just to put things in perspective, the number of atoms in the observable universe is about 10^81, just a few more variables away.

 

 

The "Big Math" at the heart of this kind of analysis pushes most processing technology to its limits and beyond. As the number of variables and restrictions increases linearly, the algorithm amplifies exponentially, often reaching the complexity class NP-Complete. As a result, companies are forced to compromise in the thoroughness of the analysis and/or the response time they are willing to tolerate. Most optimization efforts look at small snapshots of the total data available (for example, only the last month’s data), and make use of a range of techniques such as Linear, Dynamic and Integer Programming, Lagrange Multipliers and Cluster Analysis that reduce the level of complexity in various ways, all in an attempt to reach an actionable result in a realistic timeframe. But even with these approaches, companies are faced with costly infrastructure requirements, incomplete views of their data and lengthy response times resulting in stale data or missed arbitrage opportunities.

 

But what if you could bypass the existing performance limitations and get crucial intelligence much faster than before? For example, what if a database marketing company could use complex algorithms to get accurate optimization results days before the market could adjust? Or a retail franchise could precisely adjust the prices of thousand of products daily for each of its stores? Or a credit card company could run customer scoring algorithms one hundred times faster than its competitors? Or a financial services firm could run real-time Monte Carlo simulations on terabytes of data to manage risk? What impact could advantages like these have on a business? It’s fair to say the difference would be game-changing, providing a major competitive advantage and the ability to enter new markets previously out of reach.

These capabilities are not just marketing fantasies or future visions - they’re in use today.

 

Making these Information Arbitrage opportunities possible is precisely what Netezza does. Our streaming analytic appliances are built for running complex mathematical models on huge data sets, with results in a fraction of the time required by other technologies. Sophisticated analytic applications run "on stream" in the data warehouse, against all the records and detail that need to be examined. There’s no need to settle for summary data or aggregations, or ship data to another system for analysis. (We’re also constantly making our appliances better. Our recent doubling of performance is just the latest Netezza breakthrough.)

 

Through the Netezza Developer Network, we’re helping developers worldwide use the Netezza architecture to create a new generation of analytic applications that were previously impractical, unaffordable or simply impossible. When exploiting an arbitrage opportunity means leveraging Big Data and Big Math, Netezza’s streaming architecture is simply inherently faster and more efficient than other technologies. Of course, our customers already know this - and with appliance simplicity and low purchase price, information arbitrage pays off even more.

 

The bottom line is: when Big Data meets Big Math, great things become possible for our customers and their businesses, enabling them to:

 

 

 

  • Use Information Arbitrage to take advantage of time-sensitive opportunities

  • Rapidly run multiple scenarios and sensitivity analyses in near real-time

  • Make use of all the available data, all the time while their competitors are still struggling with reduced visibility from sampled or aggregated dataWhen the first Netezza appliances burst on the scene in 2002, their ability to query giant databases with unprecedented speed upset a lot of preconceived notions about the limitations of technology and what companies can do with their data. Advanced analytic applications take processing complexity to a much more challenging level, and once again the capabilities of our appliances are revolutionizing the market and capturing the imagination of our customers.

 

Jit Saxena

0 Comments 0 References Permalink
0

 


"Fig Newton: The force required to accelerate a fig 39.37 inches per sec."

from a "Wiley's Dictionary definition appearing in the B.C." comic strip, by Johnny Hart (1931-2007), cartoonist & creator of both B.C. and The Wizard of Id"

 

In the news today: FAST Engines

 

In case you missed it, today Netezza has both a new press release and a brief White Paper up on the topic of our FPGA-Accelerated Streaming Technology (FAST) Engines™ framework, a key enabler of the high performance of the NPS® appliance. What we've done is provided a little more public insight to the inner workings of the NPS system and just how it is able to provide the industry-leading price/performance that it does.

 

We're very bullish on the extensibility of the NPS system architecture, and in particular, the use of FPGA technology and the extensibility of the FAST Engine framework into the future.

 

FAST Engines (IMO, a particularly appropriate and descriptive geek-technology acronym) already help deliver the "performance multiplier" for the NPS system that we've discussed previously by removing unnecessary records and columns from a given stream of data before the system has to expend even a single CPU clock cycle or byte of memory worrying about them.

 

 

 

As you can see in the block diagram above, the five current engines included in the framework include the Control, Parse, Visibility, Project and Restrict Engines. Since they're described fairly well in the White Paper, I won't go into detail here. But I will repeat some of the critical characteristics of the FAST Engines, they are:

 

  • basic analytic functions electronically programmed into the FPGA to accelerate query performance;

  • dynamically reconfigurable — each of them can be modified, disabled or extended by the NPS system in real time; and

  • customized at run-time for each snippet executed in the SPU — each engine can incorporate parameters passed it to optimize the behavior of the FPGA for a particular query snippet.

 

From the above, what you should take away is that the hardware on each of the NPS system's hundreds of intelligent storage nodes, known affectionately as SPUs (pronounced: "SPOOz"), for Snippet Processing Units, are not just "optimally customized" for each query. Instead, as manifest in the FAST Engines, the SPUs' hardware configurations are optimally customized for each sub-step of each query, in real-time, allowing the system to maximize the streaming flow of data.

 

In parallel within the FPGA, these engines eliminate records outside of the ACID-compliant purview of a given query; project away columns that don't satisfy a given SQL statement's clause; and the restrict away rows that don't satisfy the statement's WHERE predicate. All done at the speed with which data is being read (or "streamed") off the disk drive on each intelligent storage node in the Netezza system, and replicated in parallel across hundreds of those nodes.

 

As a result, the remaining data stream for on-going query processing is typically reduced by 95% or more before it needs to be interrogated any further by the CPU on our intelligent storage nodes, or moved from one node to another. That translates directly into performance acceleration.

 

 

Want to rev up your FAST Engines? Install a turbocharger!

 

So where do we take this next? Well, for starters, Netezza will essentially be providing a "turbocharger" for our FAST Engines framework.

 

What do I mean by that? Perhaps this quote will help:

 

 

 

 

 

"turbocharger""The turbofan compresses the air fuel mixture so more molecules are squeezed into the cylinder. When the mixture is ignited, more energy is released. Thus, a turbocharged engine will provide more shaft work out than a naturally aspirated engine of the same size.

 

<snip>


"The advantage of a turbocharged engine is that about 35% more work can be done by a turbocharged engine as compared to a naturally aspirated engine of the same size.

--from a primer on Natural Gas Engines.

 

There's only one thing wrong with the above quote. The new addition to the NPS system's FAST Engines framework doesn't just boost performance by 35%; it could boost streaming query performance by as much as 100-200%! Because that's the potential upside performance customers are going to see with new Compress Engine that is being added to the FPGAs.

 

Rather than the cumbersome, compute-intensive compression efforts employed by other vendors to reduce disk usage that also result in reduced performance, the Compress Engine boosts performance by decompressing data inside the FPGA as fast as it streams from disk.

 

As data is written to disk (e.g., during data load, insert or update operations) it is compressed into a compiled format, column-by-column with the original data replaced by the Compress Engine "instruction set" for decompilation. Then, when data is read from the disk, the Compress Engine reads its instruction set and reassembles the original data as it streams from the disk, effectively raising the streaming data rate by as much as 200% - lifting the effective scanning rate per SPU node from over 60 MB/sec to approximately 200 MB/sec. With 108 active SPUs doing this in parallel in each rack of the NPS system, that's the equivalent of a persistent (i.e., not 'burst') scan speed of about 70 TB/hour per rack, or well over 500 TB/hour for today's largest NPS system configuration, the 8-rack NPS 10800.

 

 

And that's not all, folks!

 

The FAST Engines framework is extensible into the future - and we're already hard at work looking into things that will rev up performance even further, extend the applications set of the NPS appliance more broadly or both. Again, the White Paper sets out what some of these are in fairly clear language so I don't need to repeat it here.

 

Wherever the evolution of the NPS appliance takes us, we're very bullish on the notion that the performance acceleration and potential to extend the application space that FPGA provides will give Netezza that much more headroom in maintaining its leadership position in the market.

0 Comments Permalink
0

 

November X, 2007

Issue 14: Supercomputing Conference Brings Life to Reno

 

 

"Computing is not about computers anymore. It is about living."

 

 

- Nicolas Negroponte

 

 

Negroponte's words certainly rang true, in more ways than one, at the recent SC07 conference in Reno. In a city known for little other than its lackluster "old Vegas" character, its proximity to Tahoe and the Reno 911! Comedy Central series (which is actually filmed in southern California), the Supercomputing conference breathed new life into the city from November 12-15, while providing extensive displays of technologies that are changing many facets of life as we know it.

 

 

Moments after stepping off the plane, the fluorescent "Welcome to SC07" signs that followed me from the Reno airport all the way to my hotel made it clear that this conference was going to be a pretty big deal. My intuitions were confirmed upon entering the Reno-Sparks convention center, with an impressive array of meeting space, food stations, souvenir "shops" and an exhibit hall encompassing 200+ rows of endless exhibitor booths. The annual Supercomputing conference does a great job of bringing together growing numbers of cutting edge technologies, commercial enterprises, government organizations, national labs and graduate students, year after year. With its extensive technical program consisting of several awards, Birds-of-a-Feather sessions, "[challenges|http://sc07.supercomp.org/?pg=challenges.html]" and more, along with a plethora of industry and research exhibits and sessions, the show continuously delivers something of value to everyone who attends. The result: increased momentum and record-breaking numbers of both attendees and industry exhibitors at SC07. The collaboration of masterminds from all different backgrounds, locations, age groups and organizations produces an amazing fusion of ideas, solutions, partnerships and technologies.

 

 

Netezza participated in the conference with booth space in addition to a session in the SC07 exhibitor forum, featuring our very own Justin Lindsey along with John Johnson from Lawrence Livermore National Lab (better know as LLNL). Despite the modest space Netezza occupied with our 10x20' booth in the massive exhibit hall, our presence felt considerably larger as we attracted a continuous stream (no pun intended) of visitors throughout the week, including those who had never heard of Netezza along with those who'd set out on a resolute mission to find our booth.

 

 

Our bright purple SPUBox, featuring an animated motorcyclist and "Netezza Speed" written in graffiti, was by far the biggest draw to the booth. It was exciting to see so many passers-by stop for a closer look at the SPUBox, often asking a question or two about the box, how they might get their hands on one, and what exactly the Netezza Developer Network is all about. Several visitors applied to join the NDN right then and there, hoping for a chance to win a SPUBox at the end of Justin and John's exhibitor forum session. All the NDN buzz created quite a "Netezza high," if I do say so myself.

 

 

 

 

0 Comments 0 References Permalink
1

 

Issue 14: And then there were none...

 

<font color="#008000">by Vishal Daga - Netezza, Director

of Partner Marketing</font>

"Danger and delight grow on one stalk." -- English Proverb

 

With the acquisitions of Hyperion, Business Objects and Cognos this year, the BI landscape has finally taken a turn that many have predicted for some time now. Given the growing prevalence and importance of BI, and the consolidating software ecosystem, this has not been a stretch prediction to make, by any means. The question is, what now? Is this a good thing or bad thing for the BI user? The answer, as to most things in life, is a bit of both in my view, and only time will really tell.

 

 

First, the potential for good. The tighter integration possibilities that this consolidation creates between the BI apps and the other components of their parent company's product portfolios -- including ERP and data integration applications, middleware technologies, and/or databases -- could ultimately result in a much richer and overall seamless experience for the BI user. For example, this shift has the potential to catalyze the adoption of BI capabilities within ERP applications, and accelerate the arrival of an operational BI experience, i.e., a BI world that does not involve a user switching to a different application and/or a reliance on power-users. Furthermore, the resources that the larger organizations can bring to bear can help advance product capabilities and customer BI adoption at faster rates than what the relatively smaller companies could have supported on their own.

 

 

Now, the potentially not so good. With size, comes, well, size, and that many times can be not such a good thing. The distractions, conflicting priorities and layers of bureaucracy that come along with size will make it harder for the BI businesses to be managed effectively inside of the larger organizations. It's not a coincidence that innovation and product adaptation to changing market needs are usually driven by smaller, more focused companies that are highly motivated to be entrepreneurial. In addition, the biases that the new organizations will create towards preferential integration with their own portfolio technologies can dilute the independence that most customers need and demand. If this happens, then customers will face hurdles in deploying best-of-breed technologies that are best suited to their needs.

 

 

Looking forward, whether the positives end up outweighing the negatives is something that remains to be seen. It's a promising sign that many of the acquirers have committed to keeping the BI companies as independent operating entities. In large part, how things ultimately net out will depend on the discipline that the acquirers demonstrate in adhering to this strategy. The good news is that while it's still very early in the game, based on our relationships and interactions with the BI players since the announcements, there are reasons to be optimistic. Perhaps you just can have your cake, and eat it too.

 

 

Vishal Daga

 

 

1 Comments 0 References Permalink
1

 

"What Netezza is doing is... going a step further: score the data as it is streamed into the appliance and before it even hits the database... However, it is not just the performance gain that is significant. This initiative means that developers are embedding analytic software into the Netezza Data Warehouse Appliance so that it becomes, in effect, an application appliance."

 

 

Philip Howard, Director of Research, Technology, Bloor Research - from his 5th October posting, "The Netezza Developer Network"

 

 

Pardon the title's riff on the late-1970s Elvis Costello hit song What's So Funny 'Bout Peace, Love and Understanding, but a recent mini-dustup got me to thinking about providing a bit of insight into why Netezza's approach to "Streaming Analytic™ Appliances" is different from others' entries in the market. It seems the recasting of Netezza's mission in terms of streaming analytics rather than the more-limiting data warehouse appliances, along with the launch of the Netezza Developer Network (NDN), has caused something of a hullabaloo among some of our competitors (refer to recent stories from Teradata/SAS, Greenplum, IBM/SPSS and industry analyst, Curt Monash).

 

 

And well it should. While some would seem to declaim Netezza's positioning on the topic as 'nothing more than UDFs', and argue that what matters is supporting them effectively, we must beg to differ (and to differentiate). In short, we feel the Netezza approach to Streaming Analytics opens the door to dramatically change the way data warehouse systems are viewed, used and even deployed.

 

 

 

The positioning of some of our larger, (recently) publicly-traded competitors may suggest that they see themselves not just as expert in the domain of data warehouse systems, but also as experts in the ways of CRM, advanced scoring and analytics, etc. They seem to have bolted on homegrown software packages as extensions of their data warehouse offerings in the market. That may well be the case, but we don't really see how it's possible for one vendor to "corner the market" on innovation - a view that we think is borne out in recent announcements of closer UDF-based partnerships. Still others, more from the new-entrant category, claim that the only thing required is simply to support basic UDF functionality as an extension to the database. We think both ends of that argument are incorrect.

 

 

 

Instead, we at Netezza think it best to "stick to our knitting". Our aim is to provide the high-performance infrastructure along with a technical and community foundation to enable others much more expert than we are to drive the algorithmic and application-level innovation by their ability to exploit the performance of our streaming analytic appliance. To again provide a riff on something from the late-70s and early-80s BASF ad (a campaign that has recently been rekindled in that company's marketing), our vision could be summarized as, "We don't make your advanced applications; we make your advanced applications 'run like rockets'**."

 

 

 

** "Raw SPU functions are called like any other SQL function... ...and they run like rockets by exploiting the Netezza architecture."

 

 

Justin Lindsey, Chief Technology Officer, Netezza, speaking at the 2007 International Netezza User Conference, 26th September, 2007

 

 

<font color="#008000">What Netezza Provides</font>

 

 

What Netezza provides in this mix is an extremely high-performance system, particularly well-suited at storage-intensive operations (like data warehousing) and in particular, operations that (like data warehousing & BI) can benefit from a data streaming architecture in which critical reduction of unnecessary data can be accomplished as rapidly as it is read from the storage elements - allowing for greater processing efficiencies. We've written extensively about this before (see Spotlighting FPGAs, parts one, two and three) and won't repeat the arguments here.

 

 

Another key lever that Netezza provides by way of the NPS® appliance is the fact that our intelligent storage elements known as Snippet Processing Units (SPUs) are really each compute nodes. They are capable of running compiled C or Java code, with the added task-by-task "customizability" of an FPGA that can further accelerate performance, operating in an MPP compute grid but with the simplicity of the our appliance approach.

 

 

 

Consider this: if those 100s of SPUs in an NPS appliance could be used to run C code to execute SQL query processing tasks, why couldn't they equally be tasked to perform tasks that go well above and beyond those enabled (encumbered?) by the set-based, structured-data logic of SQL? Where others may use UDF or even UDA functionality in the data warehouse systems to collect up and standardize use of SQL functionality across users, the streaming analytics enabled by Netezza allows users to "draw outside the lines" of SQL.

 

 

 

Another thing Netezza provides for NDN members is a set of some basic building blocks - functions and an algorithmic work area that form the foundation for more advanced work to be produced. In so doing, some of these appear to be greatly in common with the standard fare of "traditional" SQL functional extensions: record-level functions or User-Defined Functions (UDFs) and aggregate-level functions or User-Defined Aggregates (UDAs) are part of the foundation. But some of the other parts go far beyond those definitions allowing for developers to implement functions retaining a sense of state or to cascade multiple complex algorithmic processes to build even more powerful solutions, all making use of the streaming nature of the NPS analytic appliance to push performance even further.

 

 

And finally, what Netezza provides is a simple development appliance platform on which NDN members can develop and verify their algorithms, including the performance impacts of operating in parallel. Affectionately known by the decidedly non-marketing name, SPUBox, the platform is a fully-functional version of the NPS appliance, with four Snippet Processing Units, a host processor and network connectivity. Weighing in at a little over 40 pounds (18 kilos), one might call it a "0th generation luggable analytic appliance" but one that only consumes about as much energy as two 75W light bulbs. We granted more than ten of them to new NDN members at our September global user conference in Boston, some with special decal "wraps" to stand out above the ordinary compute platforms you may be used to.

 

 

 

 

<font color="#008000">It Takes a Vilage...</font>

 

 

I think the word potential is important when applied to streaming analytics, because what we're doing is opening the door to the potential of the data warehouse to be used in a very different way. Those extended uses are being made possible by Netezza and our community of users, developers and partners that is being fostered and growing, virtually with each passing day.

 

What we are seeking to unleash is a new level of performance and innovation in the use of storage-intensive analytical computing, but the important bit is that Netezza is not looking to do this alone. In fact, we do not picture ourselves as having cornered the market on analytical algorithm writers. Instead what we launched with the NDN is intended to evolve as a cooperative and competitive global web of experts who will build on their own and one another's innovations. Here, the term, "coopetition" seems trite; I'd prefer to think of the NDN as an opportunity for innovative "mashups" at the building-block, advanced algorithmic and applications levels.

 

 

 

Foundational elements are used together to enable basic value-add functions to be built. Those are mixed and matched, typically but not always with standard SQL fare, to enable more complex algorithms to be realized. And, in turn, the algorithms enable very high-performance applications and new uses of the NPS appliance to be realized. In some cases, one entity may do most or all of the above work. In many others, we are already seeing cooperation among members to use and reuse modules developed elsewhere to extend the capabilities.

 

Like what has been accomplished by artists with a humble plastic child's toy such as the Lego, these capabilities can be mixed and built-upon to create innovations we may not even be able to imagine today.

 

 

 

 

The opportunity is helped along by the network effects of an open community and members (by last count, in excess of 50 spread around the world) spanning entities from university professors and graduate students to BI applications providers to end-customers of the Netezza Performance Server? appliance - and everything in between. This is where the true industry expertise lies. This is also the source of innovation for what can be possible with the opening up of Netezza's architecture to more than "just" data warehouse and BI.

 

 

 

Where will all of this lead? To advanced text, image, bioinformatics or video processing? Perhaps. Into the domain of the 'what if' Monte Carlo or Genetic algorithm simulations for risk analysis and predictive resource optimization? That's another possibility. But we're confident that people are going to use the NPS appliance in new and innovative ways as a result of Streaming Analytics and the NDN - and in ways which may well help shape the features and functionality of the appliance in releases to come.

 

 

<font color="#008000">What's So Special?</font>

 

What's so special about all that? Well with these foundational building blocks, imagine being able to develop customer- or threat-scoring algorithms that could be accomplished in as little as one pass through record data in a data warehouse instead of multiple passes required to denormalize or pivot data, or worse still, large extracts of the data from the warehouse to an off-board computing complex in order to perform the denormalization and scoring tasks. What if this single-pass technique yielded a 10X speedup in processing? What if it could be more than 100X - perhaps even allowing a task that formerly was accomplished in over 10 hours to be done in less than 20 minutes? Might that change the way that particular analytical task was used? Might that change someone's business? We think it could. More importantly, so do many of our customers, partners and prospects.

 

 

 

To date, the Netezza Developer Network has dozens of active partners participating in the program globally, with more than 100 applications to become part of the program pending [note: if you're thinking of your own really exciting "on stream" application ideas, you can apply online at http://www.netezza.com/ndn]. We think from the combined innovation and expertise of this group, the NDN has the potential to take the NPS analytic appliance to new levels of performance and new applications domains that will continue to include, but may go far beyond, the standard Data Warehouse Appliance of our roots.

 

 

1 Comments 0 References Permalink
0

<font color="#ff9900">by Ellen Rubin - Netezza, Vice President of Marketing</font>

"The best vision is insight." - Malcolm S. Forbes, former publisher of Forbes magazine (1919-1990

Leadership conferences are generally a mixed bag. They tend to take on weighty topics and raise interesting questions, but have a tough time providing any real insights or doing more than re-hashing mass-market ideas. The Forbes Leadership Networks Forum in Chicago this past week seemed at first glance like it might fall victim to this tendency. The title of the event was "America the Innovator: The New Rules of Global Market Growth," and the marketing claimed that it would help attendees learn about a staggering range of subjects, including but not limited to:

 

  • Global disruption by fast-growing economies like India and China

  • America’s role in this global economy

  • Innovation for Corporate America

  • Using analytics for competitive advantage

  • Social networks as the new holy grailI was exhausted just reading the brochure.

 

Happily, the conference included some terrific speakers who managed to provide hours of real insight and entertainment, and to stimulate lots of discussion among attendees. Steve Forbes, President & CEO of Forbes, former candidate for U.S. President, and now co-chair of the Rudy Giuliani campaign, kicked off the day and had a lot to say about America in the global economy. He stated his positions and biases upfront, including the need to lower taxes to make the U.S. more competitive for corporations, and to open immigration and get better at "letting in non-terrorists." Whether or not these opinions resonated with everyone in the room (doubtful), Steve was very clear on his bottom line: the only way for America to succeed is for its companies and institutions to become even more innovative and stay on the cutting edge.

 

To explain how,[Professor Clayton Christensen|http://www.claytonchristensen.com/] - world expert on disruptive innovation - took the stage and wowed the audience for the next couple of hours. Professor Christensen has written several famous books on the subject, including The Innovator’s Dilemma and The Innovator's Solution. At last year’s Netezza User Conference, he was a keynote speaker and dazzled the crowd with his vision, brilliance and dry wit. I won’t restate some of his widely-known ideas and insights, but I wrote down a short quote that resonated strongly for Netezza: "A disruptive technology is one that simplifies a complex problem."

 

 

Professor Christensen also shared some fascinating and contrarian views about Apple and the Harvard Business School (where he is a professor, but not afraid to say some unpopular things, it appears).

 

 

On Apple: they may be on top of the world right now, with over 100 million iPods sold and a great stock price, but they’re being disrupted by non-proprietary, standards-based, inexpensive mobile phones. Apple has won so far in the early stages of market disruption through its integrated, proprietary approach, with iTunes and all its sleek and beautiful products. Over time, however, Christensen predicts that the mobile phone players will carry the day and to compete, Apple will need to embed itself inside (à la "Intel Inside") and let the other devices pull content from iTunes. He feels the real opportunity is in the personalization of content from iTunes, not the devices themselves, although Steve Jobs hardly seems likely to agree.

 

 

On HBS: Professor Christensen said that a common problem that limits corporate innovation is that companies optimize their performance based on Wall Street driven statistics, such as gross margin percentage, that turn out not to be the ones that matter for their competitive survival. In the case of gross margin, this drives companies to build a broad range of product lines of high-margin niche products and to rule out innovative new products and technologies that don’t meet the hurdle rate. In the case of HBS, the "wrong statistic" is the high starting salaries of HBS graduates. Although this metric gets HBS ranked as the top business school on many lists, it is in fact making the graduates too expensive for most potential hiring companies. As a result, the hiring companies are forming their own corporate "universities" that allow them to hire cheaper talent and train them for specific corporate skills and knowledge. This has led to dramatic reductions in the number of applicants and recruiting companies at HBS. As an HBS alum, I had to chuckle at the thought of Professor Christensen presenting these ideas and meeting with stony silence in some oak-paneled room.

 

 

Next up was Professor Tom Davenport, world expert and author on the subject of "Competing on Analytics" - something very near and dear to us at Netezza. Professor Davenport lectured at Netezza University, our continuing education program, and his ideas have become a mantra for us as well as many other vendors and corporations in the world of analytics, including SAS, Accenture and others. Professor Davenport’s main thesis is that companies that use analytics for strategic competitive advantage outperform those that don’t. A common tendency in Corporate America is to[, by Malcolm Gladwell|http://www.gladwell.com/blink/] - Davenport’s wry comment: "As with overeating and other American habits, we don’t need any encouragement on this.") Instead, we need to look at examples like Amazon, Best Buy, Capital One, Google, Wal-Mart and others, where competing on analytics is the corporate strategy with commitment from the CEO down. (It was great to note that many of the companies Professor Davenport profiled as the case studies for competing on analytics are already Netezza customers!)

 

 

Academic theories are always interesting to hear, especially when presented by someone as dynamic and fun as Professor Davenport. But what made the case were the customer examples shared by him and his panel, which included Netezza customer, Rob Holland, SVP of U.S. Retail Measurement at ACNielsen (a service of The Nielsen Company), and Carol J. McCall, VP of Research & Development at Humana. Some highlights:

 

 

  • Best Buy segments their stores based on customer profiles, such as "Jill, the soccer mom," and sells targeted merchandise for the specific segments. The segmented stores earn twice as much as the non-segmented ones.

  • Humana uses predictive models for high-deductible health-care products to offer different pricing and options based on what customers need. They also use models to predict individual health, which enables them to build better relationships with customers who are at-risk and help them change their lifestyle behaviors to improve their health. These uses of analytics have returned more than $600 million to Humana and helped over 400,000 people!

  • ACNielsen analyzes data from tens of thousands of retail locations and grocery scans from over 100,000 grocery families. Based on this analysis, the company can break down the data to specific clusters of stores and tailor programs to help their retail and CPG customers be more competitive.I won’t even try to cover the other sessions or content, but it was definitely a packed schedule. The day ended with baseball; specifically, Moneyball. Wearing a Red Sox cap in celebration of the recent World Championship win, Davenport tied the day together with the story of how Billy Beane, general manager of the Oakland A’s, exploited an arbitrage opportunity by analyzing baseball statistics to find the real metrics that predicted success (batting average turns out not to matter much, while on-base percentage matters a lot). This let Beane pick the undervalued players and compete effectively against much-wealthier teams. (At the risk of being repetitive, Billy Beane was the keynote speaker at Netezza’s first user conference; we definitely have hit the trifecta with great speakers on innovation and analytics!)

 

In a sense, you could boil the whole day down to one major point: Make analytics a key aspect of your corporate strategy and leverage data to determine the critical metrics for your business - and don’t delay. As Davenport recently told a crowd at the SPSS user conference, "There's not much time to spare because somebody's going to become your analytics competitor." Or better yet, in the words of Bill James, the sabermetrics genius who inspired Billy Beane:

 

"There will always be people who are ahead of the curve, and people who are behind the curve. But knowledge moves the curve."* *

 

 

Ellen Rubin

 

 

0 Comments Permalink
0

by Ellen Rubin - Netezza, Vice President of Marketing

 

 

"Hang on - It's starting again
Hang on - There's no shelter from the wind
Hang on - Like a fire from the sky
Winds of change are blowin' by"

- closing chorus from "Winds of Change", by The Jefferson Starship and Grace Slick, 1982
[[click here for YouTube video]|http://www.youtube.com/watch?v=Z50G07A6HzE]

Witches, Elvis impersonators, bikers and the odd clown rush by, while outside, strong winds are blowing and the sky is dark and stormy.

 

Yes, it's Halloween at The Data Warehousing Institute's Orlando conference, and as 600 attendees try to celebrate without their families (actually, no one seemed too upset), Hurricane Noel is blowing in.

In fact, for the data warehouse community the storm has already hit. The appliance revolution has taken place and the impact is causing some extreme after-shocks. The talk around the halls was about appliances, and there were two half-day courses dedicated just to this topic. One was led by Richard Winter and Rick Burns of WinterCorp, providing an overview on appliances as well as a comparison of the different architectures and products on the market.

 

There’s certainly a need for this kind of information. Since Netezza launched the appliance category, built a community with well over 100 large customers and became publicly-listed, pretty much every major vendor in the industry has launched its version of an appliance, and at the TDWI show, several new players were clamoring for attention. It's pretty confusing for people who are just beginning to consider the appliance approach. Unlike the attendees at Netezza user conferences and events, the typical TDWI attendee is probably thinking something like this: "Boy, I’ve been hearing a lot about appliances lately, there seems to be a lot of news about them from TDWI and in the press, and Gartner says they're becoming mainstream. I better find out more about what they really can do and whether I need to think about this for my organization."

 

I guess when you've been evangelizing about the appliance concept for more than five years it's good to be reminded that in many ways, this is still a new frontier.

 

Back at the WinterCorp course, Richard talked about the market trends that have created a need for appliances, and made the point that "what you really want is to answer any question on any level of your data at any time." I couldn't agree more. Actually, he told a joke that really made the point: A doctor, a lawyer and a statistician went deer hunting. The doctor shot at a deer and was two feet too high and to the right. The lawyer shot and was two feet too low and to the left. The statistician said, "No need for me to shoot - according to the statistics, I've already hit the deer!" In case you missed the punchline, Richard added, "Sometimes, highly aggregated data does not get you the right answer for many business questions." Again, couldn't agree more: all the data, all the time is what appliances are about.

 

I also had a chat with Wayne Eckerson, a leader at TDWI and expert on predictive analytics. I was describing how, through our Netezza Developer Network, Netezza is opening up our appliance to developers all over the world who are doing new and cutting-edge analytics "on stream," leveraging Netezza’s streaming architecture. Wayne pointed out that there has really been an evolution over time from purpose-specific desktop systems just for the analysts in an organization who do the heavy quant work, to now, embedding some of that functionality in the data warehouse, and eventually, being able to combine it with the more traditional reporting and analysis work done by BI users. The new frontier in appliances is all about this broader role of analytics - including more groups of users, types of data and analytic algorithms - that can be done "inside the appliance," and as usual, Netezza customers and partners are at the forefront of the revolution.

 

 

Ellen Rubin

 

 

0 Comments Permalink
0

by Vishal Daga - Netezza, Director of Partner Marketing

 

"You have a little bit of talent, a certain amount of good fortune and a lot of hard work in pursuit of whatever truth you can find in it, and if you are really lucky, a terrific partner and I have that and those four things worked out for me."
- Donald Sutherland, film and television actor and star in such hits as The Dirty Dozen,
MAS*H, Cold Mountain and Pride and Prejudice (1935-)

Netezza was a sponsor of the recent Business Objects Insight User Conference in Orlando, and I wanted to share some of my quick impressions of the event.

In listening to the executive keynotes and various track sessions, what crystallized for me was a new level of appreciation for Business Objects’ (and in general the BI industry’s) drive towards improving the BI user experience. I refer to ‘user experience’ in a very broad sense here encompassing themes such as functionality (e.g. allowing users easy access to unstructured data alongside structured data), ease-of-use (e.g. interface design and adoption of data visualization techniques), flexibility (e.g. better integration with other desktop tools such as Adobe, new business models to consume BI as a service) and performance (e.g. use of smart caching). This to me, more than anything else, marks the impending arrival of mainstream operational BI. As BI adoption and penetration in organizations increases, there is real demand/need to expand the use cases that the BI tools can address, while minimizing the level of user expertise required to operate the tools.

And the SAP acquisition of Business Objects also provides a glimpse into the future of operational BI. With Business Object’s BI expertise, and SAP’s know-how of the world of business processes, these companies have the opportunity to accelerate the creation of an entirely new functional experience for the organization that combines their areas of expertise in one seamless user experience, and renders BI truly operational.

Something else also struck home for me. This is more of an observation of what I thought wasn’t said, or better put, wasn’t underscored with enough emphasis. While it’s really exciting to see some of the new functionality that is in development and think through the possibilities, BI users today are hindered by significant challenges. These manifest themselves in several forms – slow running reports and queries, inability to access data at the right depth to run analyses that are of interest and long delays associated in waiting for the technical infrastructure to be adapted to changing business needs. Across the Netezza customer and prospect base, and regardless of BI tools in use, these are themes that we hear over and over again. As I sat there and listened to the demos and presentations of what’s coming tomorrow, I could not help but think about how much more powerful the message could be, if users weren’t stuck struggling to use what they had access to today.

The point above may seem a bit Netezza or database-centric, but I had the chance to talk to several people who attended the conference and they all shared the same feeling on one level or another. At the end of the day, the BI tool sits on top of a database, and if the foundation is weak – i.e. the database cannot keep up with the user – it becomes the choke-point. The only options at the tool level then are for the BI tool to work around the database (drives complexity and cost), and or impose analytic limitations on the user (dilutes the value of BI). The tradeoffs inherent to these approaches are very limiting and/or in-efficient and cannot scale effectively.

Looking forward on the BI horizon, the mainstream arrival of operational BI will add to the existing challenges that organizations face as the number of users increases. Therefore it will become even more imperative for organizations to think through their end-to-end BI infrastructure, database to user BI tools and ensure that the right pieces are in place to capitalize on the true promise of BI.

Vishal Daga

]]>

0 Comments Permalink
0

"The 'Core' of the Matter"

 

"Give me insight into today and you may have the antique and future worlds."
- Ralph Waldo Emerson (1803-1882), lecture, August 31, 1837, delivered before the Phi Beta Kappa Society, Harvard University

In my last post, I covered the kickoff of the 3rd annual International Netezza User Conference (NUC07) sessions. This posting and one to follow it will cover days two and three, hopefully giving readers a sense for the high-level discussion points and content of the conference.

By early Tuesday morning, nearly all of the 500 attendees to this year's user conference were on scene and by the time we began the morning's activities, there was a buzz of excitement in the air. Some people had already seen some of the things Netezza was unveiling around Streaming Analytics with an early-morning tour of the "Netezza Developer Network Showcase" area and were anticipating what might be discussed in terms of Netezza's "Company Vision" and "Technology Direction" presentations to follow.

Day 2: Tuesday, 25th September
Tuesday morning's formal agenda kicked off with an opening address from Netezza President and COO, Jim Baum. Jim provided an overview and insight into Netezza's vision for what is possible in analytic appliances, now and into the future.

Jim Baum
Jim Baum

With a combination of statements about present trends, a vision for the future and some interactive, live demos, Jim talked about "the art of the possible" and made the case for a new approach spanning "traditional BI" and "mission critical analytics". He discussed Netezza's family of streaming analytic appliances in a vision fulfilled by Netezza's work with partners and others in the broader Netezza Community.

Bob Doyle
Bob Doyle
Bob Doyle

The live demos spanned scoring, geospatial and image analysis and Jim's demo even managed to help nab the "culprit" of SPUBox-gate, catching Bobby "White Shoes" Doyle in the act of making off with a SPUBox (more on them later) - as you can see in the above three clippings from video that captured the act in progress. Nearly all of them came off without a hitch and the audience was treated to a bit of just what is possible with streaming analytics.

Michael Sporer

Following Jim, was VP of Technology, Michael Sporer with his presentation of technology direction and how it may impact Netezza's product portfolio direction in the days ahead. Michael provided the perspective of the keystone hardware, software and networking technologies and innovations on which Netezza relies, what their relative potential was for advancement over the next several years and the influence we anticipate them having not only on Netezza's product direction, but more importantly on emancipation and dissemination of advanced analytics from the cloistered sanctum of the data center to the edges of the enterprise.

Most of the remainder of Tuesday was filled with business and technical track sessions, provided by Netezza customers, partners and employees. But a room that received a lot of attention throughout the day on Tuesday and into Wednesday was the Netezza Developer Network (NDN) showcase. The showcase, with its ten SPUBoxes suspended from neo-industrial scaffolding, was transformed from a store of potential energy around streaming analytics in the early morning hours to a bustling center of kinetic activity and energy by noon. Attendees to the conference were able to see, first-hand some of the ideas and applications that are part of the Netezza Developer Network spanning ten different functioning demonstrations.

NDN-before
NDN-after
NDN-after


NDN Showcase - Before & After

After the morning general sessions had concluded, we spent the bulk of the remainder of the day in business and technical track sessions. Customer case studies were presented by eight customers, including Virgin Media, Nationwide, NYSE Euronext, Ross Stores and Guy Carpenter alongside four partner case studies presented by Business Objects, MicroStrategy, Cognos and Unica. These business tracks included information about customers with the following characteristics:

  • loading billions of rows of data per day, in excess of 1 TB in total
  • growing data volumes at 100%-to-200% annualized rates
  • realizing 10s of millions of dollars in revenue returned to their businesses
  • supporting up to 10,000 users accessing the NPS appliance
  • slashing SLA data availability times by over two hours
  • performing near-real time operational analytics on data loaded every five minutes into the data warehouse
  • making real-time least-cost traffic routing decisions based on the freshest possible data
  • delivering new enterprise-wide applications & reports at better than a two-per-month rate
In addition, Netezza staff - in some cases with customers and/or partners jointly presenting - hosted eight technical track presentations. These spanned current capabilities, such as how customers are using the NPS appliance in large, innovative, operational BI deployments, to recent Release 4.0 enhancements and performance measurements to best practices in migration strategies to Netezza. These sessions even included an in-depth view of a moment in the (brief) life of a query inside the NPS system and Netezza's near-term product direction.

One of the pleasant surprises, for me at least, was the level of interest in the technical track sessions, many of which were filled to capacity. Based on this and feedback we received from the conference, I'm sure we'll be looking at ways to provide more possibilities for attendees to attend more of the technical track sessions in 2008.

We finished the day Tuesday with the party/event of the conference as attendees were taken by motorcoach to Cyclorama in Boston's South End for a night of food, music and spirited participation in the video games. The games included virtual skateboarding, auto racing, skiing and several instances of Nintendo Wii games of golf, tennis, baseball, bowling and boxing. All told, at least five Wii consoles along with other sundry electronic "tools" were given away at random to lucky winners that night.

 

Cyclorama
Cyclorama

Cyclorama
Cyclorama

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

One of the most enjoyable parts of the Cyclorama event for me was watching the various interactions of people with the Wii electronic game "appliance". Because of the easy interactivity it enables, this device is disrupting the video gaming industry - and its captivating attraction was quite evident at our event. Gaming "pros" and neophytes were interacting with it, and providing all the body English one would expect on a tennis court, baseball field or bowling alley. Observing the panoply of those motions across 5-6 Wii stations from about 100 feet away was pure hilarity.

Following Cyclorama, it was back to the hotel (or on to other Boston evening venues to be discovered), in preparation for Wednesday's day-long activities. More on those in my next "Gateway to Insight" posting...

 

0 Comments Permalink
0

"So Good, So Good!"

"Where it began,
I can't begin to knowin'
But then I know it's growing strong

"Was in the spring
And spring became the summer
Who'd have believed you'd come along.

"Hands, touchin' hands
Reachin' out, touchin' me touchin' you

"Sweet Caroline
Good times never seemed so good
I've been inclined
To believe they never would."

- Neil Diamond, lyrics from "Sweet Caroline", 1973

After a long, self-imposed quiet period hiatus, "Thoughts from..." is officially back on the air. We've been away for quite some time and in our absence, as you may have noticed, Netezza (ticker symbol: NZ) enjoyed a successful initial public offering on the NYSE Arca exchange back on the 19th of July. More recently, Netezza has just come off four major successes, including its 3rd annual International Netezza User Conference sessions as well as the recent launches of the NPS® Release 4.0 system software, the redefined & broadened role of analytic appliances and the Netezza Developer Network (NDN).

The Boston Globe Store [Oh, and in case I forget to mention it, we are in a bit of a celebratory mood around Netezza headquarters these days as a result of the Boston Red Sox' return as champions of baseball a few days ago. Since Sunday evening, there's been nearly wall-to-wall Red Sox news and parade coverage around here in the local media, interspersed with the occasional news about the New England Patriots. If you were wondering about the title of this posting and the opening quote, perhaps this YouTube video I found online will help: Opening Day at Fenway Park, April 2007. "Sweet Caroline" has become as much a good luck charm/tradition in the middle of the 8th inning at Fenway Park as "Take Me Out to the Ballgame" is at every park during the '7th inning stretch'.]

In coming days I'll discuss the strengths and benefits of release 4.0, Netezza's streaming analytic™ appliances and the NDN, but in this and the next one or two postings, I'll concentrate specifically on the highlights from the 2007 User Conference.

Jit Saxena - Welcome to 2007 Netezza International User Conference

There were over 500 customer, partner, prospect and analyst attendees at this year's conference (about 50% growth over the 2006), spending 2 1/2 information-packed days, learning, sharing of ideas, networking and "gaming" with us in the glorious 'Indian Summer' sunshine of late-September in Boston.

Jit Saxena
NUC2007 Registration
Sharing Ideas

Day 1: Monday, 24th September
The event kicked off with a "Welcome" message from Netezza CEO & Co-founder, Jit Saxena, followed by two strong keynote speakers:

Catherine discussed the growing global nature of NYSE's business along with the demands this globalization of exchange markets is putting on business intelligence and near real time analytics in her business. Through acquisition, partnership and organic growth, the NYSE has demonstrated significant market expansion over the past several years and their strong movement into the markets has ratcheted up their need for robust, high-performance business analytics on a global basis. She also spoke of Netezza's listing as the first technology company on NYSE Arca exchange as another sign of the increasingly robust actions NYSE has taken to bring innovative, high-tech companies into their trading platforms.

Catherine Kinney
Catherine Kinney

Geoffrey put the primary themes of his recent book, Dealing with Darwin: How Great Companies Innovate at Every Phase of Their Evolution into the framework of technology companies in his talk, entitled, "Business Network Transformation: The Next Big Challenge for IT". His discussion included the importance of understanding the difference between core- and context-level capabilities across a business' domain of goods and services and need to continue to innovate at the core while working within a broader business network to deliver on the context. Furthermore, he told the audience of 500 that it is encumbent on corporate leadership to understand that over time, everything that is "core" eventually becomes "context" and neutralized so the engine of innovation cannot sit idle; continued focus on differentiated innovation is key. "It's crucial," he said, "for companies to focus on the core to invent new, differentiated offerings and then to deploy that differentiation at scale."

Geoffrey Moore
Geoffrey Moore
Geoffrey Moore

I'll be coming back to Geoffrey's keynote presentation with a posting containing some additional reflections on some of his key points, particularly as they pertain to analytic appliances and Netezza, specifically. For now let's just say that he got the conference off to an outstanding, spot-on start for the evening, followed by noshes, drinks and networking 'by the boatload' at the Reception/Partner Pavilion just outside the main hall.

Networking with Friends Noshing at the Sushi Bar Networking

My next posting will cover the "core" of the conference - days two and three - with some of the highlights from the more than 30 Netezza, guest, customer and partner presentations that were covered.

]]>

0 Comments Permalink
0

by Vishal Daga - Netezza, Director of Partner Marketing

 

"Give me golf clubs, fresh air and a beautiful partner, and you can keep the clubs and the fresh air." - Jack Benny, comedian, author and actor (1894-1974)

There are a few partner user conferences and groups coming up, which prompted me to reflect on ones we had attended at the end of 2006. Netezza participated in the Business Objects and SPSS annual user conferences and both of these events were quite successful - they sparked conversations and fostered introductions. Events like these are really beneficial because they provide a forum that lets us build greater awareness within our partner communities. Ultimately, this results in the development of stronger alliances and accelerates the development of more compelling joint value propositions that leverage the performance and/or simplicity of Netezza in new ways. I have provided some highlights from these events below:

  • Business Objects Insight Americas 2006: We had one of our appliances at this event and were able to showcase a concept solution that demonstrated what an integrated BI solution could look like. The particular solution we demoed packaged Business Objects Reporting and Analysis Applications along with Netezza's data warehouse appliance and delivered in one system a pre-integrated complete BI environment that was capable of addressing the needs of many mid-market customers. In addition, Durgesh Das, BI Manager at CompuCredit also presented a compelling case study as to why his organization selected Netezza. Durgesh touched on many user anecdotes around performance improvements and administrative simplicity that highlight the real impact of Netezza.

    The links below lead to video vignettes of CompuCredit users in different roles - Business Analyst, DBAs, IT managers - talking about the value of Netezza from their individual perspectives. Pretty compelling stuff!

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

  • SPSS Directions 2006: According to many, the next big thing in BI will be around data mining and predictive analytics - not just reporting on data but actually using historical data to predict what will happen in the future. Clementine, SPSS's data mining product, delivers optimized connectivity to Netezza today, so Clementine customers can really take advantage of Netezza's performance to tackle their predictive analytic needs. We had many engaging conversations with both Netezza and SPSS customers at this event who were looking to develop/deploy deeper more effective predictive analytic capabilities. Netezza and SPSS continue to work closely together to develop tighter integration capabilities that will help to put a new level of predictive analytic capabilities in the reach of enterprise organizations.

 

 

On to the next ones!

Vishal Daga

 

0 Comments Permalink
0

Performance Multipliers for Data Stream Processing

 

Yes, star crossed in pleasure the stream flows on by
Yes, as we're sated in leisure, we watch it fly.

And time waits for no one, and it won't wait for me
And time waits for no one, and it won't wait for me.

Time can tear down a building or destroy a woman's face
Hours are like diamonds, don't let them waste.

Time waits for no one, no favours has he
Time waits for no one, and he won't wait for me.


- The Rolling Stones, Time Waits for No One (Jagger/Richards),
from the album, "It's Only Rock 'n Roll" (1973)

We've dedicated the last several postings to the Field Programmable Gate Array (FPGA) - a key performance multiplier in the NPS® system architecture. Last time out I talked about the market growth of FPGAs as a mainstream technology in multiple applications settings outside of data warehousing.

This is the last of a three-part series on FPGAs, spanning the following topics:
  • "So, What Is an FPGA?" - aimed at providing a most-basic introductory primer of the technology, its capabilities and its promise (posted 28th November).
  • "FPGAs in the Mainstream & Some of Their Practical Uses" - a look at the use of FPGA technology across a broad swathe of market applications. (posted 20th December)
  • "OK - How Does Netezza Get a Performance Edge from FPGAs & What Does the Future Hold?" - linking FPGA capabilities to the benefits it brings to the NPS system and possible future directions it could enable.

 

 

 

 

 

 

 

 

 

 

 

 

 

Today, we'll dive in a bit into how FPGAs enable high performance at low cost in the NPS appliance, and what types of applications the technology may enable for the NPS in the future.

OK - So how does Netezza get a performance edge from FPGAs?
A critical element of Netezza's architecture is the implementation of direct-attach storage in a massively parallel array of query processing elements. Called Snippet Processing Units (SPUs), these query processing elements collocate CPU, memory and FPGA with each disk drive. The SPUs are arranged in an array that can be as small as several dozen or as large as nearly a thousand in today's NPS systems.

A critical component of overall data warehouse performance lies in the disk bandwidth that can be applied to a given problem and in turn, the level of processing horsepower that can be applied to that data. In short-hand terms, Netezza refers to its architectural approach as "bringing the query to the data." Rather than moving vast amounts of data across high-speed interconnecting (and sometimes non-blocking) networks as other systems do, the NPS system reduces the data to the information essential to the query as close to the disk source as possible.

The focus of the architecture is to enable streaming processing of the data: eliminating unneeded data as early as possible and processing the rest as rapidly as it can be read from the disk drives. That's where the FPGA comes in. The FPGA in a Netezza SPU has two primary roles.

In the first, it acts as the disk controller, controlling all of the disk read and write activities on the SPU.

In the second, the FPGA efficiently applies low-level database primitives, offloading significant work from other processing elements in the system. As table data streams from the disk on the SPU, the FPGA applies the transaction visibility list (only transactions that were current in the database at the start of the query are visible to it) and then applies the appropriate column projection and row restriction rules. Then only data that satisfies the visibility, projection and restriction rules is sent from the FPGA to the memory and CPU on the SPU for additional processing, if necessary.

Adding to the performance boost provided by the FPGA in general, another important system feature known as "Zone Map" is realized in a software module of the NPS system known as the storage manager. We think of Zone Maps as an anti-Index in Netezza, telling the system what data not to read. For each numerical column, the Zone Map can take advantage of any natural ordering of the data in the table (e.g., date, customer number, order number, etc.) and reduce the number of data blocks read in response to a query to only those required. For example, if a query were looking for information about transactions that took place between the beginning of September and end of October, the Zone Map function of the storage manager would direct the FPGA to read only those data blocks containing records from September or October, thereby eliminating the need to perform a full disk scan for each query.

The FPGA implements the read of the appropriate disk blocks and additionally filters and projects only data relevant to the query. This can improve query-processing rates by two or more orders of magnitude.

FPGA as performance multiplier: an example
As an example, consider the following simple SQL query:

Select state, gender, age, count(*) From 8 billion Row Table

Where dob < '04/01/2000' And dob > '12/31/1999' And zip = 32605

Group by state, gender, age;

In this example, the storage manager and FPGA would use Zone Maps to first limit the disk read to only those disk extents with dates of birth occurring in the three-month period of January through March 2000, rather than the full table. Then, when the data was read from the disk, the FPGA would further restrict the rows of data returned to those records within the three-month range and a zip code matching the query and finally, the column data projected to the memory and CPU would be limited to only state, gender and age information of each record. If the table in question contained 100 or more columns for each record, this could represent less than 3% of the column data. If one assumes the table in question contained birthdate information for just the last seven years, this would dramatically reduce the row-count of data delivered to memory/CPU as well - specifically by more than 25:1, or 3 months out 84.

Overall, for this example, the combination of Zone Maps with FPGA projecting and filtering of the data would result in just 0.1% of the full table data being sent to the memory and CPU for additional processing.

From this, you can see how the FPGA acts as a Performance Multiplier for query processing. Before a single CPU cycle or RAM memory location has been used, the FPGA has reduced the overall data required for processing by as much as multiple orders of magnitude.

And what does the future hold?
As suggested by Keith Underwood of Sandia National Labs, the price-performance and power efficiency look like they will enjoy an order of magnitude advantage over the 'x86' CPU technology roadmaps by the end of the decade. Using its performance and I/O advantages, FPGA vendors are already able to embed CPU core technology (Xilinx - "Embedded Processing" & DSP-FPGA.com - "FPGAs - Poised to play in embedded applications") directly inside an FPGA.

 

Projected FPGA Roadmap Capabilities

Source: Composite of FPGA Vendors' Historical & Roadmap Data

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

We at Netezza fully expect the FPGA advantage to increase over time. Based on suppliers' and research technology roadmaps, by the end of the decade we are anticipating 5X enhancements in each of the following areas:

  • cost
  • available logic
  • functionality per unit of power
  • speed

 

Xilinx' Powerful Virtex2Pro FPGA

Source: JPL/NASA Tech Brief, p. 12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The result will be extended, differentiated functionality introduced into current and/or future versions of FPGA technology, further increasing the price/performance and capability advantages of the NPS data warehouse appliance. Possibilities for expanded functionality include, but are certainly not limited to, in-line, streaming data compilation or encoding, advanced filtering and analytic logic operations ("Legacy FPGA Designs Can Be Migrated to Achieve Better Performance"); and even much more powerful pre-processing of query data by embedding CPU processing capabilities directly within the device ("FPGA Advances Pave The Way Toward True SoC Solutions"). If, how and when these may be manifest in the Netezza technology roadmap is still to be seen. However on the strength of the FPGA technology roadmap and the technology's significant benefit to the streaming processing needs of data warehousing, it's clear to us that the FPGA will continue to play a major role with Netezza for the foreseeable future.

The technology trends for high-performance systems is clear. In more and more industry domains ("In Praise of FPGAs"), low-power programmable logic devices are going to act as either performance accelerators or even the primary performance engine. By offering high performance, low power requirements and highly-flexible reprogrammability, the use of FPGAs promise to continue as a strong industry trend.

In short, we believe that the advantages that FPGA technology brings to the NPS system have 'legs'. We plan to continue to exploit those advantages for the benefit of our customers and don't intend to hide them "under a bushel" any longer.

 

0 Comments Permalink
0


Data Warehouse Appliances: definition & evolving place in the market

 

"It depends on what the meaning of the word 'is' is." - Bill Clinton, President of the USA (1993-2001)

As our first act of commenting on the industry, we'd like to address a topic that has seemingly stirred up quite a bit of emotion and controversy of late: just what IS a Data Warehouse Appliance (or "DWA" for the acronym-inclined)?

But first, the punch line: it's not definition of what a DWA is that matters, but - taking things a bit further - what deploying a DWA will mean to customers who use them in their analytics and BI scenarios.

Plenty of opinions to go around
In the world of BI and data warehousing, if there's one area that's nearly become an industry segment unto itself these days, it is the field of those industry analysts, pundits and other experts trying to define just what a "Data Warehouse Appliance" really is.

It's no wonder. Over the past three-plus years, the Data Warehouse Appliance market has blossomed, indeed. It has become a significant and growing segment of the data warehouse systems market. Since Netezza's initial entry & coining of the terminology for this space in 2002/2003, a number of new entrants (from industry behemoths to the smallest, new start-ups) have tried to stake their claims to it. Enter the industry pundits, to help us all by defining and making sense of things.

A growing market segment that's "here to stay"
Claims from long-standing incumbent data warehouse systems providers notwithstanding, I would hazard that in analyzing clippings from before 2002, you would be hard-pressed to find any references to a DWA in the media or analysts' market predictions about the future. Certainly, companies have built systems expressly for use as data warehouses in the past but my searches have not revealed any claims on the notion of an "appliance" in this space before the dawn of the 21st century.

Now we have an established and growing market category for data warehouse systems. According to IDC's Dan Vesset, "IDC expects the market for DW appliances to grow at a CAGR of 70% over the next 5 years from the estimated 2005 level of $75 million."

How does Netezza see the definition?
Today, with the definition of a data warehouse appliance is seemingly crying out for clarity, with a growing number of vendors' marketing claims making things more hazy. As the pioneer and the recognized global leader in the DWA market with over 75 paying customers under our belt, when it came to defining just what a DWA is we felt, "Who is more qualified than us?" - so we decided to weigh in with our views, as follow.

 

We define data warehouse appliances as follows:
  • Purpose-built for performance - from a single vendor; combining server, database, storage and network in an architecturally-integrated system built specifically for high-performance data warehousing. This includes dedicated hardware for processing large data volumes faster than any other data warehouse solution in the market.
  • Simple to use - like a kitchen appliance, this should be dramatically easier than traditional systems. Easy to install, deploy and maintain - with installation in hours and the ability to have a large DW up and running in a day or so. No tuning, indexing, partitioning, aggregations, etc. required.
  • Low acquisition and ongoing costs - appliances are just less costly to own and maintain - even for a large EDW implementation of 100 terabytes or more.
  • Enterprise compatibility - high availability; plug n' play integration; standards-based interfaces; fully integrated with all major Data Integration, Business Intelligence and advanced Data Analytics vendors.
  • Low power, cooling and space consumption - delivering high-performance in a compact footprint without blowing your data center's budget for electrical power and without forcing your IT director to implement "skip-a-row" equipment patterns to manage the data center cooling.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The key operable point here is that a DWA is fundamentally performance-driven. It allows businesses to have more clarity and more depth of analysis across ALL of their data much faster than they have been able to in the past. The fact that a DWA also delivers simplicity and economy is putting that performance well within reach for most enterprises.

Simply put, a true data warehouse appliance will put the high-performance of a super computer into an enterprise's data center at a cost-effective price point. And it will do so with an ease of installation, use and maintenance that will make much more powerful analyses and more rapid development of ideas possible than other systems can provide.

DWA Measures - according to the experts
A recent TDWI survey indicated a majority of members surveyed understand that a data warehouse appliance is defined as server hardware and database software built specifically for data warehousing - not just a bundle of commodity hardware and generic software - and that the benefits of this approach are greater performance and lower cost. But there have been many attempts to define and measure the "goodness" of DWAs. The table below contains but a few.

Furthermore, Robin Bloor and Philip Howard of the Bloor Group have set off down a path to make the definition and benefits of various DWA approaches more clear - aiming to do so even more completely in early November.

 

Source
Characteristics
Benefits
Philip Russom
TDWI
Survey Results:
  • Server h/w & DBMS s/w built specifically to be a DW platform (53%)
  • Any server h/w & DBMS s/w bundled to create a DW platform (14%)
  • Either definition (13%)
  • Don't know (19%)
  • Pre-tuned for DW Use
  • Fast Query Performance
  • Reduced System Integration
  • Fast Installation
Dan Vesset:
IDC
Two Primary Types for Data Warehouses:
  • Complete Stack DWA
    (combined h/w & DBMS s/w)
  • Virtual DWA
    (DBMS s/w bundled with clustered commodity h/w)
  • High Performance & Scalability
  • Lower Total Cost of Ownership
  • Lower Maintenance Costs
  • Highly Scalable Business Analytics Platform
Dan Linstedt:
TDWI/Myers-Holum (Mar06) & TDWI/Myers-Holum (Sep06)
Multiple entries, but most recently:
  • Web-based Thin client GUI admin
  • API for reporting, logging, admin, etc.
  • Embed s/w at h/w & firmware levels
  • Capable of transformation, data mining, loading & reporting
  • Notify admins & end-users of suspected security breaches
  • Web-enabled firmware updates
  • Truly plug & play
  • NOT part of a cluster, IS part of grid
  • Self-contained
  • Nine 9's uptime
  • (Near) linear scalability
  • High availability
  • Fast loading
  • Compression & Encryption
  • Plug & Play MPP units
  • SQL query interfaces
  • Super fast data access
  • Low cost per TB options
  • Plug & play fail-over
  • Automatic self-updating
  • Remote monitoring
  • Compliance for data
Charles Garry:
DMReview
  • Combined price/performance of...processors, open-source software and low cost disk storage in a single cabinet
  • Purpose-built with massive #s of CPUs to handle analysis against terabytes of data quickly and simply
  • Total Cost of Ownership: The Key Differentiator
  • Faster time-to-production & time-to-value
  • Easier maintenance with "Load and Go" simplicity - with no required physical db design, tuning, hints or indexes
Kim Stanick:
Baseline Consulting
  • Packaged solution of h/w & s/w that is pre-configured to perform DW workloads consistently well, out of the box
  • Acquired as a single unit rather than a collection of components to be assembled
  • Communicates via open standards (i.e., ODBC & SQL-92)
  • "Pre-integrated high performance": engineered for optimal performance on typical DW workloads
  • Enables enterprise IT group to offload engineering & tuning burden to the DWA vendor's design
  • "Data warehousing hitting its stride" means DWAs appeal to a broader set of companies
  • Just like the evolution of the auto: "You can build your own car, but most people don't because they are readily available, affordable and get the basic job done."
Mike Schiff:
TDWI/MAS Strategies
  • Pre-integrated h/w, DBMS s/w & storage
  • Optimized for very rapid query & retrieval
  • High performance
  • Low-cost
  • Quick to implement
  • Ease of use
  • Reduced DBA Support Requirements
  • "A proven offering"

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Why not bundles or "balanced" blade-servers?
Simply grouping multiple systems in loose affiliations won't really answer the mail here. The inefficient movement of huge blocks of data for analysis adversely limits performance; and the complexity of managing disparate systems and each of their upgrade and compliance paths alone will make this approach difficult to manage. But so will the fact that the systems will evolve independently and not necessarily in alignment with one another.

Unless it is rearchitected to specifically address data warehousing, a "shrink-wrapped" bundling of products from among a major player's broader suite of systems will be similarly performance and operationally limited. And it too will have to deal with the effects of each product's evolution pulling in a different direction.

Where is this all going & why does the DWA definition even matter?
The real issue of course, is that, to enterprise customers, the "true definition" of a DWA doesn't really matter at all; what matters is the impact that taking a DWA approach to their data warehousing needs can have on their businesses.

What we've seen from customers' use of the NPS product family is that DWAs are changing the way businesses use their warehouse data today and in the near term, including the following -

  • enabling deep, unconstrained analytics on all of their business data, even in extremely busy mixed-workload scenarios;
  • changing the way they think about the staffing to support it and opening up the development of whole new advanced analytics applications;
  • changing the way they purchase data warehouse infrastructure; and
  • helping mid-tier business solve critical data warehouse needs in compact, fully-contained business solution appliances.

In the longer term, DWAs will fundamentally change the way people operate their businesses.

Look for us to provide more on this and other of our views about the future of DWAs in upcoming postings.

0 Comments Permalink
1 2 Previous Next
Bookmark and Share

Actions