Thoughts from Inside the Box

Previous Next
1

April 28, 2008

"‘To be is to do.' - Immanuel Kant
"‘To do is to be.' - Jean Paul Sartre
"‘Do-be-do-be-do' - Frank Sinatra"

--Kurt Vonnegut, Jr. (Nov 1922 - Apr 2007)
In the news today: the Compress Engine

In 1783 Immanuel Kant wrote, "David Hume woke me up from my dogmatic slumbers," and revolutionized the way humanity thinks about metaphysics. Almost 220 years later, Netezza set out to achieve a similar goal - redefine analytics. When the first NPS® data warehouse appliance was introduced, the market released itself from yet another dogmatic slumber and realized that there is a different, better way to do data warehousing; a way without compromise, a way without limits.

Netezza has helped to reenergize the data warehouse market in creating and leading the data warehouse appliance category.

  • "Every time you turn around you see another industry that's facing a tidal wave of data and they need to understand what this data is saying. Many of them have data volumes in this range that they haven't been able to afford to analyze, as much as they'd like to. ... Netezza can deliver that analytic capability, and at a very attractive price." - Richard Winter, Winter Corporation, from Netezza will scale its appliance to petabyte range, InfoWorld (January 2008)
  • "This is what Netezza has done in the data warehousing market: it has totally changed the way that we think about data warehousing... So the bottom line is not just that Netezza's entry into the market was a black swan event but that that event has not ceased to unfold." - from Netezza: a black swan by Philip Howard, Bloor Group (October 2007)
  • "Appliances are here to stay and are revolutionizing the data warehouse industry." - from Business Analytics Appliances Are Here to Stay, by Dan Veset, IDC (June 2006)
  • "The term data warehouse appliance was coined by Netezza, and this vendor has blazed a trail by proving the concept and educating the market." - from Defining the Data Warehouse Appliance, by Philip Russom, TDWI (August 2005)

Since 2002, Netezza has been repeatedly breaking the latency barrier and challenging the boundaries of data analytics. Since our first release, we have been continuously refuting the alleged mutual dependencies that became the building blocks of the industry's dogmatic misconceptions; namely the expensive nature of performance, the necessary complexity of the analytics architecture and the unavoidable limits of scalability. With today's announcement of the Compress Engine, Netezza disproves yet another myth - the inverse relationship between data compression and query performance.

The architectures of traditional data warehouses, steeped in a legacy of serving OLTP applications, were not designed to handle the ever-growing amounts of data combined with larger and more complex user workloads and shrinking data latency requirements that characterize the modern enterprise. Regulatory compliance, electronic commerce and the need to process and analyze all data in a matter of seconds has pushed the capabilities of traditional data warehouse systems to their limits. In reaction to the data capacity pressures, vendors introduced compression; not as an enhancement but as a compromise solution that allows for further data growth at the cost of processing performance.

Traditional compression approaches, used by several of the competing data warehouse vendors, typically result in performance degradation to accomplish the compression effect. Netezza's addition to the FPGA-Accelerated Streaming Technology (FAST) Engines framework - Compress Engine - utilizes its innovative streaming architecture^TM^ not only to increase the system's storage capacity by 2-4X but actually boost overall streaming query performance by a factor of about 2X (100%). All this is achieved without requiring any tuning or administration, and it is in fact a software-only upgrade that enables Compress Engine on the Netezza appliance.

It's actually really cool technology, obviously something we love to rave about. Late last year, I wrote about FAST Engines in this blog. We'll use that as a starting point and dig a level deeper into how Compress Engine works. I'm sure it will tickle the fancy of the geek in you!

http://www.netezzacommunity.com/servlet/JiveServlet/downloadImage/38-1042-1055/Picture3.png

The NPS system employs a patent-pending method for compiling (yes, compiling) columnar data in all the tables of the database as it is being written to disk e.g. during load, insert or update operations. The process converts row-based data into column streams that are independently compiled to replace the original data in the columns with a stream of "instruction sets" for the FPGA. The "instructions" themselves are much smaller in size than the data they replace, resulting in a highly compressed data stream emerging from the process.

While the compression occurs on columnar data because of the inherent compressibility within database columns, the compressed data is reassembled in rows before being written to disk. Row-wise storage of tables avoids the data scan complexity associated with columnar stores and ensures that scanned data can be efficiently parsed and processed without the need to reconstitute it from multiple sources. The compressed data uses disk much more efficiently and increases the data density of NPS systems by 2-4X - in some cases substantially higher - allowing customers to scale their NPS data warehouse systems into the hundreds of terabytes of user data.

But if the NPS system's data compression and scale brought the system's performance to its knees or severely limited performance speedup due to compression (as it does on many of those other systems), it wouldn't be so great, would it? The beauty of the Netezza way of providing data compression is that not only does it have no negative impact on performance, but it actually increases query performance by up to 100%!

http://www.netezzacommunity.com/servlet/JiveServlet/downloadImage/38-1042-1054/Picture2.gif

As the compressed data is read off the disk, it is passed through the Compress Engine which applies the instructions embedded in the data stream to restore it to its original form. Our compilation algorithm ensures that this decompression process can be performed entirely in silicon, at wire speeds. Each physical block scanned from the disk can mushroom into 2 to 4 or more times its size in memory without incurring any overhead in processing time - i.e. 2 to 3 times more data is scanned in the same amount of time _without any increase in system hardware_! Our internal benchmark testing reflecting real customer configurations and workloads has shown an overall 2.2X increase in streaming query performance through the use of Compress Engine.

This software-only enhancement, enabled by our unique architecture, is only the beginning. As we continue to develop our platform, we are investigating further enhancements to the Compress Engine or the addition of new FAST engine(s), aimed at directly increasing streaming performance on the NPS system.

Our philosophy and aim is to continue to shake the industry out of its dogmatic slumbers by extending the price/performance advantages of our products; showing that there's a different way to do data warehousing and advanced analytics. One where performance and scalability are neither the result of expense nor complexity, where you can get more performance from compression, where you do have the power to question everythingTM ...



May 15, 2008 4:41 AM Click to view spotnis's profile spotnis

To the admin,

It seems everytime I read a new announcement from Netezza I expect a paradigm shift and the compress engine lives up to the expectations.

On a separate note, having heard Kurt Vonnegut's audio collection, the quote at the top of the page cracked me up.

-Sandeep