[ www.netezza.com ]

Thinking Inside the Box

6 Posts tagged with the exadata tag
0

      

In a recent blog, Greg Rahn of Oracle responded to Phil’s “Oracle Exadata and Netezza TwinFin Compared” eBook; before commenting on an Oracle engineer’s views, I’ll restate the eBook’s larger themes.

 

Exadata connects Oracle’s RAC database, its architecture designed for online transaction processing (OLTP), via a fast network to a massively parallel processing storage tier. As an OLTP database paired with a specialized storage subsystem, tuning Exadata to function as a data warehouse is complicated and demands skilled, highly trained, experienced technical staff. Mitigating the shortcoming of an OLTP database pressed into service as an analytic database with expensive network and storage makes Exadata costly: to acquire; to design, tune and maintain as an optimally-configured data warehouse; to run in the data center.

 

Netezza TwinFin, designed as an analytic database, brings the power of massively parallel processing to manage and exploit data at terabyte-to-petabyte scale. TwinFin is an appliance–easy to install, easy to operate and easy to manage. TwinFin offers value: fast performance for advanced analytics at an affordable price.

 

Now I’ll discuss the detail of Greg’s blog and respond from a Netezza perspective.

 

Claim: Exadata Smart Scan does not work with index-organized tables or clustered tables.

 

Greg responds that “IOTs and clustered tables are both structures optimized for fast primary key access, like the type of access in OLTP workloads, not data warehousing” and suggests our intent was to mislead by quoting from an old Oracle datasheet. It wasn’t. Oracle 11g Release 2 documentation reads “Index-organized tables are suitable for modeling application-specific index structures. For example, content-based information retrieval applications containing text, image and audio data require inverted indexes that can be effectively modeled using index-organized tables.” Elsewhere the documentation states “Index-organized tables are useful when related pieces of data must be stored together or data must be physical stored in a specific order. This type of table is often used for information retrieval, spatial and OLAP applications.” In the eBook Phil discusses first and second generation data warehouses; many of the applications described by Oracle as candidates for IOTs are typical of those our customers run on TwinFin – these are second generation data warehouse applications. Greg believes Exadata smart scan not working with index-organized tables has zero impact on Exadata customers. Is it reasonable to conclude that Exadata is not being used for second generation data warehousing?

 

Claim: Exadata Smart Scan does not work with the TIMESTAMP datatype.

 

Since we published the first edition of the eBook Christian Antognini, the original source of this information, goes to the heart of the matter in his blog: “The essential thing to understand is that this limitation is due to bug 9682721. The fix is expected to be part of 11.2.0.2. According to my test cases (that Greg Rahn was so kind to execute against an early release of 11.2.0.2), offloading works correctly for all datetime functions but for the following three predicates.

 

  • months_between(d,sysdate) = 0
  • months_between(d,current_date) = 0
  • months_between(d,to_date(‘01-01-2010’,’DD-MM-YYYY’)) = 0”


Note that the MONTHS_BETWEEN function can basically be offloaded. The problem in these cases is that the offloading does not work when, for example, SYSDATE is used as a parameter.

While happy to let this one pass, I have a question. Do organizations accrue value or cost from a technology requiring its administrators understand all combinations of functions, their predicates and their parameters before they are capable of designing queries to be processed in parallel?

 

Claim: When transactions (insert, update, delete) are operating against the data warehouse concurrent with query activity, smart scans are disabled. Dirty buffers turn off smart scan.

 

In my opening comments I compared TwinFin’s simplicity to the complexity of Exadata. All queries submitted to TwinFin are processed in its massively parallel grid; no tuning, no special database design. This is appliance simplicity. In Exadata whether a query benefits from smart scans (massively parallel processing) can depend on the state of the data being read. Exadata requires developers to understand at great depth the physical path a query takes to access data. This is complexity.

 

While Greg concedes Exadata’s MPP processing is disabled for those blocks containing an active transaction he is confident that “Not having Smart Scan for small number of blocks will have a negligible impact on performance”. My experience with Netezza’s customers and their applications prompts me to take a more circumspect view. I’ll explain why in the next section.

 

Claim: Using [a shared-disk] architecture for a data warehouse platform raises concern that contention for the shared resource imposes limits on the amount of data the database can process and the number of queries it can run concurrently.

 

Greg argues contention for shared disk is not a problem for Exadata and cites Daniel Abadi’s blog in his defense. Let’s take a look at what Daniel says on this subject “If you are going to make an argument that shared-disk causes scalability problems, you have to make the argument that contention for the one shared resource in a shared-disk system is high enough to cause a performance bottleneck in the system - namely, you have to argue that the network connection between the servers and the shared-disk is a bottleneck.” This is the argument Phil makes in our eBook. Consider a query analyzing correlations between equity trades in a sector of a stock market. The algorithm calculates Spearman’s rank correlation coefficient (Spearman’s rho), measuring statistical dependence between two variables by assessing how well the relationship between them can be described. This analysis creates valuable insight in to whether specific equities influence behavior of other equities in the same market sector within a window of one to ten minutes.

 

The customer loads a massive volume of trading data into TwinFin and constantly trickle feeds data from live markets into the warehouse. The query is run and re-run constantly to assess behavior of different equities in dynamic markets. Each time TwinFin completes a Cartesian join between all the equities in the sector while at the same time calculating a Volume-Weighted Average Price and a Return From Previous Close value for the equity under investigation. The results pass to Spearman’s rank correlation coefficient function to calculate the Population Covariance and the standard deviation of every equity combination for the time period. Netezza executes every step of the query in parallel utilizing all TwinFin’s hardware and software resources. Netezza’s intelligent storage selects only the rows needed for that market sector and projecting only the columns needed for assessment. The join result is directly streamed to the code implementing the statistical analysis which TwinFin downloads to every processor in its MPP grid, running the complex calculations in parallel. Results from each node in the MPP grid are returned via the network to the host for final assembly and rendering back to the requesting application. TwinFin completes the analysis in a few minutes, and then runs it again, and again for as long as the market is open.

 

After several hours Oracle 10G was still attempting to complete its first round of analysis. What difference will a new version of the Oracle database paired with an MPP storage system and a fast network make? Exadata’s MPP storage grid is unable to process Cartesian joins, the first step of in this analytic process, meaning it brings no performance gain but must put all records on the network and send them across to Oracle RAC. Even if it we able to process the join Exadata cannot push down user defined functions, used to implement the calculations, to MPP - in Oracle functions always execute on the RAC servers. In processing the algorithms Oracle must create and manage temporary data sets and write these out of memory for storage. Exadata’s flash cache may play some role here, but the size of the data sets and the complexity of the algorithms will force database processes to write to disk. This flow from Oracle RAC is back across a network still clogged with coming from the MPP storage tier data, queued and unprocessed waiting for attention from a fully-consumed Oracle RAC. I contend that Exadata’s network connection between the servers and the shared-disk is a bottleneck. Not Exadata’s only bottleneck. TwinFin demonstrates how a true MPP architecture excels in calculating Spearman’s rank correlation coefficient - a real workload on a real dataset. Oracle’s OLTP database, simply not designed to process large-scale analytics, is overwhelmed. Exadata suffers contention on its network and in its database system’s shared disk architecture.

 

Back to the previous point about Exadata’s MPP processing being disabled for blocks containing an active transaction – the customer is constantly loading new market data and analyzing it in comparison with a massive volume of historic data. While entirely appropriate for transaction processing, Exadata’s architecture of disabling an entire block from parallel processing when a single record in the block is being updated can only hinder and never help in the data warehouse. The very point of a data warehouse is that all data should be available to the business as quickly as extract-transform-load processing allows. By pressing an OLTP database in to service as an analytical database Oracle unnecessarily burdens customers with creating database designs to work around this complexity and, developing a thorough understanding of how each query accesses the data model. While not having Smart Scan for small number of blocks may or may not impact performance, as an unnecessary complexity demanding the attention of database specialists, it costs customers real money.

 

Claim: Analytical queries, such as “find all shopping baskets sold last month in Washington State, Oregon and California containing product X with product Y and with a total value more than $35” must retrieve much larger data sets, all of which must be moved from storage to database.

 

Greg shows some nice SQL to demonstrate how Exadata processes the beer and pizza query. Give the business an answer and they always come back with a new question: “Greg, what was the total value of Brand #42 beer’ sold in each basket?” Greg can now update his SQL with the clause:

 

sum(case when p.product_description in ('Brand #42 beer') then td.sales_dollar_amt else 0 end) sum_productX,

 

and re-run the query. Business users love IT when we give them a fast performing system but are less forgiving when a query, that yesterday ran blazingly fast, today slows to a snail’s pace. Exadata cannot push down the newly introduced sum for parallel processing by its storage nodes as the join must be processed first, and the storage nodes cannot process joins. Any function or calculation that uses columns from two or more tables must be evaluated on the RAC database servers. The query performance is going to degrade significantly sending the database expert back to the Oracle documentation in an attempt to find a new way to resolve the amended query so it completes at a time acceptable to the business.

 

Claim: To evenly distribute data across Exadata’s grid of storage servers requires administrators trained and experienced in designing, managing and maintaining complex partitions, files, tablespaces, indices, tables and block/extent sizes.

 

While conceding Oracle Automatic Storage Management automates the task of striping partitions across all available disks, the ASM administration team must still create partitions, configure and manage disk groups for shared storage across instances, choose and implement either 2-way mirroring or 3-way mirroring, and configure Allocation Unit sizes. Additionally, Exadata configuration requires administrators create and manage tablespaces, index spaces, temp spaces, logs and extents.

 

In conclusion, Netezza entered the data warehouse market convinced the products offered by the dominant vendors, in particular Oracle, were ill-suited to meet the challengers of Big Data and of such complexity to make them exorbitantly expensive to acquire and use. Exadata only increases the complexity and expense of an Oracle warehouse. Greg draws his readers’ attention to the excellent blog at http://dbmsmusings.blogspot.com/ where Daniel Abadi muses “Both Oracle and Teradata are too expensive for large parts of the analytical database market.

 

Greg’s blog reveals one path available to organizations wishing to generate greater value from their data. CIOs willing to build, train, and permanently assign a team of technical experts to choosing just the right combination from a myriad of settings, can be continuously employed coercing a database designed for OLTP to function as a data warehouse. I’ll close this blog with a manager’s perspective, from someone who focuses an organization’s limited resources on its highest priorities. Peter Drucker, who introduced us to the concept of the knowledge worker, gave us a pragmatic measure to evaluate our own and our team members’ activity - am I merely efficient (doing things right) or truly effective (doing the right thing)? All the workarounds and clever tuning demanded by Exadata simply don’t exist in TwinFin, Netezza has proven them unnecessary.

0 Comments Permalink
4

Today Netezza is launching a new eBook entitled, “Oracle Exadata and Netezza TwinFin™ Compared”. As the name implies, this eBook provides a comparison of the Netezza TwinFin data warehouse appliance and Oracle’s “appliance-like” database machine offering.ebook_tfexam_thumb.jpg

 

Certainly Netezza is not the first company to compare/contrast its flagship system with Oracle’s most recent entry. Richard Burns, a consultant over at Teradata did a laudable job exposing the technical shortcomings of the Exadata v2 machine as they pertain to data warehousing in a May 2010 whitepaper. And there have been several recent pieces written on Oracle’s apparent success although the publicly named customer-list has struck some as a bit underwhelming.

 

Netezza continues to compete (and win) against Oracle regularly in the marketplace, including in competition with the Exadata v2 product and so, we felt it was high time to put our own comparison story together with today’s eBook and with this little blog posting. Let me know what you think.

 

So where to begin? Let’s start with the fact that the Netezza TwinFin is built to excel at a specific purpose – as the best price/performance platform for Data Warehousing and Analytics in the market. Conversely, Oracle has tried to “kill two birds with one stone” in the Exadata v2 – aiming it primarily at the On-Line Transaction Processing applications space, but also making bold claims to performance as a Data Warehouse with it’s Sun-based Oracle Database Machine (DBM) and Exadata Storage Server, version 2 (Exadata).

 

So why does it matter that Oracle is aiming to do both OLTP and DW in the same system – apart, that is, from at least two decades of people trying-and-failing to do exactly that with the likes of Oracle in previous software and hardware instantiations? Let’s start with the workload requirements of the two application areas:

  • OLTP systems execute many short transactions, typically of extremely small scope (touching only a handful of records) and in extremely predictable, well-understood access and query patterns. They need to excel at handling these small transactions in very high volume, combined with equally small writes to the database in the form of updates, insertions and deletions. This limited scope, high throughput and “regularity” of the access patterns make OLTP systems great candidates for intelligent caching and (multiple) secondary data structures, such as indices to speed their processing.

 

  • Conversely, DW systems are typically asked to perform “read-heavy” queries and operations against the current and deep historical data sets. Rather than analyzing just a few records, a DW query might look at millions, even billions, of rows from a single table, combined with join logic with multiple other tables. Data warehouse systems are used by company analysts and managers to find the “needle in the haystack” in guiding enterprise decision-making in a more comprehensive and often ad-hoc manner – frequently mitigating the ability to use “tricks of the trade” such as results caching and/or indices.

 

So the two applications tend to lead to very different system/platform implications. No special “news” there – as I said earlier, people have been trying-and-failing to use a single system for both applications for years.

 

Without stealing any more of the thunder of our electronic publication today, let me just lay out what I believe are the fundamental differences between Netezza’s TwinFin and the Oracle Database Machine/Exadata as simply and plainly as I can:

 

Netezza TwinFinOracle Database Machine / Exadata v2
True MPPHybrid "SMP-plus" Approach
Data Streaming with a Hardware AssistCPU-intensive Processing for Basic DB Operations
Deep Analytics ProcessingCentral Cluster-based Approach
No-Tuning-Required SimplicityComplex Array of Knobs and Levers

 

In my view, these are "big deal" differences. They're not the result of a simple feature gap to be closed in an upcoming point-release, but rather go directly to limitations at the heart of the Oracle DBM/Exadata system architecture and/or business culture. To address them would require a major rearchitecting, or at least refactoring, of Oracle's decades-old DBMS code base. They also happen to be highly visible to customers and prospects, which makes for some interesting comparisons in head-to-head on-site Proofs of Concept (POCs).

 

1) True MPP vs. a Hybrid "SMP-plus" Approach

Netezza’s TwinFin uses a full MPP approach to data warehousing, pushing all of the processing down as close as possible to where the data is stored and maximizing the processing horsepower of MPP for scalability, throughput and performance – for even the most complex workloads. Using the MPP method of dividing the workload and attacking query problems in parallel, Netezza has been able to demonstrate market-leading data warehouse price-performance across four generations of data warehouse appliances.

 

Oracle’s DBM/Exadata takes a hybrid approach adding Exadata Storage nodes largely to handle data decompression and predicate filtering tasks, but still relying primarily on the SMP cluster of Oracle RAC to handle most of the data warehouse tasks, including complex joins. In addition the SMP cluster also must act as the central distribution point for any data that needs to be redistributed between and across Exadata nodes. To try to minimize this, Oracle and Sun’s solution was to “throw hardware at the problem” (quoting Teradata’s Mr. Burns), over-engineering interconnections, processor rates and other elements required because of all of this data movement, rather than refactoring and solving a fundamental software architecture issue.

 

The difference between the two is akin to an 8-lane continuous streaming superhighway in the TwinFin instance versus multiple freeways converging on and necking down to a two-lane country road via a “traffic roundabout”. I live in Massachusetts and can attest to the negative impact of taking multiple highways down to a single road – it happens every weekend at the gateway to and from Route 6 on Cape Cod.

 

2) Data Streaming with a Hardware Assist vs. CPU-intensive Work for Basic DB Operations

In addition to the advantages of the MPP architecture for data warehousing, the TwinFin system makes use of hardware acceleration for increased query and analytics performance. Coming in the form of the "DB Accelerator" that is part of each S-Blade in the TwinFin system architecture, providing four dual-core Field-Programmable Gate Arrays (FPGAs) on each DB Accelerator, this hardware acceleration takes care of fundamental processing steps such as decompression, predicate filtering and ACID-compliant data visibility at the full scan rate of the data from disk. The fact that this device is placed as close as it is to the disks for which it is performing its processing gives the TwinFin system much more performance leverage because data can be filtered, processed and value-added before undergoing any unnecessary CPU processing or having to be transported across an expensive network.

 

And the fact that it is a field programmable device means that Netezza can use it to introduce additional features and performance through a simple upgrade to our NPS software/firmware – as Netezza has with the introduction of two phases of hybrid column/row-level compression technology (with Release 6.0, scaling as high as 32:1 compression, depending on data patterns) first introduced in 2005, and our high-performance implementation of row-level security. Because it's performed in the FPGA in TwinFin, "Compression = Performance"; so if a customer's data is compressed by a 4:1 factor, the effective data streaming rate for processing queries is increased four-fold.

 

Conversely, the DBM/Exadata system relies entirely on CPU processing. In fact, the great majority of the functionality provided for by the Exadata nodes in the DBM/Exadata system is to replicate the functionality included in each FPGA core of the TwinFin - data decompression and predicate filtering. Because of the CPU-intensive nature of decompressing data in the DBM/Exadata system, Oracle "strongly suggests" lesser compression when data is required for high-performance data warehousing vs. "cooler" queryable archive purposes. Again, the heavy-lifting for query processing and analytics is left to the central SMP cluster nodes rather than parallel Exadata nodes, forcing Oracle to "throw hardware at the problem".

 

3) Deep Analytics Processing vs. Central Cluster Analytics

Netezza brings analytics to where the data is stored – as close as possible to where it is stored to do the processing – not just to decompress it and do predicate filtering, but to complete as much of the complex analytics as is possible, in parallel. That’s as true of the “traditional” OLAP analytics of SQL-based data warehousing as it is of the advanced and predictive analytics enabled by the new capabilities of i-Class in the “Second Wave of TwinFin”.

 

With i-Class, Netezza introduces a comprehensive, scalable and high-performance approach to advanced analytics for both our customers and partners, spanning Linear Algebra/Matrix manipulation, and engines for R and Hadoop along with several programming languages including C, C++, Java, Python and even Fortran. The i-Class functionality also offers plug-ins and packages for the Eclipse IDE and R GUI, and pre-built, analytic functions engineered to deliver performance at scale spanning data preparation, mining, predictive analytics and spatial functions together with access to analytics functions from the GNU Scientific Library and R CRAN repository. Extended by the i-Class embedded analytics capabilities, TwinFin allows our partners and customers to push-down applications, functions and algorithms going well beyond standard set-based SQL, at scale with high performance, freeing them of the latency and sampling requirements demanded by off-board processing platforms for advanced analytics.

 

The Oracle DBM/Exadata performs the majority of the OLAP analytics in the central cluster (RAC) nodes, after traversing the "traffic roundabout". And apart from basic scoring functionality, virtually ALL of the advanced analytics are performed in the cluster nodes as well. Placing the predominance of processing in the central SMP cluster means that both the functionality and scale of the analytics are limited by the capacity and performance that the SMP cluster can provide - typically limited to the elements included in Oracle's own "Data Mining" package.

 

The DBM/Exadata’s requirement for shipping the data from the storage arrays to the central cluster for analytics is akin to backhauling full massive truckloads of materials from a mining site to pick out the gold at a central headquarters rather than sifting out the most important nuggets in parallel and sending only those valuable elements back in the case if TwinFin.

 

4) No-Tuning-Required Simplicity vs. a Complex Array of Knobs and Levers

For a long time, the simplicity of the Netezza data warehouse appliance has shone through most strongly in the extremely limited tuning requirements it imposes on administrators of the system, particularly as compared to Oracle-based systems. Simplifying the system management is core to Netezza’s “appliantization” of the data warehouse and analytics platform. Rather than managing a “coordinated collection” of technology assets, the system and database administrators of TwinFin interact with a single appliance and use the redundant Linux-based SMP host nodes as the interaction point for all activities. Everything from database configuration, data distribution, data mirroring, monitoring, software upgrade and day-to-day management are simplified (in the words of one TwinFin customer, “It’s Netezza-easy – it just works.”).

 

No indexing is necessary (or even supported) in TwinFin to achieve high performance. Just about the only requisite “tuning” of the system is the definition of the distribution key for spreading data across all the S-Blades – typically the primary keys of the tables. Even in the internal management structure of TwinFin, our system management has been configured to get the maximum performance from the commodity subsystems (blades, chassis, disk arrays and network) by connecting them in novel ways and then managing them at a system level, rather than at the subsystem or rack-level.

 

While it is true that Oracle has simplified some of the tuning knobs and levers in the DBM/Exadata, prospective customers should ask them if they really have moved into the domain of requiring only a small handful of tuning knobs & settings; or whether they still require, or more colloquially, “strongly suggest” the use of dozens or even hundreds of settings (depending upon the number of objects being maintained and optimized). How many dozens of IP addresses are needed to configure and manage the DBM/Exadata (TwinFin requires only two)? Oracle even have a special service to help DBM/Exadata customers migrate and tune their systems and databases for performance and some of their leading Performance Architects even talk about the requirement of using functions like the Oracle SQL Tuning Advisor as an inevitable fait accompli.

 

By Oracle’s own admission, the time-savings that customers can expect to achieve in managing and tuning the DBM/Exadata system in Oracle 11g r2 is only 26% less than in Oracle 11g. Contrast that with installation after installation of Netezza appliances where 100s of terabytes of data under management in a data warehouse(s) are being maintained by two or even less then one FTE, rather than a team of Oracle specialists. It all depends on one’s perspective and philosophy in building a real appliance for the data warehouse market. Where others may see the need to tune, partition, index and sub-index data sets for performance purposes as an inevitability, Netezza sees that same need as reason to enhance TwinFin’s capabilities in order to obviate it.

 

All of this really adds up quickly to a significant price-performance advantage for customers of TwinFin – and with our limited tuning and simplified operations, also translates into much more rapid time-to-value for Netezza’s customers, too. So that’s it – four simple fundamental differences that really set the TwinFin appliance apart from the DBM/Exadata. Agree? Disagree? Let me know what you’re thinking. And now, go over and have a look at today’s eBook release for the rest of the story.

4 Comments Permalink
3

Netezza Migrator.jpg

It may have been the result of a misunderstanding or a comment heard out of context. But whatever the background for the commentary, let me simply state that Netezza is completely committed to the success of the Netezza Migrator and all the other Netezza products and functionality launched at Enzee Universe 2010 this past week. Migrator eliminates a potential barrier to TwinFin™ adoption (i.e., migration costs) and logically should lead to easier acceptance and broader system sales for Netezza. Furthermore, our partnership with EnterpriseDB at both the corporate and technical levels has been and remains extremely solid and strong.

As I stated in the
announcement of the product, “The Netezza Migrator product allows organizations to make data warehouse migration decisions independent of proprietary software lock-in. Organizations using data integration and BI applications with embedded Oracle-proprietary database constructs, interfaces and utilities can now more easily manage their migration from Oracle to a TwinFin appliance. The Netezza Migrator will allow our customers to achieve the performance, scale and cost advantages of their TwinFin systems while maintaining their prior investment in proprietary software.” The Netezza Migrator is specifically designed to reduce the time, complexity and costs required of our customers to move their IT applications to the Netezza TwinFin platform.

With Migrator, Netezza’s customers will be able to extract themselves from the dreaded “Oracle lock-in” of functions and procedures written using Oracle-proprietary techniques and they can decide which of their applications to migrate directly to Netezza and just when, at their own pace. Its capabilities go well beyond the extremely limited capabilities provided by Oracle’s own ‘Database Gateway for ODBC’. Migrator provides an Oracle compatible wrapper around Netezza that is optimized in ways that Oracle could never hope, nor deign, to provide with its "Heterogeneous Services" functionality: including support for Netezza syntax pushdown, high speed API, and Netezza user defined functions.

Migrator makes it even easier for Netezza’s customers to move all of their data warehouse from Oracle to Netezza. In short, this is something we feel is extremely valuable for Netezza and particularly “liberating” for our customers.

3 Comments Permalink
0

A loyal customer alerted us toan Oracle blog by Jean-Pierre Dijcks earlier today that showed the Oracle FUD machine is fully revved-up and ready to go. I'd like to offer a rebuttal, however in the interest of not intruding on Jean-Pierre's entry with an overly-long comment, I've just put a short response on his blog post with a pointer to this one.


Misconceptions and Misunderstandings, or Errors and Plain-old FUD?

I’m writing to correct *just a few* of the misconceptions about what is really important in high-performance, scalable data warehouse systems, errors, or just plain-old pure “competitive FUD” points from Jean-Pierre's posting earlier today. We certainly have posted some information recently about the TwinFin product and Curt Monash’s postings late Thursday provided more info. If his readers are interested in learning more, or even signing up for a “Test Drive”, they should visit www.netezza.com.

First off, I think this is a “banner day” for Netezza. We believe that TwinFin (and the other products in the new product family)
extend both our performance and price-performance advantage over our competitors. We stand by our marketing statements that we regularly demonstrate 10-100X performance advantages over our competitors, particularly competitive offerings of the major incumbent DW system vendors (“Just who are those incumbents?” Jean-Pierre's readers may ask. Well let’s just say that we see Oracle as the incumbent system and/or a challenger system in over 50% of our deal flow.).

Regarding his claims about DBM being “
faster than Netezza” (and I can only assume he meant at “real” data warehouse tasks) - we’re ready whenever Oracle feels up to actually taking one of their Database Machines onsite to a customer for a fair, open customer benchmark. So far, Oracle have been, shall we say, “a little reticent” to do on-site benchmark testing against Netezza.

Next, given the large number of incorrect points in the original posting, I think perhaps that just a few of them will be useful enough for readers to get the gist of just how far afield some of the ‘facts’ are:

  • It all comes down to data scan rates per rack”: Would that it were true that all of data warehousing boiled down to full-stream data scans (as if the entire world of analytics relied on “select count(*) from lineitem” types of queries), then we could all measure “goodness” on how many GB/sec of data could be burst-scanned in our systems. But that’s not the case. So we build Netezza’s data and analytic appliances to deliver the best possible overall performance at the best price and power requirements. As a consequence, and following from those same numbers as-posted, a single rack of TwinFin can process (not just scan) about 400 million rows of data per second. That’s process, as in: “scan, decompress, project, restrict, AND join, etc.”. Need more processing firepower? Netezza’s system performance scales linearly with the addition of more S-Blades: at the low-end, the TwinFin 3 can deliver as much as 100M rows/second of processing horsepower, while the TwinFin 120 can provide you with 4 billion rows/second.  Does a system that still relies on using SMP-based servers running “plain old” Oracle 11g RAC scale similarly for data warehousing?


  • Non-open Linux running on FPGAs”: I’m really not sure what (if anything) was meant by this, but saying that Netezza’s FPGAs “are apparently running non-open Linux” is oxymoronic on at least two different levels (FPGAs don’t typically “run” an OS and, “non-open Linux” - really?)


  • User data & compresssion”: I also enjoyed the accounting of all that “user data” available to DBM users in the Oracle table and the various comments about compression. When Netezza quotes user data capacities in our systems, the numbers reflect real raw user data space, not space that will be further reduced because of required indexes in an attempt to boost performance. Furthermore, Netezza’s compression & decompression techniques allow us to extract “pure performance” from their use. By not relying on CPU cycles to decompress the data before we can process it any further, the FPGA engines decompress the data, on-the-fly, as fast as it streams off the disk drives. Can Oracle make either of those claims?


  • Tolerating node failures without downtime”: In perhaps the most bald-faced inaccuracy, the Oracle blog claimed, that Netezza “continues to lack the ability to tolerate node failures without downtime”. This I can only chock up to pure competitive “FUD-ism” as our capabilities in this area have been quite strong throughout the four generations of Netezza appliances and are further strengthened in TwinFin. Netezza is a fully-redundant system with no single point of failure, even in our smallest systems. Failover in the presence of failures of the disk drives, S-Blades, internal networking or host processors (in short, everything) is automatic and done in-service, with hot-swappable replacement throughout.


  • Appliance simplicity”: One thing Jean-Pierre didn’t address that might have been humorous to see his take on is the notion of “appliance simplicity” - basically the ability to build, support and maintain large to very large-sized data warehouses, with heavy workloads, with no or minimal tuning, partitioning, indexing or other “performance duct tape” required. Routinely, this capability in the Netezza systems is what delights our customers most and we have customers managing systems with several hundreds of terabytes of user data (not indexes + data, mind you - real data) with fractions of an FTE (full-time employee) devoted to them.


I hope that clears up some of the misconceptions. If any of Jean-Pierre's readers or Oracle customers would like to see or hear more about TwinFin for themselves, we definitely would invite them to come stop by our booth (#207) at
TDWI or come to one or our regional Enzee Universe events coming to a location near you.

0 Comments Permalink
5

 

We came across a series of blog posts the other day which seemed to insinuate that Netezza and other competitors might be trying to shape our 10-100X performance message on the backs of comparisons to antiquated, end-of-service life systems and not comparing to current competitors' platforms . When we got to this one - "Database Customer Benchmarketing Reports" - I felt we just had to correct the record, so I wrote a response to Greg Rahn's posting to give Netezza's side of the story, namely that

 

  • we are as up-front as possible with prospective customers and use the customer benchmark testing/POC process to prove out Netezza's performance, value and simplicity value propositions;

  • the results of other products' performance come from our prospects/customers and not the result of Netezza running the tests on those platforms;

  • not only do we test against the incumbent systems, but there is almost always at least one other current competitive system that is included in the POC process;

  • the PowerPoint deck on which Greg was doing his analysis contained some rather ancient (in enzee-years, anyway) comparisons with versions of the NPS appliance that we have not sold in as much as 4.5 years & was really not much of a data set on which to base his analysis; and

  • the "proof of the pudding is in the tasting" - Netezza's success rate of converting prospects to customers through the customer benchmarking process remains very strong.

 

In short - we make every effort to keep the Netezza website contents both accurate and clear and we definitely feel confident in standing by our 10-100X performance claims. It would be great to have more than the Netezza "product marketing guy" clarify things - while I know some of the excellent results recent customers have seen in POC, no one knows them better than our customers & SI partners themselves.

 

 

5 Comments Permalink
0

It was an odd email exchange. Only 30-minutes earlier, at approximately 3:04pm US-PDT, Oracle CEO Larry Ellison, head of one of the most powerful database technology companies on Earth, had publicly launched Oracle's entrée into the Data Warehouse Appliance marketplace: "the HP Oracle Database Machine and the Oracle Exadata Data Storage Server" - while simultaneously "sporting a curiously Romanesque hair style".

 

 

Larry Ellison & Julius Caesar - separated at birth? (Wikipedia: Julius Caesar)

 

 

Perhaps we should have been cowered by such a goliathan announcement? Perhaps we should have quivered? Well that's when the email showed up. You see, Netezza had a booth (or "stand" - as I'm writing this from London tonight) in the exposition area of Oracle's big OpenWorld show in San Francisco. Within minutes of Larry's presentation, in which Netezza figured prominently albeit with substantially erroneous information across Mr. Ellison's charts, the Netezza stand was completely deluged with people saying things like, "I had never talked to your company about data warehousing before, but if Larry is going to spend 10 minutes talking about you, I need to know more." And the Netezza product brochures starting flowing - not in a trickle like a leaky pipe, but like water through a burst dam.

 

 

Larry hadn't just brought up Netezza but had spent some "quality time" extolling the strengths of the Netezza architecture - moving query processing horsepower as close as possible to the storage elements of the system, and his commentary had marked Netezza as the leader in the Data Warehouse Appliance (DWA) approach. Within the hour, our team's supply had run out. Undeterred by the lack of the product brochures - the team had moved on to distributing our glossy fold out "BI Emergency Survival Guide".

 

 

But what this anecdote from the floor of a 50,000-person trade show really meant was that a sea-change had happened in the industry. No less than Larry Ellison had put his imprimatur on the DWA industry segment and in so-doing had also summarily marked Netezza as the industry's leading vendor in the segment.

 

 

Since then, phones have rung off the hook and email exchanges have approached the immediacy of Instant Messaging, with in-bound requests for more information about the Netezza Performance Server®. Whatever doubt that existed in the market that DWAs were a force in the marketplace was eradicated yesterday... at approximately 3:04 pm US-PDT.

 

 

"Please send more product brochures," indeed! Thanks for all the sales leads, Larry! We'll get around to correcting all your misconceptions about our product shortly.

 

 

0 Comments Permalink
Bookmark and Share

Actions