Hi,
I tried to post this in the "Prospective Users" list, but it didn't give me that option.
We're on MySQL, and we'd like to port over to Netezza. I have several questions:
1) Is Netezza strictly designed for only DW type applications? Can it also handle hi rates of DB transactions? Does it handle batch write operations?
2) Can you point me Netezza's doc for supported data and table manipulation/queries commands and functions? We need this reference to properly port over our queries.
Thank you.
Netezza is a not a transactional system. Intake for a Netezza machine is best served in bulk, at a gigabyte of data at-a-time or more. Several-records-in and several-records-out are a transactional model, where Netezza deals in bulk, millions and billions-at-a-time.
When you say you want to port your SQL statements, this is probably not completely true. You want to port your functionality to Netezza, but generally speaking by the time a system has
so run-its-course and is in need of performance boosts, several iterations of performance tuning have already been applied. And these are entirely artificial when moving over to Netezza.
Here's an example: If you take the standard cursor-based stored procedure, let's say you have a cursor pulling from a table and passing it through four business rules before putting it back. In one
transactional system using 3400 members of the cursor, it took 30 seconds per operation, translating into hours of time. In Netezza, we would take these rules and "stand them on their side", passing
all 3400 records through one rule, then the next and so forth. Each operation takes mere seconds, effectively deriving the same answer in less than a minute.
So in the end, you will have to think about what in your existing system is being processed "in bulk" and this is what you would push over to Netezza. Some of it, your transactional stuff, may remain
behind and require you to pump data between the two. That's okay, because that's how most reporting system work.
David,
Thanks for the followup.
Our apps don't have any transactions. We perform antonymous reads and writes without any rollbacks or commits. We have no stored procedures. Our business rules reside on the application so we depend on extracting information from the db for business rules execution. The apps are tuned to cache configurable data and queue up batch(or bulk) writes. A typical usecase would be to retrieve a record from a large table, churning through the business rules, and repeat for two or three more times on different tables before returning the response. This type of execution won't work on MySQL as the table approaches several hundred million rows.
We already use batch operations for MySQL, and it sounds like this is encouraged with Netezza, which is fine. For reads, our entire DAO level consists of translating and executing SQL statements, many involving nested inline views and joins, and returning objects. For writes, we create a list of objects(records) and batch execute them at a higher level. We want to know if it'll work with our current framework at the DAO level (Spring JDBC). How much porting will it take to integrate? This is the question we'd like to be answered by reviewing the Netezza docs, which we can't seem to locate.
You can't do anything 1 row at a time on Netezza. There are no indexes to allow you to get good performance. At best you might get 10 queries per second. Netezza is designed to process massive amounts of data in bulk. You can process millions of rows per second, but your application has to be written to process sets of data at a time, not 1 row at a time.
Typically you would bulk load a big set of data into one or more staging tables, process the data using SQL in bulk operations, then load the results into your target tables. It is actually far easier to write an application in this way than it is to do row at a time processing, but it is a very different mindset. Netezza handles all the performance issues for you because every SQL statement always runs in parallel across 10s to 100s of cpus and disks.
The Netezza docs won't help you determine this.
What you really need to do is engage with the sales staff and get a POC underway.
So are you saying that if I have 100 threads connect and execute a query at the same time on a big Netezza table, Netezza can do roughly 10 queries per second?
If so I'm starting to understand now. Netezza is truley for aggregating large amounts of data instead, and is more suited for Data Warehouse specific applications.
For a second there I thought there was a product where I could port my application too where I wouldn't need to worry about scale anymore. No matter how large my relational tables grew, I wouldn't need to worry about how slow singular granular reads / writes take. Does anyone out there know of a product that does this?
Netezza can handle quite a bit more load than a relational database, But it handles the load in bulk
Keep in mind that relational databases are designed to get data "in", they are not designed to get data out in quantity, such as reporting and batch processing. They are geared for small-size, high-frequency activity, where data warehousing is a discipline focusing on large-scale, low-frequency activity.
It is a data warehouse appliance, and circumscribes decades of best-practices in data warehousing. Not transactional processing. At all.
So yes, single granular reads and writes will have a hard time.
Think of it this way. Netezza can intake 1 record in 1 second. Netezza can intake millions of records in 1 second. How do we want to spend this 1 second? 900 seconds for 900 rows, or 900 seconds for 900 million rows?
It is "purpose built" toward this end. An SMP/RDBMS is general purpose, and cannot hope to compete with a purpose-built machine for this activity.
A product that does this? Teradata claims to do this, but does it so poorly that people actually port their active transactional stuff to any other platform but Teradata. I know of one group that spent a year porting their transactions to it, only to spend the next year undoing what they had done.
I've had Oracle consultants claim that Oracle can do this, but to match Netezza's scale on the bulk side you're talking about a multiple of many more millions of dollars and careful tuning, engineering and long-term maintenance of the machine to achieve and retain this kind of performance. Netezza does not have this kind of uptake or maintenance.
Short answer - reporting'bulk and transactional are like two dogs. One is a shepherd that does large-scale wrangling while one is a retriever with a singleton purpose. Put them both in the same cage, and the reporting system will starve the transactional system for resources. Your users will not stop complaining. Two dogs fighting in a cage. It is simply better engineering to have them in separate cages. It is why data warehousing was born.
You would also want to do a POC - no matter what the technology.
Thanks David. We definitely have products where we need to aggregate over a bulky chunk of data in large tables which we can use Netezza for.
From your experience, because everything is done in parallel on large datasets, are there concerns of data corruption caused by asynchronous aggregation processes?
