"Fig Newton: The force required to accelerate a fig 39.37 inches per sec."
- from a "Wiley's Dictionary definition appearing in the ." comic strip, by Johnny Hart (1931-2007), cartoonist & creator of both B.C. and The Wizard of Id"
We're very bullish on the extensibility of the NPS system architecture, and in particular, the use of FPGA technology and the extensibility of the FAST Engine framework into the future.
FAST Engines (IMO, a particularly appropriate and descriptive geek-technology acronym) already help deliver the "performance multiplier" for the NPS system that we've discussed previously by removing unnecessary records and columns from a given stream of data before the system has to expend even a single CPU clock cycle or byte of memory worrying about them.
As you can see in the block diagram above, the five current engines included in the framework include the Control, Parse, Visibility, Project and Restrict Engines. Since they're described fairly well in the White Paper, I won't go into detail here. But I will repeat some of the critical characteristics of the FAST Engines, they are:
- basic analytic functions electronically programmed into the FPGA to accelerate query performance;
- dynamically reconfigurable — each of them can be modified, disabled or extended by the NPS system in real time; and
- customized at run-time for each snippet executed in the SPU — each engine can incorporate parameters passed it to optimize the behavior of the FPGA for a particular query snippet.
From the above, what you should take away is that the hardware on each of the NPS system's hundreds of intelligent storage nodes, known affectionately as SPUs (pronounced: "SPOOz"), for Snippet Processing Units, are not just "optimally customized" for each query. Instead, as manifest in the FAST Engines, the SPUs' hardware configurations are optimally customized for each sub-step of each query, in real-time, allowing the system to maximize the streaming flow of data.
In parallel within the FPGA, these engines eliminate records outside of the ACID-compliant purview of a given query; project away columns that don't satisfy a given SQL statement's clause; and the restrict away rows that don't satisfy the statement's WHERE predicate. All done at the speed with which data is being read (or "streamed") off the disk drive on each intelligent storage node in the Netezza system, and replicated in parallel across hundreds of those nodes.
As a result, the remaining data stream for on-going query processing is typically reduced by 95% or more before it needs to be interrogated any further by the CPU on our intelligent storage nodes, or moved from one node to another. That translates directly into performance acceleration.
Want to rev up your FAST Engines? Install a turbocharger! So where do we take this next? Well, for starters, Netezza will essentially be providing a "turbocharger" for our FAST Engines framework.
What do I mean by that? Perhaps this quote will help:
"turbocharger""The turbofan compresses the air fuel mixture so more molecules are squeezed into the cylinder. When the mixture is ignited, more energy is released. Thus, a turbocharged engine will provide more shaft work out than a naturally aspirated engine of the same size.
<...snip>
"The advantage of a turbocharged engine is that about 35% more work can be done by a turbocharged engine as compared to a naturally aspirated engine of the same size.
--from a primer on Natural Gas Engines.
There's only one thing wrong with the above quote. The new addition to the NPS system's FAST Engines framework doesn't just boost performance by 35%; it could boost streaming query performance by as much as 100-200%! Because that's the potential upside performance customers are going to see with new Compress Engine that is being added to the FPGAs.
Rather than the cumbersome, compute-intensive compression efforts employed by other vendors to reduce disk usage that also result in reduced performance, the Compress Engine boosts performance by decompressing data inside the FPGA as fast as it streams from disk.
As data is written to disk (e.g., during data load, insert or update operations) it is compressed into a compiled format, column-by-column with the original data replaced by the Compress Engine "instruction set" for decompilation. Then, when data is read from the disk, the Compress Engine reads its instruction set and reassembles the original data as it streams from the disk, effectively raising the streaming data rate by as much as 200% - lifting the effective scanning rate per SPU node from over 60 MB/sec to approximately 200 MB/sec. With 108 active SPUs doing this in parallel in each rack of the NPS system, that's the equivalent of a persistent (i.e., not 'burst') scan speed of about 70 TB/hour per rack, or well over 500 TB/hour for today's largest NPS system configuration, the 8-rack NPS 10800.
And that's not all, folks!
The FAST Engines framework is extensible into the future - and we're already hard at work looking into things that will rev up performance even further, extend the applications set of the NPS appliance more broadly or both. Again, the White Paper sets out what some of these are in fairly clear language so I don't need to repeat it here.
Wherever the evolution of the NPS appliance takes us, we're very bullish on the notion that the performance acceleration and potential to extend the application space that FPGA provides will give Netezza that much more headroom in maintaining its leadership position in the market.
