by Jit Saxena, Netezza Chairman and CEO

"If you are a Big Dog and you are not persuaded by data, then in God we trust...but everyone else, bring data." - Jane E. Shaw, retired Chairman and CEO of Aerogen, Inc. and current member of the Intel Board of Directors (quoted from PowerSpeaking Inc.)

More and more companies recognize the power of analytics as part of their competitive strategy. But most solutions only provide a glimpse of what can be achieved. What is the potential impact when performance barriers fall away? In this post, I’d like to explore the possibilities and introduce a few examples of companies leveraging the intelligence in their data in new and unexpected ways. After all, competing is good, but winning is better. In finance, the term arbitrage refers to the ability to find and exploit market disparities (hedging strategies monitoring currency or securities fluctuations being prime examples). Most arbitrage opportunities are very time-sensitive - you have to recognize value in an overlooked stock then swoop in to buy it before others take notice, get the same idea and drive up the price. On Wall Street, an arbitrage virtuoso, able to consistently spot untapped potential that others miss, is worth his or her weight in gold.

Leaping through Tiny Windows

 

The term Information Arbitrage has many similarities to its finance equivalent, and it’s a good way to think about the impact that analytics can have on a company or even an entire industry. Information arbitrage is about finding game-changing intelligence buried in vast, unappreciated data assets, and exploiting it to leap ahead of the competition. Like a financial investor, the Information Arbitrager takes advantage of an opportunity before the window slams shut (which can be very fast indeed). Companies in certain industries make particularly good arbitrage candidates. These are companies dealing with "Big Data" - tera-scale or even peta-scale databases, and a constant flood of incoming data. Telecommunications, eBusiness, RFID retail applications and online advertising are a few segments that come to mind. Often the operational data is changing very quickly, and key insights are only found at a very granular level. Now suppose this normally takes hours or days, and one company can suddenly do it in minutes, seconds or even sub-seconds. As Netezza customers well know, this kind of intelligence disparity can have dramatic implications, both for that company and its market.

 

 

 

 

For example, telecommunications is a high-volume, low-margin business. Constant changes in network utilization demand real-time decisions about rating and pricing structures for an operator to stay competitive. By running pricing scenarios against billions of call data records, and by examining individual customers to determine their current calling patterns and preferences, iBasis, a major telco provider, knows exactly what options to offer each customer. In contrast, competitors might only see that customer as part of a larger segment measured at some time in the past, and come up short with their offers and pricing.

 

 

There are several challenges to Big Data analytics that make arbitrage opportunities hard to pursue. Predictive modeling, optimization and other analytic applications are much more processor intensive than the SQL queries used in standard business intelligence applications. When complex algorithms and gargantuan databases converge with real-time business demands, something usually has to give.

 

 

Many companies find they are unable to fully exploit their growing data holdings, and have to make do with sampling or high-level summaries rather than the complete, granular data they often want to examine. But using partial or high-level data can be dangerous; even the most powerful algorithms can suggest spurious or meaningless conclusions when they are applied to insufficient data. Companies may also lose hours offloading data from the data warehouse to an external cluster of processors to run the analysis. With all these approaches, the result is an incomplete solution that provides just a hint of the possibilities of analytics, because that’s all the current technology is capable of delivering.

 

 

Consider the problem of optimization, for example. Optimization solutions play a key role in helping companies target the right customers, make the right offers, determine manufacturing volumes or accurately price products to take full advantage of market conditions while minimizing expenses. Depending on the problem being addressed, an accurate optimization solution needs to account for many variables and constraints such as products, branches, budget, time, contact channels, offer history, market segmentation and privacy preferences, to name a few.

 

 

Due to the multiple permutations and combinations among the different elements, even a simplified optimization model limited to only a month of data, a thousand customers and ten different offers results in an astronomical solution search space of 2 to the power of 10,000. Just to put things in perspective, the number of atoms in the observable universe is about 10^81, just a few more variables away.

 

 

The "Big Math" at the heart of this kind of analysis pushes most processing technology to its limits and beyond. As the number of variables and restrictions increases linearly, the algorithm amplifies exponentially, often reaching the complexity class NP-Complete. As a result, companies are forced to compromise in the thoroughness of the analysis and/or the response time they are willing to tolerate. Most optimization efforts look at small snapshots of the total data available (for example, only the last month’s data), and make use of a range of techniques such as Linear, Dynamic and Integer Programming, Lagrange Multipliers and Cluster Analysis that reduce the level of complexity in various ways, all in an attempt to reach an actionable result in a realistic timeframe. But even with these approaches, companies are faced with costly infrastructure requirements, incomplete views of their data and lengthy response times resulting in stale data or missed arbitrage opportunities.

 

But what if you could bypass the existing performance limitations and get crucial intelligence much faster than before? For example, what if a database marketing company could use complex algorithms to get accurate optimization results days before the market could adjust? Or a retail franchise could precisely adjust the prices of thousand of products daily for each of its stores? Or a credit card company could run customer scoring algorithms one hundred times faster than its competitors? Or a financial services firm could run real-time Monte Carlo simulations on terabytes of data to manage risk? What impact could advantages like these have on a business? It’s fair to say the difference would be game-changing, providing a major competitive advantage and the ability to enter new markets previously out of reach.

These capabilities are not just marketing fantasies or future visions - they’re in use today.

 

Making these Information Arbitrage opportunities possible is precisely what Netezza does. Our streaming analytic appliances are built for running complex mathematical models on huge data sets, with results in a fraction of the time required by other technologies. Sophisticated analytic applications run "on stream" in the data warehouse, against all the records and detail that need to be examined. There’s no need to settle for summary data or aggregations, or ship data to another system for analysis. (We’re also constantly making our appliances better. Our recent doubling of performance is just the latest Netezza breakthrough.)

 

Through the Netezza Developer Network, we’re helping developers worldwide use the Netezza architecture to create a new generation of analytic applications that were previously impractical, unaffordable or simply impossible. When exploiting an arbitrage opportunity means leveraging Big Data and Big Math, Netezza’s streaming architecture is simply inherently faster and more efficient than other technologies. Of course, our customers already know this - and with appliance simplicity and low purchase price, information arbitrage pays off even more.

 

The bottom line is: when Big Data meets Big Math, great things become possible for our customers and their businesses, enabling them to:

 

 

 

  • Use Information Arbitrage to take advantage of time-sensitive opportunities

  • Rapidly run multiple scenarios and sensitivity analyses in near real-time

  • Make use of all the available data, all the time while their competitors are still struggling with reduced visibility from sampled or aggregated dataWhen the first Netezza appliances burst on the scene in 2002, their ability to query giant databases with unprecedented speed upset a lot of preconceived notions about the limitations of technology and what companies can do with their data. Advanced analytic applications take processing complexity to a much more challenging level, and once again the capabilities of our appliances are revolutionizing the market and capturing the imagination of our customers.

 

Jit Saxena

0 Comments 0 References Permalink

"Fig Newton: The force required to accelerate a fig 39.37 inches per sec."

 

 

  • from a "Wiley's Dictionary definition appearing in the ." comic strip, by Johnny Hart (1931-2007), cartoonist & creator of both B.C. and The Wizard of Id"

 

 

 

 

In the news today: FAST Engines </font>

 

 

In case you missed it, today Netezza has both a new press release and a brief White Paper up on the topic of our FPGA-Accelerated Streaming Technology (FAST) Engines™ framework, a key enabler of the high performance of the NPS® appliance. What we've done is provided a little more public insight to the inner workings of the NPS system and just how it is able to provide the industry-leading price/performance that it does.

 

 

We're very bullish on the extensibility of the NPS system architecture, and in particular, the use of FPGA technology and the extensibility of the FAST Engine framework into the future.

 

 

FAST Engines (IMO, a particularly appropriate and descriptive geek-technology acronym) already help deliver the "performance multiplier" for the NPS system that we've discussed previously by removing unnecessary records and columns from a given stream of data before the system has to expend even a single CPU clock cycle or byte of memory worrying about them.

 

 

 

 

As you can see in the block diagram above, the five current engines included in the framework include the Control, Parse, Visibility, Project and Restrict Engines. Since they're described fairly well in the White Paper, I won't go into detail here. But I will repeat some of the critical characteristics of the FAST Engines, they are:

 

  • basic analytic functions electronically programmed into the FPGA to accelerate query performance;

  • dynamically reconfigurable — each of them can be modified, disabled or extended by the NPS system in real time; and

  • customized at run-time for each snippet executed in the SPU — each engine can incorporate parameters passed it to optimize the behavior of the FPGA for a particular query snippet.

 

From the above, what you should take away is that the hardware on each of the NPS system's hundreds of intelligent storage nodes, known affectionately as SPUs (pronounced: "SPOOz"), for Snippet Processing Units, are not just "optimally customized" for each query. Instead, as manifest in the FAST Engines, the SPUs' hardware configurations are optimally customized for each sub-step of each query, in real-time, allowing the system to maximize the streaming flow of data.

 

 

 

 

In parallel within the FPGA, these engines eliminate records outside of the ACID-compliant purview of a given query; project away columns that don't satisfy a given SQL statement's clause; and the restrict away rows that don't satisfy the statement's WHERE predicate. All done at the speed with which data is being read (or "streamed") off the disk drive on each intelligent storage node in the Netezza system, and replicated in parallel across hundreds of those nodes.

 

 

As a result, the remaining data stream for on-going query processing is typically reduced by 95% or more before it needs to be interrogated any further by the CPU on our intelligent storage nodes, or moved from one node to another. That translates directly into performance acceleration.

 

 

Want to rev up your FAST Engines? Install a turbocharger! </font>

 

 

So where do we take this next? Well, for starters, Netezza will essentially be providing a "turbocharger" for our FAST Engines framework.

 

 

What do I mean by that? Perhaps this quote will help:

 

 

 

"turbocharger""The turbofan compresses the air fuel mixture so more molecules are squeezed into the cylinder. When the mixture is ignited, more energy is released. Thus, a turbocharged engine will provide more shaft work out than a naturally aspirated engine of the same size.

 

 

&lt;...snip&gt;

 

 

"The advantage of a turbocharged engine is that about 35% more work can be done by a turbocharged engine as compared to a naturally aspirated engine of the same size.

 

 

 

 

--from a primer on Natural Gas Engines.

 

 

There's only one thing wrong with the above quote. The new addition to the NPS system's FAST Engines framework doesn't just boost performance by 35%; it could boost streaming query performance by as much as 100-200%! Because that's the potential upside performance customers are going to see with new Compress Engine that is being added to the FPGAs.

 

 

 

Rather than the cumbersome, compute-intensive compression efforts employed by other vendors to reduce disk usage that also result in reduced performance, the Compress Engine boosts performance by decompressing data inside the FPGA as fast as it streams from disk.

 

 

 

As data is written to disk (e.g., during data load, insert or update operations) it is compressed into a compiled format, column-by-column with the original data replaced by the Compress Engine "instruction set" for decompilation. Then, when data is read from the disk, the Compress Engine reads its instruction set and reassembles the original data as it streams from the disk, effectively raising the streaming data rate by as much as 200% - lifting the effective scanning rate per SPU node from over 60 MB/sec to approximately 200 MB/sec. With 108 active SPUs doing this in parallel in each rack of the NPS system, that's the equivalent of a persistent (i.e., not 'burst') scan speed of about 70 TB/hour per rack, or well over 500 TB/hour for today's largest NPS system configuration, the 8-rack NPS 10800.

 

 

<font color="#008000">And that's not all, folks!</font>

 

 

 

The FAST Engines framework is extensible into the future - and we're already hard at work looking into things that will rev up performance even further, extend the applications set of the NPS appliance more broadly or both. Again, the White Paper sets out what some of these are in fairly clear language so I don't need to repeat it here.

 

 

 

Wherever the evolution of the NPS appliance takes us, we're very bullish on the notion that the performance acceleration and potential to extend the application space that FPGA provides will give Netezza that much more headroom in maintaining its leadership position in the market.

 

 

0 Comments 0 References Permalink

 

November X, 2007

Issue 14: Supercomputing Conference Brings Life to Reno

 

 

"Computing is not about computers anymore. It is about living."

 

 

- Nicolas Negroponte

 

 

Negroponte's words certainly rang true, in more ways than one, at the recent SC07 conference in Reno. In a city known for little other than its lackluster "old Vegas" character, its proximity to Tahoe and the Reno 911! Comedy Central series (which is actually filmed in southern California), the Supercomputing conference breathed new life into the city from November 12-15, while providing extensive displays of technologies that are changing many facets of life as we know it.

 

 

Moments after stepping off the plane, the fluorescent "Welcome to SC07" signs that followed me from the Reno airport all the way to my hotel made it clear that this conference was going to be a pretty big deal. My intuitions were confirmed upon entering the Reno-Sparks convention center, with an impressive array of meeting space, food stations, souvenir "shops" and an exhibit hall encompassing 200+ rows of endless exhibitor booths. The annual Supercomputing conference does a great job of bringing together growing numbers of cutting edge technologies, commercial enterprises, government organizations, national labs and graduate students, year after year. With its extensive technical program consisting of several awards, Birds-of-a-Feather sessions, "[challenges|http://sc07.supercomp.org/?pg=challenges.html]" and more, along with a plethora of industry and research exhibits and sessions, the show continuously delivers something of value to everyone who attends. The result: increased momentum and record-breaking numbers of both attendees and industry exhibitors at SC07. The collaboration of masterminds from all different backgrounds, locations, age groups and organizations produces an amazing fusion of ideas, solutions, partnerships and technologies.

 

 

Netezza participated in the conference with booth space in addition to a session in the SC07 exhibitor forum, featuring our very own Justin Lindsey along with John Johnson from Lawrence Livermore National Lab (better know as LLNL). Despite the modest space Netezza occupied with our 10x20' booth in the massive exhibit hall, our presence felt considerably larger as we attracted a continuous stream (no pun intended) of visitors throughout the week, including those who had never heard of Netezza along with those who'd set out on a resolute mission to find our booth.

 

 

Our bright purple SPUBox, featuring an animated motorcyclist and "Netezza Speed" written in graffiti, was by far the biggest draw to the booth. It was exciting to see so many passers-by stop for a closer look at the SPUBox, often asking a question or two about the box, how they might get their hands on one, and what exactly the Netezza Developer Network is all about. Several visitors applied to join the NDN right then and there, hoping for a chance to win a SPUBox at the end of Justin and John's exhibitor forum session. All the NDN buzz created quite a "Netezza high," if I do say so myself.

 

 

 

 

0 Comments 0 References Permalink