Thoughts from Inside the Box

24 Posts
1 2 Previous Next
1

Okay I'll admit that my first posting about the new Oracle Data Warehouse Appliance (DWA) tonight was a tad on the "snarky" side. But I have to say that I think it was because of all influences in the environment all around me. Straight away since the announcement yesterday afternoon, there's been a healthy degree of skepticism from industry insiders.

Beyond his commentary on Larry Ellison's hairstyle, Gavin Clarke of the UK's Channel Register virtually flogged Larry for flogging the "Oracle server appliance alliance with HP". Some of the best snippets included:

  • Gavin's subtitle: "(Not) a hardware provider"
  • "And so to chief executive Larry Ellison, who Wednesday afternoon announced Oracle's third effort in 10 years bundling his company's software with someone else's hardware. This time, it's a high-performance, Oracle data and storage server stack locking arms with old favorite Hewlett-Packard."

And after taking several informative paragraphs to expound on Oracle's two previously-failed attempts at ‘appliantization' - most recently the "Network Computer" initiative circa-2000 - to draw the clear analogy to yesterday's announcement, Clarke closed out his piece with this stinger:

  • "In a telling sign of how much faith Ellison places in his latest appliance, he did not sit down for his traditional, open-mic smack-down session with OpenWorld attendees to field questions."


Analyst/blogger Curt Monash summarized more than a few skeptical digs in his Oracle Exadata and Oracle data warehouse appliance sound bites posting earlier today. For example, here are a few "bites" from Curt's post:


VP & Global Marketing CTO Chuck Hollis of EMC weighed in with a couple good shots on his Chuck's Blog post: Oracle does hardware (emphasis mine):

  • "Of course, there's little in the way of performance comparisons to help us evaluate just how fast this beast might go, except the ‘Up To 10x Faster' which smells a bit optimistic, never mind that it's Oracle comparing with itself, rather than other data warehousing appliances."
  • "Every year at Oracle Open World, we hear about many "new initiatives" from Oracle. Well, not to be harsh here, but it's my impression that very few of them get talked about at next year's Oracle Open World. I routinely dig up past announcements from previous years, and it's relatively consistent pattern. I think it's fair to ask the question -- *just how serious is Oracle about all of this*?"

http://i111.photobucket.com/albums/n148/nzfrisco/Miscellaneous%20Figures/ellisonOpenWorldExadata.jpg
But the lead cynic was none other than Oracle CEO Larry Ellison himself. After years of denying performance issues at scale with various generations of Oracle DBMS software for data warehousing, Larry dropped this 11g-megaton bombshell about Oracle's data warehouse scalability, pre-Exadata - laying out the fundamental reason why Netezza has become the industry leader in Data Warehouse Appliances (source: ZDNet's Larry Dignan):

"Ellison, speaking at Oracle's OpenWorld conference, said large databases are creating a fundamental problem: Disk storage systems can't cope with data that has to be moved off of drives to database servers. He called it a ‘data bandwidth problem.'
"As data gets larger the slowdowns become more unbearable. At one terabyte you will notice data bandwidth slippage. At 10 terabytes, storage systems crawl. ‘At one terabyte the problem rears its ugly head and it gets worse every year,' said Ellison."

And that's not all - the barbs, skepticism and "bites" go on in site-after-site, and commentary-after-commentary. So please forgive my snarky-ness - I blame it on the "nuture" of my environment, not my personal "nature", per se.

1 Comments Permalink
0

It was an odd email exchange. Only 30-minutes earlier, at approximately 3:04pm US-PDT, Oracle CEO Larry Ellison, head of one of the most powerful database technology companies on Earth, had publicly launched Oracle's entrée into the Data Warehouse Appliance marketplace: "the HP Oracle Database Machine and the Oracle Exadata Data Storage Server" - while simultaneously "sporting a curiously Romanesque hair style".

http://i111.photobucket.com/albums/n148/nzfrisco/Miscellaneous%20Figures/LarryEllison-ORCLWorld.jpg http://i111.photobucket.com/albums/n148/nzfrisco/Miscellaneous%20Figures/JuliusCaesar.png

Larry Ellison & Julius Caesar - separated at birth? (Wikipedia: Julius Caesar)

Perhaps we should have been cowered by such a goliathan announcement? Perhaps we should have quivered? Well that's when the email showed up. You see, Netezza had a booth (or "stand" - as I'm writing this from London tonight) in the exposition area of Oracle's big OpenWorld show in San Francisco. Within minutes of Larry's presentation, in which Netezza figured prominently albeit with substantially erroneous information across Mr. Ellison's charts, the Netezza stand was completely deluged with people saying things like, "I had never talked to your company about data warehousing before, but if Larry is going to spend 10 minutes talking about you, I need to know more." And the Netezza product brochures starting flowing - not in a trickle like a leaky pipe, but like water through a burst dam.

Larry hadn't just brought up Netezza but had spent some "quality time" extolling the strengths of the Netezza architecture - moving query processing horsepower as close as possible to the storage elements of the system, and his commentary had marked Netezza as the leader in the Data Warehouse Appliance (DWA) approach. Within the hour, our team's supply had run out. Undeterred by the lack of the product brochures - the team had moved on to distributing our glossy fold out "BI Emergency Survival Guide".

But what this anecdote from the floor of a 50,000-person trade show really meant was that a sea-change had happened in the industry. No less than Larry Ellison had put his imprimatur on the DWA industry segment and in so-doing had also summarily marked Netezza as the industry's leading vendor in the segment.

Since then, phones have rung off the hook and email exchanges have approached the immediacy of Instant Messaging, with in-bound requests for more information about the Netezza Performance Server^®^. Whatever doubt that existed in the market that DWAs were a force in the marketplace was eradicated yesterday... at approximately 3:04 pm US-PDT.

"Please send more product brochures," indeed! Thanks for all the sales leads, Larry! We'll get around to correcting all your misconceptions about our product shortly.

0 Comments Permalink
0

In theory there is no difference between theory and practice. In practice there is. \\ -- Jan van de Snepscheut, 1953-1994, computer scientist and educator, California Institute of Technology
So, yesterday I wrote about "the Netezza's" transformation into a platform for deep analytics. Now I know a platform is only as good as the applications available on it, which brings me to our announcement this morning.

Last September, we got together with a handful of visionary partners and customers and created the Netezza Developer Network (NDN) with the goal of developing truly innovative analytic applications. We announced the first wave of these offerings today, with 5 NDN partners delivering game-changing applications built using Netezza's OnStream analytics. Let me highlight a couple of them here.

  • Systech Solutions' profitability analysis application for retail and CPG companies provides cost and revenue analysis at the detailed SKU and customer level. It gives business users the ability to build and run profitability models using a GUI, instead of relying on IT to do it for them. This is pretty unique, because traditionally something like this would take huge amounts of time - measured in many months - not to mention the resources required. Their app cuts this down by orders of magnitude! So you not only get very fine-grained profitability analysis, but it's available very, very quickly. That makes all the difference between gut-feel decisions and data-based ones about which products and customer to keep, which prices to re-negotiate and how to truly impact the bottom line.

  • Imagine if telco service providers could analyze each and every one of the many millions of call detail records they collect and store, before making very important decisions - the kinds that can dramatically alter their earnings statements. That's what RateIntegration's app offers - a tool for business users that allows them to model the impact of competitors' pricing and regulatory changes to figure out the most optimal rate plans. Business analysts can also directly implement custom scoring algorithms for customer segmentation and profiling using their flexible rules engine.

Apart from these, we have Multi-Threaded Inc's fuzzy name and text matching app for critical anti-terrorism, money laundering and digital forensics operations; HCL Technology's implementation of Monte Carlo simulations for pricing derivatives; and Edge Associate's library of SQL functions to speed up migrations to "the Netezza". Make sure you check out the brand spanking new applications webpage to get more details about each of these members and their applications, and don't forget to stop by their booths at the User Conference.

"As long as one does not have to wait minutes to hours between computational gestures, something amazing happens; one gets problem solving at the speed of human insight" \\ -- Data-Centric Computing with the Netezza Architecture, Sandia National Laboratories
While the new applications developed by NDN members are unique and serve very different markets - retail, telecommunications, financial services and government - they have remarkable similarities in the value they offer to customers. The applications power complex analytics orders of magnitude faster than economically feasible before, allowing users to perform "what-if" analyses to more accurately predict future outcomes. These analyses can be performed on large volumes of detailed data, providing unique business insights that would otherwise be lost in sampled and summarized data. The deployment and management of the overall solution is greatly simplified, freeing up business users to focus on results rather than worrying about tuning and maintaining the system.

What's really neat about Netezza's open platform approach is the ideas and innovation it is generating and the differentiated applications it is helping launch. Now that's what platform innovation is all about, isn't it? At Netezza, it's about bringing the power of analytics to the mainstream.

0 Comments Permalink
0

"The milk of disruptive innovation doesn't flow from cash-cows " \\ -- David Isenberg, Blogger, Musings About Loci of Intelligence and Stupidity
Dare I say ... "orders of magnitude performance" for data warehouse applications is old news as far as Netezza customers are concerned! It became fairly obvious to me at the Netezza European User Conference, held a few months ago. In presentation after presentation, customers talked about the performance and simplicity benefits they got from "the Netezza" - how the proof-of-concept (against their favorite legacy data warehouse vendor) seemed unbelievable at first, but certainly proved true in production; the fact that they did indeed get orders of magnitude better performance; and how all this changed the way they did business. Brian Ganly of The Carphone Warehouse used this chart to highlight Netezza performance during his talk about the "Netezza Experience." I think it captures the sentiment really well ...

http://i437.photobucket.com/albums/qq91/mraziudd/Carphoneperformance.gif

It's not that data warehouse performance is not important any more, or that somehow the 100X performance that Netezza delivers is "enough". In fact, what the Netezza customers were alluding to, in a customer's own words, is: "Netezza does what it says on the tin!" We talk about blisteringly fast performance without requiring tuning and aggregations at half the cost of other systems, and we deliver. Once customers see for themselves what "the Netezza" can do for their data warehouse, they get intrigued about the possibility of what else it could do for their business. And that quickly leads them to look beyond raw performance for data warehouses and apply "the Netezza" to new and interesting big-data analytic problems.

"... the best products become platforms at some point." \\ -- Bob Warfield, author of the SmoothSpan Blog
As the data warehouse market continues to evolve, more and more companies are looking to use information as a competitive lever across their organizations. The most successful will be those that make use of information to exploit arbitrage windows in the marketplace and predict future outcomes more accurately. These companies will differentiate themselves by making high performance analytics pervasive, providing employees, partners and vendors access to the kinds of analytics that are only available to a select few in the enterprise today.

What's needed to deliver on the promise of advanced analytics is a platform that can overcome the challenges of doing deep analytics on large data volumes - performance, complexity and cost. Let's look at how advanced analytics are done on traditional systems. In most cases, these poor data warehouses are so overtaxed that adding any more processing is a certain way to bring them to their knees. And so the usual approach is to extract huge data sets onto an outsized SMP server or compute grid, perform the analytic computation on it and load result sets back to the data warehouse for querying. You can clearly see the problems with this approach. It's expensive, especially when you're talking about a large SMP or grid; it's complex since you have more systems to maintain; but most importantly you get poor performance even if you spend tons of time and money on the infrastructure. The data movement back and forth introduces the same latency and performance bottlenecks that still plague traditional data warehouse architectures.

What we've done with "the Netezza" is created just such a platform that overcomes these complex analytics challenges. The idea is quite simple actually. Algorithms for analytic tasks such as scoring, text and spatial processing, image and video analysis and financial simulations can be run directly on the intelligent nodes inside the Netezza. So these algorithms can act on the data where it resides, rather than sending it off-board for processing. You not only get the benefit of fully parallelized execution across hundreds of processors resulting in orders of magnitude better performance for analytics, but also the simplicity and economy of an appliance. Plus the Netezza is able to handle all this extra processing because of the spare processing capacity built into each of its intelligent nodes. Let me refer you to Phil Francisco's blog for a blow-by-blow version of how "OnStream analytics" works in practice.

This is all great so far - I mean any platform that provides these kinds of advantages has to be quite extraordinary! But the true value of a platform is determined by the applications that run on it and how innovative and differentiated they are. That's where there is a lot of interest and excitement in the enzee community. More on that very soon ...

0 Comments Permalink
0

"And one of his partners asked, 'Has he vertigo?' and the other glanced out and down and said, 'Oh no, only about ten feet more.'"
— Ogden Nash, American writer and humorous poet (1902-1971)

Today's News: Netezza and EMC Partner to Simplify Data Warehousing for the Enterprise
This is hot! No, I mean it's really hot. I'm here in Las Vegas this week, attending the EMC World show in support of today's partnership announcement with EMC and yesterday the temperature crossed 110 degrees Fahrenheit (43 Celsius). Oh yes, and our announcement here this morning is hot as well - stirring up interest around the EMC World show floor. We had several great discussions with EMC field staff, their partners and some customers yesterday. In the imperfect domain of trade show tsotchkes as metric, in just four short hours yesterday we ran through nearly 400 tee-shirt give-aways. Of course, as an aside, I would subjectively say our shirt was clearly a "best-in-show" candidate here - witness these front & back photos:

EMC-Netezza tee front EMC-Netezza tee both EMC-Netezza tee back


Complementary Technology and Co-marketing for Success
"So what is this announcement all about?" you ask. From the viewpoint of our customers and prospects, it's primarily about Netezza partnering with the industry-leader in information infrastructure to bring the performance horsepower of the NPS® data warehouse appliance into the enterprise data center in an even simpler, more efficient way than we already have been. It's consistent with their continuing evolution of data warehouse appliance deployments - as business-critical systems that are used enterprise-wide. And it provides them with both operational flexibility and performance in supporting the requisite data management functions.

From Netezza's business perspective this makes the NPS appliance even more well-suited to a broader sweep of customers. We'll provide provide familiar, enterprise-class data backup, replication and disaster recovery capabilities that extend Netezza's simple appliance approach by embedding EMC's CLARiiON® storage arrays and Navisphere® & MirrorView™ software. But just as important, we'll work with EMC on co-marketing initiatives in bringing these configurations to market. As a result we anticipate being able to extend our market penetration, both with new customers and broadening our footprint in current accounts.


How does this partnership change an NPS system?
The partnering initiative with EMC involves two basic configurations of the CLARiiON AX4 storage arrays. Both configurations will be deployed within the NPS data warehouse systems themselves and will require no additional data center footprint, with very minimal impact on the system's very low power and cooling requirements.

First off, let me say straight-away that this partnership in no way changes the basic NPS system architecture that Netezza has made use of through three generations of the its data warehouse appliance. The NPS system still will achieve its scalable, high performance through the unique AMPP™ architecture, including the Snippet Processing Units (SPUs) whose design is unaffected as a result of this partnership.

Instead, the EMC storage arrays will be used to provide near-line storage for the NPS systems in order to stage data for all of the primary bulk data movement operations: loading, unloading, backup, data replication and disaster recovery.

In short, the most basic functional block diagram of the NPS appliance will evolve
from this: NPS-DBMS, Server and Storage to this: NPS with EMC CLARiiON

The two types of embedded CLARiiON configurations are as follow:

Storage Pad™
Deployed as part of the standard equipage in our two-rack (NPS 10200) and larger systems, the Storage Pad configuration will support up to 5 TB of near-line data capacity for staging ETL data loads, data unloads and incremental backup images. Applying an approach that I've come to call Tivo for data warehousing, the Storage Pad allows customers to time-shift data management functions to suit their operational requirements. For example, customers might make use of the Storage Pad to move backup data from the data warehouse rapidly and then move the backup data from the Storage Pad to a tape or disk archive at the rate that the data center network, media and operations scheduling will allow.

By comparison, other vendors may charge as much as $100,000 for just 1.5 TB of capacity for similar functionality.

Storage Pad XL™
As the name suggests, this optional configuration is scalable to high capacities that scale in-line with the NPS systems in which they will be deployed. The Storage Pad XL configurations will be available on all NPS 10000 series models and will support up to 10 TB of near-line data capacity per NPS rack - up to 80 TB for the 8-rack NPS 10800 system. Just like above, the Storage Pad XL can be used to for data staging, but now full images of the NPS tables or databases can be captured for high-speed backup.

In addition, this configuration will also be support the deployment of EMC's MirrorView software package for enterprise-class data replication and DR.

"Netezza - Now with added CLARiiON!" is the fun spin we put on the marketing look of our booth at EMC World this week but we think this partnership will provide our enterprise customers with an excellent, simple toolset for bringing the appliance paradigm to data management functions; and through our co-marketing initiatives with EMC, it will bring the NPS data warehouse appliance into more of our customers' data centers.

0 Comments Permalink
1

April 28, 2008

"‘To be is to do.' - Immanuel Kant
"‘To do is to be.' - Jean Paul Sartre
"‘Do-be-do-be-do' - Frank Sinatra"

--Kurt Vonnegut, Jr. (Nov 1922 - Apr 2007)
In the news today: the Compress Engine

In 1783 Immanuel Kant wrote, "David Hume woke me up from my dogmatic slumbers," and revolutionized the way humanity thinks about metaphysics. Almost 220 years later, Netezza set out to achieve a similar goal - redefine analytics. When the first NPS® data warehouse appliance was introduced, the market released itself from yet another dogmatic slumber and realized that there is a different, better way to do data warehousing; a way without compromise, a way without limits.

Netezza has helped to reenergize the data warehouse market in creating and leading the data warehouse appliance category.

  • "Every time you turn around you see another industry that's facing a tidal wave of data and they need to understand what this data is saying. Many of them have data volumes in this range that they haven't been able to afford to analyze, as much as they'd like to. ... Netezza can deliver that analytic capability, and at a very attractive price." - Richard Winter, Winter Corporation, from Netezza will scale its appliance to petabyte range, InfoWorld (January 2008)
  • "This is what Netezza has done in the data warehousing market: it has totally changed the way that we think about data warehousing... So the bottom line is not just that Netezza's entry into the market was a black swan event but that that event has not ceased to unfold." - from Netezza: a black swan by Philip Howard, Bloor Group (October 2007)
  • "Appliances are here to stay and are revolutionizing the data warehouse industry." - from Business Analytics Appliances Are Here to Stay, by Dan Veset, IDC (June 2006)
  • "The term data warehouse appliance was coined by Netezza, and this vendor has blazed a trail by proving the concept and educating the market." - from Defining the Data Warehouse Appliance, by Philip Russom, TDWI (August 2005)

Since 2002, Netezza has been repeatedly breaking the latency barrier and challenging the boundaries of data analytics. Since our first release, we have been continuously refuting the alleged mutual dependencies that became the building blocks of the industry's dogmatic misconceptions; namely the expensive nature of performance, the necessary complexity of the analytics architecture and the unavoidable limits of scalability. With today's announcement of the Compress Engine, Netezza disproves yet another myth - the inverse relationship between data compression and query performance.

The architectures of traditional data warehouses, steeped in a legacy of serving OLTP applications, were not designed to handle the ever-growing amounts of data combined with larger and more complex user workloads and shrinking data latency requirements that characterize the modern enterprise. Regulatory compliance, electronic commerce and the need to process and analyze all data in a matter of seconds has pushed the capabilities of traditional data warehouse systems to their limits. In reaction to the data capacity pressures, vendors introduced compression; not as an enhancement but as a compromise solution that allows for further data growth at the cost of processing performance.

Traditional compression approaches, used by several of the competing data warehouse vendors, typically result in performance degradation to accomplish the compression effect. Netezza's addition to the FPGA-Accelerated Streaming Technology (FAST) Engines framework - Compress Engine - utilizes its innovative streaming architecture^TM^ not only to increase the system's storage capacity by 2-4X but actually boost overall streaming query performance by a factor of about 2X (100%). All this is achieved without requiring any tuning or administration, and it is in fact a software-only upgrade that enables Compress Engine on the Netezza appliance.

It's actually really cool technology, obviously something we love to rave about. Late last year, I wrote about FAST Engines in this blog. We'll use that as a starting point and dig a level deeper into how Compress Engine works. I'm sure it will tickle the fancy of the geek in you!

http://www.netezzacommunity.com/servlet/JiveServlet/downloadImage/38-1042-1055/Picture3.png

The NPS system employs a patent-pending method for compiling (yes, compiling) columnar data in all the tables of the database as it is being written to disk e.g. during load, insert or update operations. The process converts row-based data into column streams that are independently compiled to replace the original data in the columns with a stream of "instruction sets" for the FPGA. The "instructions" themselves are much smaller in size than the data they replace, resulting in a highly compressed data stream emerging from the process.

While the compression occurs on columnar data because of the inherent compressibility within database columns, the compressed data is reassembled in rows before being written to disk. Row-wise storage of tables avoids the data scan complexity associated with columnar stores and ensures that scanned data can be efficiently parsed and processed without the need to reconstitute it from multiple sources. The compressed data uses disk much more efficiently and increases the data density of NPS systems by 2-4X - in some cases substantially higher - allowing customers to scale their NPS data warehouse systems into the hundreds of terabytes of user data.

But if the NPS system's data compression and scale brought the system's performance to its knees or severely limited performance speedup due to compression (as it does on many of those other systems), it wouldn't be so great, would it? The beauty of the Netezza way of providing data compression is that not only does it have no negative impact on performance, but it actually increases query performance by up to 100%!

http://www.netezzacommunity.com/servlet/JiveServlet/downloadImage/38-1042-1054/Picture2.gif

As the compressed data is read off the disk, it is passed through the Compress Engine which applies the instructions embedded in the data stream to restore it to its original form. Our compilation algorithm ensures that this decompression process can be performed entirely in silicon, at wire speeds. Each physical block scanned from the disk can mushroom into 2 to 4 or more times its size in memory without incurring any overhead in processing time - i.e. 2 to 3 times more data is scanned in the same amount of time _without any increase in system hardware_! Our internal benchmark testing reflecting real customer configurations and workloads has shown an overall 2.2X increase in streaming query performance through the use of Compress Engine.

This software-only enhancement, enabled by our unique architecture, is only the beginning. As we continue to develop our platform, we are investigating further enhancements to the Compress Engine or the addition of new FAST engine(s), aimed at directly increasing streaming performance on the NPS system.

Our philosophy and aim is to continue to shake the industry out of its dogmatic slumbers by extending the price/performance advantages of our products; showing that there's a different way to do data warehousing and advanced analytics. One where performance and scalability are neither the result of expense nor complexity, where you can get more performance from compression, where you do have the power to question everythingTM ...

1 Comments 0 References Permalink
0

April 21, 2008

"Imitation is the sincerest of flattery."
- Charles Caleb Colton (1780-1832), from his Lacon, Vol. I, published in 1820
Welcome to the Data Warehouse Appliance club - another validation of an important, growing market segment

Well, well, well! "Only" eight years after Netezza coined the term and invented the market segment, Teradata today finally officially entered the Data Warehouse Appliance market. Though it's a bit late, and certainly behind a number of other vendors, perhaps today's entry will put an end to Teradata's vacillating over whether they 'invented' the concept or not, were an appliance or not, or whatever. In the past couple of years, it seems Teradata spokespeople have gone out of their way to say their product was simultaneously a data warehouse appliance and absolutely not one - even booking appearances on panels of data warehouse appliance "vendors". Certainly their announcement is another validation that the role of Data Warehouse Appliances is an important and growing one not only in the current market, but for the future as well.

Derivative Marketing and a "Repackaged, Warmed-over" Product?

http://www.netezzacommunity.com/servlet/JiveServlet/downloadImage/1052/Picture1.gif

Teradata is positioning this new product as being, "simple, powerful and cost-effective" - which to our way of thinking sounds much more than a little derivative from Netezza's"Performance, Value and Simplicity", but I'll leave it to the reader to decide if you think so. Our reading of the Teradata announcement sounds like just another larger vendor's "repackaging" alternative to respond to the competition. Like others before them such as IBM and Oracle, it appears that with the 2500 model Teradata has done nothing more than cobble together a collection of elements from the company's model 5500 systems, repackaged and sold as an appliance.

Powerful. Um, How's That Again?

And while anyone who is serious about the appliance segment of the data warehouse market (like Netezza) has focused on delivering systems that can scale to highly complex, enterprise-wide, high performance systems, we think the 2500 will struggle to deliver even modest performance for just 6 TB in a single equipment rack.

While Teradata is quoting just over 6 TB of user capacity per two nodes in this new system, let's remember that they have been advising customers for the past year not to put more than 1.5 TB against each of those same dual-core CPU nodes. Which is it? Is the 2500 underpowered for its 6 TB data capacity per dual-node rack, or has Teradata been advising its model 5500 customers to pay at least 2X too much for their data warehouse systems for the past year?

Time will tell whether Teradata has made other compromises to the 2500 model in an attempt to limit its impact on its flagship products (5500 and the new 5550). Beyond its underpowered nodes, have they sacrificed anything else like workload management or system availability, or even the system's ability to handle highly-interactive, operational applications? As the days and weeks help raise the shroud covering the model 2500 further, we'll know more. For now though, it just feels like "me-too" imitation.

0 Comments 0 References Permalink
0

by Jit Saxena, Netezza Chairman and CEO

"If you are a Big Dog and you are not persuaded by data, then in God we trust...but everyone else, bring data." - Jane E. Shaw, retired Chairman and CEO of Aerogen, Inc. and current member of the Intel Board of Directors (quoted from *PowerSpeaking Inc.*)
More and more companies recognize the power of analytics as part of their competitive strategy. But most solutions only provide a glimpse of what can be achieved. What is the potential impact when performance barriers fall away? In this post, I’d like to explore the possibilities and introduce a few examples of companies leveraging the intelligence in their data in new and unexpected ways. After all, competing is good, but winning is better.
In finance, the term arbitrage refers to the ability to find and exploit market disparities (hedging strategies monitoring currency or securities fluctuations being prime examples). Most arbitrage opportunities are very time-sensitive - you have to recognize value in an overlooked stock then swoop in to buy it before others take notice, get the same idea and drive up the price. On Wall Street, an arbitrage virtuoso, able to consistently spot untapped potential that others miss, is worth his or her weight in gold.



Leaping through Tiny Windows


The term Information Arbitrage has many similarities to its finance equivalent, and it’s a good way to think about the impact that analytics can have on a company or even an entire industry. Information arbitrage is about finding game-changing intelligence buried in vast, unappreciated data assets, and exploiting it to leap ahead of the competition. Like a financial investor, the Information Arbitrager takes advantage of an opportunity before the window slams shut (which can be very fast indeed).

Companies in certain industries make particularly good arbitrage candidates. These are companies dealing with "Big Data" - tera-scale or even peta-scale databases, and a constant flood of incoming data. Telecommunications, eBusiness, RFID retail applications and online advertising are a few segments that come to mind. Often the operational data is changing very quickly, and key insights are only found at a very granular level. Now suppose this normally takes hours or days, and one company can suddenly do it in minutes, seconds or even sub-seconds. As Netezza customers well know, this kind of intelligence disparity can have dramatic implications, both for that company and its market.

For example, telecommunications is a high-volume, low-margin business. Constant changes in network utilization demand real-time decisions about rating and pricing structures for an operator to stay competitive. By running pricing scenarios against billions of call data records, and by examining individual customers to determine their current calling patterns and preferences, iBasis, a major telco provider, knows exactly what options to offer each customer. In contrast, competitors might only see that customer as part of a larger segment measured at some time in the past, and come up short with their offers and pricing.

There are several challenges to Big Data analytics that make arbitrage opportunities hard to pursue. Predictive modeling, optimization and other analytic applications are much more processor intensive than the SQL queries used in standard business intelligence applications. When complex algorithms and gargantuan databases converge with real-time business demands, something usually has to give.

Many companies find they are unable to fully exploit their growing data holdings, and have to make do with sampling or high-level summaries rather than the complete, granular data they often want to examine. But using partial or high-level data can be dangerous; even the most powerful algorithms can suggest spurious or meaningless conclusions when they are applied to insufficient data. Companies may also lose hours offloading data from the data warehouse to an external cluster of processors to run the analysis. With all these approaches, the result is an incomplete solution that provides just a hint of the possibilities of analytics, because that’s all the current technology is capable of delivering.

Consider the problem of optimization, for example. Optimization solutions play a key role in helping companies target the right customers, make the right offers, determine manufacturing volumes or accurately price products to take full advantage of market conditions while minimizing expenses. Depending on the problem being addressed, an accurate optimization solution needs to account for many variables and constraints such as products, branches, budget, time, contact channels, offer history, market segmentation and privacy preferences, to name a few.

Due to the multiple permutations and combinations among the different elements, even a simplified optimization model limited to only a month of data, a thousand customers and ten different offers results in an astronomical solution search space of 2 to the power of 10,000. Just to put things in perspective, the number of atoms in the observable universe is about 10^81, just a few more variables away.

The "Big Math" at the heart of this kind of analysis pushes most processing technology to its limits and beyond. As the number of variables and restrictions increases linearly, the algorithm amplifies exponentially, often reaching the complexity class NP-Complete. As a result, companies are forced to compromise in the thoroughness of the analysis and/or the response time they are willing to tolerate. Most optimization efforts look at small snapshots of the total data available (for example, only the last month’s data), and make use of a range of techniques such as Linear, Dynamic and Integer Programming, Lagrange Multipliers and Cluster Analysis that reduce the level of complexity in various ways, all in an attempt to reach an actionable result in a realistic timeframe. But even with these approaches, companies are faced with costly infrastructure requirements, incomplete views of their data and lengthy response times resulting in stale data or missed arbitrage opportunities.

But what if you could bypass the existing performance limitations and get crucial intelligence much faster than before? For example, what if a database marketing company could use complex algorithms to get accurate optimization results days before the market could adjust? Or a retail franchise could precisely adjust the prices of thousand of products daily for each of its stores? Or a credit card company could run customer scoring algorithms one hundred times faster than its competitors? Or a financial services firm could run real-time Monte Carlo simulations on terabytes of data to manage risk? What impact could advantages like these have on a business? It’s fair to say the difference would be game-changing, providing a major competitive advantage and the ability to enter new markets previously out of reach.
These capabilities are not just marketing fantasies or future visions - they’re in use today.

Making these Information Arbitrage opportunities possible is precisely what Netezza does. Our streaming analytic appliances are built for running complex mathematical models on huge data sets, with results in a fraction of the time required by other technologies. Sophisticated analytic applications run "on stream" in the data warehouse, against all the records and detail that need to be examined. There’s no need to settle for summary data or aggregations, or ship data to another system for analysis. (We’re also constantly making our appliances better. Our recent doubling of performance is just the latest Netezza breakthrough.)

Through the Netezza Developer Network, we’re helping developers worldwide use the Netezza architecture to create a new generation of analytic applications that were previously impractical, unaffordable or simply impossible. When exploiting an arbitrage opportunity means leveraging Big Data and Big Math, Netezza’s streaming architecture is simply inherently faster and more efficient than other technologies. Of course, our customers already know this - and with appliance simplicity and low purchase price, information arbitrage pays off even more.

The bottom line is: when Big Data meets Big Math, great things become possible for our customers and their businesses, enabling them to:


  • Use Information Arbitrage to take advantage of time-sensitive opportunities
  • Rapidly run multiple scenarios and sensitivity analyses in near real-time
  • Make use of all the available data, all the time while their competitors are still struggling with reduced visibility from sampled or aggregated dataWhen the first Netezza appliances burst on the scene in 2002, their ability to query giant databases with unprecedented speed upset a lot of preconceived notions about the limitations of technology and what companies can do with their data. Advanced analytic applications take processing complexity to a much more challenging level, and once again the capabilities of our appliances are revolutionizing the market and capturing the imagination of our customers.

Jit Saxena

0 Comments 0 References Permalink
0

"Fig Newton: The force required to accelerate a fig 39.37 inches per sec."

  • from a "Wiley's Dictionary definition appearing in the ." comic strip, by Johnny Hart (1931-2007), cartoonist & creator of both B.C. and The Wizard of Id"

In the news today: FAST Engines

In case you missed it, today Netezza has both a new press release and a brief White Paper up on the topic of our FPGA-Accelerated Streaming Technology (FAST) Engines™ framework, a key enabler of the high performance of the NPS® appliance. What we've done is provided a little more public insight to the inner workings of the NPS system and just how it is able to provide the industry-leading price/performance that it does.

We're very bullish on the extensibility of the NPS system architecture, and in particular, the use of FPGA technology and the extensibility of the FAST Engine framework into the future.

FAST Engines (IMO, a particularly appropriate and descriptive geek-technology acronym) already help deliver the "performance multiplier" for the NPS system that we've discussed previously by removing unnecessary records and columns from a given stream of data before the system has to expend even a single CPU clock cycle or byte of memory worrying about them.

http://i111.photobucket.com/albums/n148/nzfrisco/Miscellaneous%20Figures/FASTEngines.png

As you can see in the block diagram above, the five current engines included in the framework include the Control, Parse, Visibility, Project and Restrict Engines. Since they're described fairly well in the White Paper, I won't go into detail here. But I will repeat some of the critical characteristics of the FAST Engines, they are:

  • basic analytic functions electronically programmed into the FPGA to accelerate query performance;
  • dynamically reconfigurable — each of them can be modified, disabled or extended by the NPS system in real time; and
  • customized at run-time for each snippet executed in the SPU — each engine can incorporate parameters passed it to optimize the behavior of the FPGA for a particular query snippet.

From the above, what you should take away is that the hardware on each of the NPS system's hundreds of intelligent storage nodes, known affectionately as SPUs (pronounced: "SPOOz"), for Snippet Processing Units, are not just "optimally customized" for each query. Instead, as manifest in the FAST Engines, the SPUs' hardware configurations are optimally customized for each sub-step of each query, in real-time, allowing the system to maximize the streaming flow of data.

In parallel within the FPGA, these engines eliminate records outside of the ACID-compliant purview of a given query; project away columns that don't satisfy a given SQL statement's clause; and the restrict away rows that don't satisfy the statement's WHERE predicate. All done at the speed with which data is being read (or "streamed") off the disk drive on each intelligent storage node in the Netezza system, and replicated in parallel across hundreds of those nodes.

As a result, the remaining data stream for on-going query processing is typically reduced by 95% or more before it needs to be interrogated any further by the CPU on our intelligent storage nodes, or moved from one node to another. That translates directly into performance acceleration.

Want to rev up your FAST Engines? Install a turbocharger!

So where do we take this next? Well, for starters, Netezza will essentially be providing a "turbocharger" for our FAST Engines framework.

What do I mean by that? Perhaps this quote will help:

http://i111.photobucket.com/albums/n148/nzfrisco/Miscellaneous%20Figures/turbocharger.gif

"turbocharger""The turbofan compresses the air fuel mixture so more molecules are squeezed into the cylinder. When the mixture is ignited, more energy is released. Thus, a turbocharged engine will provide more shaft work out than a naturally aspirated engine of the same size.

<...snip>

"The advantage of a turbocharged engine is that about 35% more work can be done by a turbocharged engine as compared to a naturally aspirated engine of the same size.

--from a primer on Natural Gas Engines.

There's only one thing wrong with the above quote. The new addition to the NPS system's FAST Engines framework doesn't just boost performance by 35%; it could boost streaming query performance by as much as 100-200%! Because that's the potential upside performance customers are going to see with new Compress Engine that is being added to the FPGAs.


Rather than the cumbersome, compute-intensive compression efforts employed by other vendors to reduce disk usage that also result in reduced performance, the Compress Engine boosts performance by decompressing data inside the FPGA as fast as it streams from disk.


As data is written to disk (e.g., during data load, insert or update operations) it is compressed into a compiled format, column-by-column with the original data replaced by the Compress Engine "instruction set" for decompilation. Then, when data is read from the disk, the Compress Engine reads its instruction set and reassembles the original data as it streams from the disk, effectively raising the streaming data rate by as much as 200% - lifting the effective scanning rate per SPU node from over 60 MB/sec to approximately 200 MB/sec. With 108 active SPUs doing this in parallel in each rack of the NPS system, that's the equivalent of a persistent (i.e., not 'burst') scan speed of about 70 TB/hour per rack, or well over 500 TB/hour for today's largest NPS system configuration, the 8-rack NPS 10800.

And that's not all, folks!


The FAST Engines framework is extensible into the future - and we're already hard at work looking into things that will rev up performance even further, extend the applications set of the NPS appliance more broadly or both. Again, the White Paper sets out what some of these are in fairly clear language so I don't need to repeat it here.


Wherever the evolution of the NPS appliance takes us, we're very bullish on the notion that the performance acceleration and potential to extend the application space that FPGA provides will give Netezza that much more headroom in maintaining its leadership position in the market.

0 Comments 0 References Permalink
0

http://www.netezzacommunity.com/servlet/JiveServlet/downloadImage/38-1034-1036/SC1.jpg

November X, 2007
Issue 14: Supercomputing Conference Brings Life to Reno

"Computing is not about computers anymore. It is about living."

- Nicolas Negroponte

Negroponte's words certainly rang true, in more ways than one, at the recent SC07 conference in Reno. In a city known for little other than its lackluster "old Vegas" character, its proximity to Tahoe and the Reno 911! Comedy Central series (which is actually filmed in southern California), the Supercomputing conference breathed new life into the city from November 12-15, while providing extensive displays of technologies that are changing many facets of life as we know it.

Moments after stepping off the plane, the fluorescent "Welcome to SC07" signs that followed me from the Reno airport all the way to my hotel made it clear that this conference was going to be a pretty big deal. My intuitions were confirmed upon entering the Reno-Sparks convention center, with an impressive array of meeting space, food stations, souvenir "shops" and an exhibit hall encompassing 200+ rows of endless exhibitor booths. The annual Supercomputing conference does a great job of bringing together growing numbers of cutting edge technologies, commercial enterprises, government organizations, national labs and graduate students, year after year. With its extensive technical program consisting of several awards, Birds-of-a-Feather sessions, "challenges" and more, along with a plethora of industry and research exhibits and sessions, the show continuously delivers something of value to everyone who attends. The result: increased momentum and record-breaking numbers of both attendees and industry exhibitors at SC07. The collaboration of masterminds from all different backgrounds, locations, age groups and organizations produces an amazing fusion of ideas, solutions, partnerships and technologies.

Netezza participated in the conference with booth space in addition to a session in the SC07 exhibitor forum, featuring our very own Justin Lindsey along with John Johnson from Lawrence Livermore National Lab (better know as LLNL). Despite the modest space Netezza occupied with our 10x20' booth in the massive exhibit hall, our presence felt considerably larger as we attracted a continuous stream (no pun intended) of visitors throughout the week, including those who had never heard of Netezza along with those who'd set out on a resolute mission to find our booth.

Our bright purple SPUBox, featuring an animated motorcyclist and "Netezza Speed" written in graffiti, was by far the biggest draw to the booth. It was exciting to see so many passers-by stop for a closer look at the SPUBox, often asking a question or two about the box, how they might get their hands on one, and what exactly the Netezza Developer Network is all about. Several visitors applied to join the NDN right then and there, hoping for a chance to win a SPUBox at the end of Justin and John's exhibitor forum session. All the NDN buzz created quite a "Netezza high," if I do say so myself.

http://www.netezzacommunity.com/servlet/JiveServlet/downloadImage/38-1034-1041/Picture1.jpg

0 Comments 0 References Permalink
1


Issue 14: And then there were none...

by Vishal Daga - Netezza, Director

of Partner Marketing
"Danger and delight grow on one stalk." -- English Proverb

With the acquisitions of Hyperion, Business Objects and Cognos this year, the BI landscape has finally taken a turn that many have predicted for some time now. Given the growing prevalence and importance of BI, and the consolidating software ecosystem, this has not been a stretch prediction to make, by any means. The question is, what now? Is this a good thing or bad thing for the BI user? The answer, as to most things in life, is a bit of both in my view, and only time will really tell.

First, the potential for good. The tighter integration possibilities that this consolidation creates between the BI apps and the other components of their parent company's product portfolios -- including ERP and data integration applications, middleware technologies, and/or databases -- could ultimately result in a much richer and overall seamless experience for the BI user. For example, this shift has the potential to catalyze the adoption of BI capabilities within ERP applications, and accelerate the arrival of an operational BI experience, i.e., a BI world that does not involve a user switching to a different application and/or a reliance on power-users. Furthermore, the resources that the larger organizations can bring to bear can help advance product capabilities and customer BI adoption at faster rates than what the relatively smaller companies could have supported on their own.

Now, the potentially not so good. With size, comes, well, size, and that many times can be not such a good thing. The distractions, conflicting priorities and layers of bureaucracy that come along with size will make it harder for the BI businesses to be managed effectively inside of the larger organizations. It's not a coincidence that innovation and product adaptation to changing market needs are usually driven by smaller, more focused companies that are highly motivated to be entrepreneurial. In addition, the biases that the new organizations will create towards preferential integration with their own portfolio technologies can dilute the independence that most customers need and demand. If this happens, then customers will face hurdles in deploying best-of-breed technologies that are best suited to their needs.

Looking forward, whether the positives end up outweighing the negatives is something that remains to be seen. It's a promising sign that many of the acquirers have committed to keeping the BI companies as independent operating entities. In large part, how things ultimately net out will depend on the discipline that the acquirers demonstrate in adhering to this strategy. The good news is that while it's still very early in the game, based on our relationships and interactions with the BI players since the announcements, there are reasons to be optimistic. Perhaps you just can have your cake, and eat it too.

Vishal Daga

1 Comments 0 References Permalink
1


"What Netezza is doing is... going a step further: score the data as it is streamed into the appliance and before it even hits the database... However, it is not just the performance gain that is significant. This initiative means that developers are embedding analytic software into the Netezza Data Warehouse Appliance so that it becomes, in effect, an application appliance."

Philip Howard, Director of Research, Technology, Bloor Research - from his 5th October posting, "The Netezza Developer Network"

Pardon the title's riff on the late-1970s Elvis Costello hit song What's So Funny 'Bout Peace, Love and Understanding, but a recent mini-dustup got me to thinking about providing a bit of insight into why Netezza's approach to "Streaming Analytic™ Appliances" is different from others' entries in the market. It seems the recasting of Netezza's mission in terms of streaming analytics rather than the more-limiting data warehouse appliances, along with the launch of the Netezza Developer Network (NDN), has caused something of a hullabaloo among some of our competitors (refer to recent stories from Teradata/SAS, Greenplum,