1 2 Previous Next

Gather 'round the Grill

26 Posts

Manhattan Skylines

Posted by David Birmingham Mar 4, 2010

Marcus Gray watched in consternation as the viral program cranked up. He knew that in moments the band of hackers would once again take over the Manhattan power grid. For now, they were doing it as a prank. But he also realized it could be a test run for something even bigger. Like a grid-by-grid shutdown of the entire system, opening the door for untold mayhem on the darkened streets.

 

Moments later, messages from the hacker gang started appearing on all their terminals. Taunting barbs letting everyone know that they were in complete control and nobody could stop them. Gray shook his head and closed his eyes, hoping that this would pass quickly. Losing power even in one part of the grid could spell pandemonium and place lives and fortunes at risk. The weight on his shoulders was crushing.

 

"I think I can help," said a voice from behind. Lane McBride from the Federal Counter-Terrorism Unit based in Manhattan, leaned over to regard Gray's terminal.

 

Gray turned to the voice, recognizing it with hope in his eyes, and said, "They're at it again."

 

"I saw the precursors," McBride noted, "That they were entering the system."

 

"Yeah, but it doesn't matter if we can't find exactly where they are," Gray sighed, shaking his head, "They're in a hundred different buildings, including the Empire State. You guys have agents standing by at all of them, but they have to search the buildings floor-by-floor to find them. The problem is, we have to shut down communications for the building so that they can't warn each other. So even if we could catch a few, do you have any idea how long a floor-to-floor search takes in the Empire State? We can't keep that building offline from communication for that long."

 

"Not to worry," McBride grinned, "I have an algorithm that will directly pinpoint their floors. All we have to do is send our officers up to the floor, and I bet we can round them up in minutes."

 

"Wow," Gray whistled, "I'd like to see that."

 

McBride whipped out a flash stick, plugged it in and let the program do its work. Within seconds, it had pinpointed each hacker, the building their signal was coming from and the floor of the building. "Here we go."

 

"I like it," Gray grinned.

 

McBride touched several buttons on his phone and dispatched the information, and monitored as each of the officers acknowledged the information and the plan. "We'll know soon enough."

 

Gray noted, "The problem has always been that they could hear us coming and could shift floors anytime they wanted."

 

"Not this time," McBride smirked, "At least, not if we do it right."

 

The first officer to report back was from the Empire State. Two of the hackers had been stationed there on separate floors. Both were now in custody and unable to warn their cohorts in the other buildings. Gray listened in awe as one by one, the officers reported in, having captured their respective quarries with minimal effort.

 

"That was brilliant," Gray stared at the screen as the weight seemed to lift from his shoulders, "How did you come up with the algorithm?"

 

"Simple process of elimination. I just looked at the problem from a very-large-scale search. The most important information is where the perps aren't - not where they are. The algorithm zones in on the candidate floor by understanding which floors are not candidates. Process of elimination leads the way. So we can search the Empire State and Chrysler buildings just as quickly as a single-story, capture the floor number and we're done."


---------------------

Some of you already see the parallels. It's how a zone map works. But how does it apply?

 

When we take a look at the Record Distribution option in the Netezza Admin GUI, we're often happy with a "ragged edge" for all the SPUs. And a "flat top" is the ticket. But what about the case of a "Manhattan Skyline", where we have high peaks and low valleys? This is higher than normal skew (something we're supposed to avoid, right?) People see those and shun them. However, these are often the natural result of an intermediate table produced by an ELT operation, and often a result of multi-pass queries in a BI tool. These usually leverage the mainstay workhorse CTAS (Create-Table-As-Select), so in many cases, people are tempted to turn on "random" for all CTAS operations. Or just maybe - one of our regular static supporting tables is deliberately distributed as a Manhattan Skyline just because we want to regularly perform co-located joins with it on larger master table using the same distribution key.

 

In any case, a primary reason we would get this kind of Manhattan Skyline distribution is if we are trying to preserve an existing distribution in order to perform a follow-on operation with tables on the same distribution. Whew! And why would we allow this to continue? Isn't a random distribution better than a Manhattan Skyline? Our problem remains: if the table has such a Manhattan Skyline distribution, we have higher than normal skew. Any full-scan on the table will cause the query to perform as slow as the "tallest bar"  (the SPU with too much of the table's data). As the table grows in size, the problem worsens. It is not a scalable distribution in its latent form, so don't embrace one without a plan.

 

Well, random distribution has a risk too, especially at the BI level, of negatively affecting concurrency performance. Even if our individual queries are not hindered by the data-broadcast incurred by the random distribution, they could just be a one-hit-wonder, because running many of these operations side-by-side can sometimes saturate the inter-SPU fabric, affecting concurrency. If we can keep the processing on the SPUs, we can avoid this problem entirely. So the issue is one of user scalability, something that all of us care about and that the other vendors (sometimes) turn a blind eye to. Netezza has it covered, and as usual, it's so simple a cave man could do it (now I'll get mail!)

 

So now we have two options, neither of which seem good - (a) keep the Manhattan Skyline distribution or (b) use a random one. Let me say that random is not always bad, but it poses a potential danger for concurrency. Likewise the Manhattan Skyline can often be a latent result of an intermediate CTAS so is unavoidable anyhow. And why would we want to preserve an existing distribution on a CTAS? The answer - because it will be a co-located write (blazingly fast). But wait! Don't we get a co-located write by default?

 

Maybe.

 

I have noted in prior posts how the default distribution for a CTAS might not be what we want or expected, so here's a quick recap:

 

(a) For simple single-table CTAS, it will preserve the source distribution key - (co-located write)

(b) For simple multi-table-join CTAS, it will leverage the first column result in the "select" clause (maybe a co-located write)

(c) For CTAS using summaries/group functions in the select, it will leverage the columns in the "group-by" clause (rarely a co-located write)

 

If any of the above are not the original distribution of the source(s), we could inadvertently sacrfice our co-located write. But we can preserve it if we specifically use "distribute on" with the CTAS execution. With co-located writes, this means the data never leaves the SPUs. If we distribute the CTAS on anything else, the data must leave its current SPU and find its way to another one. This initiates a data broadcast (and can negatively affect concurrency). Preserving the distribution, we get the benefit of a co-located write (avoiding broadcast to make the table) and set up the next operation for a co-located read (also avoid the broadcast to leverage the table). Short answer: preserving the distribution preserves concurrency performance. Now the SPUs are working for us at physics-speed.

 

Rather than just live with the latent effects, lets embrace and harness them for the good of all mankind. Well - er -  at least for our user base.

 

What we really want is threefold -

 

(1) preserve the distribution with a co-located write (preserve concurrency, potential Manhattan Skyline as latent artifact)
(2) leverage the result with a co-located read (preserve concurrency, potential penalty from Manhattan Skyline)
(3) mitigate the Manhattan Skyline with a zone map (ahh, best of all worlds)

 

So to get the first two, we can simply preserve the distribution with a "distribute on (key)" clause and make sure the distribution key is part of the "where/join" operations.. This is the simple part.

 

To get the third, we need to either (a) sort the data as it is created, or (b) make a materialized view after-the-fact to get the zone map effect for selected columns. The first one (sorting) is often easier than it sounds, and with strongly filtered intermediate tables is also very scalable. The second one (materialized view) has some caveats but is very fast to create. What does the zone map actually do? It effectively stripes each SPUs portion of the table so that only the section in the zone is actually addressed. Like McBride's algorithm, it's as though the rest of the data isn't even there, because the zone map has guided the optimizer to completely ignore it. So whether the SPU's data has a tall bar or a short bar, the performance is the same. We need all three of the above and the zone map mitigates the potential problem of unexpectedly high skew from an intermediate distribution - or an outlier table that we need to distribute on a common key. Even if (1) and (2) above give us a good distribution today, it could always "go Manhattan" in the future.

 

Another obvious question is "If this is an intermediate result, why bother? Just filter out the stuff I don't want and then there's no issue, right?" Well, technically yes, for a single operation, but I know of at least a dozen cases where the intermediate table is used for a lot of downstream activity, not just a one-off throwaway. So our stewardship rule is: make the data better. For the next downstream process or the ultimate data consumer, the data should get better every time we touch it.

 

Rather than rewrite or re-design a carefully tested and detailed process, adding a simple "order by" or MV is easy and preserves the existing logic, and data model, with little impact and high return. This is especially true of a static supporting table, because we can install what we need on the table's creation. The consuming processes all benefit from it with no more than regular query execution (materialized views are transparent).

 

In the end, we can still leverage the plain-vanilla parts of the Netezza performance model (zone maps, co-location) without having to over-engineer the data using indexes, intersection tables or summaries. This preserves something more  - the ongoing resilience and adaptability of the model itself.

 

Recap:

 

  • Apply the "distribute on" clause of the CTAS to avoid the latent effect of default distribution.
  • Preserve co-location for reads and writes in intermediate tables.
  • If a potential Manhattan Skyline distribution is the CTAS result, rather than go random, sort the CTAS result by a selected column or use a materialized view.
  • As always, apply strong filters to the CTAS creation so that it's not simply copying one table's contents to another (carve the data out).
  • Experiment for the best fit, but remember that Netezza is an appliance.
  • We don't need to engineer the queries, only apply simple performance model alignments in the data itself, to leverage the machine's physics
0 Comments Permalink

"You didn't kill it," fumed the customer, "You said you would kill it."

 

"We've had some, er, labor setbacks," said Bjorn, head of DragonSlayers Inc, a startup boutique firm from several valleys away.

 

"I don't see an excuse clause in the contract," the customer shot back, "Kill the dragon or we're done."

 

"The dragon can't be killed," said a rich Scottish voice striding up to meet them.

 

Bjorn recognized the stealthy character, by name of Connery, from the Information Superhighway Roadside Assistance Service.

 

"I didn't realize that RAS was in the area," Bjorn quipped, offering a hand to Connery.

 

Connery grasped Bjorn's hand and shook it once, "We're all over. Been doing a little cleanup of this or that."

 

"What's this about the dragon," asked the customer, "That it can't be killed?"

 

"Of course not," Connery smiled, "It's a dragon. It's immortal."

 

"Did you know this?" the customer glared at Bjorn, "Have you been stringing us along?"

 

"No," Bjorn defended, "We kill dragons. It's what we do."

 

"Well," Connery chuckled, "Not real dragons, anyhow."

 

The customer's lackey approached them with a small flagon of tea, poured a stein for each of them, and departed.

 

"The dragon is immortal," Connery muttered, sipping his tea.

 

"That's impossible," Bjorn said through a long gasp, "We've killed dragons before - we "

 

"But of course you have," Connery smiled dismissively, drawing another casual sip.

 

Bjorn stared at him, unable to form another word.

 

"If the dragon can't be killed," asked the customer, "Then what?"

 

"In the nether worlds, beyond the mapped regions, you'll see little notation There Be Dragons," Connery said softly, "And whether there be dragons or not, it's uncharted territory. Places no man has ventured, but rest assured danger lurks. Unknown to the uninitiated."

 

"So you know what lies in the uncharted territories?" Bjorn sneered.

 

"It's why I'm a guide and you're a dragonslayer," Connery huffed, "Whether you know your way or not, dragon chow comes in many shapes and sizes," he put his hands up as if to size-up Bjorn, "Many shapes and sizes."


"Funny," Bjorn quipped, but it clearly wasn't funny, "All we have to do is get close enough."


"Reminds me of a time," Connery said wistfully, "Once I knew a man who you could skewer a hundred times and he'd still get right back up."

 

"Ahh, the Highlander," said Bjorn, "I've heard of him."

 

"Well, he never lost his head," Connery huffed, "Or that would've been the end of him."

 

"What are you saying?"

 

"The treacheries of the lands beyond are many. You have to keep your wits about you. Keep your head."

 

"Keep my head, got it," Bjorn said sarcastically, "Anything else?"

 

"You need to deal with the whole dragon," Connery advised, "Not just the part you wrap with that silly leash. It won't hold the dragon. Only a dungeon will."

 

"So we need an enchanter?" Bjorn smirked.

 

"In no uncertain terms," Connery said, laughing, "You have a go at that dragon on your own. Go in there with no more than an enchanter's bag of tricks, and he'll make an ash out of you!"

 

Bjorn gulped, "We'll see about that!"

 

One of the lackeys turned to the other and chortled, "He thinks he's James Bond!"

 

"What do you know about it?" Connery shot back with piercing eyes, "The dragon sends your consultants to the street and you send the dragon to the morgue. Is that how it's done in data warehousing?"


"Basically, yes," snickered a lackey.


Connery whirled, "No morgue will hold him." He turned to the customer and glared hotly, "What are you prepared to do?"


"Sign the contract," said the customer, quickly applying a signature. He stuffed the papers into Connery's hands and hastily departed, leaving the men to set sail and dispatch the dragon as soon as possible.


The boat ride to the dragon's coast was uneventful until the boat ran aground near the shore, screeching loudly against the rocks as its keel protested with a deep, gutteral groan.

 

"That's noise will stir the dragon," Bjorn bemoaned. He'd hoped for a more stealthy entrance.

 

"Hopefully only stirred," Connery quipped as he snatched up his bag, "Not shaken. Won't do to have him awake when we approach, right?"

 

"Coastline is enormous," Bjorn complained, "How will we ever pinpoint his location?"

 

"To find the dragon, you'll need to think his thoughts. Know your adversary. Know his heart."

 

"Yeah, Dragonheart," chuckled a lackey, "Seen the movie."

 

Connery ignored him and leapt from the boat onto the dry shore. "Welcome to the Rock," and then looked out over the vast, scorched wasteland, a product of the dragon's handiwork. He led the team up the rocky slope to the first rise, whipped out his spyglass and waved his hand to the others to belay their ascent.

 

"What's he doing?" asked one lackey to another.

 

"Lookin' around, I guess," smickered one, "Guess nobody told him that the dragon sleeps all day."

 

"What was that?" Connery whispered loudly enough for them to hear, "You think the dragon sleeps all day? Who are you kidding? Maybe you only struggle with him in his lair at night, but he breathes fire all day long. He never sleeps. He never dies."

 

"Where did we find this kook?" asked another, "He's as nutty as a fruitcake."

 

"He'll eat you alive," Connery sneered, trying to spot motion anywhere along the landscape before proceeding. In the distance, a dank mist arose from the ground near some caves. Connery zoomed in and spied dragon scales littering the ground. "Let's go."


The team made the tedious crossing without incident, until they stood before the open, reeking maw of the dragon's lair.

 

"Who wants to go first?" Connery chuckled.

 

"I will," said a lackey fearlessly, "I've taken down enough of these."

 

"But of course you have," Connery strode to the nearest large boulder while the others scattered for cover. After several tedious minutes, all of them could now feel the impact tremors shaking the ground, growing in intensity as the beast ascended from his lair to the cave's mouth.


Then the horns appeared, fifty feet from point-to-point as they slowly rose from the hole. Then the head,larger than a common city bus and almost twice as long. The dragon stared down the lackey for a long moment, then continued to ascend from the hole, growing larger and more hideous with each passing second until his entire upper body was revealed, from his head down to his midsection, standing over ten stories tall. He burst-extended his massive wing membranes with a loud, deafening snap, and then pointed his head straight up to gather a deep breath of air.

 

Connery reached down to pick up one of the many dragon scales scattered all over the ground. Five inches across and eight inches long, made of the most impervious stuff on earth. He flipped it over and shuddered to realize the dragon's age, betrayed in the scale's growth rings. Four thousand years, this animal had been eating and breathing fire.


The young lackey had forgotten to breathe. This dragon was orders-of-magnitude larger than any dragon he'd ever dealt with. In fact, the sheer scale of the dragon made him feel light-headed. Gathering his presence of mind, he took a defiant stance and shouted, "Begone, Dragon!"

 

Connery turned away, trying to hold back a snicker that could reveal his location to the dragon's attenuated senses.

 

The dragon pointed his nose straight down, cocked his head to the side, opened his mouth and released his breath. The column of high-intensity chemical fire blasted downward on the lackey, instantly reducing him to ash and causing the rocks all around where he'd stood to glow and almost melt.

 

Connery glanced over to the rest of the team, cowering behind the rocks in hiding, not believing that the dragon was so huge and powerful, and feeling completely beyond their depth. They stared, partly in awe and partly in concern, as Connery stepped out from behind his hiding place and boldly strode up to the dragon's cave.

 

The dragon once again drew breath into his nostrils to recharge his furnace, when Connery simply placed his hands behind his back and stared deeply into the dragon's eyes.

 

The dragon stared back, unable to comprehend the feeling of drowsiness suddenly overtaking him. He slowly lowered his head, then his body, down to the ground to gently lay next to Connery, unable to break his eyes away from Connery's deep, mesmerizing gaze.

 

Once completely settled, Connery reached out to tap the dragon's front jawbone as it drifted off to sleep, "There now," Connery said soothingly, "That's a good lad."

 

"How is this possible?" Bjorn gasped, stunned at how easily Connery had mastered the beast.

 

"Your friend told the dragon to leave," Connery huffed, "But the dragon isn't going anywhere. He lives here and you people don't. In fact, he's been around so long, and you people come and go so often, that he sees you as decorations, not even permanent fixtures in his home."

 

"But he just laid his head down and went to sleep," Bjorn noted, "How did you do it?"


"The dragon serves me," Connery said slowly, "Not the other way around. If the dragon needs to breathe fire, it's because we've not done a good job harnessing the dragon, not just because the dragon is mean."

 

"So dragon's aren't mean?"

 

"Oh, their born mean," Connery chuckled, "And they bite. Whom they bite and when, is ours to control. That's why we have dungeons. Places where the dragon will survive but under our control. Think about putting that dragon's breath to work in boiling water, making steam to run a turbine. Now the dragon is working for us."

 

"Can't be a happy existence for him."

 

"Happy? Perhaps not. Necessary? Most definitely. You came here to kill him or banish him. He knows his place. He only responds to someone who knows it as well as he does."

 

"You're an enchanter, aren't you?" Bjorn said, realizing Connery's identity.

 

"Some call me, Tim."

0 Comments Permalink

Many of those who integrate the mainstream BI tools into various underpinning data sources find subtle nuances. Not the least of which is how the database will respond to the queries presented. In Netezza data access especially, the power is not found in the query, but in the hardware. We can certainly degrade our experience with bad queries, but we would not tune queries in the same manner as with an SMP/RDBMS.

 

For example, I've watched RDBMS engineers work black-magic with a query by simply rearranging this-or-that in the monolithic query to provide boosts in the orders-of-magnitude. This is because the query is being used to guide the general-purpose physics. In Netezza, however, the purpose-driven physics snips the query apart. The physics then guides the query's mechanics. I've watched newbie Netezza folks nearly pull their hair out - and their eyelashes too! - when trying to "make the machine do what I want". Hmm, no, the machine does what it does. It's an appliance. We get what we want when we conform the data to the physics. The query is just along for the ride.

 

How does all this apply to multi-pass SQL in a BI Tool? Well, most BI tools come to the table with a pre-conceived notion that all databases are created equal. Unless they have specific VLDB hooks, and unless those hooks fully embrace VLDB principles, the BI tool will not experience the expected lift and we'll likely have to help it out. In fact, little about a BI tool is purpose-built in regards to its data source. It regards data sources as general purpose interfaces so it can be as vendor-neutral as possible.


Unlike a standard star-schema, many VLDB tables are fact-sized tables containing billions of rows, as are their dimensional counterparts. So a single one-shot query will sometimes provide the functional answer but with unacceptable performance. Many of us have seen multi-page (hey, 100+ page) queries that try to do everything in one shot. The average RDBMS leaves us few options. The VLDB and especially Netezza is not so constrained. We can make multiple passes on the data often with little penalty. The danger here is in the inefficiency of the passes, not whether multi-pass is okay. Multi-pass, or more appropriately multi-stage SQL,  is a necessary approach with large-scale tables. Netezza makes it simple and fast, using built-in concepts of its performance model.

 

Here is a spot case-study - a BI tool needed to access several tables that were each in the many billions of records. The end result was a summary of user-selected values. The temp-table creation here is done automatically by the BI-Tool, so we may have limited options in getting it to shape them as needed. In the examples below, I'll label the queries so we can reference them later.


A typical BI tool, upon realizing it needs a summary, will often divide the answer into multiiple stages of work. Each stage will store its result in a temporary table using a CTAS, leveraged in one or more following passes. Unfortunately these passes are sometimes inefficient. In the case below (this is pseudo-SQL, so bear with me here)


(1a) create t1 as select region, district, store, sum(transaction_amt) sumtran, sum(transaction_tax) sumtax from transactions where district_id=4 group by region, district, store;  (1 million records)

-

(1b) create t2 as Select  employee_id, employee_name, t2.store_id from employee_master t2, employee_lookup t3 where store_id=6 and t2.store_id=t3.store_id                   (500 records)

-

(1c) select store_id, employee_id, employee_name, sumtran, sumtax from  t1, t2 where t1.store_id = t2.store_id and t2.region_id in (41,42) and t1.store_id = 6;                     (450 records)

 

Note how in the above, the filter effects are largely applied last (1b and 1c) with the summaries applied first (1a). In this case, it is summarizing over a million values but it throws away over 90 percent of this result on the last operation, reducing 1 million records to 450. It is still accessing the larger table (transactions) only once. It just does it at the wrong time.

 

If we invert this chain and regard the filters first, we might see queries like this:

 

(2a) create t1 as select region, district, store, transaction_amt, transaction_tax from transactions where district_id=4 and region_id in (41,42) and store_id=6;            (15,000 raw records)

-

(2b) create t2 as Select  employee_id, employee_name, t2.store_id from employee_master t2, employee_lookup t3 where store_id=6 and t2.store_id=t3.store_id              (500 records)

-

(2c) select store_id, employee_id, employee_name, sum(transaction_amt) sumtran, sum(transaction_tax) sumtax from  t1, t2 where t1.store_id = t2.store_id;       (450 records)


In the above, the filters are pushed into the first part of the query chain (2a) to squeeze down the data sizes, but to also glean out the raw values for the final summary (transaction_amt, transaction_tax). The (2b) query is still a filter, but by the time we get to (2c) all we really need to do is summarize based on the intermediate table results. We don't have to "go back to the well" of the larger table. Everything we need for the final result is already in our hands, and a much smaller workload.

 

The simple inversion of the query order has significantly reduced the workload of the entire chain of events. This of course, does not answer whether our BI tool will actually implement the query in this order or manner. Anecdotally, with the above tables the original "transactions" table was over 30 billion very wide rows. The first query chain (1a-1c) takes no less than a minute, but only because key1 is zone mapped. The second query chain (2a-2c) takes 6 seconds or less, and it better represents a flow of data from larger-to-smaller, like a common source-to-target flow. It is easier to visualize and manage, and is more efficient.

 

Note: Can our BI tool shape a query chain in this manner? Can it glean out in the raw columns to an intermediate table, later summarizing on the intermediate? Or will it always require us to summarize at the outset and then squeeze out from there? Some BI tools are very close to this model already.

 

Yet another pernicious issue is not obvious from the above - temp table distribution. This last query chain, though 6 seconds in duration, is still a one-hit wonder. Once two or more users start hitting the machine, concurrency will reveal all. The machine is quickly saturated and all of the queries start to take more and more time. In one case of just five users on the machine, all of the queries took over a minute, and one took over five minutes. Concurrency tuning is a bread-and-butter issue, too, so what's going on here?

 

In both query chains, the CTAS is not being given explicit instructions on how to distribute its results. The outcome is unpredictable from the BI tool's perspective, but very predictable for us. When the CTAS result remains distributed on its original distribution, we get a co-located write. If the CTAS does not use the original distribution, it will have to redistribute the data, broadcasting it all over the SPUs. We need to avoid this because co-located writes are desireable and muey caliente.

 

The original distribution key for the transaction table is (transaction_id). This doesn't do us much good if we are later focusing on the store_id (2b, 2c) as the primary distribution. In order for the final activities to be as quick as possible, we need to bridge the transactions into the store_id. We could set up data structures to do this, but in the end with so few records coming off the transaction table in the (2a-2c) chain, an intermediate broadcast is already in the mix. We can do it deliberately under our control, or allow it to use CTAS defaults. In this case, the CTAS default is worse.

-

In the first chain of queries (1a-1c), we would expect to see the following CTAS defaults:

 

(1a) - distributed on (region, district, store) because this is the group-by clause. It cannot use transaction_id for a co-located write because it's not even in the result set. Those who understand distribution keys know that this is not an optimal state of affairs.
-

(1b) - distributed on (employee_id) because it happens to be the first column in the select-clause. This query uses two tables in the join, so
     CTAS will opt for using a column in the select clause.

 

So in this case, the CTAS will not preserve the original distribution or even a useful distribution. Don't get me wrong here. CTAS defaults are acceptable in over 90 percent of cases. This example is offered as a typical one-off of BI automated query construction. The first query (1a) will produce a million records (and honestly, some cases it produced a couple of billion records) we really need some optimization here.


If we were to take (2a) and (2b) above to deliberately enforce the distribution, we would use the "distribute on (store_id)", but we would have to include store_id in the result set. In each case, this would prepare both tables for the final query (2c) for a co-located join.

 

Note: This brings up another BI tool issue, in that we need to affect the order of the sequence, and also provide for columns that are adminstrative (like store_id) but not part of the final result. Some BI tools are picky this way. If the column is not required in the final reporting output, it trims or ignores the need for the column in the intermediate tables.

 

To continue, we have now pushed the workload into the physics, not the query itself. But as noted, concurrency is the test.  This final chain of co-located queries then returned in less than 3 seconds, and did not grow beyond 4 seconds until 20 users were running the same query at the same time, and even then tended to hover between 3 and 5 seconds as even
more users were added. Isn't this the kind of scalable performance we want?

 

Additional note: If we really want to push this harder, it would be best for us to manufacture a "store_transactions" table that is distributed on the store_id already (for the 2a query). This would be a report-facing table that essentially mirrored the transactions table, but only carrying the high-traffic reporting columns. In this way, the store_id becomes the universal distribution even for the very first query. Keep in mind that while this strategy may cost disk space, it will further eliminate concurrency issues. I am not a big fan of preserving disk space when performance issues are in play. We will still need to perform a "distribute on (store_id)" for each (2a,2b) but it will preserve the distribution with a co-located write.

 

But we can see, the two protocols we will need in play from the BI tool is to use capture-filtration-summary, and then also apply distribution keys deliberately to the first passes to preserive distribution. We often apply these very same protocols in ELT because they make sense. But we have complete, detailed control of query construction in ELT, not so in the BI Tool world.

 

Conclusion: Rather than use a BI tool's default of summary-filter chain, what we need is capture-filter-summary chain. This guarantees that we can leverage the VLDB physics, but also moves the data from larger-to-smaller in the most efficient manner.

 

Recap for Multi-Stage SQL:

  • especially for summary data, should perform the summary as the final operation, with capture-and-filtration in the first passes. This allows the final operation to be a simple summary, since all the filtration has already been applied. In other words, no more where-clause activity apart from the join criteria.
  • Organize the tables (including additional tables) on the distribution key in play. Bridging one distribution to another can give us the performance, but if broadcasting it can eventually create a concurrency problem
  • the chain should not address the same large table more than once. Get everything we will need and get out - don't keep coming back for something the first pass did not get.
  • the chain should capture raw information into an intermediate table, foregoing the summary until the final operation.
  • should provide a means to bridge one distribution key into another, for maximum efficiency, rather than using CTAS defaults.
  • should perform filtration at the outset, as a method toward attacking the larger table(s) with zone maps etc.. Move from larger data sets to smaller ones.
  • should preserve distribution to leverage co-located write and read where possible. This maximizes overall performance but also optimizes concurrency.


What if the BI tool will not, as a general-purpose tool, perform these deliberate and purposeful query chains? At this point, we need to have a heart-to-heart with the BI Tool vendor stating our concerns. Assume the best, that the tool vendor may eventually fix the issue, just not in time to help us now. We then need to consider two purpose-built options, each of which has its own issues. These are offered in the spirit of temporary adaptation until the BI tool is smart enough to bypass them.

 

Summary tables: These are often constructed to prop up database performance issues. They are just as viable for functional reasons, such as providing data in a form that is only available and most efficient when summarized, or to intersect details with pre-summarized data. But if used as a performance prop or BI Tool helper, put some effort into making it an adaptation that could be deprecated when the BI Tool is smarter. This way, we're not committed to it forever.

 

Stored procedures: Used in an appliance as an adaptation mechanism (in this context). Effectively bridges the BI tool to the data with a temporary procedural construct (the procedure) rather than a more permanent structure (like a summary table). Stored procedures pull application features down to the database level and adapt the BI tool into the Netezza performance model.

 

When or whether to use either of the above is always a design decison, not necessarily dictated by the tools themselves. But keep in mind the idea of temporary adaptation. I am always of the mindset that the warehouse and BI environment must exist with the expectation of change, so in general, adaptability and adaptation concepts are always desireable. They allow us to be more responsive to future requirements

0 Comments Permalink

Riding the Waves

Posted by David Birmingham Jan 20, 2010

I've been noticeably quiet over the past weeks as I've switched horses, so to speak, and joined Brightlight Consulting. I had already been following Brightlight for a number of years, encountering their significantly talented people at various Netezza sites across the fruited plain.

 

the press release is here:

 

http://www.brightlightconsulting.com/news_2010_DavidBirmingham.htm

 

 

At the Netezza conferences this year, many of you saw the slow-motion videos of surfers mastering those monstrous waves. Also during this season, I happened to attend another conference where the speaker shared some famous words from Shakespeare's Julius Caesar in a similar context to what I was now experiencing right there on the conference room floor:

 

 

There is a tide in the affairs of men. Which taken at the flood, leads on to fortune;
Omitted, all the voyage of their life Is bound in shallows...

 

 

I was standing on the top of the wave, so to speak, and had a choice before me. Ride the wave, or return to the shallows.

 

Now, I don't put a lot of stock in epiphany-styled revelations, but in this case a tingle went up my spine, realizing that the TwinFin had completely changed the game - it was time to seriously get on the wave and ride it, or commit to the shallows of the everyday. As many of you know, Brightlight has stood out for a number of years as being a go-to partner for all-things Netezza, and their VLDB consultants have solved large-scale problems where others feared to tread.

 

"You have some serious thrill issues, dude," Crush the Turtle, Finding Nemo

 

As I have been inundated with pings and kudos from many of you who already know the story, I thought it was worth sharing, especially for you Shakespeare and surfing aficionados, a rare breed indeed.

 

And to surfin' Enzees everywhere - here's to a "so totally awesome" 2010 and all the promises it offers. All the best.

 

See you on the waves!

0 Comments Permalink

A Tastier Float

Posted by David Birmingham Nov 5, 2009

In one of our primary tables, we'll call it a fact table, it contained a number of columns that had arrived through some pretty hairy ELT-based math algorithms. In all the crunching, we would see spontaneous overflow errors, so we converted some of them to float. More explosions occurred, and we converted more to float. After several more iterations, we converted them all to float. Then we discovered that the reporting layer also had to perform some hairy on-demand calculations, so it was a good thing we had float values to give them. Now everyone was safe.

 

However, as this table grew, and they always do, the floats became "bloats". Netezza does not compress a float data type. One day we looked up and the table was approaching 20 TB in size, with no end in sight.  The theory was, that we could reduce these float values to numeric data types, we could save half the storage right away, and even more so with Netezza's compression, but it would put the reporting layer in danger of a spontaneous overflow explosion.

 

Once we performed the conversion of the table (as a test case) and saw it reduce in size to about 7 TB, we were hooked on the possibilities of compression but vexed as to the impact this would have on the consumers of the data.

 

We had experimented with surgically casting the data from numeric to float on-the-fly, but this would create a lot of headaches for the users if they always had to wrap every field with a casting notation. It did however, prove out one thing, that the time to cast the numeric-to-float is inconsequential when compared to the amount of I/O required to pull a float value from the SPU's disk "as is". In essence, we traded the time we saved in compression, and converted it to time used in casting.

 

So the next step would be, put a special view on top of the fact table, such that it would automatically cast every numeric column into a floating point value. Thus, whenever a reporting layer query required data, it would automatically and transparently leverage the view, pull less data from the disk, covert it to float in the CPU and then leverage it as float in memory. We effectively eliminated the cycles spent in I/O to rip the float value from the disk drive. We spent a little of it in the cast of the data to a float. We made the operation transparent to the reporting layer.

 

old way:

 

FLOAT ->>>>>>>>>>>>>>>>>>>>>>>CPU -> QUERY MATH

(16 bytes, no compression

 

new way:

 

NUMERIC ->>>>>>>>>>>>>>>>>>>>FLOAT -> CPU -> QUERY MATH

(8 bytes or less, with compression)

 

All of the CPU-level math then becomes inconsequential when we move to the Twinfin, since it has its own floating-point processor and can handily deal with the float type. But we can continue to mitigate the I/O hit for the data by storing it in a compressible numeric format, and coverting this on-demand to a float at the CPU level.

0 Comments Permalink

Famous words, or some such like, uttered by Orson Welles as he launched into a scary parody of alien terror on national radio. Really scary for some. And proferred on Halloween night in 1938, so dare I say, 'tis the season (almost).

 

Ahh, not to fear, this purports to be a painless foray. But I do have a story to tell.

 

Several projects ago (I always start this way, so you won't think I'm talking about you!) - I worked with some really sharp data engineers on boiling out a solution for retail operational reporting. The data arrived every five minutes or more, or less, and sometimes in parallel loads, with 24x7 regularity. More and more Netezza implementations are going this way, and you too, should look into processing data at the speed of thought. In any case, the reporting users wanted to plumb the depths of this data store, to the tune of eighty billion records and growing. (Okay, small I know (for some of you) but humor me).

 

Well and good, except rather late in the game, the reporting users spontaneously expressed a desire to review the detail through metadata-based "lens", that is, set up some drilling levels and other metadata-based entry points, such that the entire operational model would be seen through this reporting "lens" and it would provide all the context for the consumers.

 

Now, such a model as described, would require such enormous power from a standard SMP/RDBMS-styled system, that we might well cause structural damage on the raised floor for sheer physical weight of said system. That is, if we really expected a report to return within a day or two of the request. Ahem! as I facetiously clear my literary throat.

 

But the worst-case for any given query for the above was around 8 minutes, and over 99 percent of the thousands of queries submitted, returned in less than 30 seconds. Oh, yeah, it was smokin' hot. In most queries using zone maps and the like, we saw returns in mere multiple seconds. Pshaw! Says the tick-tock-man, chocolate and vanilla, don't waste my time.

 

However (and there's always a catch) many of the larger reports were actually conglomerations of these smaller queries, and their aggregate time would occasionally exceed ten minutes or more. And even though this was a far cry from the "days away" we would expect from an SMP/RDBMS system, it was still 'too slow' for the users. Now, this is true adrenalin-junkie stuff, sort of like the old Far-Side cartoon of a young man standing with a fork in front of a waffle iron, captioned "Wendell Zurkowitz, slave to the waffle light". I recall how one man noted that many years ago we would wait hour(s) for a traditional oven to finish cooking, and now get impatient when the microwave instructions are greater than five minutes.

 

Perspective.

 

And rather than punt to the users and say, "Hey guys, this is just unrealistic" and degenerate into "expectation management" - the challenge was to actually achieve faster turnaround times on the reports. And here, I'm talking about getting these ten-minute reports into the 30-second zone. Would we have to embrace some extreme engineering for this feat? Methinks not - but the form of the process to get there was quite instructive.

 

Now recall I noted that the above model had operational tables, which were to be the detailed source, and a retail reporting hierarchy that was largely metadata-based. This reporting hierarchy had some significant size as well, perhaps a fourth the size of the eighty-billion-record fact table it had to link into. Yet both of these were on separate distribution keys. Queryng one meant broadcasting another.

 

And now, for broadcasting.

 

Whenever two tables are distributed on different keys, a join between them cannot be initially co-located. To support the co-location, Netezza will broadcast the salient information from one table's context to the other. This means the physical data has to move from its home SPU, out onto the inter-SPU network fabric, and find its way to the target SPU where it will be further examined. Broadcasting for small tables is inconsequential and barely a blink on the radar. For larger tables it can have strange effects. For example, we saw one query return consistently in ten seconds. Yet when running side-by-side with itself (multiple users) it could take several times longer.


The reason is that both queries were competing for bandwidth on the inter-SPU fabric, among other things. The simplest solution, of course, is to get our metadata table distributed on the same key as the operational tables. The problem was simply in the complexity of this metadata table and how it mapped to the core information. "Blowing it out" into a materialized form of information would require significant planning and design, because a misstep could easily make the reports turn out wrong, and this was unthinkable. In all this, the maintainability had to be considered, because if our initial complexity is too high, the maintainability is in jeopardy - by design.

 

Of course, we would spend most of our time in testing this scenario. Coding and implementation in most BI shops is a nit compared to the testing we have to execute to validate the outcome. Netezza is no different, except we can close the testing loop sooner if we have more power. And of course, for something of this magnitude, to test the change from minutes to seconds, we would need a powerful machine to measure the difference. Whenever we ran the new solution on a smaller machine, the difference couldn't even be measured. No, the power of the machine makes the testable difference visible and measurable.

 

As I noted, the form of this exercise was the most instructive part. Rather than form a means to align these two tables for co-located joins, the first effort was in attempting to tune the queries. You know, "query engineering", which is the mainstay of performance engineering on an SMP/RDBMS platform, and old habits are hard to break. The data engineers were somehow in denial that they would receive extraordinary power from configuring the data. Rather they trusted their instincts and chose to attack the queries.

 

Now, in any platform, regardless of shape, size or vendor, power is always and forever the domain of hardware. Software cannot manufacture more CPUs or network speed. If the physical plant is not ready, the software can only use what it has at its disposal. The software itself is largely a cost center, because it can only drain the machine's energy through inefficiency. In an SMP/RDBMS machine, the only option we have is to engineer the queries, because the physical plant is configured to be general purpose.

 

In a purpose-built machine, however, the query is simply a controlling mechanism to Netezza's resources. The host will chop it apart into snippets and dispatch these to the component that they will serve. Extreme query engineering on the other hand, assumes that jockeying around with the query can actually affect our fate. (contrast; a poorly written query is different from directly engineering a well-written query). And besides, do we really want to spend our time carefully engineering the query to the point of functional brittleness? In an SMP/RDBMS machine we will see queries that extend for tens of pages in a very daunting complexity. Maintaining these is a full-time job for our consultants. They swarm on the machine, and carefully tune their handiwork to avoid breakage.

 

Yet, we purchased a Netezza machine to get away from this complexity. To reduce, clarify and simplify our administration and consumption of the data. So as I watched these engineers bat themselves against the problem, no differently than a fly batting against a window, I watched them pull out their hair in generous tufts when little they did offered the significant gains they expected. This outcome was entirely counter-intuitive to their training. They were acccustomed to using and tuning software to make things work faster.


Sweeping the hair from the floor one evening, I mentioned (for the x-teenth time) that the broadcast effect was killing them. Once our engineers grasped the broadcasting problem, I thought we would make headway, but things actually got worse. They started trying try to control the broadcast as the root cause rather than the symptom. In one test, I saw one of the largest tables leap into a broadcast and we just killed the query outright (it would probably still be running, even today). The engineers lamented: How do we make sure the larger table doesn't broadcast? How do we control the broadcasting to our benefit? Answers exist to all of these, but it's like talking to a drug addict, one who is addicted to the drug of SMP/RDBMS and claims he can 'quit anytime'.

 

And then the truth came out, "David, if we can make this 10100 machine process data like a 10400 machine, we'll look like heroes!" To which I ask "How?" to which the response is: "We can save them all that money they would have spent on the hardware..." Well, not really. You've just chosen something else to spend the money on, namely performance engineering, the cost of time-to-market, the cost of a marginal implementation and the cost of human labor (the most expensive asset you have, by the way). But since the only way to get a 10100 to perform like a 10400 is to actually be a 10400, well, you see the futility. 432 SPUs versus 108 SPUs? And they really, truly thought they could - I mean - seriously. Let's keep in mind that the opposite is true. If we can't make the 10100 process data like a 10400, perhaps our approach is flawed? Heroes or goats. Take your pick. In my estimation, there's only one hero in the room. The big black box.

 

So the broadcast is the symptom, not the root cause. How about, we quit broadcasting, cold turkey? Take the data model through a detox program and the engineers through a series of deprogramming seminars to - well - it's not that bad. Typically the average engineer only has to see it operate in an adverse manner to become a believer. But a believer they must be, or they will not take action to correct the problem, correctly.

 

So one of them finally decided to produce a map table, one that would map the metadata into the operational tables such that all core joins would become co-located, with a common distribution. And lo, the first test of this blew their minds. Even the complex reports were now coming back in single-digit times, and the reports that had been running ten minutes or longer were now under a minute, even with multiple users. In fact, they saw the performance and scalability practically handed to them - simply because they configured the data correctly. It had little to do with query engineering.

 

Now one may ask the obvious question, and please do so now: Why don't you just build out some user-facing tables and forget leveraging the operational tables? After all, we don't build our non-Netezza reporting systems on top of operational data, do we? We build-out dimensional models and other handy structures to postively affect the user experience and simplify the flow (and the maintenance). This functional decoupling is a mainstay of reporting environments. (Okay, the next entry will focus on this). But in this case, suffice to say that the owner of the machine had placed down a hard-mandate on disk utilization. At no time could we foray into replicated detail, or even summary of detail without a plan to access the operational detail on a drill-down and the like. Interestingly, the required reporting tables would have only cost mere fractions of the cost (on disk) of the time/labor and effort put into making the operational tables viable. This is why it deserves its own treatment in a separate rant - er - essay. Stay tuned, and don't touch that radio dial.


Back to the drama - A telltale symptom that we're doing something wrong, is when we start down the engineering path. It's an appliance. We don't engineer toasters, blenders or laundry machines. But the difference here seems to be subtle. It's not. In this case, the culprit was the broadcast, something to be eliminated rather than managed. And no amount of creative query hoop-jumping would overcome this. Get the joins onto the SPUs. It seems obvious to those who have been around the machine for bit. But for those who have not, the learning curve is upon them. Be patient with them for as long as it takes to get it right. Once we have a believer, we'll never have the conversation again. As long as we stay in a theoretical zone, however expect them to stay in the spin cycle. This is like many things scientific. Seeing is believing.

 

Whenever I (and others like me) observe a ritual of performance engineering, each participant holding out the hope that "just one thing" will offer stratospheric boost so they can all wipe their foreheads and go home - this is the surest sign of one of two things: Either the data is poorly configured and is causing the queries to be ineffcient, or the data is properly configured and the machine does not have enough physics to achieve the goal. If the focus is on query engineering, they are wasting time. If the focus is on data engineering, at some point it will reach a "diminishing return". Either the machine has the power or it doesn't. Time to switch to Netezza, or if using Netezza, time to add some physics (a frame or two) to make it happen.


Moral of the story: Performance is found in the physics, not the carefully engineered queries. If we find ourselves "engineering" our queries for performance reasons - we should take a step back, take a deep breath - click our heels together and say softly: "There's no power like SPU power. There's no power like SPU power." Repeat as necessary.

 

And pay no attention to the man behind the curtain. I'll bet he and Orson Welles never even met.

0 Comments Permalink

A number of months ago I wrote about how the World Tour Awaits, and all the buzz in the air about the new TwinFin. I was honored to moderate the best practices forums in North America and London, and many thanks to the rather effervescent participation by the panelists. Kudos goes out to David from Brightlight, David from Edge Associates, and Jeff from Quantisense, each of whom have those over-the-top kind of personalities that turn the session into an "experience" more than just a discussion.

 

But all in all, the sessions flew like lightning. If any of you have additional questions or insights, may I invite you to post them here on the Netezza community. The discussion never ends, you know.

 

It is interesting to note that many of the questions coming from Enzees in every venue, struck a common chord and followed a common thread. In that Enzees are unique and have a rarefied problem and solution domain. And are able to approach it with the confidence of Spartacus in the arena, or Jackie Chan on the streets of New York. Comments often began with, "I have a table with <seventy, eighty, ninety, your number here> billion records and I want to..."  I mean, seriously, those on the outside lookin' in will also look askance at such an opening statement, and marvel at the ensuing, rather casual discussion about it. Nothing is casual about these data sizes, on the outside world.

 

It goes like this: Bring it on, baby. Because the question of whether it can be done is behind me, now I just want to know how to do it well. The audacity!

 

Kudos also for the Enzee crowd members who injected their insights and wisdom into the discussion, freely sharing their technical and political battleground knowledge for the betterment of all. This was not the same as "iron sharpening iron", because at this scale of data processing, iron crumbles. No, this was a lot like titanium sharpening titanium, and was exciting to participate in, to say the least.

 

Many thanks also to Netezza for inviting me to the tour. It was a whirlwind to be sure, but well worth the ride. Tim, Olga, Courtney and Karina made it easy for me (actually all of us) to participate. Thanks to all for your hard work and a World Tour Well Done!

0 Comments Permalink


As the sunrise peeked over the horizon, it cast long shadows over the four cars awaiting the break of dawn. Stretching before them, the expanse of the salt flat beckoned, nay taunted them, to accelerate across its ancient surface. Not caring for the winner or loser, it merely provided a level playing field for them to test their wares and technology. But yawned at the futility of the race itself. The salt flat had always been, and always would be. Come one, come all, it invited daily, almost mockingly.

 

The leader for team-Exa sat in his racer's driver seat, eyes closed. When he felt the warmth of the morning touch his face, he raised an eyelid to examine the time. Now thirty minutes from flag-down, the sun would still be at his back when he won the race. And he would win the race.

 

The lead for team-Terra pushed back into her driver's chair to stretch her legs as her eyes fluttered open. She glanced toward her left to the Exa racer, gleaming in the morning sun, and then to her right at the NZ racer, its plain black lines and nondescript exterior, she knew, hid the power under its frame, and was nothing to be trifled with.

 

The fourth car on the end, entered in the eleventh hour was a plain vanilla Volkswagen Beetle with a rocket engine attached to its backside. No frills, no nonsense and nothing hidden. Five men from Redmond had delivered it last evening. They hadn't even had time to take a test run on the flat.

 

Minutes later all four drivers and their lackeys met in front of the four cars, partly to wish each other luck and partly to offer last minute trash-talk. Dominic Toretto, the driver of the NZ machine, ran his hands over his bald scalp and rubbed it vigorously, as if massaging the sleep from his head, then yawned and said, "Okay gentlemen. We're fifteen minutes from flag-down. Anyone want to back out? I swear we won't hold it against you."

 

"Dude," laughed Excel, the driver for the Redmond machine, "In your dreams. I have investors watching."

 

"As do I," smiled Tara, the only female driver, and would command the blue-streamlined Terra racer, named for its ability to master the earth and its elements. "We're all in this for keeps." She batted her eyes and tilted her head flirtatiously, "You want to see under my hood?"

 

"Out here in the open?" Toretto laughed, drawing chuckles from the others, "Sure, let's see what you have."

 

She ignored the innuendo and pointed her keytag toward the Terra racer and pressed a button, causing both side doors to slide away and the hood to pop open. Toretto strolled over to examine the engine. He'd seen these before.

 

"Lot of power under that hood," he quipped.

 

"Yeah," she said, expecting a bit more enthusiasm for her machine. She wouldn't find it among any of these drivers, though. They lived and breathed adrenalin, and knew as much about her machine as she did. And weren't in denial about its weaknesses, either.

 

"Looks plain," said Jeff, driver for the Exa-car, "And as you can see, not enough control."

 

"So let's look at yours," Toretto said, a twinkle in his eye.

 

As they sauntered to the next car, Jeff's lackey whispered in Toretto's ear, "We've radar-mapped the entire flat between here and the finish line. Every bump is programmed into the machine. You think that's a competitive advantage?" He slapped Toretto on the back and laughed loudly.

 

"Bumps don't matter," Toretto muttered, with the strength and experience of someone who would know.

 

Jeff spun to face him, "What was that?" he laughed, "Bumps don't matter. Did you hear that?" he looked around him to the others, with his lackey already laughing, "He says bumps don't matter." He crossed his arms, "Would it matter to you if I said that ignoring bumps at these speeds is like a death wish?"

 

"No."

 

"No, what? No it won't matter what I say, or bumps still don't matter?"

 

"Either way," Toretto said with a wry grin, "Bumps don't matter."

 

Jeff threw up his hands in frustration as Toretto poked his head into the Exa-racer's driver side window. Jeff asked, "What do you think, huh?"

 

Toretto examined the interior, laid out like a Boeing 757 cockpit. Three LCD screens loaded with controls and meters, flashing lights all around the dashboard and dozens of knobs and gears. "Got a lot of moving parts," Toretto sighed, "Think you'll need all that?"

 

"No more, no less," Jeff said, "Our investors are very demanding. All the tires and wheels are measured for pressure and impact, the dual-redundant monitors compensate for any detected differences, and the pre-mapped radar anticipates every bump and turn."

 

"It's a salt flat," Toretto grinned, patting him on the side of his shoulder, "There are no turns. And bumps don't matter."

 

Jeff nearly bit his tongue, but instead smiled and shook his head while Toretto continued his examination.

 

"Looks to me like," Toretto finally said, "You decked out the car just for this ride."

 

"Yeah. So?"

 

"Well, it might work for a salt flat under controlled conditions, but it's not streetworthy."

 

"We're not testing on a street," Jeff fired back, "All that matters is who makes it to the other side."

 

"Really?" Toretto raised an eyebrow, "You think people will be knocking on your door to buy a few of these to come out here to run on salt flats?" He laughed, "Your investors will expect to see the performance you show here," he pointed toward the West, "Out there. Or they can't make any money. Optimizing your car, just for this test, doesn't mean anything."

 

"We'll see," Jeff snapped.

 

"I'd like an assessment of my car, if you don't mind," said Less, the driver for the Redmond car.

 

Toretto simply said, "Not much different from the Exa. Except you don't make any bones about the fact that you've strapped a jet engine to an underpowered car. You think those wheels and frame can handle the stress of the race? We'll see how you do on the flats. That's all I can say."

 

"Gentlemen," intoned a voice all around them, coming from well-placed speakers, "We're five minutes from flags-down so anything you need for warm-up, do it now."

 

Jeff punched a button on his keytag to remotely initiate his computers into a final pre-race system check. Toretto slowly strolled back to his car, opened the door and flopped into the driver's seat. His lackey Mark, younger than he but the sharpest of his crew, brushed back a long black lock of hair and positioned it over his ear, then silently joined Toretto in the passenger seat. After Toretto punched several buttons to initiate the engine, Mark  could no longer hold it in.

 

"Don't you think we're about to get smoked here?" Mark said, glancing to the Exa car, "I mean, radar mapping, all those controls and - I mean - "

 

"I know what you mean," Toretto said casually, engaging the first gear, "Just trust the machine."

 

"I know what your philosophy is," Mark sighed, shaking his head, "Put it all under the hood, make it self contained, but what if you need to get creative in the middle of the race?"

 

"Would one of our customers have the option to get creative?" Toretto asked, allowing the car to roll ahead to the starting line. "Do we let them add stuff to the machine? Do we require them to know a lot about what's under the hood?"

 

"No, but -"

 

"But what?"

 

"I don't know what! It just seems like they have more, you know, more -"

 

"More what?"

 

"I don't know what! It just seems like more."

 

"More to break. More to maintain and watch - when the real mission is to go fast on the flats. And everywhere else."

 

"You think we'll win?"

 

"Trust the machine."

 

Presently a racing judge appeared with a flag in each hand, and took his place between the two middle cars. Watching the clock count down, he raised the flags high, then started counting down loudly.

 

"Hold on to your chair," Toretto mumbled, "It's a little rough out of the gate."

 

"I'm ready," Mark said, holding tightly to the chair, pushing against the floorboard to press his back into the chair's leather. He'd made the mistake of eating a meal just prior to the first test runs the week before, and had spent an hour cleaning his half-digested meal from the dashboard and interior windshield. This time, he'd fasted for twenty four hours. Nothing remained in his stomach, he was sure of it.

 

Over in the Exa-racer, Jeff had strapped himself into his seat, and his onboard systems had just finished its run-through only seconds before the flags would fall. The carefully tuned machine would master the flats today. The machine, and his name, would soon be synonymous with extreme speed and power. He would win this race. He was sure of it.

 

Each driver sat in breathless anticipation as the judge counted down to zero, and watched almost in slow motion as the flags went down. But that's when anything "slow motion" utterly ended. Each of the machines engaged their own forms of acceleration. The Redmond machine driver simply turned a valve and flooded the rocket engine with fuel. It's ignition was like an explosion of TNT and it blasted from the line like, well, like a rocket.

 

"They're getting ahead of us," Mark complained as the NZ car's acceleration pulled him deeper into the leather.

 

"It's just a side effect of packaging," Toretto said, his pulse rate not having changed one beat faster, "Just be patient."

 

Without warning, the Redmond machine sputtered and fishtailed its wheels as they passed it, Mark spun his head as the Redmond machine flew past them and they left it in a wall of salty dust. He then looked back at the Exa racer, and to Jeff's eyes riveted forward, set like flint againt the Western sky.

 

"How did you -" Mark began.

 

"Know it would run out of power?" Toretto lifted one side of his mouth, "Get real."

 

"We're still ahead of the others," Mark noted pensively, glancing around toward Tara, who seemed oblivious to everything around her.

 

"It will stay that way," Toretto said simply.

 

"So that's it," said Mark, "We stay in these race positions until the end?"

 

"No, they will think the race is over soon, and make their move."

 

Suddenly Tara's car started gaining ground, like something pushing it from behind. Mark saw her pulling up behind them fast, and faster still, "She's coming. She's coming really fast."

 

"Naah, she's just changed her fuel mix. Thinks going from 55/50 to 25/50 will actually matter."

 

Mark spun toward the Exa racer, now closing the distance, "He's coming too, Are we slowing down, or are they -"

 

"Making their move," Toretto said quietly.

 

"Aren't you going to do something? They're gaining!"

 

"Let them burn out," Toretto chided as the two competitor machines passed them and gained their respective leads, "And besides, the race is won in the architecture, not the gadgets."

 

"What difference does it make if we're behind?"

 

Toretto watched as the odometer slowly ticked over, And over again. "We're almost there, are you strapped in?"

 

"Yes, I'm strapped in, but almost where? Where is there?"

 

"There," Toretto pointed to a tinted stain in the salt flat, and watched the odometer tick over to the prescribed reading. "Here we go. Hold on."

 

"What are you doing?"

 

Toretto ignored him and pressed a switch on the dashboard. They could hear a whining mechanical noise coming from the rear as two gleaming foils slowly rose from the tail of their accelerating vehicle.

 

"What are those?"

 

"What did the Exa driver say?" Toretto reminded, "That at these speeds, bumps count. Actually, at these speeds,what counts is stabilization."

 

"How will those make us more stable? It looks like they're slowing us down!"

 

"Brace yourself," Toretto said, and punched the second button. "Accelerators engaged."

 

In that instant, the air inside the car seemed to grow thin, and the air around them seemed to radically change, buffeting the racer with increasing intensity. Then Mark felt it, a pulling, g-force of acceleration as it pressed him deep into the leather of his chair, and caused the blood to run from his face and into the back of his head. With a whoosh-whoosh, they passed the other two cars as though they were standing still.

 

Jeff watched helplessly as the NZ racer flew past them. Upon glancing down and across the controls, all of their gauges were standing at the max, pinned almost into the red line. Even if he could make it go faster, they would incur irreversible structural stress, and possibly crack apart on the flats, spinning into a million pieces. Jeff furiously spun dials and adjusted controls, attempting to squeeze just a bit more power from the machine. If he couldn't come in first, second place would have to do. Jeff now cursed his own racer as it entered the NZ racer's dust trail. His investors would be livid.


Tara furiously slammed her palm into the steering wheel, repeatedly cursing as the NZ car disappeared into the distance. Switching her fuel mixture from 55/50 to 25/50 had made her car lighter and more agile, but had not offered the additional speed. At least, not that kind of speed.


Then something rushed toward both their cars as the NZ racer crossed the sound barrier, a shockwave ripped up the surface of the salt flat and met them head-on. The Terra car was more stable, so the wave simply bounced its wheels. The Exa car was not so lucky. When the shockwave hit, the passengers heard the sonic boom before they felt it lift the racer's front end and flip it backwards, spinning it in a barrel-roll as it tried to find its footing again. Its back wheels landed first, then the front, causing the back wheels to lift off again, then the front, rocking violently back and forth like this at least five times before the right front tire blew out, sending the vehicle into a wild spin.

 

Jeff could hear and feel the car's structure releasing and popping from the stress. At this speed and rate of rotation, the Exa-racer's uncontrolled spin would rapidly develop enough centrifugal force to turn human brains to scrambled eggs. Jeff felt the red-out coming as an automatic release triggered and both their ejection seats activated, separately catapulting them hundreds of feet into the air. Their parachutes deployed when they reached apex, and Jeff witnessed his car disintegrate on the salt flat.

 

Jeff lifted his gaze into the West, watching the NZ car disappear like a speck in the wake of its own shockwave, churning up the ground behind it. It would likely reach the finish line before his parachute even touched him to the ground.

 

Toretto casually glanced to his rear-view mirror, watchind the salt flat behind him, practically corrugating the ground in his wake. "Hmmm," he finally said, "Maybe bumps do count. Just not for us. And I don't mind giving them a bumpy ride." He settled into his seat, "No sir." And with that, fully understood the frustrated rage building in the minds of his competitors, and soon their investors.

 

And more fully understanding the difference between being fast, and being furious.

0 Comments Permalink

Rick Deckard wiped the sweat from his brow as he holstered his high-powered weapon. Lifting the communicator from his belt, he muttered several codes and closed the transceiver.

 

"Skin jobs," he said to himself, surveying the replicant sprawled on the floor, and amazed at the technology's ability to mimic the most complex entities on earth. He softly kicked the replicant's front panel, observing large hole his weapon had created in the technology's logo. The half-remaining "T" and the "ata" telling him he'd scored big. Another wannabee down for the count.

 

His communicator buzzed for attention. He lifted it, beeped-in and said "Deckard" like he really didn't want to be bothered, but knew such sentiments were useless. Apparently more replicants were on the prowl, having stolen their way into enterprises with myopic POCs, NDAs and a variety of other three-letter-acronyms. He so longed to go Solo.

 

"We've spotted another one," said the dispatcher on the other end, "People are dying."

 

"Dying?" Deckard raised an eyebrow. "That's new."

 

"Dying to get their jobs back after a misfired deployment with a replicant," said the dispatcher, "Get with the program Deckard. You were called from retirement, but you can't be this rusty. Not with this much at stake."

 

"You wanna come out here and be my backup?" Deckard shot back, irritated, "It's easy to criticize from behind a desk."

 

"Keep on talkin'," laughed the dispatcher, "But the day's slippin' by - and so will your replicant if you don't get on the stick."

 

"Yeah, yeah, whatever," Deckard beeped out, sighed and replaced the communicator. The steam rising from the replicant's body reminded him of why his work was important. Stolen money. Stolen dreams.

 

Less than fifteen minutes later, Deckard found himself crouching behind a stack of crates, one eye on the replicant and one eye on his pistol as he wrested it from its holster. Time was, he could draw, shoot and replace it before a replicant could take one mechanical breath. Now, countless CPU clocks dishonored his rustiness, and he needed a new weapon if he ever intended to win.

 

Too late he realized that he'd spent too much time fiddling with the pistol, and upon looking up, found the replicant nowhere in sight. In that moment, he felt the replicant's mechanical breath on the back of his neck, and he whirled to confront it.

 

"Deckard!" shouted the replicant as he delivered a hard backfist, reeling Deckard over the crates to fall hard on the other side. "You should never have returned! You know I can't be beaten in toe-to-toe comparison!" He then split the crates apart and tossed them to each side.

 

Deckard had already reached for his pistol, but it had been just loose enough to fall from the holster when the replicant had ambushed him. Glancing around feverishly, the fear rose in his throat as the replicant took one step forward, grabbed him by the shirt and shook him once. He pulled his fist back and Deckard could hear it hitch, meaning that some special spring had latched in preparation for release, and if the replicant's fist now threw a punch, the impact would take his head clean off his shoulders.

 

"Sleep tight," said the replicant wickedly.

 

But the punch never came. Instead the replicant's eyes widened, his breath shortened and his strength seemed to instantly leave his body. He dropped Deckard like a sack of potatoes, and Deckard wasted no time in scrambling clear. The replicant fell to his knees with a bone-crunching impact, his eyes vacuous, and fell forward with a whump.

 

Deckard glanced around for his weapon, only to be met face to face with another, much younger Blade Runner, holding a smoking weapon, clearly more advanced than his own.

 

"I'm TwinFin," said the Blade Runner meekly, pointing to the twitching mass that was the replicant  "I see you've just run across a more advanced model than you're accustomed to."

 

"Stronger than before," Deckard rasped, wiping the sweat from his face with both hands, "It's been awhile."

 

"Yes," he said, "This one's name is A-Data. He is the most advanced of his kind. A front-loader and high-volume storage capability. Also fast response. Almost as fast as yours, even with age."

 

"Thanks," Deckard responded flatly, unamused, "A-Data, eh?" he smirked, tapping the replicant's leg with his foot, "Well, now he's just an ex A-Data."

 

"True," smiled TwinFin, "But you'll need more power if you want to stay ahead of them," he held out his weapon, a POC-killer if ever Deckard had seen one. On the weapon's barrel, in old-Gothic script, he read the weapon's name "The Closer."

 

"Nice," Deckard quipped.

 

TwinFin suddenly produced an auto-ject unit with the "enzee" logo emblazoned on it, snatched Deckard's hand, and before Deckard could object, injected the enzee accelerant into Deckard's bloodstream.

 

"What the?" Deckard now snatched his hand back, but suddenly felt the chemical's surge of power, "What's in that stuff?"

 

"Secret sauce," TwinFin smiled, "You'll be five-X or more faster response than they are. Your next replicant will go down for the count before the count even begins."

 

"Tight."

 

"You have no idea," he smiled, "And by the way, I'll be right behind you."

 

"I hear some of them are looking for their makers," Deckard posited.

 

"Wouldn't you?" TwinFin said, "I'd sure wonder why I was made that way. Changed from one purpose to another in the middle of my cycle."

 

"I wonder if anyone has noticed, that the replicants are always trying to be like us?"

 

"It's because we're the only standard they know, by which they are measured."


"I also wonder," mused Deckard, "If these replicants dream of electric customers."

0 Comments Permalink

"Blade?" Hannibal King touched the sleeping warrior gently on the shoulder, "Wake up, dude."

 

Blade raised one eyebrow, then slowly opened his left eye. Unafraid of the day or night, the warrior moved his hand ever so slightly to verify the presence of his sword. King could see the taughtness of Blade's shoulder sinews as he slowly shifted his weight on the pallet.

 

"This has better be good," Blade rasped, "I was in the middle of a dream. Kickin' bloodsucker tail," he wiped his hand over his face as though it would wipe away the sleep from his eyes, or the fatigue in his body, but it did neither.

 

"We have some news," King said with a low voice, "The upgrades have arrived."

 

Blade's other eye slowly opened, "Oh?"

 

"Yeah," King laughed, "You're gonna like it."

 

"I'll be there in five," Blade said, half of him wanting to roll over and sleep, and half of him curious about the upgrades. Blade always had a half-and-half approach to life. The bloodsuckers hated him for it.

 

A number of minutes later, the warrior strolled slowly into the main atrium of his personal lair, only to find it strewn with boxes, styrofoam and bubble wrap, "What's all this mess?" he rasped.

 

King appeared from behind one of the largest boxes, a vertical package over eight feet tall, holding a swatch of bubble wrap, "Don't you just love this stuff?" he quipped, violently popping several dozen bubbles with vigorous manipulation.

 

"Stop that!" Blade commanded, ever-despising King's cheeky nature, "Tell me what all this is."

 

"All this," King pointed to a far wall where the apparatus had been installed, "is just for you. At your service."

 

"Blade servers, eh?" Blade took two short steps toward the machines, "What does it do?"

 

"Only slices, dices and makes Julie-Anne cry!" King cackled.

 

Blade was not amused.

 

"Okay, seriously," King began, "Recall some of our - er clients - had some run-ins with the bloodsuckers? Their problems were really that they were working with too little information. Or that it was inaccurate, or not arriving in time. The BI bloodsuckers swoop in to save the day."

 

"I hate bloodsuckers," Blade seethed.

 

"Oookay, so they fell prey to the wiles of the bloodsuckers, promising a better mousetrap and all that."

 

"They always promise."

 

"Moving right along, they promise but don't deliver. Here's where we come in, and help them get on the right track."

 

"How do these machines do that?"

 

"The Blade servers include a special sauce - "

 

"Special sauce. Is it red?"

 

"Uhh, no. But it's all painted in your favorite color. The better part is that you can use this machinery during the day to find opportunities, and still let it work at night, you know, when you're - uh - out."

 

"Hunting bloodsuckers."

 

"Uhh, yeah, so let's focus here. The new server has a special acclerator that basically lights up the night."

 

"Is it ultra-violet light?"

 

"No, but it's ultra-clear light. The kind of light we need to shine on business priorities, SLAs and how to leverage the machine at the enterprise level. You know, best practices."

 

"I don't need any practice. When the sun goes down - "

 

"Okay, look," King interrupted, "The accelerator sits on the blade and does all the analytic streaming work. The server then allows for cache RAM to sit between the disk drives and the processor, so we can keep stuff in memory longer."

 

"I have a long memory for bloodsuckers."

 

"And some clients," King rolled his eyes, "May need long memory for lookup tables, oft-used dimensions and the like."

 

"Are you starting all that other-dimension talk again? I thought I'd made a deal with Stan that we would never introduce - "

 

"No, not alternative dimensions in spacetime," King smirked, "But multidimensional analysis."

 

"I don't follow."

 

"Data analysis."

 

"To what purpose? What are we looking for?"

 

King thought about the question for a moment, realizing that the answer could capture Blade's attention or lose him forever. He finally said "Bloodsuckers."

 

Blade's eyes flashed, "If this will help us find the bloodsuckers, why do we only have one? Why not more?"

 

"Now, now, we should start small and grow tall - "

 

"Platitudes," Blade huffed, "Time is short. Will it find the bloodsuckers or not?"

 

King knew that when he said bloodsuckers, he'd meant the broken processes and data that drain the lifeblood from a company, "Yes, it can help us find them."

 

"Good," Blade finally said, slowly strolling toward the machines. He stared at them for a long moment and finally said. "You work for me, now."

 

"Uhh, Blade," King said, "They can't hear you, they're machines."

 

Blade didn't say anything.

 

"Oh, and I have this," King produced a small metal plate and held it out to Blade.

 

The warrior turned and stared at the object, curious as to its nature. "And this?"

 

"Is a Final Interrogation Node," King said, "For use when you are about to dispatch a bloodsucker."

 

"How does it work?"

 

"You wrap the wrist-strap here," he applied the strap to his own wrist, holding the plate in his hand, then flicked his wrist. The plate flew to nearest stone column, remaining connected to King's wrist with a tether made of high-tensile filament. The plate sank into the stone with a dull rrrriiiiinggg. . King then flicked his wrist again and the plate dismounted, the tension in the tether returning it immediately to his open palm.

 

"That was fun, but what does it do, really?"

 

"When you're done asking questions that anyone can get answers for, the FIN takes it to the next level. And if you have one in each hand - "

 

"Twin Fins, very funny."

 

"You'll still get the answers you're looking for."

 

"I'll always get the answer I want eventually."

 

"Uhh, well, isn't that what the bloodsuckers say? Anyone can give the right answer slow. But these," he held up the FINs," Get the right answers faster than anything."

 

"Even faster than me?"

 

"Faster than Blade alone," King smiled, "Yep, even faster than a blade and all its servers. You still need the FIN's and special sauce. Bloodsuckers don't have those."

 

"Competitive advantage," Blade said in a low whisper, "I like it."

0 Comments Permalink

As a young lad, my Dad had purchased a 1946 Wyllis Jeep. For any of you who are Jeep aficionados, you know that this is a direct, post-war Jeep complete with starter button (war Jeeps didn't have car keys) and four-wheel shift gears). Dad had this thing re-fitted with a power take-off (a rear-gear for attaching appliances) and had purchased a bush-hog to attach to it. Off my Dad went on our property, Jeep in full tilt and bush-hog in tow, slicing and dicing bushes and small trees from our property like a veteran landscape engineer.

 

One day the trailer hitch had a an issue - the towing ball had somehow become bent and needed replacement. Yes, Dad worked these machines to their extreme. Now, if you feel a bit out of place with all these odd terms, imagine my hubris in thinking I knew everything about them just by watching my Dad work with them from the sidelines.

 

In any case, he took the Jeep in to a shop to get the thing fixed, and this mechanic started working on the trailer hitch to loosen it up. Strange thing, though, he was turning the bolt clockwise to get it undone. And everyone knows that in order to undo a bolt, you turn it counterclockwise, right? Of course, those in Australia and Brazil might not turn it this way, but that's an inside joke, too. So I quipped, "You're turning it the wrong way."

 

To which this mechanic simply replied, willing to engage an uppity kid while my Dad just offered me a hot stare, "Are you sure?". To which I responded, thinking that the mechanic actually thought I was a viable entity, "Yep, I'm sure." To which the mechanic said, "You want to bet ten dollars on it?" To which I immediately responded, thinking easy money - "You bet."

 

At this point my Dad simply leaned into me and said the words I would never forget, even to this day, as I share them with you.

 

"Never bet on the other man's game."

 

This initially had a hollow ring, considering that I was on the brink of winning ten dollars, but in that moment the mechanic wrested the object free from its mooring in spite of having turned it the wrong way all that time. And I learned something new, that some devices actually do unscrew in a clockwise direction. Lesson learned, and I did not lose ten dollars. The mechanic was merciful.

 

Licking my wounds and regarding my status of having dodged a bullet, I gained a new appreciation of knowledge, learned in a simple way, that the other man's game is something to approach with high trepidation and respect. If it really is the other man's game, he knows it better than I, so what business do I have on betting with it? It's a sucker bet at best. He knows the game better than I do.

 

So it is with the appliance wanna-bees who have attempted to bet on Netezza's game. That the appliance is the way to go, and they have invested many millions of dollars in attempting to topple Netezza, or at least steal the market share. But this is yet another case of betting on the other man's game, and nobody knows this game better than Netezza.

 

And now, Netezza has changed the game, leaving the competitiion in the dust to once again lick its wounds and wonder, why did they ever bet on the other man's game, and now, what game are they really in?

 

The new Netezza architecture has upped-the-ante on the existing game, and moved the game in another direction that in no uncertain terms, changes the game and the stakes to play it.

 

Apart from browsing the white papers and gathering your own general specification insights to the environment, I can say as a veteran who has worked with this technology extensively that I had a short wish list of things that I thought would be really nice to have. I had a short list of what I thought were functional shortcomings that I had found simple workarounds for, and could painlessly ignore. But now, with the new architecture, those few shortcomings were washed away. The short wish list was fulfilled, and so much more. And in the end, I am a happy clam.

 

On the short runway of things I am looking forward to - include the capacity to cache whole tables, Linux on the lower deck, the Intel-programmability of the parallel environment, and the additional capacity both in storage and in processing power. And these are just a few of my favorite things.

 

Once upon a time, I worked with real-time engines for embedded systems, and was enamored with one software vendor's ability to stay ahead of the pack by simply assimilating the innovations of other competitors. One has to imagine that once a vendor is out-in-front, they can maintain their position through this assimilation process. If they are not out in front, then assimilating other vendors' innovations doesn't have the same impact, because nobody is a frontrunner.

 

That Netezza can take the innovations of other (major) vendors such as IBM and leverage them through simple assimilation, is yet another testimony to Netezza's position as the well-in-front frontrunner. While other vendors attempt to duplicate or imitate, Netezza just moves on, changes the game and leaves them in the dust. Innovations from the vendor remain ensconced (and enhanced) in the new architecture, while other technologies are easily assimilated. That this has given the architecture a stratospheric boost is a testimony to the original architects and visionaries, as well as the existing ones.

 

All that's a lot of gushy sentiment, though, compared to the tailspins that the wanna-bee competitors have been in since they got their first news that the winds were changing. I could use a lot of sailor/sailing analogs here, but I'll spare you. The fact remains, the competitors are scrambling all-hands-on-deck to reset their goal for market share they never really achieved. Could this mean that they are sunk altogether and don't know it yet? Who has a crystal ball, except that we could now pump these quantities into the Netezza architecture and get an answer back faster than they could.

 

Right answer faster: Priceless.

0 Comments Permalink

I heard this sentiment from a senior manager at a large-scale data processing facility, so I thought I'd post it as a provocative talking point. In his mind, when something went really south in the scheme of things, he had to evaluate people as to whether they were incompetent or immoral. Or something in between. You never know what a manager is thinking, apparently.

 

You see, in his mind, he needed a means to label a person rather than an activity. On the other hand, I like the sentiment of another famous consultant, who when asked how things could get so bad, would simply quip "Because honest, hard-working people did the best they could with what they had." Hmm - there's no incompetence or immorality there, just the realization that things can and do go wrong. Case in point, last year we had a project where the workload grossly exceeded the headcount to make-it-happen. With a thousand spinning plates in the air, and not enough people to keep them spinning, invariably a plate would fall to the floor and crash. The manager above would call a meeting and review why the plate crashed, and find someone to blame for it. But in the end, the plate crashed because it's what plates do.

 

Debating the "why" is a waste of time.

 

We see this when the "critical mass" exists to switch horses, sometimes in midstream, from a powerhouse, legacy and mainstay kind of technology to a new, shining future in another, more promising technology. Ahh - you see where this leads. Someone now has turf to protect, and the review of a new technology - or even the hint of replacement - is viewed as an indictment of the existing technology. And, you guessed it, an indictment of the existing technologists. Because of the manager above, the people in the mix begin to wonder if they are being labeled as incompetent, for defending an inadequate technology, or immoral, for having another motive, like defending the technology because it will help keep their job, regardless of whether it is the best choice for their company or team.

 

And if they perceive this labeling, they too will fire off their own labels and soon we see the makings of a classic conflict. I spoke with a leader who had just weathered such a conflict, and he said that he couldn't believe how quickly is seemingly objective, science-minded technologists reduced to feral animals practically overnight. He didn't really have to muscle-through the process like some other extreme cases, but it is important to note, that the conflict is real. The drama brings out interesting colors in people, and shows us what they are made of. And like the second consultant above, it's usually not bad stuff. Just human stuff.

 

People who make an investment in one technology find themselves with an emotional and professional attachment to it. Like hanging on to to a stock ticker even when it's in free-fall, hope springs eternal. Our investment isn't really for nought - if we can just wait it out. My challenge to the average Joe out there, is to do what you've been doing, stay the course and keep the high road. Bad-mouthing the existing technology, or the existing people running the technology, is not a profitable path. When we think about it, the "Enzee way" is to let the machine's power and architecture speak for itself. After all, if we have to resort to the same nefarious activities as a wannabee competitor, doesn't it speak volumes about what we really think of our favored son?

 

Some time ago, I was helping with a competitive POC and when we finally reported the metrics for loading, query and whatnot, we decided to show Netezza in its best possible light, and we agreed with everyone that this would be the case. We watched across-the-way as the competitors stayed late nights, carefully tuning their machine and its attendant parts, while we just tossed data into to the Netezza box, did some basic distribution tuning and that was that. In fact, after getting some initial metrics, we took the worst times and reported on them, not the best times for the loads and queries.

 

When we reported our final numbers, our metrics blew away the competition by a factor of five or more, in some cases much more. When we told the decision-makers that we'd only spent a few hours on the POC, and even then only reported the worst-case numbers, they were stunned. Primarily because the competitive team had spent so much energy on tuning their technology, only to fall short at 20 percent or less of what the Netezza machine could do.

 

And doesn't this kind of story speak volumes - and shows Netezza in the best possible light? After all, it's not really a good story to tell if we have to spend countless hours tuning the machine. The decision-makers know that for the sake of competition, we might spend a lot of time to "get that benchmark", but it will be the only time they ever see the benchmarked metric because they know we won't embrace that kind of intensity when we've actually deployed the technology.

 

I saw a "famous" benchmark on the internet, touted by vendors other than Netezza using technologies that were carefully tuned for the outcome. You know, like an Olympic athlete trains the daylights out of his body to get that one-shining-moment. But catch up with the same athlete years later and find them out-of-the-game, no longer the feared competitor for one primary reason - they can meet the bar once. But they can't sustain it. And this is true of the famous benchmark. They can tune the daylights out of those technologies, and take them to new, never-known heights, and break world records. But if you really want to deploy these at your site, you'll get the standard disclaimer.

 

Your actual mileage may vary.

 

This variability is not found in the Netezza experience. The machine delivers the kind of sheer power and turnaround we need just by breaking the plastic and plugging it in. We don't have to spend countless hours tuning the machine under a hot lamp and after gallons of Red Bull. Power - effortless power - at our fingertips - really is the best possible light for showcasing what the machine can do. One of the customer decision makers said just that - if it requires a swarm of people to get the competitive technology to remotely the same level as a Netezza machine can reach just by powering-on, what kind of story does this tell? That we are committing to high-intensity deployment and maintenance for the life of the technology?

 

Cost of ownership has a lot of different meanings, no?


On a more recent project, we did the common "light" organization of the data and then the report developers cut their BI tool loose on it. When the smoke cleared, the turnaround times on the reports were abysmal. Some of them executed in minutes, some of them tens of minutes and some of them never came back at all. Then the finger-pointing started (from the reporting team) and could not lay enough blame at the foot of the Netezza machine. But soft, what light through yonder window breaks? It is the Marlboro Man, to carefully show the reporting gurus why the Netezza machine is not an SMP/RDBMS machine, and needs a few additional hints (e.g. zone map refs at the query level) to make the reports turnaround at keyboard-speed. Honestly, if it were any other technology - like an SMP/RDBMS, and we encountered such abysmal turnaround time, the answer really would be to fix the database, in the data structures, the indexing, or even at the hardware level. How amazing is it that rather than "going back to formula" - we can just tweak a query or two, and lo, we have stratospheric performance?

 

As it should be.


There is a temptation, you see, to protect the turf one loves so well, by somehow telling a story that does not meet with reality. And in all this, it's no different than saying we cannot get our toaster or blender to work in our kitchen, even though we aren't using them as described in the owner's manual. Netezza is an appliance. It has measurable, deterministic behavior and simply does not deviate from its prescribed, self-contained nature. For someone to claim that a kitchen toaster doesn't work, one only has to ask a few simple questions to determine whether the toaster really doesn't work, or if it's just not being used correctly.

 

And in our case, the Netezza machine is a more complex horse. But the interface to the horse is still the same - a pair of reins and a pat on the neck, and the horse behaves just like we expect it to. Of course, getting four-hundred horses to behave the same way in lockstep, is a matter of architecture. But imagine how much work you could get done if you could package four-hundred-horspower for useful work? It's the difference between a 32-horsepower (SMP/RDBMS) oatmeal-mobile ---  or a 400-horsepower street machine with e-brake for those drifting stunts.

 

Yeah-man, give me the street machine any day.

0 Comments Permalink

Now here's a luxury we don't see every day. After all, if we're a car manufacturer, perhaps an aircraft builder, we have to get it right and make it fast all as part of the original design.

 

Which is not to say that we should just choke up a bunch of data structures and expect Netezza to cover our backs - oh wait - perhaps we really can do this, but it's not always practical.

 

Early in my career I worked with embedded / real-time systems, and while some really believe that they work with "real time" - here's the purist definition: A robot balancing a broom on its open palm. The robot has to make infinitesimal adjustments, in real time at the microsecond level, to keep the broom from falling. In "business" real time, however, we have the luxury of whole seconds to make a decision!

 

In these embedded systems, we had to take things through to a complete functional shakeout. Only then could we see where events collided or didn't make sense. So-called "race" conditions and meta-stable conditions - yep - for those of you who know what these things mean, it can cause you do go gray early and stay that way.

 

Ahem.

 

So the maxim here was always "get it right, and then make it fast". I took this "RTI" (little three-letter-acronym (TLA) for Run-Time-Improvement), into a commercial venture where I developed an expert system engine for medical claims processing. Only when the system was behaving correctly could we then find ways to pinpoint the hot-spots. In one case, one percent of the claims took over 90 percent of the processing. We dug deep to find the issue, optimized for it and voila - the system screams like a banshee. In fact, these improvements boosted processing speed for the areas that were already fast - so it was a double-win.

 

People asked us at the time why we allowed the development process to "suffer" with "poor" performance up until the very end - when we knew good and well that attempting to optimize it while it was still in functional flux, didn't make any sense. We can do sweeping improvements only so much and only so deep. But what if we spent all of our time improving something that three weeks later the client says they don't want any more, or want to go in another direction, and all of our improvements are for nought?

 

Nay, wait a little longer - make it right, then make it fast.

 

Fortunately - and quite luxuriously - in our space the Netezza platform dovetails directly into this approach. It already has all the juice to help us succeed even if we do things badly or inefficiently. But when the smoke clears and we have a working prototype, it's time to roll up sleeves and pop-the-hood, so to speak, to do some RTI on a working system. This is where it gets fun.

 

The following is a short list and is by no means comprehensive

 

(A) - line up the operations in their natural sequence. Find the ones that are longer-running and optimize them. This will provide some degree of relief, but it's still just low-hanging fruit.

 

(B) Locate in-line calculations in the where-clause or join-clause, and reduce or eliminate them. In one case, we need to join on a date-time where the "drift" allowed the timestamps to be different by plus-or-minus three seconds. Rather than put the time+3 and time-3  calculation in the actual query, we precalculated an additional two columns, one with the time minus three and one with time plus 3. We then used a simple "between" operator to get the answer. Time to market - 1000:1 difference in the two run times.  In-line calculations in the where-clause are an invisible power drain. Get them outta there.

 

(C) Find the processing patterns and consolidate them. For example, we had a BI tool executing eight queries to achieve a report output - and each query took approximately 20 seconds. Not bad for having to plumb the equivalent of 53 terabytes of information. Yet eight queries like this delayed the report's display by 160 seconds - almost three minutes. Users don't like to wait this long. So to optimize, we focused on the pattern, that the same basic query existed at the core of each of the eight. By taking the "hard part" and performing an up-front query that did all the heavy-lifting at once, we were able to take the 20 second penalty only once, and the 8 downstream queries returned in 4 seconds or less. So - 160 seconds to 34 seconds with a simple logistical change.

 

(D) Pre-filter or precalculate when consolidating. This means taking a common downstream operation, especially a repetitive one, and moving it upstream to another, even unrelated operation. The above time-drift is one example. Calculate and filter as early as possible, and then all the downstream operations benefit from it. This can offer up more than just a "spot" boost, because if it shaves a few seconds off every downstream query, this can quickly add up to shaving tens of minutes and even hours off our overall processing time.

 

(E) Mind the gap - of what we understand about how an RDBMS works versus Netezza. For example, if we want to leverage filter power in Netezza, we could use a "where exists" clause rather than a regular join. If the regular join cannot leverage a distribution key, then the where-exists is a highly performant option. Likewise if we have a view in the join that does more than just serve up data, like doing a sub-join itself. This can be very costly, and is another hidden drain on the performance that we can pull into a where-exists. So another "gap" is in merging two data sets where they have no common distribution key. The where-exists and similar operations force the machine to obey our optimizations, because we actually know the data, where the machine simply exhanges it with us.

 

(F) Avoid squeezing blood from a stone - It is tempting for said adrenalin junkie to see a cool way to reduce 2 minutes to several seconds - after a rush like this, it becomes addictive. We should not let it go to our head. In one case of processing a nightly batch, one of the client's 14-hour processes had been reduced to less than fifteen minutes. Yet some in the room still groused for more - they came up with an outrageous plan to reduce the processing to less than five minutes, but only four people on the entire planet could ever maintain it, much less enhance or improve it. At some point we have to agree that enough is enough - especially when we are sacrificing valuable things (like extensibility, flexibility etc) for the sake of a few more minutes of adrenalin rush. We must resist!

 

(G) Focus on the target system, not the one you are leaving behind. I've never known a case where someone moves into a new home and refuses to use the appliances in the home just because they didn't exist in their old one. Nor have I ever seen someone buy an new home and then attempt to fill it with new furniture that would only fit in their former home. Who does stuff like this? No, we should use the former system as a functional baseline (describing what)  but not focus on how we implemented the baseline - rather our new technology gives us the ability to spread our wings and fly in ways that we never could in the old environment. Example: One client had over 400 stored procedures to convert, and regarded these stored procedures as the baseline for the actual work, rather than the baseline for the functionality alone. When re-characterized, it reduced to four flows with a handful of core operations each - all with very simple implementations. Trust me, when it takes its final form in Netezza, it will look nothing like its former self.

 

Haven't yet experienced a run-time improvement cycle, or committed time to making it happen? It's worth it - even if only for a sanity-check on an existing implementation or a proof-of-concept on an upcoming one.

 

Either way, we can reach functional closure in a fraction of the time of any other system, and once the functionality is stable (not necessarily locked down, just stable) we can find surprising and dramatic boosts with little additional effort.

 

 

Make it right (functionally speaking) - and then make it fast -

0 Comments Permalink

Before a professional visit to London last year, a friend of mine said to me "Mind the Gap" - and said it's something I would hear a lot. He did not mention the primary context of this phrase. Seems that in some of the London Tube (subway) Stations, their is a significant gap between the platform and the door to the subway car. Westminster station and Kensington station even have "Mind the Gap" engraved on the platform, with a pleasant voice intoning this phrase over the PA. That "gap" can be as much as 8 inches, too wide to just drag a roll-aboard into the car, and too wide to expect a small child to get it right.

 

I can say that in standing up a new Netezza environment there is a common "gap" between what the new users expect to see versus what they experience. Closing this gap will not only accelerate productivity, you will get to closure as quickly as humanly possible, without the humans doing the heavy lifting

 

SPUs do the work - In the NPS, the SPUs are the workhorses, not the overall machine. Push all of the work to the SPUs and avoid hitting the host for a lot of work.

 

What does this mean? Typically you will find a new user implementing the machine in the same way they would implement an RDBMS - that is - thinking about the problem in a single-record-at-a-time model. It is sometimes a challenge for people to restructure their thinking into a "bulk" approach versus a singleton approach. Here's an example:

 

Let's say I have an RDBMS stored procedure with four rules. The common approach is to open a cursor, pull a record-at-a-time, process each of these entities in context of the four rules, and then persist the results. One of our customers has a particular procedure that follows this protocol, and each of these 4-rule entity processes takes about 30 seconds. For 3500 entities, this can eventually add up to hours of time as the table grows.

 

How would we solve this in Netezza?

 

We would focus the work so that each rule is applied en-masse to all the records. We know that Netezza can process 3500 records in the blink of an eye. So if we simply take these rules and "stand them on their side" - we get the effect - Rule 1 for 3500 records, Rule 2 for 3500 records etc - and we finish all the rules in less than a minute.

 

When we say - "apply rule 1 to 3500 records" - this means persisting the data to a temporary table to be consumed by Rule #2.

 

Yikes - some of you will say - temp tables. You're kidding right? This first, knee-jerk reaction to temp-tables is expected from those who shun temp-tables in the RDBMS, because they are so expensive. In Netezza space, the temp-table is your friend. In fact, a most significant ally.

 

Queries do the work?  In an RDBMS setting, it is typical to see "big fat" queries that span pages and pages of work. The owner of this kind of query knows that the data coming off the disk had better come off only once, and be written back only once, and all the work that needs to get done had better get done in-transit, while the data is on-the-move.

 

The maxim of this approach is transactional-thinking - that "If the data is in my hands, I should do as much as possible before sending it back to storage".

 

But this maxim is anathema to a Netezza implementation, where we might see dozens upon dozens of "ELT" queries that manufacture intermediate results toward a conclusion in fraction of the time of their "big fat" counterparts.

 

In short, when we force the query to do the work, we dogpile all of our logic into a single query. When we let the SPUs do the work, we snap apart the query into smaller, more digestible chunks, and the data never leaves the SPUs until we want to consume the final product.

 

SPU means something: In the Netezza machine, the Snippet Processing Unit - that the machine already intends to break the SQL apart into manageable chunks call snippets. Each snippet finds a home in various parts of the architecture. What we want to make sure of - is that the SPUS are getting the majority of the snippets in the query (not the host or the network fabric between SPUS) and one of the best ways to do this is to avoid dogpiling a lot of logic into a single SQL query with the expectation that Netezza will just sort it all out. Oh, Netezza will give you an answer, usually in a fraction of the time of its RDBMS counterpart.

 

But when we really want to optimize the machine, we need to think like the machine does, and this often means injecting simpler SQL statements into the machine, capturing intermediate work in parallel tables, and deriving the same conclusion in yet another fraction of the time.

 

As an example, one of our clients has a query structure like this:

 

select

   sum(a), sum(b), sum(c)....etc

from

    lots of join conditions here

    lots of filter conditions here

    lots of group-by conditions here.

 

For this query, executed by a BI tool, the result came back in about 10 minutes, owing to the fact that one of the tables was over 8 terabytes in size and two others were over four terabytes in size. We had applied all the zone maps we could, and had generally tuned the query for the best fit, down from (a lot more than 10 minutes) to 10 minutes. Yet 10 minutes still seemed like an eternity.

 

One of the problems - the joins occurred across distribution boundaries - so the larger tables were on one distribution and the smaller tables on another, so that bridging them was problematic.

 

Simplest fix: was to divide this query into two - with the heavy-lifting split across the two.

 

Query 1:

select

    (raw columns)

from

    heavy-lifting tables

    using applicable filters

into a temp-table containing only a subset of data

   distributed on the key for Query 2 tables

 

Query 2:

select

       sum( raw columns)

from

        temp table above

        joined to additional tables, leveraging co-location

        and final filters for additional tables

 

 

the execution for the first query was about 15 seconds. For the second one, about 5 seconds.

 

So in a single implementation, we have reduced the time from 10 minuites to 20 seconds just by breaking up the "big fat query" into more digestible parts.

 

 

Does it always work?  In most cases, if we approach the problem in a way the machine will ultimately solve it at the SPU level, and keep the data on the SPUs for as long as possible, then yes, emphatically so. It always works.

 

Is it always necessary? Hmm. well, you tell me. If your users are okay with a 10-minute turnaround on a query, then no, It's not necessary. What is necessary, however, is to be a proper steward of the data and the processing resources hosting it. In practically every case, running a big-fat-query is inefficient and wasteful, and largely borne on the transactional maxims/constraints of the RDBMS approach.

 

Unhook your brain from RDBMS-styled problem solving, and get away from transactional thinking. This is bulk data processing. Everything we do should address data in millions-of-records-at-a-time, not an record-at-a-time.

0 Comments Permalink

What's heating up about as fast as Summer here in Texas, is the excitement over the upcoming EnZee World Tour.

 

I am especially excited this year because I've been tapped to host/emcee the Best Practices sessions in each of the cities, which means that I'll get a front-row seat to hear how the masters of the technology ply their trade and make the Netezza machine sing.

 

After all my fellow Enzees - you are the ones gathered 'round the grill and the ones who make-it-happen. Others of us are often in awe of the rather inspired means and outcomes you so deftly deploy with the technology, and integrate it to the technologies around you.

 

Of all the questions I hear at a customer site on the basic workin's of the machine, there's nothing like sharing war stories with people who pull all those things together and instantiate an operational environment. Especially when you do it by utterly eclipsing the performance of Netezza's displaced predecessor. And here's where we really want to hear the down-low on how things used-to-be versus how-things are.

 

In many cases, I hear that you had an easy time of bringing in the box and making it go. But making the technology go wasn't nearly as difficult as bringing-in-the-box - especially if you have to wheel it past the sneering eyes of doubters or political players who want to see it fail, or at least  - see it be not-so-widly successful as the current expectations might dictate.

 

But Netezza really does meet those lofty expectations, doesn't it? And one of the stories we all love to hear is that type of victory - the dark horse so to speak - championing the cause amidst the pressure of anything-but-technology. The odd thing about new, better technologies is that they are so much better than old technologies that the older technologists cannot believe their own ears. Orders-of-magnitude more power you say? Tish tosh, you must be mad.

 

So when we get into best practice sessions, we speak of things like scanning a terabyte, or 2 or 10, and complain that our query can't seem to cross the X-number-of-seconds boundary. Seconds, mind you. And people hear this and wonder what the complaint really is - after all we can't be working with real data because terabyte-sized table queries always take hours to run, or hadn't you heard this?

 

I recall sitting in on a session with a bunch of people who honestly had money-to-burn. One of them complained that they could not get up to New York often enough, and every time they went their favorite restaurant/play/whatever seemed to be oversold. One of them complained about a broken drawer in his private jet, while another complained about the drafty interior of one of his summer homes. Still other said that they had spent 150k on custom teak wood in their 140-foot sailboat, and had it all ripped out and replaced because it "didn't look right". Ahh, money to burn. People with a completely different list of priorities than the average Joe like me.

 

I say this for contrast, because the things we speak of as Enzees, with the power available at our fingertips in the machine, is utterly foreign to people who have never experienced the power themselves. And it's interesting in best-practice space when we talk about squeezing 9-hour processes into 9 minutes, and then hear our business counterparts wonder if we could squeeze out just a few more. A best-practice balancing act is getting to the solution without over-engineering, and some of you consider this an art form.

 

So Enzees, Artists and those who would kick-the-tires, gather round the grille and let's fire up those steaks, veggies and what-have-you - then the only thing hotter than Summer will be the ideas coming off the cooker -

0 Comments Permalink
1 2 Previous Next