Blog Posts

Blog Posts: 89
Items per page
Statistics: Blogs: 9 Blog Posts: 89   1 2 3 Previous Next

I have arrived at my Forward Operating Base. It took a couple of weeks to get here, having left the USA on a chartered commercial flight. There is a certain pleasure in bypassing TSA altogether and boarding a plane armed to the teeth! I brought an automatic rifle, a grenade launcher, a bayonet, screwdriver, leatherman and a FULLSIZE bottle of shampoo on my flight...that's a felony stop at BOS!

 

Had a brief stay at an airbase where I lived in a huge open barracks in a tent. We have one of those tents here at the FOB. They park helicopters in it, so you can get a sense of the scale of it. Just you and ~300 other folks crammed together in bunks. They didn't give me any Marriott points either. Major downer.

 

From there we crossed into Afghanistan and stayed in Kandahar. HUGE airbase in the southwest. Rocket attack on the first night I was there but apparently they miss the base almost every time. Even if they did, it's so spread out and sparse they still have trouble threatening the troops...

 

Saw a little bit of everything at Kandahar. Canadian, British, French, Belgian, Italian, Romanian, Dutch and Afghan troops. A virtual airshow of the worlds aircraft. A hockey game (no skates). TGIFridays (no free refrills!). Want asian style food for lunch? They had a DFAC (Dining Facility) to satisfy that hankering. Jonesin' for a croissant? Head to the Parisian style cafe. KAF is quite the place.

 

Oh, and that base smells. Bad. The prevailing winds cross over the cesspools and sewage and deliver a serious stank. I was happy to leave.

 

I am now happily ensconced at a very very small base in a desert area. Top temp last year, in the shade, was 147. Accomodations are spartan but livable (I definitely have it better than the Infantry at the moment). The showers DO have water, most of the time, and are dangerously close to being warm. We have a PX here but it's been closed for months. The terrain looks a lot like Nevada, believe it or not.

 

Food is unexceptional but tolerable. If you're not picky that is. Thought it was great when a local dignitary presented us with a goat in thanks for all the work the previous team had accomplished with him. It was OK barbecued.

 

It's a pretty rustic lifestyle here at the FOB.

0 Comments Permalink

Marcus Gray watched in consternation as the viral program cranked up. He knew that in moments the band of hackers would once again take over the Manhattan power grid. For now, they were doing it as a prank. But he also realized it could be a test run for something even bigger. Like a grid-by-grid shutdown of the entire system, opening the door for untold mayhem on the darkened streets.

 

Moments later, messages from the hacker gang started appearing on all their terminals. Taunting barbs letting everyone know that they were in complete control and nobody could stop them. Gray shook his head and closed his eyes, hoping that this would pass quickly. Losing power even in one part of the grid could spell pandemonium and place lives and fortunes at risk. The weight on his shoulders was crushing.

 

"I think I can help," said a voice from behind. Lane McBride from the Federal Counter-Terrorism Unit based in Manhattan, leaned over to regard Gray's terminal.

 

Gray turned to the voice, recognizing it with hope in his eyes, and said, "They're at it again."

 

"I saw the precursors," McBride noted, "That they were entering the system."

 

"Yeah, but it doesn't matter if we can't find exactly where they are," Gray sighed, shaking his head, "They're in a hundred different buildings, including the Empire State. You guys have agents standing by at all of them, but they have to search the buildings floor-by-floor to find them. The problem is, we have to shut down communications for the building so that they can't warn each other. So even if we could catch a few, do you have any idea how long a floor-to-floor search takes in the Empire State? We can't keep that building offline from communication for that long."

 

"Not to worry," McBride grinned, "I have an algorithm that will directly pinpoint their floors. All we have to do is send our officers up to the floor, and I bet we can round them up in minutes."

 

"Wow," Gray whistled, "I'd like to see that."

 

McBride whipped out a flash stick, plugged it in and let the program do its work. Within seconds, it had pinpointed each hacker, the building their signal was coming from and the floor of the building. "Here we go."

 

"I like it," Gray grinned.

 

McBride touched several buttons on his phone and dispatched the information, and monitored as each of the officers acknowledged the information and the plan. "We'll know soon enough."

 

Gray noted, "The problem has always been that they could hear us coming and could shift floors anytime they wanted."

 

"Not this time," McBride smirked, "At least, not if we do it right."

 

The first officer to report back was from the Empire State. Two of the hackers had been stationed there on separate floors. Both were now in custody and unable to warn their cohorts in the other buildings. Gray listened in awe as one by one, the officers reported in, having captured their respective quarries with minimal effort.

 

"That was brilliant," Gray stared at the screen as the weight seemed to lift from his shoulders, "How did you come up with the algorithm?"

 

"Simple process of elimination. I just looked at the problem from a very-large-scale search. The most important information is where the perps aren't - not where they are. The algorithm zones in on the candidate floor by understanding which floors are not candidates. Process of elimination leads the way. So we can search the Empire State and Chrysler buildings just as quickly as a single-story, capture the floor number and we're done."


---------------------

Some of you already see the parallels. It's how a zone map works. But how does it apply?

 

When we take a look at the Record Distribution option in the Netezza Admin GUI, we're often happy with a "ragged edge" for all the SPUs. And a "flat top" is the ticket. But what about the case of a "Manhattan Skyline", where we have high peaks and low valleys? This is higher than normal skew (something we're supposed to avoid, right?) People see those and shun them. However, these are often the natural result of an intermediate table produced by an ELT operation, and often a result of multi-pass queries in a BI tool. These usually leverage the mainstay workhorse CTAS (Create-Table-As-Select), so in many cases, people are tempted to turn on "random" for all CTAS operations. Or just maybe - one of our regular static supporting tables is deliberately distributed as a Manhattan Skyline just because we want to regularly perform co-located joins with it on larger master table using the same distribution key.

 

In any case, a primary reason we would get this kind of Manhattan Skyline distribution is if we are trying to preserve an existing distribution in order to perform a follow-on operation with tables on the same distribution. Whew! And why would we allow this to continue? Isn't a random distribution better than a Manhattan Skyline? Our problem remains: if the table has such a Manhattan Skyline distribution, we have higher than normal skew. Any full-scan on the table will cause the query to perform as slow as the "tallest bar"  (the SPU with too much of the table's data). As the table grows in size, the problem worsens. It is not a scalable distribution in its latent form, so don't embrace one without a plan.

 

Well, random distribution has a risk too, especially at the BI level, of negatively affecting concurrency performance. Even if our individual queries are not hindered by the data-broadcast incurred by the random distribution, they could just be a one-hit-wonder, because running many of these operations side-by-side can sometimes saturate the inter-SPU fabric, affecting concurrency. If we can keep the processing on the SPUs, we can avoid this problem entirely. So the issue is one of user scalability, something that all of us care about and that the other vendors (sometimes) turn a blind eye to. Netezza has it covered, and as usual, it's so simple a cave man could do it (now I'll get mail!)

 

So now we have two options, neither of which seem good - (a) keep the Manhattan Skyline distribution or (b) use a random one. Let me say that random is not always bad, but it poses a potential danger for concurrency. Likewise the Manhattan Skyline can often be a latent result of an intermediate CTAS so is unavoidable anyhow. And why would we want to preserve an existing distribution on a CTAS? The answer - because it will be a co-located write (blazingly fast). But wait! Don't we get a co-located write by default?

 

Maybe.

 

I have noted in prior posts how the default distribution for a CTAS might not be what we want or expected, so here's a quick recap:

 

(a) For simple single-table CTAS, it will preserve the source distribution key - (co-located write)

(b) For simple multi-table-join CTAS, it will leverage the first column result in the "select" clause (maybe a co-located write)

(c) For CTAS using summaries/group functions in the select, it will leverage the columns in the "group-by" clause (rarely a co-located write)

 

If any of the above are not the original distribution of the source(s), we could inadvertently sacrfice our co-located write. But we can preserve it if we specifically use "distribute on" with the CTAS execution. With co-located writes, this means the data never leaves the SPUs. If we distribute the CTAS on anything else, the data must leave its current SPU and find its way to another one. This initiates a data broadcast (and can negatively affect concurrency). Preserving the distribution, we get the benefit of a co-located write (avoiding broadcast to make the table) and set up the next operation for a co-located read (also avoid the broadcast to leverage the table). Short answer: preserving the distribution preserves concurrency performance. Now the SPUs are working for us at physics-speed.

 

Rather than just live with the latent effects, lets embrace and harness them for the good of all mankind. Well - er -  at least for our user base.

 

What we really want is threefold -

 

(1) preserve the distribution with a co-located write (preserve concurrency, potential Manhattan Skyline as latent artifact)
(2) leverage the result with a co-located read (preserve concurrency, potential penalty from Manhattan Skyline)
(3) mitigate the Manhattan Skyline with a zone map (ahh, best of all worlds)

 

So to get the first two, we can simply preserve the distribution with a "distribute on (key)" clause and make sure the distribution key is part of the "where/join" operations.. This is the simple part.

 

To get the third, we need to either (a) sort the data as it is created, or (b) make a materialized view after-the-fact to get the zone map effect for selected columns. The first one (sorting) is often easier than it sounds, and with strongly filtered intermediate tables is also very scalable. The second one (materialized view) has some caveats but is very fast to create. What does the zone map actually do? It effectively stripes each SPUs portion of the table so that only the section in the zone is actually addressed. Like McBride's algorithm, it's as though the rest of the data isn't even there, because the zone map has guided the optimizer to completely ignore it. So whether the SPU's data has a tall bar or a short bar, the performance is the same. We need all three of the above and the zone map mitigates the potential problem of unexpectedly high skew from an intermediate distribution - or an outlier table that we need to distribute on a common key. Even if (1) and (2) above give us a good distribution today, it could always "go Manhattan" in the future.

 

Another obvious question is "If this is an intermediate result, why bother? Just filter out the stuff I don't want and then there's no issue, right?" Well, technically yes, for a single operation, but I know of at least a dozen cases where the intermediate table is used for a lot of downstream activity, not just a one-off throwaway. So our stewardship rule is: make the data better. For the next downstream process or the ultimate data consumer, the data should get better every time we touch it.

 

Rather than rewrite or re-design a carefully tested and detailed process, adding a simple "order by" or MV is easy and preserves the existing logic, and data model, with little impact and high return. This is especially true of a static supporting table, because we can install what we need on the table's creation. The consuming processes all benefit from it with no more than regular query execution (materialized views are transparent).

 

In the end, we can still leverage the plain-vanilla parts of the Netezza performance model (zone maps, co-location) without having to over-engineer the data using indexes, intersection tables or summaries. This preserves something more  - the ongoing resilience and adaptability of the model itself.

 

Recap:

 

  • Apply the "distribute on" clause of the CTAS to avoid the latent effect of default distribution.
  • Preserve co-location for reads and writes in intermediate tables.
  • If a potential Manhattan Skyline distribution is the CTAS result, rather than go random, sort the CTAS result by a selected column or use a materialized view.
  • As always, apply strong filters to the CTAS creation so that it's not simply copying one table's contents to another (carve the data out).
  • Experiment for the best fit, but remember that Netezza is an appliance.
  • We don't need to engineer the queries, only apply simple performance model alignments in the data itself, to leverage the machine's physics
0 Comments Permalink

Enzee Universe 2010 attendees can sign up to participate in a pre-conference Best Practices forum led by David Birmingham - author of the Netezza Underground book and Gather 'round the Grill blog, and Senior Principal Consultant at Brightlight Consulting.

During this half-day session from 8:30am to 2:30pm EST on Monday, June 21, 2010, David will share insights and best practices on the topics outlined below and then open each for discussion and QA.

Space is limited, so be sure to sign up for the Pre-conference Best Practices session when you register for Enzee Universe.

Topics for discussion will include:

  • Migrating an Existing Data Warehouse to Netezza
  • Super-charged Data Models
  • Holistic Performance Planning
  • Flexible Hierarchies, Dynamic Alignment, Snapshot Fact Views
  • How-to Examples
  • MViews, Views, and Aggregations
  • ELT How-to's
  • Recovery
  • Data Intake and Staging Strategies
  • Schema Switching, Replica Tables
  • Data Management, Horizontal and Vertical Partitioning, High-traffic Denormalization

Register today to hold your spot.

0 Comments Permalink

Training is now complete. Our tribe is formed and we are ready to perform our mission. Over these first few months of active duty, I have received specialized training on how to perform my specific job, as well as more 'general' training that allows me to work with the rest of the team when we are outside the wire. Now I'm itching to get to work.

 

I can best describe that training time as building an airplane while we flew it. Seemed like every week, new people joined the team from somewhere else, and had to integrate with the rest of the group. Of course, the group was already running at a fast pace, so the noobs had to work hard to catch up! This is where we learned what to do and what NOT to do. Or in Rumsey-speak, we learned a little about the 'known unknowns'.

 

From previous posts, it should be obvious that I've become very fond of my team. It's a special group, with people from various commands, different parts of the US (and some parts not in the USA too!), active and reserve military services, even different cultures. We got it together quickly and rocked this training from start to finish. I LIKE these people. We work hard, we play hard and when the times comes, we'll fight hard. These are the kind of people, when thrown together by random circumstance, become lifelong friends. That doesn't always happen. I'm glad it happened here.

 

We can't finish our mission until we start. Fortunately, we've got the right people, in the right place, with the right gear and the right attitude, to make it a great start.

0 Comments Permalink

"You didn't kill it," fumed the customer, "You said you would kill it."

 

"We've had some, er, labor setbacks," said Bjorn, head of DragonSlayers Inc, a startup boutique firm from several valleys away.

 

"I don't see an excuse clause in the contract," the customer shot back, "Kill the dragon or we're done."

 

"The dragon can't be killed," said a rich Scottish voice striding up to meet them.

 

Bjorn recognized the stealthy character, by name of Connery, from the Information Superhighway Roadside Assistance Service.

 

"I didn't realize that RAS was in the area," Bjorn quipped, offering a hand to Connery.

 

Connery grasped Bjorn's hand and shook it once, "We're all over. Been doing a little cleanup of this or that."

 

"What's this about the dragon," asked the customer, "That it can't be killed?"

 

"Of course not," Connery smiled, "It's a dragon. It's immortal."

 

"Did you know this?" the customer glared at Bjorn, "Have you been stringing us along?"

 

"No," Bjorn defended, "We kill dragons. It's what we do."

 

"Well," Connery chuckled, "Not real dragons, anyhow."

 

The customer's lackey approached them with a small flagon of tea, poured a stein for each of them, and departed.

 

"The dragon is immortal," Connery muttered, sipping his tea.

 

"That's impossible," Bjorn said through a long gasp, "We've killed dragons before - we "

 

"But of course you have," Connery smiled dismissively, drawing another casual sip.

 

Bjorn stared at him, unable to form another word.

 

"If the dragon can't be killed," asked the customer, "Then what?"

 

"In the nether worlds, beyond the mapped regions, you'll see little notation There Be Dragons," Connery said softly, "And whether there be dragons or not, it's uncharted territory. Places no man has ventured, but rest assured danger lurks. Unknown to the uninitiated."

 

"So you know what lies in the uncharted territories?" Bjorn sneered.

 

"It's why I'm a guide and you're a dragonslayer," Connery huffed, "Whether you know your way or not, dragon chow comes in many shapes and sizes," he put his hands up as if to size-up Bjorn, "Many shapes and sizes."


"Funny," Bjorn quipped, but it clearly wasn't funny, "All we have to do is get close enough."


"Reminds me of a time," Connery said wistfully, "Once I knew a man who you could skewer a hundred times and he'd still get right back up."

 

"Ahh, the Highlander," said Bjorn, "I've heard of him."

 

"Well, he never lost his head," Connery huffed, "Or that would've been the end of him."

 

"What are you saying?"

 

"The treacheries of the lands beyond are many. You have to keep your wits about you. Keep your head."

 

"Keep my head, got it," Bjorn said sarcastically, "Anything else?"

 

"You need to deal with the whole dragon," Connery advised, "Not just the part you wrap with that silly leash. It won't hold the dragon. Only a dungeon will."

 

"So we need an enchanter?" Bjorn smirked.

 

"In no uncertain terms," Connery said, laughing, "You have a go at that dragon on your own. Go in there with no more than an enchanter's bag of tricks, and he'll make an ash out of you!"

 

Bjorn gulped, "We'll see about that!"

 

One of the lackeys turned to the other and chortled, "He thinks he's James Bond!"

 

"What do you know about it?" Connery shot back with piercing eyes, "The dragon sends your consultants to the street and you send the dragon to the morgue. Is that how it's done in data warehousing?"


"Basically, yes," snickered a lackey.


Connery whirled, "No morgue will hold him." He turned to the customer and glared hotly, "What are you prepared to do?"


"Sign the contract," said the customer, quickly applying a signature. He stuffed the papers into Connery's hands and hastily departed, leaving the men to set sail and dispatch the dragon as soon as possible.


The boat ride to the dragon's coast was uneventful until the boat ran aground near the shore, screeching loudly against the rocks as its keel protested with a deep, gutteral groan.

 

"That's noise will stir the dragon," Bjorn bemoaned. He'd hoped for a more stealthy entrance.

 

"Hopefully only stirred," Connery quipped as he snatched up his bag, "Not shaken. Won't do to have him awake when we approach, right?"

 

"Coastline is enormous," Bjorn complained, "How will we ever pinpoint his location?"

 

"To find the dragon, you'll need to think his thoughts. Know your adversary. Know his heart."

 

"Yeah, Dragonheart," chuckled a lackey, "Seen the movie."

 

Connery ignored him and leapt from the boat onto the dry shore. "Welcome to the Rock," and then looked out over the vast, scorched wasteland, a product of the dragon's handiwork. He led the team up the rocky slope to the first rise, whipped out his spyglass and waved his hand to the others to belay their ascent.

 

"What's he doing?" asked one lackey to another.

 

"Lookin' around, I guess," smickered one, "Guess nobody told him that the dragon sleeps all day."

 

"What was that?" Connery whispered loudly enough for them to hear, "You think the dragon sleeps all day? Who are you kidding? Maybe you only struggle with him in his lair at night, but he breathes fire all day long. He never sleeps. He never dies."

 

"Where did we find this kook?" asked another, "He's as nutty as a fruitcake."

 

"He'll eat you alive," Connery sneered, trying to spot motion anywhere along the landscape before proceeding. In the distance, a dank mist arose from the ground near some caves. Connery zoomed in and spied dragon scales littering the ground. "Let's go."


The team made the tedious crossing without incident, until they stood before the open, reeking maw of the dragon's lair.

 

"Who wants to go first?" Connery chuckled.

 

"I will," said a lackey fearlessly, "I've taken down enough of these."

 

"But of course you have," Connery strode to the nearest large boulder while the others scattered for cover. After several tedious minutes, all of them could now feel the impact tremors shaking the ground, growing in intensity as the beast ascended from his lair to the cave's mouth.


Then the horns appeared, fifty feet from point-to-point as they slowly rose from the hole. Then the head,larger than a common city bus and almost twice as long. The dragon stared down the lackey for a long moment, then continued to ascend from the hole, growing larger and more hideous with each passing second until his entire upper body was revealed, from his head down to his midsection, standing over ten stories tall. He burst-extended his massive wing membranes with a loud, deafening snap, and then pointed his head straight up to gather a deep breath of air.

 

Connery reached down to pick up one of the many dragon scales scattered all over the ground. Five inches across and eight inches long, made of the most impervious stuff on earth. He flipped it over and shuddered to realize the dragon's age, betrayed in the scale's growth rings. Four thousand years, this animal had been eating and breathing fire.


The young lackey had forgotten to breathe. This dragon was orders-of-magnitude larger than any dragon he'd ever dealt with. In fact, the sheer scale of the dragon made him feel light-headed. Gathering his presence of mind, he took a defiant stance and shouted, "Begone, Dragon!"

 

Connery turned away, trying to hold back a snicker that could reveal his location to the dragon's attenuated senses.

 

The dragon pointed his nose straight down, cocked his head to the side, opened his mouth and released his breath. The column of high-intensity chemical fire blasted downward on the lackey, instantly reducing him to ash and causing the rocks all around where he'd stood to glow and almost melt.

 

Connery glanced over to the rest of the team, cowering behind the rocks in hiding, not believing that the dragon was so huge and powerful, and feeling completely beyond their depth. They stared, partly in awe and partly in concern, as Connery stepped out from behind his hiding place and boldly strode up to the dragon's cave.

 

The dragon once again drew breath into his nostrils to recharge his furnace, when Connery simply placed his hands behind his back and stared deeply into the dragon's eyes.

 

The dragon stared back, unable to comprehend the feeling of drowsiness suddenly overtaking him. He slowly lowered his head, then his body, down to the ground to gently lay next to Connery, unable to break his eyes away from Connery's deep, mesmerizing gaze.

 

Once completely settled, Connery reached out to tap the dragon's front jawbone as it drifted off to sleep, "There now," Connery said soothingly, "That's a good lad."

 

"How is this possible?" Bjorn gasped, stunned at how easily Connery had mastered the beast.

 

"Your friend told the dragon to leave," Connery huffed, "But the dragon isn't going anywhere. He lives here and you people don't. In fact, he's been around so long, and you people come and go so often, that he sees you as decorations, not even permanent fixtures in his home."

 

"But he just laid his head down and went to sleep," Bjorn noted, "How did you do it?"


"The dragon serves me," Connery said slowly, "Not the other way around. If the dragon needs to breathe fire, it's because we've not done a good job harnessing the dragon, not just because the dragon is mean."

 

"So dragon's aren't mean?"

 

"Oh, their born mean," Connery chuckled, "And they bite. Whom they bite and when, is ours to control. That's why we have dungeons. Places where the dragon will survive but under our control. Think about putting that dragon's breath to work in boiling water, making steam to run a turbine. Now the dragon is working for us."

 

"Can't be a happy existence for him."

 

"Happy? Perhaps not. Necessary? Most definitely. You came here to kill him or banish him. He knows his place. He only responds to someone who knows it as well as he does."

 

"You're an enchanter, aren't you?" Bjorn said, realizing Connery's identity.

 

"Some call me, Tim."

0 Comments Permalink

Netezza users and partners that attend Enzee Universe 2010 have the option to take one of three post-conference training courses, regularly valued at $1950.

 

  • Netezza Data Warehouse Appliance Usage – for DBAs, system administrators, application developers and data warehouse architects that are new to Netezza
  • Advanced Analytics Inside Netezza – a brand new, limited availability course that will show analytic developers and data miners how to write and use compute-intensive analytics within Netezza
  • Mantra for Data Compliance – another newly available course to teach Netezza users how to leverage the Mantra product for data compliance

 

Space is limited, so register today to hold your spot.

Post-conference training will take place between 12 noon EST Wednesday, June 23 and 5pm EST Friday, June 25, 2010. Click here for more information about this year’s post-conference training courses.

0 Comments Permalink

Many of those who integrate the mainstream BI tools into various underpinning data sources find subtle nuances. Not the least of which is how the database will respond to the queries presented. In Netezza data access especially, the power is not found in the query, but in the hardware. We can certainly degrade our experience with bad queries, but we would not tune queries in the same manner as with an SMP/RDBMS.

 

For example, I've watched RDBMS engineers work black-magic with a query by simply rearranging this-or-that in the monolithic query to provide boosts in the orders-of-magnitude. This is because the query is being used to guide the general-purpose physics. In Netezza, however, the purpose-driven physics snips the query apart. The physics then guides the query's mechanics. I've watched newbie Netezza folks nearly pull their hair out - and their eyelashes too! - when trying to "make the machine do what I want". Hmm, no, the machine does what it does. It's an appliance. We get what we want when we conform the data to the physics. The query is just along for the ride.

 

How does all this apply to multi-pass SQL in a BI Tool? Well, most BI tools come to the table with a pre-conceived notion that all databases are created equal. Unless they have specific VLDB hooks, and unless those hooks fully embrace VLDB principles, the BI tool will not experience the expected lift and we'll likely have to help it out. In fact, little about a BI tool is purpose-built in regards to its data source. It regards data sources as general purpose interfaces so it can be as vendor-neutral as possible.


Unlike a standard star-schema, many VLDB tables are fact-sized tables containing billions of rows, as are their dimensional counterparts. So a single one-shot query will sometimes provide the functional answer but with unacceptable performance. Many of us have seen multi-page (hey, 100+ page) queries that try to do everything in one shot. The average RDBMS leaves us few options. The VLDB and especially Netezza is not so constrained. We can make multiple passes on the data often with little penalty. The danger here is in the inefficiency of the passes, not whether multi-pass is okay. Multi-pass, or more appropriately multi-stage SQL,  is a necessary approach with large-scale tables. Netezza makes it simple and fast, using built-in concepts of its performance model.

 

Here is a spot case-study - a BI tool needed to access several tables that were each in the many billions of records. The end result was a summary of user-selected values. The temp-table creation here is done automatically by the BI-Tool, so we may have limited options in getting it to shape them as needed. In the examples below, I'll label the queries so we can reference them later.


A typical BI tool, upon realizing it needs a summary, will often divide the answer into multiiple stages of work. Each stage will store its result in a temporary table using a CTAS, leveraged in one or more following passes. Unfortunately these passes are sometimes inefficient. In the case below (this is pseudo-SQL, so bear with me here)


(1a) create t1 as select region, district, store, sum(transaction_amt) sumtran, sum(transaction_tax) sumtax from transactions where district_id=4 group by region, district, store;  (1 million records)

-

(1b) create t2 as Select  employee_id, employee_name, t2.store_id from employee_master t2, employee_lookup t3 where store_id=6 and t2.store_id=t3.store_id                   (500 records)

-

(1c) select store_id, employee_id, employee_name, sumtran, sumtax from  t1, t2 where t1.store_id = t2.store_id and t2.region_id in (41,42) and t1.store_id = 6;                     (450 records)

 

Note how in the above, the filter effects are largely applied last (1b and 1c) with the summaries applied first (1a). In this case, it is summarizing over a million values but it throws away over 90 percent of this result on the last operation, reducing 1 million records to 450. It is still accessing the larger table (transactions) only once. It just does it at the wrong time.

 

If we invert this chain and regard the filters first, we might see queries like this:

 

(2a) create t1 as select region, district, store, transaction_amt, transaction_tax from transactions where district_id=4 and region_id in (41,42) and store_id=6;            (15,000 raw records)

-

(2b) create t2 as Select  employee_id, employee_name, t2.store_id from employee_master t2, employee_lookup t3 where store_id=6 and t2.store_id=t3.store_id              (500 records)

-

(2c) select store_id, employee_id, employee_name, sum(transaction_amt) sumtran, sum(transaction_tax) sumtax from  t1, t2 where t1.store_id = t2.store_id;       (450 records)


In the above, the filters are pushed into the first part of the query chain (2a) to squeeze down the data sizes, but to also glean out the raw values for the final summary (transaction_amt, transaction_tax). The (2b) query is still a filter, but by the time we get to (2c) all we really need to do is summarize based on the intermediate table results. We don't have to "go back to the well" of the larger table. Everything we need for the final result is already in our hands, and a much smaller workload.

 

The simple inversion of the query order has significantly reduced the workload of the entire chain of events. This of course, does not answer whether our BI tool will actually implement the query in this order or manner. Anecdotally, with the above tables the original "transactions" table was over 30 billion very wide rows. The first query chain (1a-1c) takes no less than a minute, but only because key1 is zone mapped. The second query chain (2a-2c) takes 6 seconds or less, and it better represents a flow of data from larger-to-smaller, like a common source-to-target flow. It is easier to visualize and manage, and is more efficient.

 

Note: Can our BI tool shape a query chain in this manner? Can it glean out in the raw columns to an intermediate table, later summarizing on the intermediate? Or will it always require us to summarize at the outset and then squeeze out from there? Some BI tools are very close to this model already.

 

Yet another pernicious issue is not obvious from the above - temp table distribution. This last query chain, though 6 seconds in duration, is still a one-hit wonder. Once two or more users start hitting the machine, concurrency will reveal all. The machine is quickly saturated and all of the queries start to take more and more time. In one case of just five users on the machine, all of the queries took over a minute, and one took over five minutes. Concurrency tuning is a bread-and-butter issue, too, so what's going on here?

 

In both query chains, the CTAS is not being given explicit instructions on how to distribute its results. The outcome is unpredictable from the BI tool's perspective, but very predictable for us. When the CTAS result remains distributed on its original distribution, we get a co-located write. If the CTAS does not use the original distribution, it will have to redistribute the data, broadcasting it all over the SPUs. We need to avoid this because co-located writes are desireable and muey caliente.

 

The original distribution key for the transaction table is (transaction_id). This doesn't do us much good if we are later focusing on the store_id (2b, 2c) as the primary distribution. In order for the final activities to be as quick as possible, we need to bridge the transactions into the store_id. We could set up data structures to do this, but in the end with so few records coming off the transaction table in the (2a-2c) chain, an intermediate broadcast is already in the mix. We can do it deliberately under our control, or allow it to use CTAS defaults. In this case, the CTAS default is worse.

-

In the first chain of queries (1a-1c), we would expect to see the following CTAS defaults:

 

(1a) - distributed on (region, district, store) because this is the group-by clause. It cannot use transaction_id for a co-located write because it's not even in the result set. Those who understand distribution keys know that this is not an optimal state of affairs.
-

(1b) - distributed on (employee_id) because it happens to be the first column in the select-clause. This query uses two tables in the join, so
     CTAS will opt for using a column in the select clause.

 

So in this case, the CTAS will not preserve the original distribution or even a useful distribution. Don't get me wrong here. CTAS defaults are acceptable in over 90 percent of cases. This example is offered as a typical one-off of BI automated query construction. The first query (1a) will produce a million records (and honestly, some cases it produced a couple of billion records) we really need some optimization here.


If we were to take (2a) and (2b) above to deliberately enforce the distribution, we would use the "distribute on (store_id)", but we would have to include store_id in the result set. In each case, this would prepare both tables for the final query (2c) for a co-located join.

 

Note: This brings up another BI tool issue, in that we need to affect the order of the sequence, and also provide for columns that are adminstrative (like store_id) but not part of the final result. Some BI tools are picky this way. If the column is not required in the final reporting output, it trims or ignores the need for the column in the intermediate tables.

 

To continue, we have now pushed the workload into the physics, not the query itself. But as noted, concurrency is the test.  This final chain of co-located queries then returned in less than 3 seconds, and did not grow beyond 4 seconds until 20 users were running the same query at the same time, and even then tended to hover between 3 and 5 seconds as even
more users were added. Isn't this the kind of scalable performance we want?

 

Additional note: If we really want to push this harder, it would be best for us to manufacture a "store_transactions" table that is distributed on the store_id already (for the 2a query). This would be a report-facing table that essentially mirrored the transactions table, but only carrying the high-traffic reporting columns. In this way, the store_id becomes the universal distribution even for the very first query. Keep in mind that while this strategy may cost disk space, it will further eliminate concurrency issues. I am not a big fan of preserving disk space when performance issues are in play. We will still need to perform a "distribute on (store_id)" for each (2a,2b) but it will preserve the distribution with a co-located write.

 

But we can see, the two protocols we will need in play from the BI tool is to use capture-filtration-summary, and then also apply distribution keys deliberately to the first passes to preserive distribution. We often apply these very same protocols in ELT because they make sense. But we have complete, detailed control of query construction in ELT, not so in the BI Tool world.

 

Conclusion: Rather than use a BI tool's default of summary-filter chain, what we need is capture-filter-summary chain. This guarantees that we can leverage the VLDB physics, but also moves the data from larger-to-smaller in the most efficient manner.

 

Recap for Multi-Stage SQL:

  • especially for summary data, should perform the summary as the final operation, with capture-and-filtration in the first passes. This allows the final operation to be a simple summary, since all the filtration has already been applied. In other words, no more where-clause activity apart from the join criteria.
  • Organize the tables (including additional tables) on the distribution key in play. Bridging one distribution to another can give us the performance, but if broadcasting it can eventually create a concurrency problem
  • the chain should not address the same large table more than once. Get everything we will need and get out - don't keep coming back for something the first pass did not get.
  • the chain should capture raw information into an intermediate table, foregoing the summary until the final operation.
  • should provide a means to bridge one distribution key into another, for maximum efficiency, rather than using CTAS defaults.
  • should perform filtration at the outset, as a method toward attacking the larger table(s) with zone maps etc.. Move from larger data sets to smaller ones.
  • should preserve distribution to leverage co-located write and read where possible. This maximizes overall performance but also optimizes concurrency.


What if the BI tool will not, as a general-purpose tool, perform these deliberate and purposeful query chains? At this point, we need to have a heart-to-heart with the BI Tool vendor stating our concerns. Assume the best, that the tool vendor may eventually fix the issue, just not in time to help us now. We then need to consider two purpose-built options, each of which has its own issues. These are offered in the spirit of temporary adaptation until the BI tool is smart enough to bypass them.

 

Summary tables: These are often constructed to prop up database performance issues. They are just as viable for functional reasons, such as providing data in a form that is only available and most efficient when summarized, or to intersect details with pre-summarized data. But if used as a performance prop or BI Tool helper, put some effort into making it an adaptation that could be deprecated when the BI Tool is smarter. This way, we're not committed to it forever.

 

Stored procedures: Used in an appliance as an adaptation mechanism (in this context). Effectively bridges the BI tool to the data with a temporary procedural construct (the procedure) rather than a more permanent structure (like a summary table). Stored procedures pull application features down to the database level and adapt the BI tool into the Netezza performance model.

 

When or whether to use either of the above is always a design decison, not necessarily dictated by the tools themselves. But keep in mind the idea of temporary adaptation. I am always of the mindset that the warehouse and BI environment must exist with the expectation of change, so in general, adaptability and adaptation concepts are always desireable. They allow us to be more responsive to future requirements

0 Comments Permalink

Hello everyone, sorry for the huge delay but it couldn't be avoided. I'm at a training site that emulates a base in Afghanistan, and they take the limited bandwidth situation to the extreme. Both in terms of available downtime for me, as well as in terms of throughput to the InterWeb!

 

Since my lasdt update I have visited Fort Gordon where I completed the Brigade S-6 Communications Officers course. An interesting course, it taught me the ins and outs of the role I will play while downrange in Afghanistan and what will be expected of me and my staff while were are there. It was challenging, as I learned more about Antenna and RF theory then I care to admit, and the boring minutae of virtualized servers. Glad I did it of course, but happier still to graduate and move one step closer to starting my mission.

 

I arrived at an Army Camp in Indiana at the beginning of December. Here I joined the main element of my team and began training as a whole unit, rather than developing individual skill sets. The people have been great and I'm excited to be working with them. Safe to say that all walks of life are covered by this group, and they come from every corner of the US and its Territories worldwide.

 

In fact, I feel bad for some of the folks from the warmer climes, since we've had some shockingly cold and damp weather since getting here. It's almost a cliche about 'Army' training, but as it turns out, this base has the facilities and staff we need, but the terrain and weather couldn't be further from what we'll experienience in-country!

 

So far I've spent time training on driving and maintaining various vehicles, crew served weapons, combat lifesaver medical techniques and plenty of language training. To be honest, the language stuff is taking me a bit longer than I had hoped. We won't even talk about learning to read and write Dari at this point! I'll get there, hopefully sooner than later...

 

Probably worth mentioning this one observation: I am so impressed by my Sailors, Soldiers and Airmen. They have such positive attitudes, endure the rigors of road marches in the bitter cold, miss no opportunity to laugh and in general make me see what a privilege it is to be an American. If someone doubts if the coming generations have any promise, I'm here to tell you with absolute certainty; They Do. Great people, and I'm glad to be rolling with them on this mission.

 

I've been in contact with a few of my Netezza friends, and I'll be sure to call them at least one more time before I board the plane. I hope to make more frequent updates to the blog, and of course, everyone should feel free to ping me at narmychief@gmail.com if they so inclined.

 

Best of luck for a strong close to FY2010 and for a great start on FY2011! I'm looking forward to surfing the wave when I get back!

0 Comments Permalink

Enzee Universe 2010 is returning to its roots. This year, Netezza will host a single, multi-day user conference for customers, partners and prospects across the globe to get together in Boston from June 21-23, 2010. Here are some basic details that might get you revved up for the event:

WHO:

Netezza users, partners, and employees from around the globe, along with esteemed industry analysts and thought leaders in business intelligence and analytics

WHAT:

The premier data warehousing and analytics conference of 2010

WHEN:

June 21-23, 2010 (click to save to your Outlook calendar)

WHERE:

Westin Boston Waterfront, 425 Summer St., Boston MA 02210

WHY:

A value-packed agenda, complimentary pre- and post-conference education, priceless networking opportunities, and a free t-shirt!

HOW MUCH:

$475

More details, including a shell agenda, are available at www.enzeeuniverse.com...

0 Comments Permalink

I've been noticeably quiet over the past weeks as I've switched horses, so to speak, and joined Brightlight Consulting. I had already been following Brightlight for a number of years, encountering their significantly talented people at various Netezza sites across the fruited plain.

 

the press release is here:

 

http://www.brightlightconsulting.com/news_2010_DavidBirmingham.htm

 

 

At the Netezza conferences this year, many of you saw the slow-motion videos of surfers mastering those monstrous waves. Also during this season, I happened to attend another conference where the speaker shared some famous words from Shakespeare's Julius Caesar in a similar context to what I was now experiencing right there on the conference room floor:

 

 

There is a tide in the affairs of men. Which taken at the flood, leads on to fortune;
Omitted, all the voyage of their life Is bound in shallows...

 

 

I was standing on the top of the wave, so to speak, and had a choice before me. Ride the wave, or return to the shallows.

 

Now, I don't put a lot of stock in epiphany-styled revelations, but in this case a tingle went up my spine, realizing that the TwinFin had completely changed the game - it was time to seriously get on the wave and ride it, or commit to the shallows of the everyday. As many of you know, Brightlight has stood out for a number of years as being a go-to partner for all-things Netezza, and their VLDB consultants have solved large-scale problems where others feared to tread.

 

"You have some serious thrill issues, dude," Crush the Turtle, Finding Nemo

 

As I have been inundated with pings and kudos from many of you who already know the story, I thought it was worth sharing, especially for you Shakespeare and surfing aficionados, a rare breed indeed.

 

And to surfin' Enzees everywhere - here's to a "so totally awesome" 2010 and all the promises it offers. All the best.

 

See you on the waves!

0 Comments Permalink

I'll be on an extended leave of absence, as most folks already know. In a unique twist, this Navy Chief will deploy with an Army unit just about 500 miles from the nearest significant body of water. And I'll be working for NATO too! I'll be serving as a Communications Officer for a Provincial Reconstruction Team. The goal is to deliver infrastructure improvements (roads, schools, hospitals etc) that make everyday life for everyday citizen better. A happy citizenry is toxic to the bad guys!

 

In early October, I reported for active duty at the Naval Station in Newport RI. From there, I went to Gulfport MS (Home of the Seabees!) for a week of administrative 'stuff'. Had my records checked, got a bunch of nasty shots, issued Army uniforms etc. It's where I began my transformation from a Navy Chief, to a Narmy Chief. Next step is about a month at Fort Gordon for Radio, Antenna and Computer training. With that almost out of the way, I've got a few months of combat skills training here in the USA before I go 'downrange.' Should be good, wholesome, sweaty fun!

 

On a more serious note: I'd like to thank everyone who has shown me so much positive support. For all the beers I've waived off, the lunches I've rain-checked and the rounds of golf I've deferred...THANK YOU! It means a lot to me knowing that when I look left or I look right, I'm surrounded by such great people. I'll work HARD to be worthy of all those positive vibes, and shout from the rooftops just how special this company really is. I'm looking forward to my mission, but I cannot wait to return and get back to work!

1 Comments Permalink

“The best vision is insight.” -- Malcolm Forbes (1919-1990), publisher of Forbes magazine, New Jersey state senator and adventure hobbyist.

A couple of big announcements from our friends at SAS today. For the industry at large, SAS’
commitment to in-database analytic processing is a confirmation of trends that we have been discussing for over two years: more and more, the “data warehouse” is becoming the hub of all analytics processing for the enterprise. While that announcement covers multiple database vendors, today’s other announcement from Cary, NC on the availability of the “SAS Scoring Accelerator for Netezza” means that we and SAS are immediately putting this recommitted strategy into action.

Of primary importance to Netezza’s customers is the fact that with SAS’ intensification of In-Database functionality, SAS and Netezza will continue working together to deliver ever more advanced analytic capabilities inside the Netezza appliance. And the first step on that path is an excellent one: the availability SAS Scoring Accelerator for Netezza means that Netezza’s customers are able to execute SAS scoring models directly within the Netezza appliance and in-line with other SQL query processing on their data. The SAS Scoring Accelerator for Netezza will be Generally Available in early 2010, and Netezza and SAS are already working with a small number of early adopter customers such as Catalina Marketing, as they begin to benefit from this powerful functionality.

These scoring models are used in virtually every vertical market in which Netezza sells our products for fraud detection, credit and risk analysis and market segmentation. By embedding them in the Netezza appliance, customers will get the same 10-100X market-leading performance on scoring their data as they do on query processing. By running in-database customers can score
all their data and not be reliant on only using samples or aggregates for expediency. And the in-database scoring also means that the inherent delays, or latency, in getting at the data to score it has been eliminated. The best way to deal with the large amounts of data being loaded in today’s data warehouse systems is not move it unless necessary, so Netezza’s AMPP architecture and method of moving the data processing as close as possible to where data is stored delivers huge performance gains for in-database analytics.

 

n-sight atomic small 2.png

n-sight logo small.pngThe on-going partnering work with SAS, and specifically the Scoring Accelerator, are part of the conversation with customers, partners and the market in general that Netezza began back with our Enzee Universe world tour in September regarding our vision for the industry and for Netezza. It’s known as “Netezza Insight” and CEO Jim Baum used his keynote addresses in seven cities around the world to begin the dialogue of taking Netezza and the concept of data warehousing “deeper”, “higher”, “wider” in a “unified” enterprise-wide platform approach together with other partners in the community. In smaller settings with customers, partners and analysts since then, we’ve continued that dialogue since the Enzee Universe and generated real excitement as they come to understand the full breadth of what Netezza is enabling in the market.


In coming days, we’ll be writing more about Netezza Insight and how it is manifest in product platforms, features and applications. But for today, let’s just say that SAS and Netezza customers are already able to do more, faster, with our combined products than ever before and that this is just a step toward even more powerful capabilities.

As Rick (Humphrey Bogart) said in the closing scenes of
Casablanca, “I think this is the beginning of a beautiful friendship.”

 

 

 

[UPDATE: Rather than just reading what I have to say, you can watch SAS Executive Vice President and CTO Keith Collins describe his take on the value of in-database processing and the Scoring Accelerator for Netezza in the following video from the Enzee Universe 2009 show in Boston.]

 



0 Comments Permalink

In one of our primary tables, we'll call it a fact table, it contained a number of columns that had arrived through some pretty hairy ELT-based math algorithms. In all the crunching, we would see spontaneous overflow errors, so we converted some of them to float. More explosions occurred, and we converted more to float. After several more iterations, we converted them all to float. Then we discovered that the reporting layer also had to perform some hairy on-demand calculations, so it was a good thing we had float values to give them. Now everyone was safe.

 

However, as this table grew, and they always do, the floats became "bloats". Netezza does not compress a float data type. One day we looked up and the table was approaching 20 TB in size, with no end in sight.  The theory was, that we could reduce these float values to numeric data types, we could save half the storage right away, and even more so with Netezza's compression, but it would put the reporting layer in danger of a spontaneous overflow explosion.

 

Once we performed the conversion of the table (as a test case) and saw it reduce in size to about 7 TB, we were hooked on the possibilities of compression but vexed as to the impact this would have on the consumers of the data.

 

We had experimented with surgically casting the data from numeric to float on-the-fly, but this would create a lot of headaches for the users if they always had to wrap every field with a casting notation. It did however, prove out one thing, that the time to cast the numeric-to-float is inconsequential when compared to the amount of I/O required to pull a float value from the SPU's disk "as is". In essence, we traded the time we saved in compression, and converted it to time used in casting.

 

So the next step would be, put a special view on top of the fact table, such that it would automatically cast every numeric column into a floating point value. Thus, whenever a reporting layer query required data, it would automatically and transparently leverage the view, pull less data from the disk, covert it to float in the CPU and then leverage it as float in memory. We effectively eliminated the cycles spent in I/O to rip the float value from the disk drive. We spent a little of it in the cast of the data to a float. We made the operation transparent to the reporting layer.

 

old way:

 

FLOAT ->>>>>>>>>>>>>>>>>>>>>>>CPU -> QUERY MATH

(16 bytes, no compression

 

new way:

 

NUMERIC ->>>>>>>>>>>>>>>>>>>>FLOAT -> CPU -> QUERY MATH

(8 bytes or less, with compression)

 

All of the CPU-level math then becomes inconsequential when we move to the Twinfin, since it has its own floating-point processor and can handily deal with the float type. But we can continue to mitigate the I/O hit for the data by storing it in a compressible numeric format, and coverting this on-demand to a float at the CPU level.

0 Comments Permalink

Famous words, or some such like, uttered by Orson Welles as he launched into a scary parody of alien terror on national radio. Really scary for some. And proferred on Halloween night in 1938, so dare I say, 'tis the season (almost).

 

Ahh, not to fear, this purports to be a painless foray. But I do have a story to tell.

 

Several projects ago (I always start this way, so you won't think I'm talking about you!) - I worked with some really sharp data engineers on boiling out a solution for retail operational reporting. The data arrived every five minutes or more, or less, and sometimes in parallel loads, with 24x7 regularity. More and more Netezza implementations are going this way, and you too, should look into processing data at the speed of thought. In any case, the reporting users wanted to plumb the depths of this data store, to the tune of eighty billion records and growing. (Okay, small I know (for some of you) but humor me).

 

Well and good, except rather late in the game, the reporting users spontaneously expressed a desire to review the detail through metadata-based "lens", that is, set up some drilling levels and other metadata-based entry points, such that the entire operational model would be seen through this reporting "lens" and it would provide all the context for the consumers.

 

Now, such a model as described, would require such enormous power from a standard SMP/RDBMS-styled system, that we might well cause structural damage on the raised floor for sheer physical weight of said system. That is, if we really expected a report to return within a day or two of the request. Ahem! as I facetiously clear my literary throat.

 

But the worst-case for any given query for the above was around 8 minutes, and over 99 percent of the thousands of queries submitted, returned in less than 30 seconds. Oh, yeah, it was smokin' hot. In most queries using zone maps and the like, we saw returns in mere multiple seconds. Pshaw! Says the tick-tock-man, chocolate and vanilla, don't waste my time.

 

However (and there's always a catch) many of the larger reports were actually conglomerations of these smaller queries, and their aggregate time would occasionally exceed ten minutes or more. And even though this was a far cry from the "days away" we would expect from an SMP/RDBMS system, it was still 'too slow' for the users. Now, this is true adrenalin-junkie stuff, sort of like the old Far-Side cartoon of a young man standing with a fork in front of a waffle iron, captioned "Wendell Zurkowitz, slave to the waffle light". I recall how one man noted that many years ago we would wait hour(s) for a traditional oven to finish cooking, and now get impatient when the microwave instructions are greater than five minutes.

 

Perspective.

 

And rather than punt to the users and say, "Hey guys, this is just unrealistic" and degenerate into "expectation management" - the challenge was to actually achieve faster turnaround times on the reports. And here, I'm talking about getting these ten-minute reports into the 30-second zone. Would we have to embrace some extreme engineering for this feat? Methinks not - but the form of the process to get there was quite instructive.

 

Now recall I noted that the above model had operational tables, which were to be the detailed source, and a retail reporting hierarchy that was largely metadata-based. This reporting hierarchy had some significant size as well, perhaps a fourth the size of the eighty-billion-record fact table it had to link into. Yet both of these were on separate distribution keys. Queryng one meant broadcasting another.

 

And now, for broadcasting.

 

Whenever two tables are distributed on different keys, a join between them cannot be initially co-located. To support the co-location, Netezza will broadcast the salient information from one table's context to the other. This means the physical data has to move from its home SPU, out onto the inter-SPU network fabric, and find its way to the target SPU where it will be further examined. Broadcasting for small tables is inconsequential and barely a blink on the radar. For larger tables it can have strange effects. For example, we saw one query return consistently in ten seconds. Yet when running side-by-side with itself (multiple users) it could take several times longer.


The reason is that both queries were competing for bandwidth on the inter-SPU fabric, among other things. The simplest solution, of course, is to get our metadata table distributed on the same key as the operational tables. The problem was simply in the complexity of this metadata table and how it mapped to the core information. "Blowing it out" into a materialized form of information would require significant planning and design, because a misstep could easily make the reports turn out wrong, and this was unthinkable. In all this, the maintainability had to be considered, because if our initial complexity is too high, the maintainability is in jeopardy - by design.

 

Of course, we would spend most of our time in testing this scenario. Coding and implementation in most BI shops is a nit compared to the testing we have to execute to validate the outcome. Netezza is no different, except we can close the testing loop sooner if we have more power. And of course, for something of this magnitude, to test the change from minutes to seconds, we would need a powerful machine to measure the difference. Whenever we ran the new solution on a smaller machine, the difference couldn't even be measured. No, the power of the machine makes the testable difference visible and measurable.

 

As I noted, the form of this exercise was the most instructive part. Rather than form a means to align these two tables for co-located joins, the first effort was in attempting to tune the queries. You know, "query engineering", which is the mainstay of performance engineering on an SMP/RDBMS platform, and old habits are hard to break. The data engineers were somehow in denial that they would receive extraordinary power from configuring the data. Rather they trusted their instincts and chose to attack the queries.

 

Now, in any platform, regardless of shape, size or vendor, power is always and forever the domain of hardware. Software cannot manufacture more CPUs or network speed. If the physical plant is not ready, the software can only use what it has at its disposal. The software itself is largely a cost center, because it can only drain the machine's energy through inefficiency. In an SMP/RDBMS machine, the only option we have is to engineer the queries, because the physical plant is configured to be general purpose.

 

In a purpose-built machine, however, the query is simply a controlling mechanism to Netezza's resources. The host will chop it apart into snippets and dispatch these to the component that they will serve. Extreme query engineering on the other hand, assumes that jockeying around with the query can actually affect our fate. (contrast; a poorly written query is different from directly engineering a well-written query). And besides, do we really want to spend our time carefully engineering the query to the point of functional brittleness? In an SMP/RDBMS machine we will see queries that extend for tens of pages in a very daunting complexity. Maintaining these is a full-time job for our consultants. They swarm on the machine, and carefully tune their handiwork to avoid breakage.

 

Yet, we purchased a Netezza machine to get away from this complexity. To reduce, clarify and simplify our administration and consumption of the data. So as I watched these engineers bat themselves against the problem, no differently than a fly batting against a window, I watched them pull out their hair in generous tufts when little they did offered the significant gains they expected. This outcome was entirely counter-intuitive to their training. They were acccustomed to using and tuning software to make things work faster.


Sweeping the hair from the floor one evening, I mentioned (for the x-teenth time) that the broadcast effect was killing them. Once our engineers grasped the broadcasting problem, I thought we would make headway, but things actually got worse. They started trying try to control the broadcast as the root cause rather than the symptom. In one test, I saw one of the largest tables leap into a broadcast and we just killed the query outright (it would probably still be running, even today). The engineers lamented: How do we make sure the larger table doesn't broadcast? How do we control the broadcasting to our benefit? Answers exist to all of these, but it's like talking to a drug addict, one who is addicted to the drug of SMP/RDBMS and claims he can 'quit anytime'.

 

And then the truth came out, "David, if we can make this 10100 machine process data like a 10400 machine, we'll look like heroes!" To which I ask "How?" to which the response is: "We can save them all that money they would have spent on the hardware..." Well, not really. You've just chosen something else to spend the money on, namely performance engineering, the cost of time-to-market, the cost of a marginal implementation and the cost of human labor (the most expensive asset you have, by the way). But since the only way to get a 10100 to perform like a 10400 is to actually be a 10400, well, you see the futility. 432 SPUs versus 108 SPUs? And they really, truly thought they could - I mean - seriously. Let's keep in mind that the opposite is true. If we can't make the 10100 process data like a 10400, perhaps our approach is flawed? Heroes or goats. Take your pick. In my estimation, there's only one hero in the room. The big black box.

 

So the broadcast is the symptom, not the root cause. How about, we quit broadcasting, cold turkey? Take the data model through a detox program and the engineers through a series of deprogramming seminars to - well - it's not that bad. Typically the average engineer only has to see it operate in an adverse manner to become a believer. But a believer they must be, or they will not take action to correct the problem, correctly.

 

So one of them finally decided to produce a map table, one that would map the metadata into the operational tables such that all core joins would become co-located, with a common distribution. And lo, the first test of this blew their minds. Even the complex reports were now coming back in single-digit times, and the reports that had been running ten minutes or longer were now under a minute, even with multiple users. In fact, they saw the performance and scalability practically handed to them - simply because they configured the data correctly. It had little to do with query engineering.

 

Now one may ask the obvious question, and please do so now: Why don't you just build out some user-facing tables and forget leveraging the operational tables? After all, we don't build our non-Netezza reporting systems on top of operational data, do we? We build-out dimensional models and other handy structures to postively affect the user experience and simplify the flow (and the maintenance). This functional decoupling is a mainstay of reporting environments. (Okay, the next entry will focus on this). But in this case, suffice to say that the owner of the machine had placed down a hard-mandate on disk utilization. At no time could we foray into replicated detail, or even summary of detail without a plan to access the operational detail on a drill-down and the like. Interestingly, the required reporting tables would have only cost mere fractions of the cost (on disk) of the time/labor and effort put into making the operational tables viable. This is why it deserves its own treatment in a separate rant - er - essay. Stay tuned, and don't touch that radio dial.


Back to the drama - A telltale symptom that we're doing something wrong, is when we start down the engineering path. It's an appliance. We don't engineer toasters, blenders or laundry machines. But the difference here seems to be subtle. It's not. In this case, the culprit was the broadcast, something to be eliminated rather than managed. And no amount of creative query hoop-jumping would overcome this. Get the joins onto the SPUs. It seems obvious to those who have been around the machine for bit. But for those who have not, the learning curve is upon them. Be patient with them for as long as it takes to get it right. Once we have a believer, we'll never have the conversation again. As long as we stay in a theoretical zone, however expect them to stay in the spin cycle. This is like many things scientific. Seeing is believing.

 

Whenever I (and others like me) observe a ritual of performance engineering, each participant holding out the hope that "just one thing" will offer stratospheric boost so they can all wipe their foreheads and go home - this is the surest sign of one of two things: Either the data is poorly configured and is causing the queries to be ineffcient, or the data is properly configured and the machine does not have enough physics to achieve the goal. If the focus is on query engineering, they are wasting time. If the focus is on data engineering, at some point it will reach a "diminishing return". Either the machine has the power or it doesn't. Time to switch to Netezza, or if using Netezza, time to add some physics (a frame or two) to make it happen.


Moral of the story: Performance is found in the physics, not the carefully engineered queries. If we find ourselves "engineering" our queries for performance reasons - we should take a step back, take a deep breath - click our heels together and say softly: "There's no power like SPU power. There's no power like SPU power." Repeat as necessary.

 

And pay no attention to the man behind the curtain. I'll bet he and Orson Welles never even met.

0 Comments Permalink

The sky did not fall in for Netezza as Oracle predicted. Instead, we've gone from strength to strength. TwinFin extends our lead and at our current run rate, TwinFin customers will soon outnumber Exadata’s. My sense is that Oracle has fewer than 25 Exadata installations although I suspect many of these are not paid-for customers in the strict "hand over the money" sense of the word. For a company with the inside track on over 250,000 customers, 25 installations in a year is hardly stellar success -- it's what a statistician may call a "rounding error." This is also low when you consider that in the same time period over twice this number of Oracle customers churned to Netezza, presumably pausing to consider Exadata on their way.

Oracle's challenge is greater than people imagine: their portability advantage is now their Achilles’ heal. Portability across different hardware systems -- Oracle's first killer app -- is a hindrance in the appliance model whose advantage lies in very specific software designed for a specific hardware configuration. For Oracle, generalization means that all the baggage that comes with an Oracle database for performance tuning and data management -- such as results caching, buffers, indexing, partitioning, tablespaces (ie the stuff you don’t need in Netezza) — is standard in Exadata. This increases costs, slows throughput and makes for the very same unwieldy solution people are trying to escape.

Oracle is responding to Netezza (and Teradata) in the same way that the incumbent database vendors responded to Oracle in the 1980s: they are bolting-on "me too" features to existing products and hoping their customers won't be tempted by alternative solutions. Cullinet's IDMS/R is a great example. It was their attempt to be seen to embrace the relational model by adding relational features to their CODASYL database. But history shows no one was convinced and it was too little too late. Another spanner jamming Oracle's works is that the value of thoroughbred appliances is so visible and easily realized, their "PL/SQL standardization" trump card is easily discredited: when the business suffers, there's not much to discuss!

So what does the coming year hold for Exadata? Well, a new hardware platform announcement sends Oracle back to the starting line. There's a switch in tone from 'the world’s fastest data warehouse that can do OLTP' (circa 2008) to 'the world’s fastest OLTP database that can do data warehousing." But regardless of anything else, a sure fire thing is Oracle will claim victory and performance leadership. The full-page ads are probably already commissioned.

1 Comments Permalink

A number of months ago I wrote about how the World Tour Awaits, and all the buzz in the air about the new TwinFin. I was honored to moderate the best practices forums in North America and London, and many thanks to the rather effervescent participation by the panelists. Kudos goes out to David from Brightlight, David from Edge Associates, and Jeff from Quantisense, each of whom have those over-the-top kind of personalities that turn the session into an "experience" more than just a discussion.

 

But all in all, the sessions flew like lightning. If any of you have additional questions or insights, may I invite you to post them here on the Netezza community. The discussion never ends, you know.

 

It is interesting to note that many of the questions coming from Enzees in every venue, struck a common chord and followed a common thread. In that Enzees are unique and have a rarefied problem and solution domain. And are able to approach it with the confidence of Spartacus in the arena, or Jackie Chan on the streets of New York. Comments often began with, "I have a table with <seventy, eighty, ninety, your number here> billion records and I want to..."  I mean, seriously, those on the outside lookin' in will also look askance at such an opening statement, and marvel at the ensuing, rather casual discussion about it. Nothing is casual about these data sizes, on the outside world.

 

It goes like this: Bring it on, baby. Because the question of whether it can be done is behind me, now I just want to know how to do it well. The audacity!

 

Kudos also for the Enzee crowd members who injected their insights and wisdom into the discussion, freely sharing their technical and political battleground knowledge for the betterment of all. This was not the same as "iron sharpening iron", because at this scale of data processing, iron crumbles. No, this was a lot like titanium sharpening titanium, and was exciting to participate in, to say the least.

 

Many thanks also to Netezza for inviting me to the tour. It was a whirlwind to be sure, but well worth the ride. Tim, Olga, Courtney and Karina made it easy for me (actually all of us) to participate. Thanks to all for your hard work and a World Tour Well Done!

0 Comments Permalink


As the sunrise peeked over the horizon, it cast long shadows over the four cars awaiting the break of dawn. Stretching before them, the expanse of the salt flat beckoned, nay taunted them, to accelerate across its ancient surface. Not caring for the winner or loser, it merely provided a level playing field for them to test their wares and technology. But yawned at the futility of the race itself. The salt flat had always been, and always would be. Come one, come all, it invited daily, almost mockingly.

 

The leader for team-Exa sat in his racer's driver seat, eyes closed. When he felt the warmth of the morning touch his face, he raised an eyelid to examine the time. Now thirty minutes from flag-down, the sun would still be at his back when he won the race. And he would win the race.

 

The lead for team-Terra pushed back into her driver's chair to stretch her legs as her eyes fluttered open. She glanced toward her left to the Exa racer, gleaming in the morning sun, and then to her right at the NZ racer, its plain black lines and nondescript exterior, she knew, hid the power under its frame, and was nothing to be trifled with.

 

The fourth car on the end, entered in the eleventh hour was a plain vanilla Volkswagen Beetle with a rocket engine attached to its backside. No frills, no nonsense and nothing hidden. Five men from Redmond had delivered it last evening. They hadn't even had time to take a test run on the flat.

 

Minutes later all four drivers and their lackeys met in front of the four cars, partly to wish each other luck and partly to offer last minute trash-talk. Dominic Toretto, the driver of the NZ machine, ran his hands over his bald scalp and rubbed it vigorously, as if massaging the sleep from his head, then yawned and said, "Okay gentlemen. We're fifteen minutes from flag-down. Anyone want to back out? I swear we won't hold it against you."

 

"Dude," laughed Excel, the driver for the Redmond machine, "In your dreams. I have investors watching."

 

"As do I," smiled Tara, the only female driver, and would command the blue-streamlined Terra racer, named for its ability to master the earth and its elements. "We're all in this for keeps." She batted her eyes and tilted her head flirtatiously, "You want to see under my hood?"

 

"Out here in the open?" Toretto laughed, drawing chuckles from the others, "Sure, let's see what you have."

 

She ignored the innuendo and pointed her keytag toward the Terra racer and pressed a button, causing both side doors to slide away and the hood to pop open. Toretto strolled over to examine the engine. He'd seen these before.

 

"Lot of power under that hood," he quipped.

 

"Yeah," she said, expecting a bit more enthusiasm for her machine. She wouldn't find it among any of these drivers, though. They lived and breathed adrenalin, and knew as much about her machine as she did. And weren't in denial about its weaknesses, either.

 

"Looks plain," said Jeff, driver for the Exa-car, "And as you can see, not enough control."

 

"So let's look at yours," Toretto said, a twinkle in his eye.

 

As they sauntered to the next car, Jeff's lackey whispered in Toretto's ear, "We've radar-mapped the entire flat between here and the finish line. Every bump is programmed into the machine. You think that's a competitive advantage?" He slapped Toretto on the back and laughed loudly.

 

"Bumps don't matter," Toretto muttered, with the strength and experience of someone who would know.

 

Jeff spun to face him, "What was that?" he laughed, "Bumps don't matter. Did you hear that?" he looked around him to the others, with his lackey already laughing, "He says bumps don't matter." He crossed his arms, "Would it matter to you if I said that ignoring bumps at these speeds is like a death wish?"

 

"No."

 

"No, what? No it won't matter what I say, or bumps still don't matter?"

 

"Either way," Toretto said with a wry grin, "Bumps don't matter."

 

Jeff threw up his hands in frustration as Toretto poked his head into the Exa-racer's driver side window. Jeff asked, "What do you think, huh?"

 

Toretto examined the interior, laid out like a Boeing 757 cockpit. Three LCD screens loaded with controls and meters, flashing lights all around the dashboard and dozens of knobs and gears. "Got a lot of moving parts," Toretto sighed, "Think you'll need all that?"

 

"No more, no less," Jeff said, "Our investors are very demanding. All the tires and wheels are measured for pressure and impact, the dual-redundant monitors compensate for any detected differences, and the pre-mapped radar anticipates every bump and turn."

 

"It's a salt flat," Toretto grinned, patting him on the side of his shoulder, "There are no turns. And bumps don't matter."

 

Jeff nearly bit his tongue, but instead smiled and shook his head while Toretto continued his examination.

 

"Looks to me like," Toretto finally said, "You decked out the car just for this ride."

 

"Yeah. So?"

 

"Well, it might work for a salt flat under controlled conditions, but it's not streetworthy."

 

"We're not testing on a street," Jeff fired back, "All that matters is who makes it to the other side."

 

"Really?" Toretto raised an eyebrow, "You think people will be knocking on your door to buy a few of these to come out here to run on salt flats?" He laughed, "Your investors will expect to see the performance you show here," he pointed toward the West, "Out there. Or they can't make any money. Optimizing your car, just for this test, doesn't mean anything."

 

"We'll see," Jeff snapped.

 

"I'd like an assessment of my car, if you don't mind," said Less, the driver for the Redmond car.

 

Toretto simply said, "Not much different from the Exa. Except you don't make any bones about the fact that you've strapped a jet engine to an underpowered car. You think those wheels and frame can handle the stress of the race? We'll see how you do on the flats. That's all I can say."

 

"Gentlemen," intoned a voice all around them, coming from well-placed speakers, "We're five minutes from flags-down so anything you need for warm-up, do it now."

 

Jeff punched a button on his keytag to remotely initiate his computers into a final pre-race system check. Toretto slowly strolled back to his car, opened the door and flopped into the driver's seat. His lackey Mark, younger than he but the sharpest of his crew, brushed back a long black lock of hair and positioned it over his ear, then silently joined Toretto in the passenger seat. After Toretto punched several buttons to initiate the engine, Mark  could no longer hold it in.

 

"Don't you think we're about to get smoked here?" Mark said, glancing to the Exa car, "I mean, radar mapping, all those controls and - I mean - "

 

"I know what you mean," Toretto said casually, engaging the first gear, "Just trust the machine."

 

"I know what your philosophy is," Mark sighed, shaking his head, "Put it all under the hood, make it self contained, but what if you need to get creative in the middle of the race?"

 

"Would one of our customers have the option to get creative?" Toretto asked, allowing the car to roll ahead to the starting line. "Do we let them add stuff to the machine? Do we require them to know a lot about what's under the hood?"

 

"No, but -"

 

"But what?"

 

"I don't know what! It just seems like they have more, you know, more -"

 

"More what?"

 

"I don't know what! It just seems like more."

 

"More to break. More to maintain and watch - when the real mission is to go fast on the flats. And everywhere else."

 

"You think we'll win?"

 

"Trust the machine."

 

Presently a racing judge appeared with a flag in each hand, and took his place between the two middle cars. Watching the clock count down, he raised the flags high, then started counting down loudly.

 

"Hold on to your chair," Toretto mumbled, "It's a little rough out of the gate."

 

"I'm ready," Mark said, holding tightly to the chair, pushing against the floorboard to press his back into the chair's leather. He'd made the mistake of eating a meal just prior to the first test runs the week before, and had spent an hour cleaning his half-digested meal from the dashboard and interior windshield. This time, he'd fasted for twenty four hours. Nothing remained in his stomach, he was sure of it.

 

Over in the Exa-racer, Jeff had strapped himself into his seat, and his onboard systems had just finished its run-through only seconds before the flags would fall. The carefully tuned machine would master the flats today. The machine, and his name, would soon be synonymous with extreme speed and power. He would win this race. He was sure of it.

 

Each driver sat in breathless anticipation as the judge counted down to zero, and watched almost in slow motion as the flags went down. But that's when anything "slow motion" utterly ended. Each of the machines engaged their own forms of acceleration. The Redmond machine driver simply turned a valve and flooded the rocket engine with fuel. It's ignition was like an explosion of TNT and it blasted from the line like, well, like a rocket.

 

"They're getting ahead of us," Mark complained as the NZ car's acceleration pulled him deeper into the leather.

 

"It's just a side effect of packaging," Toretto said, his pulse rate not having changed one beat faster, "Just be patient."

 

Without warning, the Redmond machine sputtered and fishtailed its wheels as they passed it, Mark spun his head as the Redmond machine flew past them and they left it in a wall of salty dust. He then looked back at the Exa racer, and to Jeff's eyes riveted forward, set like flint againt the Western sky.

 

"How did you -" Mark began.

 

"Know it would run out of power?" Toretto lifted one side of his mouth, "Get real."

 

"We're still ahead of the others," Mark noted pensively, glancing around toward Tara, who seemed oblivious to everything around her.

 

"It will stay that way," Toretto said simply.

 

"So that's it," said Mark, "We stay in these race positions until the end?"

 

"No, they will think the race is over soon, and make their move."

 

Suddenly Tara's car started gaining ground, like something pushing it from behind. Mark saw her pulling up behind them fast, and faster still, "She's coming. She's coming really fast."

 

"Naah, she's just changed her fuel mix. Thinks going from 55/50 to 25/50 will actually matter."

 

Mark spun toward the Exa racer, now closing the distance, "He's coming too, Are we slowing down, or are they -"

 

"Making their move," Toretto said quietly.

 

"Aren't you going to do something? They're gaining!"

 

"Let them burn out," Toretto chided as the two competitor machines passed them and gained their respective leads, "And besides, the race is won in the architecture, not the gadgets."

 

"What difference does it make if we're behind?"

 

Toretto watched as the odometer slowly ticked over, And over again. "We're almost there, are you strapped in?"

 

"Yes, I'm strapped in, but almost where? Where is there?"

 

"There," Toretto pointed to a tinted stain in the salt flat, and watched the odometer tick over to the prescribed reading. "Here we go. Hold on."

 

"What are you doing?"

 

Toretto ignored him and pressed a switch on the dashboard. They could hear a whining mechanical noise coming from the rear as two gleaming foils slowly rose from the tail of their accelerating vehicle.

 

"What are those?"

 

"What did the Exa driver say?" Toretto reminded, "That at these speeds, bumps count. Actually, at these speeds,what counts is stabilization."

 

"How will those make us more stable? It looks like they're slowing us down!"

 

"Brace yourself," Toretto said, and punched the second button. "Accelerators engaged."

 

In that instant, the air inside the car seemed to grow thin, and the air around them seemed to radically change, buffeting the racer with increasing intensity. Then Mark felt it, a pulling, g-force of acceleration as it pressed him deep into the leather of his chair, and caused the blood to run from his face and into the back of his head. With a whoosh-whoosh, they passed the other two cars as though they were standing still.

 

Jeff watched helplessly as the NZ racer flew past them. Upon glancing down and across the controls, all of their gauges were standing at the max, pinned almost into the red line. Even if he could make it go faster, they would incur irreversible structural stress, and possibly crack apart on the flats, spinning into a million pieces. Jeff furiously spun dials and adjusted controls, attempting to squeeze just a bit more power from the machine. If he couldn't come in first, second place would have to do. Jeff now cursed his own racer as it entered the NZ racer's dust trail. His investors would be livid.


Tara furiously slammed her palm into the steering wheel, repeatedly cursing as the NZ car disappeared into the distance. Switching her fuel mixture from 55/50 to 25/50 had made her car lighter and more agile, but had not offered the additional speed. At least, not that kind of speed.


Then something rushed toward both their cars as the NZ racer crossed the sound barrier, a shockwave ripped up the surface of the salt flat and met them head-on. The Terra car was more stable, so the wave simply bounced its wheels. The Exa car was not so lucky. When the shockwave hit, the passengers heard the sonic boom before they felt it lift the racer's front end and flip it backwards, spinning it in a barrel-roll as it tried to find its footing again. Its back wheels landed first, then the front, causing the back wheels to lift off again, then the front, rocking violently back and forth like this at least five times before the right front tire blew out, sending the vehicle into a wild spin.

 

Jeff could hear and feel the car's structure releasing and popping from the stress. At this speed and rate of rotation, the Exa-racer's uncontrolled spin would rapidly develop enough centrifugal force to turn human brains to scrambled eggs. Jeff felt the red-out coming as an automatic release triggered and both their ejection seats activated, separately catapulting them hundreds of feet into the air. Their parachutes deployed when they reached apex, and Jeff witnessed his car disintegrate on the salt flat.

 

Jeff lifted his gaze into the West, watching the NZ car disappear like a speck in the wake of its own shockwave, churning up the ground behind it. It would likely reach the finish line before his parachute even touched him to the ground.

 

Toretto casually glanced to his rear-view mirror, watchind the salt flat behind him, practically corrugating the ground in his wake. "Hmmm," he finally said, "Maybe bumps do count. Just not for us. And I don't mind giving them a bumpy ride." He settled into his seat, "No sir." And with that, fully understood the frustrated rage building in the minds of his competitors, and soon their investors.

 

And more fully understanding the difference between being fast, and being furious.

0 Comments Permalink

Hi

 

I am using below sql to load the data from flat file to Netezza

 

INSERT INTO A
SELECT * FROM EXTERNAL 'c:\data\a.txt'
USING ( BOOLSTYLE '1_0' COMPRESS FALSE CRINSTRING FALSE CTRLCHARS FALSE DATEDELIM '-' DATESTYLE 'YMD' DELIMITER 'TAB' ENCODING 'latin9'
FILLRECORD FALSE IGNOREZERO FALSE LOGDIR 'c:\data'
MAXERRORS 1 MAXROWS 0 NULLVALUE 'NULL' QUOTEDVALUE 'NO' REMOTESOURCE 'jdbc' REQUIREQUOTES FALSE SKIPROWS 0
SOCKETBUFSIZE 8388608 TIMEDELIM ':' TIMEEXTRAZEROS FALSE TIMESTYLE '24HOUR' TRUNCSTRING FALSE Y2BASE 0)

 

I want in reverse order I mean I want select the data from table and place in flat file.

Please some one could help me.

 

thanks

Babu

3 Comments Permalink

Hi .

I am .Net [C#] developer and using Netezza as database in my Latest application.

I wrote a small application to get the data from Netezza and using odbc driver.

Its returning the rows and in middle I am getting below msg.

 

ERROR [08S02] Unexpected protocol character/message

System.Data.Odbc.OdbcConnection.HandleError(OdbcHandle hrHandle, RetCode retcode)
at System.Data.Odbc.OdbcDataReader.Read()

 

Source is nsqlodbc.dll.

 

Please some could help me.

 

thanks

Babu

0 Comments Permalink

Rick Deckard wiped the sweat from his brow as he holstered his high-powered weapon. Lifting the communicator from his belt, he muttered several codes and closed the transceiver.

 

"Skin jobs," he said to himself, surveying the replicant sprawled on the floor, and amazed at the technology's ability to mimic the most complex entities on earth. He softly kicked the replicant's front panel, observing large hole his weapon had created in the technology's logo. The half-remaining "T" and the "ata" telling him he'd scored big. Another wannabee down for the count.

 

His communicator buzzed for attention. He lifted it, beeped-in and said "Deckard" like he really didn't want to be bothered, but knew such sentiments were useless. Apparently more replicants were on the prowl, having stolen their way into enterprises with myopic POCs, NDAs and a variety of other three-letter-acronyms. He so longed to go Solo.

 

"We've spotted another one," said the dispatcher on the other end, "People are dying."

 

"Dying?" Deckard raised an eyebrow. "That's new."

 

"Dying to get their jobs back after a misfired deployment with a replicant," said the dispatcher, "Get with the program Deckard. You were called from retirement, but you can't be this rusty. Not with this much at stake."

 

"You wanna come out here and be my backup?" Deckard shot back, irritated, "It's easy to criticize from behind a desk."

 

"Keep on talkin'," laughed the dispatcher, "But the day's slippin' by - and so will your replicant if you don't get on the stick."

 

"Yeah, yeah, whatever," Deckard beeped out, sighed and replaced the communicator. The steam rising from the replicant's body reminded him of why his work was important. Stolen money. Stolen dreams.

 

Less than fifteen minutes later, Deckard found himself crouching behind a stack of crates, one eye on the replicant and one eye on his pistol as he wrested it from its holster. Time was, he could draw, shoot and replace it before a replicant could take one mechanical breath. Now, countless CPU clocks dishonored his rustiness, and he needed a new weapon if he ever intended to win.

 

Too late he realized that he'd spent too much time fiddling with the pistol, and upon looking up, found the replicant nowhere in sight. In that moment, he felt the replicant's mechanical breath on the back of his neck, and he whirled to confront it.

 

"Deckard!" shouted the replicant as he delivered a hard backfist, reeling Deckard over the crates to fall hard on the other side. "You should never have returned! You know I can't be beaten in toe-to-toe comparison!" He then split the crates apart and tossed them to each side.

 

Deckard had already reached for his pistol, but it had been just loose enough to fall from the holster when the replicant had ambushed him. Glancing around feverishly, the fear rose in his throat as the replicant took one step forward, grabbed him by the shirt and shook him once. He pulled his fist back and Deckard could hear it hitch, meaning that some special spring had latched in preparation for release, and if the replicant's fist now threw a punch, the impact would take his head clean off his shoulders.

 

"Sleep tight," said the replicant wickedly.

 

But the punch never came. Instead the replicant's eyes widened, his breath shortened and his strength seemed to instantly leave his body. He dropped Deckard like a sack of potatoes, and Deckard wasted no time in scrambling clear. The replicant fell to his knees with a bone-crunching impact, his eyes vacuous, and fell forward with a whump.

 

Deckard glanced around for his weapon, only to be met face to face with another, much younger Blade Runner, holding a smoking weapon, clearly more advanced than his own.

 

"I'm TwinFin," said the Blade Runner meekly, pointing to the twitching mass that was the replicant  "I see you've just run across a more advanced model than you're accustomed to."

 

"Stronger than before," Deckard rasped, wiping the sweat from his face with both hands, "It's been awhile."

 

"Yes," he said, "This one's name is A-Data. He is the most advanced of his kind. A front-loader and high-volume storage capability. Also fast response. Almost as fast as yours, even with age."

 

"Thanks," Deckard responded flatly, unamused, "A-Data, eh?" he smirked, tapping the replicant's leg with his foot, "Well, now he's just an ex A-Data."

 

"True," smiled TwinFin, "But you'll need more power if you want to stay ahead of them," he held out his weapon, a POC-killer if ever Deckard had seen one. On the weapon's barrel, in old-Gothic script, he read the weapon's name "The Closer."

 

"Nice," Deckard quipped.

 

TwinFin suddenly produced an auto-ject unit with the "enzee" logo emblazoned on it, snatched Deckard's hand, and before Deckard could object, injected the enzee accelerant into Deckard's bloodstream.

 

"What the?" Deckard now snatched his hand back, but suddenly felt the chemical's surge of power, "What's in that stuff?"

 

"Secret sauce," TwinFin smiled, "You'll be five-X or more faster response than they are. Your next replicant will go down for the count before the count even begins."

 

"Tight."

 

"You have no idea," he smiled, "And by the way, I'll be right behind you."

 

"I hear some of them are looking for their makers," Deckard posited.

 

"Wouldn't you?" TwinFin said, "I'd sure wonder why I was made that way. Changed from one purpose to another in the middle of my cycle."

 

"I wonder if anyone has noticed, that the replicants are always trying to be like us?"

 

"It's because we're the only standard they know, by which they are measured."


"I also wonder," mused Deckard, "If these replicants dream of electric customers."

0 Comments Permalink

Hi Good Monring all,

 

We are syncing the Data from SQL Server to Netezza and some special charactes are not converting

 

For Example  -

 

MC21-‘90s [SQL Server]               MC21-?90s [Netezza]

Please can some one help me

thanks

4 Comments Permalink

"Blade?" Hannibal King touched the sleeping warrior gently on the shoulder, "Wake up, dude."

 

Blade raised one eyebrow, then slowly opened his left eye. Unafraid of the day or night, the warrior moved his hand ever so slightly to verify the presence of his sword. King could see the taughtness of Blade's shoulder sinews as he slowly shifted his weight on the pallet.

 

"This has better be good," Blade rasped, "I was in the middle of a dream. Kickin' bloodsucker tail," he wiped his hand over his face as though it would wipe away the sleep from his eyes, or the fatigue in his body, but it did neither.

 

"We have some news," King said with a low voice, "The upgrades have arrived."

 

Blade's other eye slowly opened, "Oh?"

 

"Yeah," King laughed, "You're gonna like it."

 

"I'll be there in five," Blade said, half of him wanting to roll over and sleep, and half of him curious about the upgrades. Blade always had a half-and-half approach to life. The bloodsuckers hated him for it.

 

A number of minutes later, the warrior strolled slowly into the main atrium of his personal lair, only to find it strewn with boxes, styrofoam and bubble wrap, "What's all this mess?" he rasped.

 

King appeared from behind one of the largest boxes, a vertical package over eight feet tall, holding a swatch of bubble wrap, "Don't you just love this stuff?" he quipped, violently popping several dozen bubbles with vigorous manipulation.

 

"Stop that!" Blade commanded, ever-despising King's cheeky nature, "Tell me what all this is."

 

"All this," King pointed to a far wall where the apparatus had been installed, "is just for you. At your service."

 

"Blade servers, eh?" Blade took two short steps toward the machines, "What does it do?"

 

"Only slices, dices and makes Julie-Anne cry!" King cackled.

 

Blade was not amused.

 

"Okay, seriously," King began, "Recall some of our - er clients - had some run-ins with the bloodsuckers? Their problems were really that they were working with too little information. Or that it was inaccurate, or not arriving in time. The BI bloodsuckers swoop in to save the day."

 

"I hate bloodsuckers," Blade seethed.

 

"Oookay, so they fell prey to the wiles of the bloodsuckers, promising a better mousetrap and all that."

 

"They always promise."

 

"Moving right along, they promise but don't deliver. Here's where we come in, and help them get on the right track."

 

"How do these machines do that?"

 

"The Blade servers include a special sauce - "

 

"Special sauce. Is it red?"

 

"Uhh, no. But it's all painted in your favorite color. The better part is that you can use this machinery during the day to find opportunities, and still let it work at night, you know, when you're - uh - out."

 

"Hunting bloodsuckers."

 

"Uhh, yeah, so let's focus here. The new server has a special acclerator that basically lights up the night."

 

"Is it ultra-violet light?"

 

"No, but it's ultra-clear light. The kind of light we need to shine on business priorities, SLAs and how to leverage the machine at the enterprise level. You know, best practices."

 

"I don't need any practice. When the sun goes down - "

 

"Okay, look," King interrupted, "The accelerator sits on the blade and does all the analytic streaming work. The server then allows for cache RAM to sit between the disk drives and the processor, so we can keep stuff in memory longer."

 

"I have a long memory for bloodsuckers."

 

"And some clients," King rolled his eyes, "May need long memory for lookup tables, oft-used dimensions and the like."

 

"Are you starting all that other-dimension talk again? I thought I'd made a deal with Stan that we would never introduce - "

 

"No, not alternative dimensions in spacetime," King smirked, "But multidimensional analysis."

 

"I don't follow."

 

"Data analysis."

 

"To what purpose? What are we looking for?"

 

King thought about the question for a moment, realizing that the answer could capture Blade's attention or lose him forever. He finally said "Bloodsuckers."

 

Blade's eyes flashed, "If this will help us find the bloodsuckers, why do we only have one? Why not more?"

 

"Now, now, we should start small and grow tall - "

 

"Platitudes," Blade huffed, "Time is short. Will it find the bloodsuckers or not?"

 

King knew that when he said bloodsuckers, he'd meant the broken processes and data that drain the lifeblood from a company, "Yes, it can help us find them."

 

"Good," Blade finally said, slowly strolling toward the machines. He stared at them for a long moment and finally said. "You work for me, now."

 

"Uhh, Blade," King said, "They can't hear you, they're machines."

 

Blade didn't say anything.

 

"Oh, and I have this," King produced a small metal plate and held it out to Blade.

 

The warrior turned and stared at the object, curious as to its nature. "And this?"

 

"Is a Final Interrogation Node," King said, "For use when you are about to dispatch a bloodsucker."

 

"How does it work?"

 

"You wrap the wrist-strap here," he applied the strap to his own wrist, holding the plate in his hand, then flicked his wrist. The plate flew to nearest stone column, remaining connected to King's wrist with a tether made of high-tensile filament. The plate sank into the stone with a dull rrrriiiiinggg. . King then flicked his wrist again and the plate dismounted, the tension in the tether returning it immediately to his open palm.

 

"That was fun, but what does it do, really?"

 

"When you're done asking questions that anyone can get answers for, the FIN takes it to the next level. And if you have one in each hand - "

 

"Twin Fins, very funny."

 

"You'll still get the answers you're looking for."

 

"I'll always get the answer I want eventually."

 

"Uhh, well, isn't that what the bloodsuckers say? Anyone can give the right answer slow. But these," he held up the FINs," Get the right answers faster than anything."

 

"Even faster than me?"

 

"Faster than Blade alone," King smiled, "Yep, even faster than a blade and all its servers. You still need the FIN's and special sauce. Bloodsuckers don't have those."

 

"Competitive advantage," Blade said in a low whisper, "I like it."

0 Comments Permalink

A loyal customer alerted us toan Oracle blog by Jean-Pierre Dijcks earlier today that showed the Oracle FUD machine is fully revved-up and ready to go. I'd like to offer a rebuttal, however in the interest of not intruding on Jean-Pierre's entry with an overly-long comment, I've just put a short response on his blog post with a pointer to this one.


Misconceptions and Misunderstandings, or Errors and Plain-old FUD?

I’m writing to correct *just a few* of the misconceptions about what is really important in high-performance, scalable data warehouse systems, errors, or just plain-old pure “competitive FUD” points from Jean-Pierre's posting earlier today. We certainly have posted some information recently about the TwinFin product and Curt Monash’s postings late Thursday provided more info. If his readers are interested in learning more, or even signing up for a “Test Drive”, they should visit www.netezza.com.

First off, I think this is a “banner day” for Netezza. We believe that TwinFin (and the other products in the new product family)
extend both our performance and price-performance advantage over our competitors. We stand by our marketing statements that we regularly demonstrate 10-100X performance advantages over our competitors, particularly competitive offerings of the major incumbent DW system vendors (“Just who are those incumbents?” Jean-Pierre's readers may ask. Well let’s just say that we see Oracle as the incumbent system and/or a challenger system in over 50% of our deal flow.).

Regarding his claims about DBM being “
faster than Netezza” (and I can only assume he meant at “real” data warehouse tasks) - we’re ready whenever Oracle feels up to actually taking one of their Database Machines onsite to a customer for a fair, open customer benchmark. So far, Oracle have been, shall we say, “a little reticent” to do on-site benchmark testing against Netezza.

Next, given the large number of incorrect points in the original posting, I think perhaps that just a few of them will be useful enough for readers to get the gist of just how far afield some of the ‘facts’ are:

  • It all comes down to data scan rates per rack”: Would that it were true that all of data warehousing boiled down to full-stream data scans (as if the entire world of analytics relied on “select count(*) from lineitem” types of queries), then we could all measure “goodness” on how many GB/sec of data could be burst-scanned in our systems. But that’s not the case. So we build Netezza’s data and analytic appliances to deliver the best possible overall performance at the best price and power requirements. As a consequence, and following from those same numbers as-posted, a single rack of TwinFin can process (not just scan) about 400 million rows of data per second. That’s process, as in: “scan, decompress, project, restrict, AND join, etc.”. Need more processing firepower? Netezza’s system performance scales linearly with the addition of more S-Blades: at the low-end, the TwinFin 3 can deliver as much as 100M rows/second of processing horsepower, while the TwinFin 120 can provide you with 4 billion rows/second.  Does a system that still relies on using SMP-based servers running “plain old” Oracle 11g RAC scale similarly for data warehousing?


  • Non-open Linux running on FPGAs”: I’m really not sure what (if anything) was meant by this, but saying that Netezza’s FPGAs “are apparently running non-open Linux” is oxymoronic on at least two different levels (FPGAs don’t typically “run” an OS and, “non-open Linux” - really?)


  • User data & compresssion”: I also enjoyed the accounting of all that “user data” available to DBM users in the Oracle table and the various comments about compression. When Netezza quotes user data capacities in our systems, the numbers reflect real raw user data space, not space that will be further reduced because of required indexes in an attempt to boost performance. Furthermore, Netezza’s compression & decompression techniques allow us to extract “pure performance” from their use. By not relying on CPU cycles to decompress the data before we can process it any further, the FPGA engines decompress the data, on-the-fly, as fast as it streams off the disk drives. Can Oracle make either of those claims?


  • Tolerating node failures without downtime”: In perhaps the most bald-faced inaccuracy, the Oracle blog claimed, that Netezza “continues to lack the ability to tolerate node failures without downtime”. This I can only chock up to pure competitive “FUD-ism” as our capabilities in this area have been quite strong throughout the four generations of Netezza appliances and are further strengthened in TwinFin. Netezza is a fully-redundant system with no single point of failure, even in our smallest systems. Failover in the presence of failures of the disk drives, S-Blades, internal networking or host processors (in short, everything) is automatic and done in-service, with hot-swappable replacement throughout.


  • Appliance simplicity”: One thing Jean-Pierre didn’t address that might have been humorous to see his take on is the notion of “appliance simplicity” - basically the ability to build, support and maintain large to very large-sized data warehouses, with heavy workloads, with no or minimal tuning, partitioning, indexing or other “performance duct tape” required. Routinely, this capability in the Netezza systems is what delights our customers most and we have customers managing systems with several hundreds of terabytes of user data (not indexes + data, mind you - real data) with fractions of an FTE (full-time employee) devoted to them.


I hope that clears up some of the misconceptions. If any of Jean-Pierre's readers or Oracle customers would like to see or hear more about TwinFin for themselves, we definitely would invite them to come stop by our booth (#207) at
TDWI or come to one or our regional Enzee Universe events coming to a location near you.

0 Comments Permalink

Just when I was beginning to think Netezza's competitors were creeping in, those clever guys over at Netezza come out with a new data warehouse appliance named after a surfboard. Totally radical . I'm sure most people have asked at least one of these questions upon hearing the news:

  1. Is TwinFin faster than NPS?TwinFin ServerOnWhite.jpg
  2. Is it cheaper?
  3. What is different architecturally?
  4. How much data can you cram into a TwinFin?
  5. Where the heck did the "TwinFin" name come from?
  6. Is this new appliance readily available to be shipped?
  7. Is anyone using it yet?
  8. When can I check it out for myself?


I decided to embark on a mission to answer these questions. And with the flurry of information that has gone out this week about the product, it actually wasn't very difficult to find the answers. So I figured I'd share...


  1. Yes, TwinFin is faster than NPS. According to Curt Monash's blog, TwinFin delivers an immediate increase in price-performance, based on a 3X cut in price/terabyte and a 3-5X improvement in mixed workload performance.
  2. Again, yes, TwinFin is cheaper. Several media outlets have reported that TwinFin offers a reduced price point of $20K per TB of user data, including:
  3. Rather than attempt to explain the architecture of TwinFin myself, you're better off going directly to the source: Phil Francisco (Netezza's VP of Product Management & Product Marketing)'s blog
  4. While it sounds like there will be several different variations of Netezza's new appliances, the TwinFin will scale into the petabytes according to this morning's Netezza press release.
  5. One of Phil Francisco's blogs hints at this, but I've also heard a rumor that Tim Young will soon divulge the full meaning of "TwinFin" in his next Enzee Frenzy article.
  6. According to the press release, the TwinFin appliance is available immediately, and Netezza already has systems in production with customers.
  7. Yep. See answer to #6. Plus, word on the street is that customers already using TwinFin will be at Enzee Universe to talk about their experiences.
  8. The TwinFin appliance will be on site at every Enzee Universe destination. I'm really excited to check it out - and you should be too! (Register today if you haven't already - registration is free for existing and prospective Netezza users.)


There are two other resources you should check out for more info on Netezza's latest and greatest product - especially if you're too lazy to read! The first is a "hip" new video that is presently posted to Netezza's home page. The second is a podcast where Phil Francisco shares his insight into the new product.


Enjoy - and I hope to see you all at Universe!


- Brian

1 Comments Permalink

 


"You stay classy, San Diego." -- Ron Burgundy (Will Ferrell) in "Anchorman" (2004)Will Ferrell Anchorman.gif


This morning a few others from the Netezza Marketing and Product Management teams and I are ensconced by the Marina in sunny San Diego, CA for the TDWI World Conference and for an news announcement or two. And who better to bring us "Breaking News!" than the Number 1 newsman in all of San Diego, Ron Burgundy. [For those of you who might have been "hoping for more" from Ron in a quote about San Diego, you can check out the IMDB database for some great ones, including Ron's own historical (and hysterical) etymology for the city's name.]


BANNER_TwinFin_3.gif

 

Though it’s not exactly a state-secret at this point, today we’re launching the 4th generation of Netezza data warehouse and analytic appliances and the first of four initial product lines in it: TwinFin™.

 

TwinFin logo name.jpg

Some of the core characteristics of the TwinFin and the overall platform are:

  • Resetting Netezza’s price-performance leadership position in the market and extending Netezza’s performance lead;
  • Disrupting the competitive data warehouse market among the incumbents, just as we did with our initial systems in 2003/’04;
  • Moving to a commercially-available, blade-based server and storage platform; and
  • Opening Netezza’s aperture on the broader market with a multi-product platform design to match customers’ data warehouse and analytics needs across their enterprise


After the market disruption Netezza caused with the introduction of the NPS® in 2003 and since, we have seen the entry of dozens of new startups in our wake and virtually every major incumbent data warehouse vendor has retooled its portfolio to include a “response” to the Data Warehouse Appliance (DWA) in a suddenly reenergized market. Several of them, to their credit, have advanced their value propositions and improved their competitive position.


TwinFin Board Image.gifNow it is Netezza’s time once again. With the introduction of TwinFin and the other members of the new family of products, Netezza is once again changing the game; widening the applicability of our systems to more types of customers, applications and partners in the market.

As stated in
my response to Curt Monash, my response to Curt Monash last week, we think of this 4th generation of the Netezza appliance as using “the same architecture with a new physical implementation”. Starting with TwinFin, we moved to a commodity blade-server based system framework, but one that still uses Netezza’s “secret sauce” to deliver as much as a 5X increase in performance over the previous generation of Netezza systems, namely:

· our balanced design and streaming architecture;

· the use of Field Programmable Gate Array (FPGA) technology as a query processing “turbocharger”; and

· our advanced MPP management and optimization software.

 

And there are more innovations and performance gains on the way! TwinFin, quite simply, will serve as a platform for expanding Netezza’s performance and price-performance advantage in the industry and as the basis for advancing the state-of-the-art for in-database, analytically intensive data processing; all without sacrificing any of the appliance simplicity with which our company is synonymous.

As
a couple of us said last week, Netezza has served as “the benchmark” for high-performance DWA pricing in the industry and we are now leading “the market in pivoting to a new competitive price-performance level”. With these new systems, we have embraced a trend that has been happening around the industry – the movement of marginal cost of a bit of disk storage toward $0 – with system-sizing, pricing and even system numbering focused on the performance delivered by a given platform.

 

We think the net effect of the new, simplified pricing structure for TwinFin and the other members of the Netezza product family will create a major disruption in the market. With starting (US-based) prices that equate to under $20,000 per terabyte, TwinFin’s list price is a fraction of other competitors’ performance-system pricing (after they’re all done playing price-obfuscation games around mirror, swap and index storage).

 

TwinFin and the other new Netezza data and analytic appliance products give us the opportunity to continue to lead the market and provide our customers with the best value and performance possible for all of their data warehouse and analytic processing needs. Netezza TwinFin - because two fins are faster than one.

2 Comments Permalink

As a young lad, my Dad had purchased a 1946 Wyllis Jeep. For any of you who are Jeep aficionados, you know that this is a direct, post-war Jeep complete with starter button (war Jeeps didn't have car keys) and four-wheel shift gears). Dad had this thing re-fitted with a power take-off (a rear-gear for attaching appliances) and had purchased a bush-hog to attach to it. Off my Dad went on our property, Jeep in full tilt and bush-hog in tow, slicing and dicing bushes and small trees from our property like a veteran landscape engineer.

 

One day the trailer hitch had a an issue - the towing ball had somehow become bent and needed replacement. Yes, Dad worked these machines to their extreme. Now, if you feel a bit out of place with all these odd terms, imagine my hubris in thinking I knew everything about them just by watching my Dad work with them from the sidelines.

 

In any case, he took the Jeep in to a shop to get the thing fixed, and this mechanic started working on the trailer hitch to loosen it up. Strange thing, though, he was turning the bolt clockwise to get it undone. And everyone knows that in order to undo a bolt, you turn it counterclockwise, right? Of course, those in Australia and Brazil might not turn it this way, but that's an inside joke, too. So I quipped, "You're turning it the wrong way."

 

To which this mechanic simply replied, willing to engage an uppity kid while my Dad just offered me a hot stare, "Are you sure?". To which I responded, thinking that the mechanic actually thought I was a viable entity, "Yep, I'm sure." To which the mechanic said, "You want to bet ten dollars on it?" To which I immediately responded, thinking easy money - "You bet."

 

At this point my Dad simply leaned into me and said the words I would never forget, even to this day, as I share them with you.

 

"Never bet on the other man's game."

 

This initially had a hollow ring, considering that I was on the brink of winning ten dollars, but in that moment the mechanic wrested the object free from its mooring in spite of having turned it the wrong way all that time. And I learned something new, that some devices actually do unscrew in a clockwise direction. Lesson learned, and I did not lose ten dollars. The mechanic was merciful.

 

Licking my wounds and regarding my status of having dodged a bullet, I gained a new appreciation of knowledge, learned in a simple way, that the other man's game is something to approach with high trepidation and respect. If it really is the other man's game, he knows it better than I, so what business do I have on betting with it? It's a sucker bet at best. He knows the game better than I do.

 

So it is with the appliance wanna-bees who have attempted to bet on Netezza's game. That the appliance is the way to go, and they have invested many millions of dollars in attempting to topple Netezza, or at least steal the market share. But this is yet another case of betting on the other man's game, and nobody knows this game better than Netezza.

 

And now, Netezza has changed the game, leaving the competitiion in the dust to once again lick its wounds and wonder, why did they ever bet on the other man's game, and now, what game are they really in?

 

The new Netezza architecture has upped-the-ante on the existing game, and moved the game in another direction that in no uncertain terms, changes the game and the stakes to play it.

 

Apart from browsing the white papers and gathering your own general specification insights to the environment, I can say as a veteran who has worked with this technology extensively that I had a short wish list of things that I thought would be really nice to have. I had a short list of what I thought were functional shortcomings that I had found simple workarounds for, and could painlessly ignore. But now, with the new architecture, those few shortcomings were washed away. The short wish list was fulfilled, and so much more. And in the end, I am a happy clam.

 

On the short runway of things I am looking forward to - include the capacity to cache whole tables, Linux on the lower deck, the Intel-programmability of the parallel environment, and the additional capacity both in storage and in processing power. And these are just a few of my favorite things.

 

Once upon a time, I worked with real-time engines for embedded systems, and was enamored with one software vendor's ability to stay ahead of the pack by simply assimilating the innovations of other competitors. One has to imagine that once a vendor is out-in-front, they can maintain their position through this assimilation process. If they are not out in front, then assimilating other vendors' innovations doesn't have the same impact, because nobody is a frontrunner.

 

That Netezza can take the innovations of other (major) vendors such as IBM and leverage them through simple assimilation, is yet another testimony to Netezza's position as the well-in-front frontrunner. While other vendors attempt to duplicate or imitate, Netezza just moves on, changes the game and leaves them in the dust. Innovations from the vendor remain ensconced (and enhanced) in the new architecture, while other technologies are easily assimilated. That this has given the architecture a stratospheric boost is a testimony to the original architects and visionaries, as well as the existing ones.

 

All that's a lot of gushy sentiment, though, compared to the tailspins that the wanna-bee competitors have been in since they got their first news that the winds were changing. I could use a lot of sailor/sailing analogs here, but I'll spare you. The fact remains, the competitors are scrambling all-hands-on-deck to reset their goal for market share they never really achieved. Could this mean that they are sunk altogether and don't know it yet? Who has a crystal ball, except that we could now pump these quantities into the Netezza architecture and get an answer back faster than they could.

 

Right answer faster: Priceless.

0 Comments Permalink

Just trying to clarify. Curt Monash's informative blog on the coming Netezza system and family of products includes the following:

 

<snip>

 

Beyond the switcheroo in components, Netezza is making substantial changes to its hardware architecture. In current Netezza products, the FPGA plays the role of a disk controller on steroids — it receives data, does some SQL or other analytic operations on it, and then throws it over the wall to the CPU for the rest of the processing. The new Netezza product family, however, adds an actual disk controller. More important, it adds fast interconnects between the FPGAs, the disk controller, and RAM — specifically, as Phil Francisco put it in an email,

using multiple parallel channels of PCIe with much faster interconnection rates and lower contention between the blade server and the “DB accelerator card” with the FPGAs.

DMA (Direct Memory Access) technology also fits into the picture somehow.

 

<snip>

 

...which seems to beg further clarification.

 

While Curt suggests big changes are afoot in Netezza's “architecture” - I think a more appropriate viewpoint would be that it's “the same architecture with a new physical implementation”. That is, the concept of data streaming from disk through the system is just as important now as it ever was.

 

S-Blade Diagram.jpg

 

True, we did move the "disk controller" function to a pair of HBA (Host Bus Adapter) cards that interface with the disk enclosures using multiple, redundant SAS (Serial-Attached SCSI), and providing more than ample bandwidth to stream all the drives per rack continuously to the blades. For those who click-thru on Curt's blog, this function is embedded in the device labeled “SAS Expander Module” (one on both the blade server and the "DB accelerator") in the 3rd chart of the PDF file (and also shown above) and allows data to stream from disk through to memory and then on to the FPGA without delay.

 

SP Data Flow.jpg

 

To move data between the blade server and the DB accelerator cards, we use IBM's expansion card (formerly known as "sidecar") technology to provide multiple parallel high-speed PCIe (peripheral component interconnect express) channels delivering the data streams from the disk drives to the memory on each blade server and providing very high-speed interconnect between the FPGA devices and that same memory, using DMA (direct memory access) to effect high-speed memory access without encumbering the CPU to get at it.

 

FPGA Engines.jpg

 

With all this high-speed interconnectivity, Netezza has been able to alter the data flow so that data streams to the memory first and then to the various FAST engines (see above diagram and/or refer to Issue 16: The Latest Addition to Netezza's FAST Engines Framework) in the FPGA. Those engines act as a "turbocharger" for query processing, implementing data decompression, restricting, projecting and applying the appropriate visibility rules in a pipelined process; typically filtering out well over 95% of the data scanned. From the FPGA, the resulting reduced data set is passed on to the CPU memory for additional processing to complete the process.

 

So, the logical streaming model of data from from disk to FPGA to CPU is retained, with significantly higher throughput as a result. But there's an added benefit: the fact that the originally-scanned data can remain in memory, still in compressed & unfiltered form, to be used as a cache avoiding disk scan activity where possible and helping boost system performance even more. In short, "Change, but no Change."

 

I hope that helps - with Curt's architecture viewpoint as well as with questions about our use of PCIe interconnects to raise performance.

2 Comments Permalink

 

"Don't be afraid to try the greatest sport around

(catch a wave, catch a wave)
Everybody tries it once
Those who don't just have to put it down
You paddle out turn around and raise
And baby that's all there is to the coastline craze
You gotta catch a wave and you're sittin' on top of the world"
– from "Catch a Wave" by The Beach Boys (1963)

Surf's up! Summer seems to finally have arrived in the Boston area and a number of vendors in the data warehousing and analytics space are hoping to catch a wave riding on a flurry of industry announcements. A few trends continue to build in the news:

 

  1. Data sizes continue to grow alongside the pressure to increase performance & shrink data latencies;
  2. Workload complexity and user counts continue to grow;
  3. More and more, customers are seeing the value of running advanced analytical processing directly in their primary data repository (see item #1 for reasons why); and
  4. Industry prices for data warehousing and analytics have begun another shift downward.


Today I'd like to address this last point. According to more than one industry analyst, over the last several years, Netezza has served as "the benchmark" for DWA pricing in the industry. Several of our competitors have sought to match and/or undercut Netezza pricing in the market. Some of the incumbent players have tried to, with very limited success, hinge their pricing off Netezza prices, match the performance of the Netezza Performance Server® system, or inoculate their pricey "flagship" products by adding less-expensive, feature-deficient products to their portfolio. But Netezza has continued to succeed in the marketplace, becoming a profitable, publicly-traded company with nearly 300 customers and 400 employees worldwide and one that is listed among the "Leaders" in the Gartner Magic Quadrant.

 

When we disrupted the data warehousing market with our first generation product in 2003 and 2004, Netezza was one of very few startups in an otherwise moribund industry. Now, with established "street cred" and hundreds of loyal customers, we intend to once again upset our competitors and lead the market in pivoting to a new competitive price-performance level. We're about to launch the fourth generation platform of our data warehouse and analytic appliances, which will advance Netezza's performance leadership and once again establish a new price-performance benchmark.

 

Admittedly, we won't be the first vendor offering high-performance data warehouse systems to move to a lower pricing plateau. That task is usually done by early-stage start-ups looking to find a way to differentiate themselves. True to form, Dataupia probably can claim establishing a lower price point first and recently another multiyear "start-up" has also started lower. But those are offerings from very modestly-sized startups with no established market "track record". Netezza will be the first company with proven product maturity, customer base and financial viability to do so.

 

Just how and what are we doing to cause this disruption? Well, let's just say things around the "briefing table" have been quite hectic, and that I and others will have more news about that to follow shortly.

 

[As you might imagine, it's been getting more and more difficult to keep things under wraps – in recent weeks we've even had to fight people off from getting early "sneak peeks". ]

 

Until then hey, it's summertime! So here's what I'd recommend –

 

"So take a lesson from a top-notch surfer boy

(catch a wave, catch a wave)
Get yourself a big board
But don't you treat it like a toy
Just get away from the shady turf
And baby go catch some rays on the sunny surf
And when you catch a wave you'll be sittin' on top of the world


Catch a wave and you'll be sittin' on top of the world"

 

 

Twin Fin: A short board (usually 5'8" - 6'8") with a wide tail for maneuverability and a fin near each rail for stability in radical turns.

 

Purpose: A wider tail area provides more planing area and lift, which creates more speed by efficiently utilizing wave energy. Milking speed and energy from smart surf with extremely sensitive and responsive turning ability are this design's strong points

0 Comments Permalink

I heard this sentiment from a senior manager at a large-scale data processing facility, so I thought I'd post it as a provocative talking point. In his mind, when something went really south in the scheme of things, he had to evaluate people as to whether they were incompetent or immoral. Or something in between. You never know what a manager is thinking, apparently.

 

You see, in his mind, he needed a means to label a person rather than an activity. On the other hand, I like the sentiment of another famous consultant, who when asked how things could get so bad, would simply quip "Because honest, hard-working people did the best they could with what they had." Hmm - there's no incompetence or immorality there, just the realization that things can and do go wrong. Case in point, last year we had a project where the workload grossly exceeded the headcount to make-it-happen. With a thousand spinning plates in the air, and not enough people to keep them spinning, invariably a plate would fall to the floor and crash. The manager above would call a meeting and review why the plate crashed, and find someone to blame for it. But in the end, the plate crashed because it's what plates do.

 

Debating the "why" is a waste of time.

 

We see this when the "critical mass" exists to switch horses, sometimes in midstream, from a powerhouse, legacy and mainstay kind of technology to a new, shining future in another, more promising technology. Ahh - you see where this leads. Someone now has turf to protect, and the review of a new technology - or even the hint of replacement - is viewed as an indictment of the existing technology. And, you guessed it, an indictment of the existing technologists. Because of the manager above, the people in the mix begin to wonder if they are being labeled as incompetent, for defending an inadequate technology, or immoral, for having another motive, like defending the technology because it will help keep their job, regardless of whether it is the best choice for their company or team.

 

And if they perceive this labeling, they too will fire off their own labels and soon we see the makings of a classic conflict. I spoke with a leader who had just weathered such a conflict, and he said that he couldn't believe how quickly is seemingly objective, science-minded technologists reduced to feral animals practically overnight. He didn't really have to muscle-through the process like some other extreme cases, but it is important to note, that the conflict is real. The drama brings out interesting colors in people, and shows us what they are made of. And like the second consultant above, it's usually not bad stuff. Just human stuff.

 

People who make an investment in one technology find themselves with an emotional and professional attachment to it. Like hanging on to to a stock ticker even when it's in free-fall, hope springs eternal. Our investment isn't really for nought - if we can just wait it out. My challenge to the average Joe out there, is to do what you've been doing, stay the course and keep the high road. Bad-mouthing the existing technology, or the existing people running the technology, is not a profitable path. When we think about it, the "Enzee way" is to let the machine's power and architecture speak for itself. After all, if we have to resort to the same nefarious activities as a wannabee competitor, doesn't it speak volumes about what we really think of our favored son?

 

Some time ago, I was helping with a competitive POC and when we finally reported the metrics for loading, query and whatnot, we decided to show Netezza in its best possible light, and we agreed with everyone that this would be the case. We watched across-the-way as the competitors stayed late nights, carefully tuning their machine and its attendant parts, while we just tossed data into to the Netezza box, did some basic distribution tuning and that was that. In fact, after getting some initial metrics, we took the worst times and reported on them, not the best times for the loads and queries.

 

When we reported our final numbers, our metrics blew away the competition by a factor of five or more, in some cases much more. When we told the decision-makers that we'd only spent a few hours on the POC, and even then only reported the worst-case numbers, they were stunned. Primarily because the competitive team had spent so much energy on tuning their technology, only to fall short at 20 percent or less of what the Netezza machine could do.

 

And doesn't this kind of story speak volumes - and shows Netezza in the best possible light? After all, it's not really a good story to tell if we have to spend countless hours tuning the machine. The decision-makers know that for the sake of competition, we might spend a lot of time to "get that benchmark", but it will be the only time they ever see the benchmarked metric because they know we won't embrace that kind of intensity when we've actually deployed the technology.

 

I saw a "famous" benchmark on the internet, touted by vendors other than Netezza using technologies that were carefully tuned for the outcome. You know, like an Olympic athlete trains the daylights out of his body to get that one-shining-moment. But catch up with the same athlete years later and find them out-of-the-game, no longer the feared competitor for one primary reason - they can meet the bar once. But they can't sustain it. And this is true of the famous benchmark. They can tune the daylights out of those technologies, and take them to new, never-known heights, and break world records. But if you really want to deploy these at your site, you'll get the standard disclaimer.

 

Your actual mileage may vary.

 

This variability is not found in the Netezza experience. The machine delivers the kind of sheer power and turnaround we need just by breaking the plastic and plugging it in. We don't have to spend countless hours tuning the machine under a hot lamp and after gallons of Red Bull. Power - effortless power - at our fingertips - really is the best possible light for showcasing what the machine can do. One of the customer decision makers said just that - if it requires a swarm of people to get the competitive technology to remotely the same level as a Netezza machine can reach just by powering-on, what kind of story does this tell? That we are committing to high-intensity deployment and maintenance for the life of the technology?

 

Cost of ownership has a lot of different meanings, no?


On a more recent project, we did the common "light" organization of the data and then the report developers cut their BI tool loose on it. When the smoke cleared, the turnaround times on the reports were abysmal. Some of them executed in minutes, some of them tens of minutes and some of them never came back at all. Then the finger-pointing started (from the reporting team) and could not lay enough blame at the foot of the Netezza machine. But soft, what light through yonder window breaks? It is the Marlboro Man, to carefully show the reporting gurus why the Netezza machine is not an SMP/RDBMS machine, and needs a few additional hints (e.g. zone map refs at the query level) to make the reports turnaround at keyboard-speed. Honestly, if it were any other technology - like an SMP/RDBMS, and we encountered such abysmal turnaround time, the answer really would be to fix the database, in the data structures, the indexing, or even at the hardware level. How amazing is it that rather than "going back to formula" - we can just tweak a query or two, and lo, we have stratospheric performance?

 

As it should be.


There is a temptation, you see, to protect the turf one loves so well, by somehow telling a story that does not meet with reality. And in all this, it's no different than saying we cannot get our toaster or blender to work in our kitchen, even though we aren't using them as described in the owner's manual. Netezza is an appliance. It has measurable, deterministic behavior and simply does not deviate from its prescribed, self-contained nature. For someone to claim that a kitchen toaster doesn't work, one only has to ask a few simple questions to determine whether the toaster really doesn't work, or if it's just not being used correctly.

 

And in our case, the Netezza machine is a more complex horse. But the interface to the horse is still the same - a pair of reins and a pat on the neck, and the horse behaves just like we expect it to. Of course, getting four-hundred horses to behave the same way in lockstep, is a matter of architecture. But imagine how much work you could get done if you could package four-hundred-horspower for useful work? It's the difference between a 32-horsepower (SMP/RDBMS) oatmeal-mobile ---  or a 400-horsepower street machine with e-brake for those drifting stunts.

 

Yeah-man, give me the street machine any day.

0 Comments Permalink

Now here's a luxury we don't see every day. After all, if we're a car manufacturer, perhaps an aircraft builder, we have to get it right and make it fast all as part of the original design.

 

Which is not to say that we should just choke up a bunch of data structures and expect Netezza to cover our backs - oh wait - perhaps we really can do this, but it's not always practical.

 

Early in my career I worked with embedded / real-time systems, and while some really believe that they work with "real time" - here's the purist definition: A robot balancing a broom on its open palm. The robot has to make infinitesimal adjustments, in real time at the microsecond level, to keep the broom from falling. In "business" real time, however, we have the luxury of whole seconds to make a decision!

 

In these embedded systems, we had to take things through to a complete functional shakeout. Only then could we see where events collided or didn't make sense. So-called "race" conditions and meta-stable conditions - yep - for those of you who know what these things mean, it can cause you do go gray early and stay that way.

 

Ahem.

 

So the maxim here was always "get it right, and then make it fast". I took this "RTI" (little three-letter-acronym (TLA) for Run-Time-Improvement), into a commercial venture where I developed an expert system engine for medical claims processing. Only when the system was behaving correctly could we then find ways to pinpoint the hot-spots. In one case, one percent of the claims took over 90 percent of the processing. We dug deep to find the issue, optimized for it and voila - the system screams like a banshee. In fact, these improvements boosted processing speed for the areas that were already fast - so it was a double-win.

 

People asked us at the time why we allowed the development process to "suffer" with "poor" performance up until the very end - when we knew good and well that attempting to optimize it while it was still in functional flux, didn't make any sense. We can do sweeping improvements only so much and only so deep. But what if we spent all of our time improving something that three weeks later the client says they don't want any more, or want to go in another direction, and all of our improvements are for nought?

 

Nay, wait a little longer - make it right, then make it fast.

 

Fortunately - and quite luxuriously - in our space the Netezza platform dovetails directly into this approach. It already has all the juice to help us succeed even if we do things badly or inefficiently. But when the smoke clears and we have a working prototype, it's time to roll up sleeves and pop-the-hood, so to speak, to do some RTI on a working system. This is where it gets fun.

 

The following is a short list and is by no means comprehensive

 

(A) - line up the operations in their natural sequence. Find the ones that are longer-running and optimize them. This will provide some degree of relief, but it's still just low-hanging fruit.

 

(B) Locate in-line calculations in the where-clause or join-clause, and reduce or eliminate them. In one case, we need to join on a date-time where the "drift" allowed the timestamps to be different by plus-or-minus three seconds. Rather than put the time+3 and time-3  calculation in the actual query, we precalculated an additional two columns, one with the time minus three and one with time plus 3. We then used a simple "between" operator to get the answer. Time to market - 1000:1 difference in the two run times.  In-line calculations in the where-clause are an invisible power drain. Get them outta there.

 

(C) Find the processing patterns and consolidate them. For example, we had a BI tool executing eight queries to achieve a report output - and each query took approximately 20 seconds. Not bad for having to plumb the equivalent of 53 terabytes of information. Yet eight queries like this delayed the report's display by 160 seconds - almost three minutes. Users don't like to wait this long. So to optimize, we focused on the pattern, that the same basic query existed at the core of each of the eight. By taking the "hard part" and performing an up-front query that did all the heavy-lifting at once, we were able to take the 20 second penalty only once, and the 8 downstream queries returned in 4 seconds or less. So - 160 seconds to 34 seconds with a simple logistical change.

 

(D) Pre-filter or precalculate when consolidating. This means taking a common downstream operation, especially a repetitive one, and moving it upstream to another, even unrelated operation. The above time-drift is one example. Calculate and filter as early as possible, and then all the downstream operations benefit from it. This can offer up more than just a "spot" boost, because if it shaves a few seconds off every downstream query, this can quickly add up to shaving tens of minutes and even hours off our overall processing time.

 

(E) Mind the gap - of what we understand about how an RDBMS works versus Netezza. For example, if we want to leverage filter power in Netezza, we could use a "where exists" clause rather than a regular join. If the regular join cannot leverage a distribution key, then the where-exists is a highly performant option. Likewise if we have a view in the join that does more than just serve up data, like doing a sub-join itself. This can be very costly, and is another hidden drain on the performance that we can pull into a where-exists. So another "gap" is in merging two data sets where they have no common distribution key. The where-exists and similar operations force the machine to obey our optimizations, because we actually know the data, where the machine simply exhanges it with us.

 

(F) Avoid squeezing blood from a stone - It is tempting for said adrenalin junkie to see a cool way to reduce 2 minutes to several seconds - after a rush like this, it becomes addictive. We should not let it go to our head. In one case of processing a nightly batch, one of the client's 14-hour processes had been reduced to less than fifteen minutes. Yet some in the room still groused for more - they came up with an outrageous plan to reduce the processing to less than five minutes, but only four people on the entire planet could ever maintain it, much less enhance or improve it. At some point we have to agree that enough is enough - especially when we are sacrificing valuable things (like extensibility, flexibility etc) for the sake of a few more minutes of adrenalin rush. We must resist!

 

(G) Focus on the target system, not the one you are leaving behind. I've never known a case where someone moves into a new home and refuses to use the appliances in the home just because they didn't exist in their old one. Nor have I ever seen someone buy an new home and then attempt to fill it with new furniture that would only fit in their former home. Who does stuff like this? No, we should use the former system as a functional baseline (describing what)  but not focus on how we implemented the baseline - rather our new technology gives us the ability to spread our wings and fly in ways that we never could in the old environment. Example: One client had over 400 stored procedures to convert, and regarded these stored procedures as the baseline for the actual work, rather than the baseline for the functionality alone. When re-characterized, it reduced to four flows with a handful of core operations each - all with very simple implementations. Trust me, when it takes its final form in Netezza, it will look nothing like its former self.

 

Haven't yet experienced a run-time improvement cycle, or committed time to making it happen? It's worth it - even if only for a sanity-check on an existing implementation or a proof-of-concept on an upcoming one.

 

Either way, we can reach functional closure in a fraction of the time of any other system, and once the functionality is stable (not necessarily locked down, just stable) we can find surprising and dramatic boosts with little additional effort.

 

 

Make it right (functionally speaking) - and then make it fast -

0 Comments Permalink
1 2 3 Previous Next