Many of those who integrate the mainstream BI tools into various underpinning data sources find subtle nuances. Not the least of which is how the database will respond to the queries presented. In Netezza data access especially, the power is not found in the query, but in the hardware. We can certainly degrade our experience with bad queries, but we would not tune queries in the same manner as with an SMP/RDBMS.
For example, I've watched RDBMS engineers work black-magic with a query by simply rearranging this-or-that in the monolithic query to provide boosts in the orders-of-magnitude. This is because the query is being used to guide the general-purpose physics. In Netezza, however, the purpose-driven physics snips the query apart. The physics then guides the query's mechanics. I've watched newbie Netezza folks nearly pull their hair out - and their eyelashes too! - when trying to "make the machine do what I want". Hmm, no, the machine does what it does. It's an appliance. We get what we want when we conform the data to the physics. The query is just along for the ride.
How does all this apply to multi-pass SQL in a BI Tool? Well, most BI tools come to the table with a pre-conceived notion that all databases are created equal. Unless they have specific VLDB hooks, and unless those hooks fully embrace VLDB principles, the BI tool will not experience the expected lift and we'll likely have to help it out. In fact, little about a BI tool is purpose-built in regards to its data source. It regards data sources as general purpose interfaces so it can be as vendor-neutral as possible.
Unlike a standard star-schema, many VLDB tables are fact-sized tables containing billions of rows, as are their dimensional counterparts. So a single one-shot query will sometimes provide the functional answer but with unacceptable performance. Many of us have seen multi-page (hey, 100+ page) queries that try to do everything in one shot. The average RDBMS leaves us few options. The VLDB and especially Netezza is not so constrained. We can make multiple passes on the data often with little penalty. The danger here is in the inefficiency of the passes, not whether multi-pass is okay. Multi-pass, or more appropriately multi-stage SQL, is a necessary approach with large-scale tables. Netezza makes it simple and fast, using built-in concepts of its performance model.
Here is a spot case-study - a BI tool needed to access several tables that were each in the many billions of records. The end result was a summary of user-selected values. The temp-table creation here is done automatically by the BI-Tool, so we may have limited options in getting it to shape them as needed. In the examples below, I'll label the queries so we can reference them later.
A typical BI tool, upon realizing it needs a summary, will often divide the answer into multiiple stages of work. Each stage will store its result in a temporary table usint a CTAS, leveraged in one or more following passes. Unfortunately these passes are sometimes inefficient. In the case below (this is pseudo-SQL, so bear with me here)
(1a) create t1 as select b, x, y sum(b) sumb, sum(c) sumc from table1 where key1=4 group by b, x, y (1 million records)
(1b) create t2 as Select x, y, a, t2.b from table2 t2, table3 t3 where key2=3 and t2.b=t3.b (500 records)
(1c) select b, sumb, sumc, x, y from t1, t2 where t1.b = t2.b and t2.a in (41,42) and t1.b = 6; (450 records)
Note how in the above, the filter effects are largely applied last (1b and 1c) with the summaries applied first (1a). In this case, it is summarizing over a million values but it throws away over 90 percent of this result on the last operation, reducing 1 million records to 450. If we invert this protocol and regard the filters first, we might see queries like this:
(2a) create t1 as select x, y, b from table2 t2, table3 t3 where t2.b=t3.b and t2.a in (41,42) and
key1=4 and key2=3 and and t2.b=6 and t3.b=6 ; 15,000 raw records
(2b) select b, sum(b) sumb, sum(c) sumc, x, y from table1, t1 where table1.a = t1.a and key1=4 group by b,x,y; 450 summary records
In the above, the filters are pushed into the first part of the query chain (2a) to squeeze down the data sizes, but to also prepare a "filter table" effect on the larger table in (2b). Not only will we forego all the throwaway work, we will work with smaller data sets overall. The key here is in balancing workload versus filter.
The simple inversion of the query order has radically reduced the workload of the entire chain of events. This of course, does not answer whether our BI tool will actually implement the query in this order or manner. I can tell you this, with the above tables the original "table1" was over 30 billion very wide rows. The first query chain (1a-1c) takes no less than a minute, but only because key1 is zone mapped. The second query chain (2a-2b) takes 6 seconds or less.
Yet another pernicious issue is not obvious from the above - temp table distribution. This last query chain, though 6 seconds in duration, is still a one-hit wonder. Once two or more users start hitting the machine, concurrency will reveal all. The machine is quickly saturated and all of the queries start to take more and more time. In one case of just five users on the machine, all of the queries took over a minute, and one took over five minutes. Concurrency tuning is a bread-and-butter issue, too, so what's going on here?
In both query chains, the CTAS is not being given explicit instructions on how to distribute its results. The outcome is unpredictable from the BI tool's perspective, but very predictable for us. When the CTAS result remains distributed on its original distribution, we get a co-located write. If the CTAS does not use the original distribution, it will have to redistribute the data, broadcasting it all over the SPUs. We need to avoid this because co-located writes are desireable and muey caliente.
The original distribution key for all these tables is the column (a). So - is column (a) being preserved as the distribution for maximum performance?
In the first chain of queries (1a-1c), we would expect to see the following CTAS defaults:
(1a) - distributed on (b,x,y) because this is the group-by clause. It cannot use column (a) because it's not even in the result set.
(1b) - distributed on (x) because it happens to be the first column in the select-clause. This query uses two tables in the join, so
CTAS will opt for using a column in the select clause.
So in this case, the CTAS will not preserve the original distribution. In the first query (1a), it has no option. Considering this query will produce a million records (and honestly, some cases it produced a couple of billion records) we really need some optimization here.
If we were to take (2a) and (2b) above to deliberately enforce the distribution, we only need look at (2a) and include the notation "distribute on (a)" at the end. This would maximize the table for the (2b) query for a co-located join. We have now pushed the workload into the physics, not the query itself. But as noted, concurrency is the test. This chain of queries then returned in less than 3 seconds, and did not grow beyond 4 seconds until 20 users were running the same query at the same time, and even then tended to hover between 3 and 5 seconds as even
more users were added. Isn't this the kind of scalable performance we want?
But we can see, the two protocols we will need in play from the BI tool is to use filtration first and summary last, and then also apply distribution keys deliberately to the first passes to preserive distribution. We often apply these very same protocols in ELT because they make sense. But we have complete, detailed control of query construction in ELT, not so in the BI Tool world.
Recap:
- Multi-stage SQL, especially for summary data, should perform the summary as the final operation, with filtration in the first passes.
- Multi-stage SQL should not address the same large table more than once. Get everything we will need and get out - don't keep coming
back for something the first pass did not get. - Multi-stage SQL should perform filtration at the outset, as a method toward attacking the larger table(s). Conquer smaller-to-larger.
- Multi-stage SQL should preserve distribution to leverage co-located write and read. This maximizes overall performance but also
opimizes concurrency.
What if the BI tool will not, as a general-purpose tool, perform these deliberate and purposeful query chains? At this point, we need to have a heart-to-heart with the BI Tool vendor stating our concerns. Assume the best, that the tool vendor may eventually fix the issue, just not in time to help us now. We then need to consider two purpose-built options, each of which has its own issues. These are offered in the spirit of temporary adaptation until the BI tool is smart enough to bypass them.
Summary tables: These are often constructed to prop up database performance issues. They are just as viable for functional reasons, such as providing data in a form that is only available and most efficient when summarized, or to intersect details with pre-summarized data. But if used as a performance prop or BI Tool helper, put some effort into making it an adaptation that could be deprecated when the BI Tool is smarter. This way, we're not committed to it forever.
Stored procedures: Used in an appliance as an adaptation mechanism (in this context). Effectively bridges the BI tool to the data with a temporary procedural construct (the procedure) rather than a more permanent structure (like a summary table). Stored procedures pull application features down to the database level and adapt the BI tool into the Netezza performance model.
When or whether to use either of the above is always a design decison, not necessarily dictated by the tools themselves. But keep in mind the idea of temporary adaptation. I am always of the mindset that the warehouse and BI environment must exist with the expectation of change, so in general, adaptability and adaptation concepts are always desireable. They allow us to be more responsive to future requirements

