Aug 27, 2010 12:49 PM
query with unions
-
Like (0)
hello -- I am trying to convert a large table with the following format
tab1: id | fld1 | fld2 | ... | fld500
into a table with the following format
tab2: rowid | colid | value
rowid corresponds to row_number in tab1
colid is a mapping of field name in tab1 to an integer. eg. fld1 -> 1, fld2 -> 2
I wrote a Python script which generates the following:
select iv.rn as rowid, 1 as colid, fld1 as value from (select *, row_number() over (order by id) as rn from tab1) iv
union all
select iv.rn as rowid, 2 as colid, fld2 as value from (select *, row_number() over (order by id) as rn from tab2) iv
union all
...
Obviously, this depends on having a field with distinct elements in tab1 on which I can specify an order by failing which row_number for each union would be different. The alternative approach is to create a temp table, tab1tmp with an additional column, the row number, and then running the above query.
Is there a different approach which does not require a field with distinct elements, and would still be able to use an inline view? thanks.
You'll definitly want to create a temp table as it will be much easier to read. If you don't have a unique identifier you can use the rowid pseudo column...
create table temp1 as
select row_number() over (order by rowid) as rn, t.*
from sourcetable t;
Instead of passing through the table multiple times it would be better to do this as a cartesian join using a case statement, so if you had 5 columns in the table TEMP1, you could do this:
create table columnids (colid integer);
insert into columnids value (1);
insert into columnids value (2);
insert into columnids value (3);
insert into columnids value (4);
insert into columnids value (5);
insert into target_table (id, colid, value)
select rn, colid, case when colid = 1 then column1
when colid = 2 then column2
when colid = 3 then column3
when colid = 4 then column4
when colid = 5 then column5 end
from source_table, columnids;
In Netezza 6.0 if you had 64 or less columns (a user defined table function can only accept 64 arguments), you could use a table function to do this in a single pass. For example, using nzLua:
insert into target_table (id,colid,value)
select tf.id, tf.colid, tf.value
from source_table s, table(nzlua('
function processRow(argc, c1, c2, c3, c4, c5)
id = getNextId()
rows = {}
rows[1] = { id, 1, c1 }
rows[2] = { id, 2, c2 }
rows[3] = { id, 3, c3 }
rows[4] = { id, 4, c4 }
rows[5] = { id, 5, c5 }
return rows
end
function getShape()
return {{"id", integer },
{"colid", integer },
{"value", numeric(18,4) }}
end',
s.c1, s.c2, s.c3, s.c4, s.c5 )) as tf;
The nzLua function "getNextId()" generates a unique id on each call. It will not necessarily generate a sequential id since the values are generated in parallel, but each value will always be unique. The general idea here is that a table function can act as a join, producing multiple output rows for each input row.
I'll be curious if we can seamlessly do this with R's melt operation after the iClass release. Our "R" guys like to brag (with good reason) how sweet some of the R transformations are.
-D

