Skip navigation
18972 Views 11 Replies Latest reply: Nov 10, 2010 1:16 PM by kmohan RSS
caro New Enzee 27 posts since
Aug 3, 2009
Currently Being Moderated

Feb 23, 2010 4:05 AM

CTAS with distribution key

hi all,

 

when I create a table as CTAS, the distribution column in the target table is the first column of the source table, even though the source table is distributed on random.

 

 

Could i change the distribution column while running the CTAS querry.

 

 

Thanks.

  • Superuser Rookie 90 posts since
    Sep 19, 2008
    Currently Being Moderated
    1. Feb 23, 2010 4:18 AM (in response to caro)
    Re: CTAS with distribution key

    ofcourse, try the below

     

    create table test_new as select * from test_old distribute on random;

  • Vijay New Enzee 9 posts since
    Jan 5, 2009
    Currently Being Moderated
    3. Feb 23, 2010 7:38 AM (in response to caro)
    Re: CTAS with distribution key

    DIstribute on random is not a preferred way for Distribution. It will cause performance degradations unless the table is a small one

  • David Birmingham Active Enzee 426 posts since
    Sep 24, 2007
    Currently Being Moderated
    5. Feb 23, 2010 8:20 AM (in response to caro)
    Re: CTAS with distribution key

    use the 'distribute on' clause as the last part of the CTAS - some reminders:

     

    if you use a key other than the source table, you will initiate a broadcast from the data's current SPU to the new one. It's transparent but for larger results is more costly than maintaining the distribution (you sacrifice the co-located write)

     

    If you are bridging from one distribution to another so that your next operation gets a co-located join/read, be sure to check the distribution of the result. If the result is highly skewed, you might want to consider doing the CTAS with some additional sorting logic to manufacture a zone map to mitigate this condition. The co-located join alone is not as efficient if one of the joining tables is highly skewed, but if using a zone map, skewing is mitigated.

  • David Birmingham Active Enzee 426 posts since
    Sep 24, 2007
    Currently Being Moderated
    7. Feb 23, 2010 9:59 AM (in response to caro)
    Re: CTAS with distribution key

    As your CTAS table grows larger, it will become an issue. This is especially true for CTAS tables created in a BI reporting product. The temporary nature of the table is not the issue. If the table is poorly distributed then any join-operation consuming it will run less effciently. For smaller tables this may not seem like a big deal, but smaller tables invariably grow. It is best to apply a practice at the outset so there is no worry later, and it doesn't really cost anything to apply it. When performance starts to degrade because it was not applied earlier then for goodness sake apply it now. It is easy to apply later, just not so easy to diagnose later.

     

    But again, the temporary nature of the CTAS result is not the issue. We want all ELT or BI reportng queries to run at top speed, so optimizing them with these simple tactics keeps them scalable with no mysteries.

  • doniuppa New Enzee 1 posts since
    Sep 15, 2009
    Currently Being Moderated
    9. Nov 10, 2010 9:22 AM (in response to Superuser)
    Re: CTAS with distribution key

    Is there a way to do a CTAS keeping the distribution columns of the source table?

  • Shawn Fox Enzee Exraordinaire 1,326 posts since
    Aug 15, 2006
    Currently Being Moderated
    10. Nov 10, 2010 9:57 AM (in response to doniuppa)
    Re: CTAS with distribution key

    The only time CTAS does not keep the same distribution is for tables which have been distributed on random.  In that case, CTAS uses the first column as the distribution key.  It isn't a very good default behavior, but that is the way it works presently (still shows the same behavior on 6.0).

     

    In general though, relying on default behaviors isn't a good idea as the default can always change in the future.  Specifying the distribution key is a much better approach even if the default distribution key is (currently) what you want.

  • kmohan New Enzee 69 posts since
    Oct 23, 2006
    Currently Being Moderated
    11. Nov 10, 2010 1:16 PM (in response to Shawn Fox)
    Re: CTAS with distribution key

    Use CTAS carefully. I personally prefer explicit definition.

    CTAS drops the "Default Clause" in a table. This caused a lot of grief to me.

     

    ---

    Ex:

    scdwtst(ndwdm)=> create table tab_xyz (name varchar(30), id integer default 100);
    CREATE TABLE
    Elapsed time: 0m4.971s
    scdwtst(ndwdm)=> \d tab_xyz
                           Table "tab_xyz"
    Attribute |         Type          | Modifier | Default Value
    -----------+-----------------------+----------+---------------
    name      | character varying(30) |          |
    id        | integer               |          | 100
    Distributed on random: (round-robin)
    scdwtst(ndwdm)=> create table ctas_tab_xyz as select * from tab_xyz;
    INSERT 0 0
    Elapsed time: 0m2.004s
    scdwtst(ndwdm)=> \d ctas_tab_xyz
                         Table "ctas_tab_xyz"
    Attribute |         Type          | Modifier | Default Value
    -----------+-----------------------+----------+---------------
    name      | character varying(30) |          |
    id        | integer               |          |
    Distributed on random: (round-robin)

    scdwtst(ndwdm)=> create table tab_xyz (name varchar(30), id integer default 100);

    CREATE TABLE

    Elapsed time: 0m4.971s

    scdwtst(ndwdm)=> \d tab_xyz

                           Table "tab_xyz"

    Attribute |         Type          | Modifier | Default Value

    -----------+-----------------------+----------+---------------

    name      | character varying(30) |          |

    id        | integer               |          | 100

    Distributed on random: (round-robin)

     

    scdwtst(ndwdm)=> create table ctas_tab_xyz as select * from tab_xyz;

    INSERT 0 0

    Elapsed time: 0m2.004s

    scdwtst(ndwdm)=> \d ctas_tab_xyz

                         Table "ctas_tab_xyz"

    Attribute |         Type          | Modifier | Default Value

    -----------+-----------------------+----------+---------------

    name      | character varying(30) |          |

    id        | integer               |          |

    Distributed on random: (round-robin)

    -----

    Regards
    Krish Mohan

More Like This

  • Retrieving data ...

Bookmarked By (0)

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points