Skip navigation
14333 Views 3 Replies Latest reply: Feb 23, 2010 5:24 PM by Shawn Fox RSS
Drew DeNardo New Enzee 1 posts since
Sep 15, 2009
Currently Being Moderated

Feb 23, 2010 3:42 PM

Better to store pre-computed values or compute on-the-fly?

I currently have a table with about 500 million rows.  Each row represents one transaction and includes 10 financial columns.  In order to support international business we need to add the ability to support optionally having different currencies for the buyer and the seller -- this means I, in effect, need to have either:

 

CREATE TABLE SAMPLE_WITH_INLINE_DATA

(

   ... some non-financial columns ...

   -- Note financials are stored in bigints because we store fractional pennies.

   financial_column_1   BIGINT NOT NULL,

   financial_column_2   BIGINT NOT NULL,

 

   financial_column_3   BIGINT NOT NULL,

 

   financial_column_4   BIGINT NOT NULL,

 

   financial_column_5   BIGINT NOT NULL,

 

 

   financial_column_6   BIGINT NOT NULL,

 

   financial_column_7   BIGINT NOT NULL,

 

   financial_column_8   BIGINT NOT NULL,

 

   financial_column_9   BIGINT NOT NULL,

 

   financial_column_10   BIGINT NOT NULL,

 

   buyer_financial_column_1   BIGINT,

 

   buyer_financial_column_2   BIGINT,

   buyer_financial_column_3   BIGINT,

   buyer_financial_column_4   BIGINT,

   buyer_financial_column_5   BIGINT,

   buyer_financial_column_6   BIGINT,

   buyer_financial_column_7   BIGINT,

   buyer_financial_column_8   BIGINT,

   buyer_financial_column_9   BIGINT,

   buyer_financial_column_10   BIGINT,

 

   seller_financial_column_1   BIGINT,

 

   seller_financial_column_2   BIGINT,

   seller_financial_column_3   BIGINT,

   seller_financial_column_4   BIGINT,

   seller_financial_column_5   BIGINT,

   seller_financial_column_6   BIGINT,

   seller_financial_column_7   BIGINT,

   seller_financial_column_8   BIGINT,

   seller_financial_column_9   BIGINT,

   seller_financial_column_10   BIGINT

) DISTRIBUTE ON {whatever};

 

-OR-

 

 

CREATE TABLE SAMPLE_WITH_CONVERSION_RATES

(

   ... some non-financial columns ...

   -- Note financials are stored in bigints because we store fractional pennies.

   financial_column_1   BIGINT NOT NULL,

   financial_column_2   BIGINT NOT NULL,

   financial_column_3   BIGINT NOT NULL,

   financial_column_4   BIGINT NOT NULL,

   financial_column_5   BIGINT NOT NULL,

   financial_column_6   BIGINT NOT NULL,

   financial_column_7   BIGINT NOT NULL,

   financial_column_8   BIGINT NOT NULL,

   financial_column_9   BIGINT NOT NULL,

   financial_column_10   BIGINT NOT NULL,

   buyer_currency_conversion_factor   BIGINT,

   seller_currency_conversion_factor   BIGINT

) DISTRIBUTE ON {whatever};

 

 



 



 

The "_INLINE" table will be larger on the disk, and therefore will incur more i/o to load and process.  The "_WITH_CONVERSION_RATES" table will be smaller on the disk (less i/o), but I'll have to do 20 multiplications for each row each time I use it.  My experience with other "traditional" DBMS systems leads me to prefer the "_INLINE" option for performance/scale reasons, but I've come to discover that Netezza doesn't really resemble "traditional" DBMS systems when it comes to issues of scale.
I've done some benchmarking on my Netezza box and, frankly, I cannot detect much difference in response time between the two approaches.  I thought I'd ask here to see what best practices have fallen out of everyone's experience with this sort of issue.


  • Shawn Fox Enzee Exraordinaire 1,326 posts since
    Aug 15, 2006

    In the case of simple multiplication you are much better off only storing the currency and each value separately.  You can create a view that multiplies all the columns out for you on the fly.  If the calculation was really complex you might want to store the calculated result, but in this case the multiplication should result in better performance and obviously less disk usage.

  • David Birmingham Active Enzee 426 posts since
    Sep 24, 2007

    Rather than storing as bigints you might want to think about numerics - they are scaled integers (bigints) but have some representation precision that is handy for business logic - doesn't cost you anything more.

     

    As for computation - the typically basic math in the select-clause won't get you into trouble (except in some cases of multiplication you may get overflow) - but anything that is row-level calculation is usually okay as long as you are not doing the math in the where-clause. If you find yourself doing the math in the where-clause, it is better to precalculate the values and simplify the where-clause with common operators like "=" or "between". Also if you are doing a lot of case-logic in the select clause, this can also drag your query down so is sometimes better in a precalculation.

     

    I have also written a blog post (Tastier Float) that talks about using a view for row-level conversions of numeric to float to avoid math overflow. The CPU-level cast is far less expensive than the I/O hit of the double-precision when coming off the disk, and the numeric also benefits from compression where the float does not.

  • Shawn Fox Enzee Exraordinaire 1,326 posts since
    Aug 15, 2006

    I didn't look real close at the table.  I agree with David that you want to store the data using the NUMERIC data type instead of bigint.  NUMERIC data in Netezza is actually stored as an integer value by the database.  All math operations on NUMERICS are actually integer math as well.

     

    The value 1234567.890 is actually stored as the integer value 123456789000000 in a NUMERIC(18,8) column.  This also saves you the trouble of dealing with the decimal points yourself.

More Like This

  • Retrieving data ...

Bookmarked By (0)

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points