Feb 23, 2010 3:42 PM
Better to store pre-computed values or compute on-the-fly?
-
Like (0)
I currently have a table with about 500 million rows. Each row represents one transaction and includes 10 financial columns. In order to support international business we need to add the ability to support optionally having different currencies for the buyer and the seller -- this means I, in effect, need to have either:
CREATE TABLE SAMPLE_WITH_INLINE_DATA
(
... some non-financial columns ...
-- Note financials are stored in bigints because we store fractional pennies.
financial_column_1 BIGINT NOT NULL,
financial_column_2 BIGINT NOT NULL,
financial_column_3 BIGINT NOT NULL,
financial_column_4 BIGINT NOT NULL,
financial_column_5 BIGINT NOT NULL,
financial_column_6 BIGINT NOT NULL,
financial_column_7 BIGINT NOT NULL,
financial_column_8 BIGINT NOT NULL,
financial_column_9 BIGINT NOT NULL,
financial_column_10 BIGINT NOT NULL,
buyer_financial_column_1 BIGINT,
buyer_financial_column_2 BIGINT,
buyer_financial_column_3 BIGINT,
buyer_financial_column_4 BIGINT,
buyer_financial_column_5 BIGINT,
buyer_financial_column_6 BIGINT,
buyer_financial_column_7 BIGINT,
buyer_financial_column_8 BIGINT,
buyer_financial_column_9 BIGINT,
buyer_financial_column_10 BIGINT,
seller_financial_column_1 BIGINT,
seller_financial_column_2 BIGINT,
seller_financial_column_3 BIGINT,
seller_financial_column_4 BIGINT,
seller_financial_column_5 BIGINT,
seller_financial_column_6 BIGINT,
seller_financial_column_7 BIGINT,
seller_financial_column_8 BIGINT,
seller_financial_column_9 BIGINT,
seller_financial_column_10 BIGINT
) DISTRIBUTE ON {whatever};
-OR-
CREATE TABLE SAMPLE_WITH_CONVERSION_RATES
(
... some non-financial columns ...
-- Note financials are stored in bigints because we store fractional pennies.
financial_column_1 BIGINT NOT NULL,
financial_column_2 BIGINT NOT NULL,
financial_column_3 BIGINT NOT NULL,
financial_column_4 BIGINT NOT NULL,
financial_column_5 BIGINT NOT NULL,
financial_column_6 BIGINT NOT NULL,
financial_column_7 BIGINT NOT NULL,
financial_column_8 BIGINT NOT NULL,
financial_column_9 BIGINT NOT NULL,
financial_column_10 BIGINT NOT NULL,
buyer_currency_conversion_factor BIGINT,
seller_currency_conversion_factor BIGINT
) DISTRIBUTE ON {whatever};
In the case of simple multiplication you are much better off only storing the currency and each value separately. You can create a view that multiplies all the columns out for you on the fly. If the calculation was really complex you might want to store the calculated result, but in this case the multiplication should result in better performance and obviously less disk usage.
Rather than storing as bigints you might want to think about numerics - they are scaled integers (bigints) but have some representation precision that is handy for business logic - doesn't cost you anything more.
As for computation - the typically basic math in the select-clause won't get you into trouble (except in some cases of multiplication you may get overflow) - but anything that is row-level calculation is usually okay as long as you are not doing the math in the where-clause. If you find yourself doing the math in the where-clause, it is better to precalculate the values and simplify the where-clause with common operators like "=" or "between". Also if you are doing a lot of case-logic in the select clause, this can also drag your query down so is sometimes better in a precalculation.
I have also written a blog post (Tastier Float) that talks about using a view for row-level conversions of numeric to float to avoid math overflow. The CPU-level cast is far less expensive than the I/O hit of the double-precision when coming off the disk, and the numeric also benefits from compression where the float does not.
I didn't look real close at the table. I agree with David that you want to store the data using the NUMERIC data type instead of bigint. NUMERIC data in Netezza is actually stored as an integer value by the database. All math operations on NUMERICS are actually integer math as well.
The value 1234567.890 is actually stored as the integer value 123456789000000 in a NUMERIC(18,8) column. This also saves you the trouble of dealing with the decimal points yourself.

