Saturday, October 3, 2009

The Balance of processing

Should something be processes on the fly or precompiled when the star schema is loaded with data (the ETL process)? Its all a matter of timing ,stability and data storage . Obviously if you precompile something, it has to be stored somewhere and this can be restricted by how much storage capacity is available. Conversely if everything is calculated on the fly, this can dramatically deteriate the performance of a report. For this assignment there are too many unknowns. How fast are their servers ? How much data storage is available ? Anticipated volumes of data ? Who knows !

These decisions are something that I would make implicitly but I guess this project has really galvanized my train of though on the matter. Some are applicable to the assign, while

Rules of Thumb.

  • If something will only need to be processed once and it will not change , its best done during the ETL load. A good example is age category. Assuming there is a standardised category 18-24, 25-34, etc.why process this again and again on the fly?
  • If something is likely to be altered, changed, or there are dozens of ways to calculated it. Do it on the fly. Eg Percentage of late Sales. If this is done at the ETL level, this would need to be recalculated for every region or state.
  • If a report is standardized and isn't likely to change, create an aggregate table (select net_proceeds from sales_fact group by region). By creating aggregating a table, the load of processing data on the presentation layer can be significantly reduced. This only is applicable if data storage is not an issue and the volume of data being calculated is large.

No comments:

Post a Comment