Scroll

Optimizing Reports for Database Aggregation

Part of the Performance and Scaling series.

Exago BI can run aggregate (or summary) calculations on any arbitrary data set of your choice. However, causing an Exago execution engineweb application or remote execution schedulerto do so can be under-performant for large sets of data for two reasons:

  1. The application must pull every individual detail row from the database
  2. The calculations are done in the application, not the database

Thus both the network and the application server will be bottlenecks to higher performance. We can optimize away these inefficiencies with Database Aggregation (DB Agg).

In short, DB Agg pushes aggregate calculations to the database when possible, which allows for far fewer data rows to have to be sent across the network and aggregated on the web server.

Best Practices

While DB Agg is designed to work invisibly without requiring any additional user input, it cannot work in all scenarios. We recommend some best practices in order to make sure that a high percentage of your reports can take advantage of the speed boost. Each best practice is designed to work around one of the inherent limitations that will prevent DB Agg from working.

Separate visualization detail rows to a Linked Report

Limitation: Showing detail rows defeats the purpose of DB Agg.

Solution: For visualization reports based on group data, suppress or remove the Detail section. You can create a secondary report with the detail rows and link it to the chart as a Drilldown. This way the primary visualization runs quickly. Then when the database must be queried for detail rows, you will retrieve smaller, filtered subsets instead of the full data set.

Example

Instead of this:

before.png

Do this:

after.png

Construct tables in the database, not the application

Limitation: Certain types of joins—one-to-many and cross joins (Cartesian products)—and custom columns, which are created in the application rather than the data source, are incompatible with DB Agg.

Solution: Do join and formula logic in the database instead of in the application. As long as a data object generates SQL, then any constructed table will support DB Agg. Stored procedures, table-valued functions, user-defined functions, and custom SQL statements are all supported.

Use pre-aggregated tables

This one's cheating, because we're not actually going to use Exago's DB Agg feature at all. Instead, we can just aggregate in the database ourselves using custom SQL statements. Using the API's ability to supply custom SQL objects for new reports, we can create data objects on the fly where the detail rows themselves are simply database-aggregated values for other tables.

Example

The following SQL statement, supplied in a custom SQL object through the API, joins two tables and returns three aggregated columns before ever reaching the Exago application.

SELECT Order.Id AS Id, COUNT(OrderDetail.Id) AS IdCount, MAX(OrderDetail.OrderId) AS OrderMax, MAX(OrderDetail.ProductId) AS ProdMax
  FROM OrderDetail INNER JOIN Order ON (OrderDetail.OrderId = Order.Id)
  GROUP BY Order.Id
  ORDER BY Order.Id ASC

Conclusion

If you take away one lesson from this article, it should be the following: DB Agg will work brilliantly for improving performance whenever you don't have to pull the individual rows across the network. Almost every pitfall will occur when the application is forced to see the full data set to give you the result you need. Many such situations can be avoided with foresight and preparation.

Tip

There is no visual indication that DB Agg is occurring; however you can find out by inspecting the log file. With logging set to DEBUG level and DB Agg turned on, Exago BI will indicate whenever it was unable to use it, and the reason why. Just search the logfile for the following text:

SQL aggregate requirement failed due to:

And the text after the colon (:) will indicate the reason why DB Agg could not work:

SQL aggregate requirement failed due to: cell [=AggCount({OrderDetails.Id})] contains aggregates that can be duplicated due to one-to-many relationships

Tell us if you come up with any clever solutions to enable DB Agg in your environment. We love to hear what our clients can do with a little bit of inspiration and elbow grease.

Until next time, Happy Reporting!


Hidden Article Information

Article Author
Exago Development
created 2018-01-24 17:37:08 UTC
updated 2018-01-26 22:42:22 UTC

Labels
aggregate, scaling, summary, calculation, agg, performance, db,
Have more questions? Submit a request