Data Modeling Archives - Page 2 of 3

Posted on January 21, 2011March 29, 2019 by Russell Beech — Leave a comment

Avoid BI Breakpoints With Cornerstone Solutions

BI System Builders in their End to End BI philosophy refer to the concept of BI breakpoints. The application of our Cornerstone Solutions^® will help you avoid BI breakpoints, but what are they?

Here’s an example of a small breakpoint in a BI system with relational tables that illustrates the escalating effort and associated cost. It starts with a poorly defined report specification. In this example no one identifies the specification as being problematic. So the report is developed according to the specification. However, when the report reaches User Acceptance Testing (UAT) it fails. After consultation with the end user that carried out the UAT it is understood that two important report columns are missing, so the report specification is changed. However, the new data columns are not supported by the dimensional model, so this also must be changed requiring modelling and DBA work. This consequence in turn means that the universe and the ETL packages must be changed and so on and so forth. You can see here the interdependence and knock on effect between the components leading to escalating effort and cost.

However, if you know what to look for you can pre-empt where the BI breakpoints might occur and put quality gates in place. Here are some examples of where BI breakpoints might occur in the dimensional model design of a relational database:

The use of a data warehouse generated artificial primary key rather a composite primary key on a fact table
Snowflaking in a relational model (not SAP OLAP)
A table in which its nature is not made explicit though clear naming convention
Table joins leading to chasms or fan traps
Fact to fact joins (there are exceptions)

Now a dimensional model design can be tricky to fully evaluate while it remains purely on paper or a computer screen but it is possible at this point to put a quality gate in place to minimise the risk of future BI breakpoints. Firstly, it can be checked for best practice design principles to avoid the types of things listed in the bullet points above. Another quality gate can occur after the physical model has been generated. This quality gate is to execute business questions against the tables using Structured Query Language (SQL) and to validate that they can be answered. To choose the business questions to be executed via SQL refer to the report specifications or consult the business users.

It is better to write pure SQL statements using a SQL tool than to use SAP BusinessObjects to generate the business questions in queries. It’s a false economy to wait for Web Intelligence reports to be designed to test the physical model. The reason for this is that the design should be validated as soon as possible, and before the universe is generated. To achieve this some test data must be loaded into the tables but it is far more efficient to identify a design weakness at this stage than later on where changes to the physical model will have a knock on effect on the universe design and any dependent reports.

If it becomes apparent in the quality gate that overly complex SQL has to be generated to answer a business question then the design should be revisited and optimized. The business questions should be focussed around any fact to fact joins, snowflaking or bridging tables as these are potential BI Breakpoints. Their existence can lead to very poor performance, miscalculations due to chasms and fan traps, and the prevention of drill across.

To minimise the risk of BI breakpoints occurring the BI Architect should introduce best practice principles along with quality gates early on. Here’s an example of the best practice principle of designing aggregates. The use of aggregates upfront can address three breakpoint areas before they occur:

Slow query response times
Report rendering times
Overly complex reports

It is of course vital to select the correct aggregate type to support the end user requirement. There are several different types of aggregates for example invisible aggregates (roll ups), new facts (ETL pre-calculated fields), pre-joined aggregates and accumulating snapshots etc. Aggregates allow the processing effort to be pushed way from the report and into the database engine and ETL program where it can be more efficiently executed. However, the use of specific aggregates should not be decided upon until the detailed report specifications are available. Choosing the most appropriate aggregates to support the reports requires skill and experience and is not a menial task.

Aggregate tables can significantly reduce the size of the row set returned to the BI reporting tool. As a rule of thumb it is always better for BI reporting tools to work with smaller row sets. It is always best to force as much processing effort as possible back to the ‘gorilla’ of the BI system – the server itself. It can be even better when processing effort and calculations are forced back into the ETL program. However, care should be taken to educate end users when the effort is pushed back into the ETL program for calculated facts, so called ‘New Facts’, because the facts may be semi-additive or not additive at all.

The downside of using aggregate tables is the increased maintenance e.g. administering ten tables instead of five and the ETL effort required to support them. However, their benefits outweigh their cost.

BI breakpoints can manifest in far more areas than just relational tables, they may be seemingly invisible, and costly when they occur because of the independencies of the BI system. However, applying governance to the BI system around breakpoint areas is not always the most popular notion at implementation. This is because it requires the effort to think clever and to apply hard work, rigour, and discipline upfront to pre-empt problems that are almost invisible. We may be tempted to take short cuts, but, it’s better to apply the extra early effort. Entering into small skirmishes and battles in the early stages to iron out BI breakpoints is much better than allowing them to go unchecked and develop into big fire fights later. Regardless of whether your BI system relies on SAP OLAP cubes and queries or relational tables BI breakpoints are a real threat.

Posted on July 23, 2010March 29, 2019 by Russell Beech — Leave a comment

Improving Web Intelligence Deep Drill Down Performance

Business Intelligence information consumers and analysts often have the important requirement to drill up and down through an organisation’s data hierarchies. The ability to drill up and down through an organisation’s data enables effective analysis allowing the analyst to identify both underperforming and over performing areas.

Native drilling capabilities exist within Web Intelligence under the name of ‘Scope of Analysis’. To understand Scope of Analysis it is important to know that a Web Intelligence report always returns data from the database in the form of a microcube. Enabling the ‘Scope of Analysis’ functionality then allows drill down and drill up functionality to be performed on the microcube. Web Intelligence uses a microcube because it is built for flexible ad hoc ability to quickly build reports on the fly and for fast query execution. However, the limitations of the ‘Scope of Analysis’ functionality must be understood. When a microcube is built with drilling capabilities beyond four levels (that is four dimensions) the query performance can suffer. Large multi-dimensional queries requiring drill down through numerous dimensions require tools such as OLAP.

Many organisations have hierarchies that run beyond four levels. When large hierarchies are being utilised it is best to use the SAP OLAP client or a suppression technique within a Crystal Report. However, if Web Intelligence must be used then the requirement to drill down beyond four levels without a performance hit can be realised through vertical drilling hierarchy tables.

The basic technique is to map the source system data through staging tables to a horizontal dimension table with hierarchies and then transpose the table to a vertical hierarchy dimension table. The existing horizontal hierarchy dimension data can be transposed to a vertical hierarchy structure using SAP BusinessObjects Data Integrator as part of the ETL process. Before this can happen the vertical hierarchy dimension tables must be modelled in the data warehouse.

The hierarchy tables are then configured in the SAP BusinessObjects universe where they become available to Web Intelligence. There is some complexity that must now occur in the development of the Web Intelligence report. Basically a data provider with a query which includes the @prompt syntax in a filter is developed to run against the vertical tables. After that a block is developed in the report with hyperlinks that use the opendocument.jsp function. The parent report hyperlinks are designed to call themselves with the opendocument.jsp function (the parent and target reports are the same report) but the logic of the report is such that the prompt will be populated with a node of the hierarchy one level up or down from the current level. This forces the report data to refresh at a level higher or level lower than the current and effectively allows the user to drill up or down through the hierarchy.

There are some limitations to this system. Firstly, the development overhead for the tables and ETL jobs. Secondly, it takes some advanced skill to be able to develop the report. Thirdly, drill down is only at one level per click i.e. there is no multi-level tree walking. That is that from level one in the hierarchy the user must go to level two, it is not possible to jump immediately to level three. This is not true if Crystal reports is utilised.

Posted on May 23, 2010March 29, 2019 by Russell Beech — Leave a comment

The Aggregate Advantage

This article is about the advantage of using aggregate tables in your data warehouse when using Web Intelligence as your reporting tool. Whilst it is possible to configure a SAP BusinessObjects universe for use with Web Intelligence on low level detail database tables such as transaction tables, it is rarely a good idea to do so.

The primary purpose of Web Intelligence is to allow users to perform ad hoc analysis, speedily slicing and dicing data and drilling up and down through it. The speed of this on the fly analysis is possible due to the architecture of Web Intelligence. When a query is executed Web Intelligence creates a small cube of cached data referred to as a microcube. Note the word ‘microcube’. It is not a macrocube, it is not an OLAP cube and Web Intelligence is not an OLAP tool, so we must not try to use it as one. In our experience we have found it best not to overload the microcube with enormous amounts of cached data.

You should go to the transaction tables with Web Intelligence only when you really need to get to that level of detail. If your query returns 10000 rows of data from a transaction table are you really going to scroll through all those rows for analysis purposes? If your answer really is yes, you might want to consider using Crystal Reports instead because it has a different underlying architecture more appropriate to that type of reporting.

If your answer is no, then let’s consider a reporting example. In this example your reporting requirement is to view your sales revenue figures by location, by product category, by year and by month. Now you could run your query against transaction tables and then use Web Intelligence to sum up your revenue. This approach has three weaknesses. The first is that the actual SQL code generated by your query has to execute against very big database tables and so may be slow. The second problem is that you may be returning thousands of rows to your microcube. The third issue is that you are then asking the Web Intelligence program to sum up or aggregate these thousands of rows of data and in doing so you are failing to exploit the power of your database server and instead stressing the Web Intelligence report engine.

There is a better way; your report is a candidate for aggregate tables as it is using summary data. So what are aggregate tables? Simplistically they are tables that hold data that has been summed up or aggregated in some way. Aggregated tables are not necessarily a simple summing up; they can also consist of other aggregates such as pre-calculated ratios, but the concept is a simple one. You take a transaction table, maybe holding 100 million rows of data, then using the Data Integrator tool you extract the data, aggregate it, and then load it into another table holding the summary data. In our example the table would be sales revenue by location, by product category, by year, by month. Now let’s say that because of the aggregation the summary table only holds 1 million rows of consolidated data, what effect does this have?

Firstly, the SQL query is faster because there is much less data in the table to query. Secondly, you are not asking Web Intelligence to do the summing up; that already happened behind the scenes in the data load. Thirdly, because the data is already summed up, significantly less data needs to be cached in the microcube, and consequently the Web Intelligence performance will be excellent.

This article is only a simplistic introduction to aggregation and clearly there is much more to consider than this. However, it has served its purpose, which is to point you in the right direction.