27 January 2012

The differences between a transactional database and an analytical one

Imagine the following scenraio:
A business decision has been made to create a BI project. After all the ETL process has been completed, and the decisions had been made about what's in and what out of the BI project; After all the data had been incorporated from several different systems (including the Excel spreadsheets in the "My Documents" folder); After the cube had been processed and reports had been created to please the business, somebody (else) from business comes with a question regarding one of the good old transactional systems, and expect the new flashy BI project to answer.  Now you need to come with a good reason why a "no can do" is actually the correct answer.
So here's an extract from an email I've sent regarding this issue. I've changed the names and some of the details to avoid identification, so the transactional database is called "DB" and the analytical database is called "BI". If you decide to cut & paste this post and send it to your business analyst, change the "DB" to the name of the particular database, and the same with the "BI".
More homework: check the DB versus the BI project. What are the questions that the DB is supposed to answer quickly and efficiently? What are the questions that the BI project is supposed to answer quickly and efficiently? Once done, put them down as "Examples" and spread them across you email.
Dear Business Analyst, 

Enclosed hereby please find a thorough information about DB, BI and the differences between them.

1.       Purpose:
a.       DB is a transactional database. It is meant to give you an answer for questions for the here & now. It is optimised for UPDATE, INSERT & DELETE operations.
b.      BI is an analytical database. It is optimised for GROUP BY (operations like SUM, AVG, COUNT etc.) and for reporting.
2.       Time:
a.       DB is a transactional database, and therefore it doesn’t retain historic information. While some data is being kept for auditing purposes, if you want to find out how many red bicycles were sold in the month of March 2011 you'd look in the AdventureWorks_DW, not in AdventureWorks.
b.     BI is an analytical database and therefore it maintains historic information across the time dimension. The question in (a) is considered pretty easy for BI querying.
c.       Granularity  – DB is changed constantly, BI is changed according to the frequency of the feed. While you can change the frequency of the feed, the changes are not being caught by the minute and we’ll always “miss” a change in order to take a snapshot of the bigger picture.
3.       Differences in values:
Theoretically, you’d expect that if you download the data from BI at a certain point in time and compare it to the DB then the values would be identical. Unfortunately, that is not true. It could because in the DB there are values that are incorrect; It could be because the DB keeps some fields for the purposes of answering questions that are relevant to its purpose, but are not being exported to the BI as they convey very little meaning in terms of the bigger picture. As the BI tries to incorporate data from more than one system that's quite likely.
In summary:
If you compare at a certain point in time the differences between DB and BI you’ll find them vast and you’ll probably get pretty frustrated at the differences. This comparison is useless.
What BI can give you (if we put the correct data inside) are changes and trends across time. BI isn’t meant for an exact snapshot at a certain point in time; that's the purpose of the DB. 

Sincerely Yours, 
Your BI developer

No comments:

Post a Comment