Business Intelligence in Sydney: 2011

14 December 2011

Avoid visible attribute hierarchies for attributes used as levels in user-defined hierarchies

Truth is, I never understood what this message above meant, so I just ignored it.
Yesterday I went to an SQL User Group about PowerPivot and PowerView, a very interesting lecture by
Peter Myers, and now I finally understand.

The blue squiggly line under Dim Date is due to that fact that in the Properties window,

the AttributeHierarchVisible is set to "True", while as you can clearly see there is a perfectly reasonable Hierarchy defined. Hiding those features (setting them to False in the properties window), removes the squiggly blue line and resolve this error

09 November 2011

No id, please

When I was 16 I got my first id card, that’s obviously a milestone in every person’s life. Not even using it for much (maybe for seeing M rated movies) it means that I have “id”. The second time I got an id was after I immigrated to Australia; as an immigrant to show Australian id is a real milestone.
In the SQL world - databases vs. data warehouses - we have two process. Learning databases you learn that normalization involves that each table has a unique key, and that every row in each table can be identified by a unique key. The first normal form requires every row to have a unique key. This unique key can be an “id” - identifier, or a composite key.
For many years I’ve been working with MS-Access. In Ms-Access, the best practice would be to have the unique key as an Identifier and not as a composite key; that’s due to the fact that if you want to link from one database to another it links the tables better. It’s usually best practice to split the access application to back end (data) and front end (user interface) in two different mdb files, so when you try to change network location (or migrate the date to sql server) and you need to change “in bulk” the location of the referring database, if you have a field that’s defined as an “id” it would be done quicker and more accurately than it would if you have a composite key.
In data warehouses’ design things are different. While designing a dimension every row must have a unique key which is a foreign key in the fact table. This unique key better be truly unique: not only it should be different than the “name” or the natural key of the external system, it should also be acting as a surrogate key, in case we’re going to change the “external system” that we’re working with. But apparently, a fact table doesn’t have to have a key! A unique composite key is “good enough”, and that's because we're dealing mainly with measures, and some people claim that even that isn't really necessary. Think "Sum" and then "group by"; no, we don't need Id as identity over there, do we? Moreover, I’ve found out that using a composite key gives me the ability to minimize the data and to warn me whenever I’m doing an error, like inserting the data with the wrong date. (I don’t disable the indices while inserting; I’d rather lose half a minute waiting for data to be inserted, than lose a day’s work thinking what went wrong with the job I’ve done. For my size of database it is “a good enough practice”).
So far I’ve found out that’s personally, this is the most difficult part of “denormalization”. Another thing which I needed to get my head around is that actually, when we’re discussing Hierarchies in SSAS, we don’t really need the “id” there, especially not for Slowly Changing Dimensions. Let’s take “employee” as an example: I’d group by the key of “employee id” and add a line for a change in the name, for example. As I group by the “employee id” I would like to see all that’s relating to a certain employee no matter if they got married/divorced in October and therefore changed their name... Keeping the “id” as an attribute, but outside of the “Hierarchy”, enables me to browse the Hierarchy correctly without seeing repeating lines of Employee Id where I shouldn’t.

03 November 2011

Adding parameters to my MDX query

In my last post I've discussed a recursive MDX query. In this post I'll describe how I've parametrized the date for this query.
As you remember, in that query there was a set defined like this:

SET [OrderedFUM] AS (

    FILTER(    

    ORDER (([Dim Portfolio Client].[Dim Client Group Id].children,

        [Dim Date].[Id].&[20110630]),

        [Measures].[FUM Value], BASC),

        ISEMPTY ([Measures].[FUM Value] )=0)

        

    )

In real life, of course, I don't want to make a specific date; I want to add the date as a parameter to my MDX query. The requirement is for "sum per client per month" as the data is stored on a monthly basis. After lots of mocking around I've done the following, and I recommend following the same procedure for anyone else who is trying to parametrize an MDX query in SSRS:

Create a new report in BIDS.
Define a data connection.
Copy the MDX query "as is" into the Query Designer (the DataSet part)
Define a parameter in the "parameters" tab

In order to achieve that I have to use SSRS: After I've created the Data Source and the Data Set and added the MDX "as is" into the Query Designer, I've created the report and formatted it to my liking. Then I've added the parameter in the "parameters" section. I called it "Month" and made it of data type text.
Now I need to add the "date" in the SET as a parameter, not a constant. Therefor, I had to change the defintion of the measure "OrderFum" set as follows:



SET [OrderedFUM] AS (

    FILTER(    

    ORDER (([Dim Portfolio Client].[Dim Client Group Id].children,

        STRTOMEMBER(@month)),        [Measures].[FUM Value], BASC),

        [Measures].[FUM Value]<>0)

    )

And in order to make the code more readable (=less references to parameters), I changed the other reference to "dim date" in the RANK function from

[Dim Date].[Id].&[20110630]

To

[Dim Date].[Id].currentmember.

And now in the "parameter" box I need to insert the following very clear code
[dim date].[id].&[this is where the day of the month actually gets in - the actual parameter]
For Example:
[dim date].[id].&[20110930]
You would expect that I would use the "@month" parameter differently, and chances are it could be true.While unfortunately there are changes in using StrToMember function between Sql Server 2008 R2 to Sql Server 2005, the StrToMember Functions are quite complicated, as Mosha clearly explains. In order to make this code a bit more usable, I just added [Dim Date].[id].&[20110930] as a default parameter; the beauty is, that if my business analyst wants to play around with this, he just needs to change the date, not the entire text in the text box. Of course it is prone to human error; I, for once, forgot that it is [Dim Date] and not [DimDate].... And yes, I've tried StrToValue but to no avail.
For conclusion: definitely not intuitive, get ready to bang your head in the wall several times during using parametrised MDX with SSRS.

26 October 2011

Recursive Documented MDX

The challange:

I was asked by my business analyst to group the end of month results like this:

FUM	Number of clients (CG ID)	Sum of FUM
Up to $100,000
$100,001 - $250,000
$250,001 - $500,000
$500,001 - $1,000,000
$1,000,001 plus

Summarise the value of the portfolio per client and group them according to their values. In a way it means to give a dimension value to the measures. That has to be done, of course, while not losing the granuality of the data.

The data, obviously, looks like this:
Client id, Security id, Security value, date id

(and we insert the data on a monthly basis).
In order to achieve this I've googled, and googled, and googled some more, and reached the following links:
First, there is a term called Recursive MDX, and thanks to Mosha's excellent work you can read about it. Another thanks are due to Richard Lee, who explains when not to use it (hint: there are plenty of date functions in MDX that exist right now, so please learn them if you need to summarise "by month up till now"). I also read his post regarding Group by Measure Range Dynamically and the code below is based on his. The only difference you'll find below is that I've added much needed documentation, as I wasn't even sure about what I was doing (a hint: lots of copy, paste, try & error...)So here's the code, and below please find the documentation:

WITH 

MEMBER Measures.FumRange  AS

CASE 

    WHEN [Measures].[FUM Value] <= 0 THEN 'Negative'

   WHEN [Measures].[FUM Value] > 0 AND  [Measures].[FUM Value]<=100000 THEN '0-100K'

   WHEN [Measures].[FUM Value] >100000 AND [Measures].[FUM Value] <=250000 THEN '100K-250K'

   WHEN [Measures].[FUM Value]> 250000 AND [Measures].[FUM Value] <= 500000 THEN '250K-500K'

   WHEN [Measures].[FUM Value] > 500000 AND [Measures].[FUM Value] <= 1000000 THEN '500K-1M'

   WHEN [Measures].[FUM Value] > 1000000 THEN 'Over 1M'

END    

SET [OrderedFUM] AS (

    FILTER(    

    ORDER (([Dim Portfolio Client].[Dim Client Group Id].children,

        [Dim Date].[Id].&[20110630]),

        [Measures].[FUM Value], BASC),

        ISEMPTY ([Measures].[FUM Value] )=0)

        

    )    



MEMBER Measures.FumRank AS RANK (([Dim Portfolio Client].[Dim Client Group Id].currentmember,

                                    [Dim Date].[Id].&[20110630]), 

                                    [OrderedFum])

                                    

MEMBER MEASURES.RangeTot AS

    IIF((OrderedFUM.item(Measures.FumRank-2), Measures.FumRange)= Measures.FumRange,

             Measures.[FUM Value] + ( Measures.RangeTot ,OrderedFUM.item(Measures.FumRank-2)),

             Measures.[FUM Value])

    

MEMBER MEASURES.RangeCountCG  AS

    IIF((OrderedFUM.item(Measures.FumRank-2), Measures.FumRange)= Measures.FumRange,

            1 + (Measures.RangeCountCG, OrderedFUM.item(Measures.FumRank-2)),

            1)

MEMBER MEASURES.RangeTotal AS 

    IIF((OrderedFUM.item(measures.FumRank),Measures.FumRange)=Measures.FumRange,

        NULL,

        Measures.RangeTot),

    FORMAT_STRING = "$#,##0.00;-$#,##0.00"                 

    

    

SELECT {[Measures].FumRange, Measures.RangeTotal, Measures.RangeCountCG} ON 0, 

NONEMPTY(OrderedFUM, Measures.RangeTotal) ON 1                                    

FROM [xplan DWH]

So, what's going on in here?
(The definitions are from Kevin S. Goff Article: The Baker's Dozen. I recommend reading it as he writes clearly and the information is invaluable)
First of all, we're defining our own measures, and that's why I had to begin with



WITH

keyword, tells that this is what comes below is the list of measures to use.
Second, the measures are:
A list of the values to group by, a "CASE" statement: the same in MDX as it would be in SQL.

  MEMBER Measures.FumRange  AS

CASE 

    WHEN [Measures].[FUM Value] <= 0 THEN 'Negative'

   WHEN [Measures].[FUM Value] > 0 AND  [Measures].[FUM Value]<=100000 THEN '0-100K'

   WHEN [Measures].[FUM Value] >100000 AND [Measures].[FUM Value] <=250000 THEN '100K-250K'

   WHEN [Measures].[FUM Value]> 250000 AND [Measures].[FUM Value] <= 500000 THEN '250K-500K'

   WHEN [Measures].[FUM Value] > 500000 AND [Measures].[FUM Value] <= 1000000 THEN '500K-1M'

   WHEN [Measures].[FUM Value] > 1000000 THEN 'Over 1M'

END

Then I need to define the SET. The Set statement

 SET [OrderedFUM] AS (

    FILTER(    

    ORDER (([Dim Portfolio Client].[Dim Client Group Id].children,

        [Dim Date].[Id].&[20110630]),

        [Measures].[FUM Value], BASC),

        ISEMPTY ([Measures].[FUM Value] )=0)

        

    )

enables me to group all the Sec Value (we use the term FUM for this measure for some reasons) by client id according to the month of June. Not all the clients have FUM value every month, and that's why I FILTER to check if there is value, using the ISEMPTY keyword. I order BASC by the measures and please note that
[Dim Portfolio Client].[Dim Client Group Id].children

is Actually equivalent to

[Dim Portfolio Client].[Dim Client Group Id]
I added it for readability only.

Now I needed to Rank every member of the set according to their value, using the RANK Keyword.

 MEMBER Measures.FumRank AS RANK (([Dim Portfolio Client].[Dim Client Group Id].currentmember,

                                    [Dim Date].[Id].&[20110630]), 

                                    [OrderedFum])

And now the real fun begin.
First of all, I need to summarize all the members of a certain set that falls in a certain category, as defined in the first measure. This is where the recursive MDX is defined:

MEMBER MEASURES.RangeTot AS

    IIF((OrderedFUM.item(Measures.FumRank-2), Measures.FumRange)= Measures.FumRange,

             Measures.[FUM Value] + ( Measures.RangeTot ,OrderedFUM.item(Measures.FumRank-2)),

             Measures.[FUM Value])

Or, in English:
The Measure "RangeTot" gives you for each member the previous [FUM VALUE] in the measures. The IIF statement says: If the the measure falls within the FUM range, then add it to the previous one (note the recursion defined), if it doesn't fall than this is our stopping point (the false condition).

But if you try to run it you wouldn't get much, you actually need to call this measure using

 MEMBER MEASURES.RangeTotal AS 

    IIF((OrderedFUM.item(measures.FumRank),Measures.FumRange)=Measures.FumRange,

        NULL,

        Measures.RangeTot),

    FORMAT_STRING = "$#,##0.00;-$#,##0.00"

And I've formatted the string to show "currency" as give it much required readability.
The same recursion is required in order to count the number of clients in the client group, and this is done like this:

MEMBER MEASURES.RangeCountCG  AS

    IIF((OrderedFUM.item(Measures.FumRank-2), Measures.FumRange)= Measures.FumRange,

            1 + (Measures.RangeCountCG, OrderedFUM.item(Measures.FumRank-2)),

            1)

Once again, the condition for stopping is for not being in the range, then you just add one, otherwise continue counting.

And now in order to display the data:

 SELECT {[Measures].FumRange, Measures.RangeTotal, Measures.RangeCountCG} ON 0, 

NONEMPTY(OrderedFUM, Measures.RangeTotal) ON 1                                    

FROM [xplan DWH]

The measures RangeTotal must appear both on Rows and on Columns (0 and 1 Axys), otherwise it isn't summed up. I could have used Measures.RangeTot instead of RangeTotal ON 0, but that wouldn't have been formatter. Of course, there's no need to "format" a recursive expression and yes, this code works and displays the data as it should!

Comments:

Write the same code in SQL for a certain month just in order to validate your data; I love the way you can give more power to the business analyst by well defined measures in MDX
I'll write in my next post about how to parameterise the month and how to display the data.

18 October 2011

Dimension Relationship in a Recursive Hierarchy

I'm now working on a data warehouse project with SSAS 2008 R2 (Sql Server Analysis Services). It is very refreshing to do something different after doing MS-Access for so long.

While designing the dimensions in Analysis Services I had the following problem: working in a Financial Company, every client has a client group. Sometimes it is the same client group as the client, and sometimes it isn't. Pretty classic material:
John & Jane Smith (account id: 1) and John Smith (account id 2) both belong to the same "client group" of John & Jane Smith. So far nothing special: the dimClient table has a self referencing attribute to dimClient with the id of dimClientGroupId. The problem arises when I need to group by properties of the client group, and not by the client itself. Sometimes the adviser for "dimClient" is different then the adviser of "dimClientGroup", and the business needs to view the data by the Client Group, and not by the Client.
So I have quariable attributes that are different in different parent-child levels and I need to present the data properly. I tried using the "client group adviser" as an attribute to the client but it didn't work. It also didn't group the data correctly.

So, how do I use the attributes of the parent dimension in a self-referencing dimension?(Please note: using "self referencing" is better than "parent-child", because then you get to all sort of parenting material; useful, but not what I'm looking for right now!)

Generally, it is advised against using too many parent-child hierarchies in Analysis Services, unless "there's a genuine business requirement" (which happens to be the case).

So, after designing all the dimensions I had to link the "adviser" dimension to the "client" dimension. That was easy (and maybe even done automatically) as every client has an adviser. Now I had to link the "adviser" dimension to the "client group" dimension and this is where I got stuck (after all, "client group" is a self-referencing dimension and is treated by SSAS differently!).
My solution has two levels. I might change between them in the future, as there are no guarantees that my first level will continue to work.

Using "referenced dimensions" I linked the "adviser" dimension to "client".
Now, I've linked the "client" dimension to "client" again and - voila! - I have the "client group" dimension and I can view in the cube the "client group.adviser" dimension and filter by it. A warning: trying to deploy the cube generates an error: "Errors in the metadata manager. The 'Dim Client Group Id' intermediate granularity attribute of the 'Dim Client' measure group dimension does not have an attribute hierarchy enabled.". Nevertheless the cube can still be processed and view from the SSMS - Analysis services, and the numbers are correct.
There is a different solutions to the same problem that is suggested in here (under the "parent-child Hierarchies" short section):
As the numbers of Hierarchies in this business model is fixed (every client has one and only one client group), Creating a bridge table of "portfolio client" which includes only the client and the client group (both are foreign keys to the "dim client" table) and again, using it as a reference dimension as follwos:
[fact]portfoio -> [dim]portfolio client ->[dim] Client Group ->[dim]Adviser sorts this problem in an even clearer manner and without ugly error messages. A good explanation about bridge table could be found in here (they bought me with the term "Graph Theory").

10 October 2011

Change a dimension to a fact

Life don't always follow the "best practicing" methodologies of Life. "Star Schema" is the best and most preferred way to design the data warehouse; unfortunately, it's not always possible. In the industry I'm at (financial planning) you would expect the measures to be some financial product valuable (i.e. money); so the major feed is actually clients and their financial products. The portfolio is the "fact" table (easy to decide by the size of it) while the clients and their attributes are the "dimension" table. Sounds OK, until they asked for the number of clients. Easy enough, you'd say, every client must have a representation in the "portfolio" fact table, so all we have to do is to use "distinct count" on the "client id" in the fact table, right? Wrong. Some clients have financial products, but not in the feed of the portfolio but in some other file which hasn't been created yet. Again, before you scream "bad design!" may I add that the original intention was to create a "source of truth" which lays in an outsourced system, and only later to create the date warehouse itself?
So here I am, trying to create count of clients per month per adviser, or in DW/BI terms: trying to use a dimension table of a slowly moving dimension (that has the "date" only as time attributes) as a fact table (which would have a link to the date as a dimension). Or, as my Business Analyst put it: Before, the "portfolio" was in the center of what we want. Now, the center of attention is the client.
The solution?

Using indexed view from dimClient, created a view called "factClient". The view takes only the necessary fields from the dimClient table (no "date inserted" and no "valid till" - which is defaulted to 31 of December, 9999) and joins them to the dimDate table.
The view is created like this:

SET ANSI_NULLS ON

GO



SET QUOTED_IDENTIFIER ON

GO



SET ANSI_PADDING ON

GO



SET CONCAT_NULL_YIELDS_NULL ON

GO



SET ANSI_WARNINGS ON 

GO









CREATE VIEW [dbo].[factClient]

WITH SCHEMABINDING

AS

SELECT d.[id] AS dimDateId

      ,1 AS ClientGroupCount

        ,C.[id]

      ,C.dimAdviserId

      -- Other dimension fields and other necessary fields

      --

      --

  FROM [dbo].[dimDate] d INNER JOIN

    [dbo].[dimClient] C ON c.DateInserted <= d.DateFormat AND c.ValidTill > d.DateFormat

  WHERE d.HasPast <> 0

As you can see I deleted (for the sake of this view) the columns that are particular to my database, but what I've kept are the followings:
First, I've added a field in the Date dimension table that relates weather this data has past or not. This fields get updated after the ETL process completes. The reason for that is that I didn't want to inner join with the (large) fact Portfolio table, and the other is that any other way of querying that client would result with an error of "Cannot create index on view 'dbo.factClient'. The function 'getdate' yields nondeterministic results. Use a deterministic system function, or modify the user-defined function to return deterministic results."
The second thing that happens is that it gives you the user id, and a dimension, like this:
Member Id   dimDateID      dimAdviser Id    Count
1                  1 Dec 2010     1                            1
1                  2 Dec 2010     2                            1
......
etc,
Which gives me the ability to query by adviser on a different date dimension. Giving every line the value of "1" enables me to "sum" without using a "distinct count" (or any count, for that matter) as distinct count requires a unique measure group.
Please note that I'm using SQL Server 2008 R2 which enables me to use the "date" data type instead of "datetime" datatype.
Please note that the "dateFormat" field in the "dimDate" table is in order to give the "date" version of the date id: so when the key is 20101130 (data type: int) the "date format" is '20101130' (data type: date).
(More about "Indexed Views" in here)

Now there's a new fact table the following stages should be made in the Analysis Services project:
Add the new fact table to the DSV (data source view) file
Create the necessary relationships between the fact table and the dimensions surrounding it
Create a new measure group in the cube. Make sure that the measures are defined correctly (the [Member id] field is an INT, but I don't want it to be summed!)
Hand it over to your Business Analyst and hope that s/he approves...

18 September 2011

MdxScript(DB) (8, 5) Parser: The syntax for ',' is incorrect

This problem appeared all of a sudden out of the blue in my Cube, while trying to calculate a new member. I usually create a calculated new member using the "Calculate" sub-tab in the toolbar of the "Cube" Tab in SSAS (2008 R2).
The Solution is to go to "Script View" in the toolbar and this is what I saw there:

CREATE MEMBER CURRENTCUBE.[Measures].[Calculated Member]

 AS , 

VISIBLE = 1  ;

With a red squiggly line under the "AS ,", as it did need a name.
Deleting those messy rows helped, as well as re creating the Calculated Members that I've needed.
Apparently, I was trying to create a new "calculated member" more than once and one of them wasn't edited. As I had quite a few (more than 10) I didn't even notice.
So while using the SSAS MDX features knowing actually how to write MDX is a useful feature even for beginners such as myself.

15 August 2011

How to find out how many decimals are in a column

This is the script to finding out the maximum number for a decimal in a column (how many decimals are there after the decimal point)



  SELECT

      MAX(LEN(CAST([aFloatColumn]AS VARCHAR(255))) - CHARINDEX('.',CAST([aFloatColumn]AS VARCHAR(255))))

     

  FROM[dbo].[table_1]

27 May 2011

COM Error on PublishForm for Outlook (and you wouldn't believe the solution!)

The problem: an oft file for deployment.
Here is some code in C# which I've copied & pasted into my C# code: (source is in here)

string filename; string location;
_Application olApp = new Microsoft.Office.Interop.Outlook.ApplicationClass();
               _NameSpace olNs = olApp.GetNamespace("MAPI");
               MAPIFolder oInbox = olNs.GetDefaultFolder(Microsoft.Office.Interop.Outlook.OlDefaultFolders.olFolderInbox);
               _MailItem oMailItem = (Microsoft.Office.Interop.Outlook._MailItem)olApp.CreateItemFromTemplate(location, oInbox);
               Microsoft.Office.Interop.Outlook.FormDescription oFormDesc = (Microsoft.Office.Interop.Outlook.FormDescription)oMailItem.FormDescription;


               oFormDesc.Name = filename;
               oFormDesc.PublishForm(OlFormRegistry.olFolderRegistry, oInbox);
               oMailItem.Close(OlInspectorClose.olDiscard);

And this code would fail on the line


               oFormDesc.PublishForm(OlFormRegistry.olFolderRegistry, oInbox);

With an error "COMException was Unhandled"

The reason:
The filename (the parameter given) included the string " - ".
So for filename "Broker forms" - pass,
"Broker - forms" -fail.

06 January 2011

Update on a linked table failed

The problem:
"Update or insert of view or function 'function_name' failed because it contains a derived or constant field (#4406)"
in my access database which is linked to SQL server

The reason:
The underlying view is of type "union"

The solution:
change to a single source (not a union select)

Business Intelligence in Sydney