Power BI – Art.3 : Build the analytical model

Oct 25, 2015 - by Vivian Lejeune

Building an analytical model means preparing a model that can easily be understandable and used by an end user. Typical tasks are:

  • Naming entities and their attributes meaningfully.
  • Create hierarchies to prepare analysis paths when the tool make this possible.
  • Hide objects that are not usefully for consumption (e.g.: technical elements like IDs, …).
  • Create calculations.
  • Create relationship between entities to enable data analysis from different point of view

Introduction

Our analytical model can simply be deduced from the different entities we’ve created in the previous article. The logical model is this one:

This model is quite simple but we can see that it contains an N to N relationship between the “Track” and “Speaker” entities. In order to use this kind of relationship in the modeling tool of “Power BI” (aka “Power Pivot”), we need to normalize this relationship by creating an additional entity that we will name “FactlessSpeaker”.

Another thing is missing from this model: a “Time” entity that will enable us to analyze the data by year, quarter, etc. Analytical models generally need this kind of entity.

The final model will then be something like this one:

Before starting using the modeling tool, lets create the missing entities.

 

FactlessSpeaker Query

The “FactlessSpeaker” entity can be created by doing the following steps:

  1. Go back to the query editor by clicking on the “Edit Queries” button of the toolbar.
  2. Create a new query by referencing the “Source_Data” query and name it “FactlessSpeaker”.
  3. Filter the lines that have a value beginning with “Speaker” in the “Attribute” column.
  4. Remove all the columns except “Id”, “Value” and “Attribute”.
  5. Change the type of “Id” from “Number” to “Text”.
  6. Replace the “Speaker “ value of the “Attribute” column by nothing.
  7. Add a “TrackId” custom column with the “[Id] & "-" & [Attribute]” formula.
  8. Remove the “Attribute” and “Id” columns.
  9. Split the “Value” column by “,”.
  10. Unpivot the “Value.1”, “Value.2” and “Value.3” columns.
  11. Remove the “Attribute” column.
  12. Replace the “Mr” and “.” values of the “Value” column.
  13. Trim the content of the “Value” column.
  14. Rename the “Value” column in “Speaker” and your final query should look like this:

 

Time Query

This query is a little bit particular because, instead of using an existing data source to create the “Time” entity, we will create it entirely by using “M” code:

  1. Create a new query by clicking on the “New Source” button from the toolbar and selecting “Blank Query”.
  2. Name it “Time”.
  3. Click on the “Advanced Editor” button from the toolbar. This editor gives you access to the “M” code of every query.
  4. Copy and paste the following code in the editor (copied from the Matt Masson’s blog):
    let CreateDateTable = (StartDate as date, EndDate as date, optional Culture as nullable text) as table =>
      let
        DayCount = Duration.Days(Duration.From(EndDate - StartDate)),
        Source = List.Dates(StartDate,DayCount,#duration(1,0,0,0)),
        TableFromList = Table.FromList(Source, Splitter.SplitByNothing()),    
        ChangedType = Table.TransformColumnTypes(TableFromList,{{"Column1", type date}}),
        RenamedColumns = Table.RenameColumns(ChangedType,{{"Column1", "Date"}}),
        InsertYear = Table.AddColumn(RenamedColumns, "Year", each Date.Year([Date])),
        InsertQuarter = Table.AddColumn(InsertYear, "QuarterOfYear", each Date.QuarterOfYear([Date])),
        InsertMonth = Table.AddColumn(InsertQuarter, "MonthOfYear", each Date.Month([Date])),
        InsertDay = Table.AddColumn(InsertMonth, "DayOfMonth", each Date.Day([Date])),
        InsertDayInt = Table.AddColumn(InsertDay, "DateInt", each [Year] * 10000 + [MonthOfYear] * 100 + [DayOfMonth]),
        InsertMonthName = Table.AddColumn(InsertDayInt, "MonthName", each Date.ToText([Date], "MMMM", Culture), type text),
        InsertCalendarMonth = Table.AddColumn(InsertMonthName, "MonthInCalendar", each (try(Text.Range([MonthName],0,3)) otherwise [MonthName]) & " " & Number.ToText([Year])),
        InsertCalendarQtr = Table.AddColumn(InsertCalendarMonth, "QuarterInCalendar", each "Q" & Number.ToText([QuarterOfYear]) & " " & Number.ToText([Year])),
        InsertDayWeek = Table.AddColumn(InsertCalendarQtr, "DayInWeek", each Date.DayOfWeek([Date])),
        InsertDayName = Table.AddColumn(InsertDayWeek, "DayOfWeekName", each Date.ToText([Date], "dddd", Culture), type text),
        InsertWeekEnding = Table.AddColumn(InsertDayName, "WeekEnding", each Date.EndOfWeek([Date]), type date)    
      in
        InsertWeekEnding,
        #"Invoked FunctionCreateDateTable" = CreateDateTable(#date(2010, 1, 1), #date(2015, 12, 31), null)
    in
      #"Invoked FunctionCreateDateTable"
    

 

Data modeling

Now that we have all the entities loaded for the model, we will start the data modeling process. The following steps are usually realized:

  • Create relationships between the model entities
  • Add functionalities to the model (calculations)
  • Clean the data model to make it user friendly

 

Create relationships

The creation of relationships cannot be done graphically in the current version of “Power BI Desktop” (2.26.4128.403) even if the diagram view shows them. This is a difference with the “Power Pivot” for Excel that fully manage the creation of relationships from this view.

In order to create a relationship, we have to click on the “Manage Relationships” from the toolbar.

The “Manage Relationships” dialog pops up.

The “Autodetect…” button can analyze the model and create what it found as valid relationships. However, we prefer to create each relationship manually in order to complete control their configuration.

 

Event – Time Relationship

This first relationship will enable us to explore how to create and configure a relationship between two entities:

Click on the “New…” button to pop up the “Create Relationship” dialog.

In this dialog, you can specify the “source” entity (“Event” in this case) which is related to a “lookup” entity (“Time” in this case) and then the corresponding columns in each of the table. A really interesting thing on this dialog is after selecting both entities, we will see a preview of the data helping us to correctly select the columns involved in the relationship.

Some import things to note:

  • The columns involved in the relationship must share the same data type (date and time in this case).
  • Each record of the “source” entity must have one and only one corresponding record in the “lookup” entity. If it is not the case, the data modeling tool will throw an error. This also means that your data model has some weaknesses that have to be fixed.
  • A relationship can only be defined on a single column in the “source” and “lookup” entities. If the data in your entities requires to create a relationship based on the combination of multiple columns, you have to create a single column that contains unique combination of all the columns (by concatenation for example) and define the relationship on this new column.

In the “Advanced options”, we can configure some elements to better control the behavior of the relationship but we will see this later.

Once the relationship has been created, it can be visualize in the “model viewer”. Click on the third button of the left panel.

The diagram view shows the different entities and their relationships. The view displays different useful information like the cardinality, the direction of the relationship and, when selecting the relationship, the columns involved in it.

 

Others Relationship

To complete the model, we have to create the following relationships:

  • A relationship between the “Track” and “Event” entities through the “EventId” and “Id” columns respectively.
  • A relationship between the “FactlessSpeaker” and “Track” entities through the “TrackId” columns respectively.
  • A relationship between the “FactlessSpeaker” and “Speaker” entities through the “Speaker” columns respectively.
  • A relationship between the “Attendee” and “Event” entities through the “EventDate” and “Date” columns respectively.

The final model should look like this:

 

Test the model

Once all the relationships has been created, we can test that theiy worked as expected. For this, we can go to the “presentation layer” and add some fields to the design surface to check how the model behaves.

  1. From the field’s panel, drag and drop the “Year” field from the “Time” entity to the design surface.
  2. Drag and drop the “Event” field from the “Event” entity to the “Values” zone of the “Filters” panel.

If everything has been defined correctly, we should see the list of events and their related year.

Add the “Track” field from the “Track” entity and you should see the tracks by event and date.

Add the “Speaker” field of the “Speaker” entity and then the “Attendee” field of the “Attendee” entity and you should see that all the information are correctly related to each other.

 

Add calculation to the model

Now that the model is functional, we could want to add some calculations to use the model efficiently.

We don’t have any quantitative information in the model so the only thing we can measure are “counts” (counts of event, attendees and tracks) and information related to them.

It is not mandatory to create measure in the model to support “counts” because it is possible to change the aggregation method of a numeric fields when it placed in the “Values” zone of the presentation layer. It is even automatic when we add a text field to the same zone. However, it is easier for an end user to understand your model and what he can do with it when all the necessary elements are already present.

 

Event Count Measure

In order to create a custom measure in the data model, it is necessary to go to the data modeling designer.

This view gives us a preview of the data available in the selected entity.

This designer is split in different areas:

  1. Tool data modeling toolbar that provides access to the different functionalities.
  2. The formula bar that gives you access to the DAX formula of calculated columns and measures. DAX, for Data Analysis Expression, is the “Power Pivot” language. This is a very powerful language which, when mastered, can do impressive things. More information can be found here.
  3. The list of fields and entities.
  4. The preview pane.

In order to create the “Event Count” measure:

  1. Select the “Event” entity in the fields list.
  2. Click on the “New Measure” button from the toolbar to add a new, empty measure in the “Event” entity.
  3. Introduce the following DAX expression in the formula zone:
    Event Count = COUNTROWS(Event)
  4. Test the “Event Count” measure by adding it to a display object containing the “Year” field from the “Time” entity. If everything is working as expected, you should see the following result: 

 

Track Count, Attendee Count Measures

By executing similar actions, create the following measures:

  • Track Count in the “Track” entity with the following formula:
    Track Count = COUNTROWS(Track)
  • Attendee Count in the “Attendee” entity with the following formula:
    Attendee Count = COUNTROWS(Attendee)

Add the “Track Count” to the display object. You should see something like that:

Then add the “Attendee Count” measure to the display object. You should see something like that:

As you can see, all the measure are correctly filtered by the “Event” and “Time” entities. Now add the “Track” field from the “Track” entity to the display object. As you can see, the “Attendee Count” measure is repeated for every existing “Track”.

This is due to the fact that the “Attendee” entity is not directly related to the “Track” entity. As we can see on the diagram view, the relationship between the “Event” entities, which is the intermediate entity between “Attendee” and “Track”, and “Track” is configure to work in a single direction.

In order to make fix this, we have to edit the advanced properties of this relationship and change it from “Single” to “Both”.

After changing the “cross filter direction”, go back to presentation layer. You should see that the issue is now fixed.

If you add the « Speaker » field from the “Speaker” entity to the display object, you should see the same problematic behavior. After changing the “cross filtering direction” of the relationship between the “Track” and “FactlessSpeaker” and between the “FactlessSpeaker” and “Speaker” entities to “Both”, the issue should be corrected.

 

Share Of Participation Measure

Another interesting measure that could be derived from our information is the “share of participation”. With this measure, we can analyze the participation of speaker to events but also the partition of attendees to event.

In order to create this measure, the following steps have to be done:

  1. Go back to the data modeling designer
  2. Select the “Event” entity and create a new measure with the following DAX expression:

    Share Of Participation = COUNTROWS(Event) / COUNTROWS(ALL(Event))

    This measure computes the count of rows of the “Event” entity (which is automatically filtered by the “context” in place) divided by the total number of rows of the Event entity.

    The “context” in place can be defined by several things:

    • The context of the display object used
    • The filters applied to the different related entities
  3. Change the format of the measure to “Percentage”

    Test the measure by adding the following elements to a new display object:

    • The “Speaker” field from the “Speaker” entity
    • The “Share Of Participation” measure of the “Event” entity
    • The “Event Count” measure of the “Event” entity

    You should see something like that: 

    As you can see, the “Share Of Participation” measure is correctly calculated. In this case, the context applied to the measure is the “Speaker” field.

    If you replace the “Speader” field by the “Attendee” field, we expect to see the same applied to the list of attendees. However, this is not the case because of the direction of the relationship. Change it to “both” to fix this issue.

 

Clean and prepare the data model

The last step of the data modeling process is preparing the data model for the end user usage. For this, we can:

  • Hide technical elements like ids, intermediate calculation defined to ease the creation of complex calculations
  • Hide unused objects like technical entities, fields …
  • Rename objects

In our model, there is a lot of objects to hide. We will start with the “Source_Data” entity:

  1. Go back in the data modeling designer.
  2. Right click on the “Source_Data” entity and select “Hide in Report Viewer” to hide the entity.

Realize the same operation for:

  • FactlessSpeaker
  • The fields
    • “Id” and “Date” from the “Event” entity
    • “EventId” and “TrackId” from the “Track” entity
    • “Event Date” and “Order Date” from the “Attendee” entity

To check the result, go back to the presentation layer of “Power BI Desktop”. You should see your model like this:

 

Conclusion

In this article, we use the data modeling functionalities of “Power BI Desktop” to build and prepare a model usable by an end user. In the next article, we will see how we can see this model to have some insight about the data.

0 commentaires

Laisser un commentaire