What is Data Analysis Expressions (DAX) Tutorial

introduction to powerpivot data analysis expressions data analysis expressions dax in PowerPoint for excel and data analysis expressions (dax) reference
Dr.JeanHiggs Profile Pic
Dr.JeanHiggs,United States,Professional
Published Date:10-08-2017
Your Website URL(Optional)
Comment
Chapter 9. DAX Syntax and Calculations In this chapter, we will cover the following recipes: Understanding DAX formulas Using the AutoSum measure in Visual Studio Creating calculated measures Creating calculated columns Using the IF function Using the AND function Using the SWITCH function Using the CONCATENATE function Using the LEFT function Using the RELATED function Using the RELATEDTABLE function Using EVALUATE in DAX queries Filtering based on a value Filtering a related table Using ALL to remove filters Using ALL to calculate a percentage Using the SUMMARIZE function Adding columns to the SUMMARIZE function Using ROLLUP with the SUMMARIZE function Introduction This chapter will explore how to leverage Data Analysis Expressions (DAX) in Power Pivot, tabular models, and SQL Server Management Studio. DAX is a formula-based language similar to functions in Excel that allows you to create calculations and queries. When designing models, you will leverage these formulas to enhance the model to make it easier for users to leverage. There are two ways to add DAX into your model, either as a calculated column or a calculate measure. When you create a calculated column you apply a function that evaluates each row independently and returns the result. Calculated measures are applied to the table and column by using functions to determine the result based on the context. In addition, you can use DAX to query your model much like using T- SQL to query a relational database. Note There are several categories of DAX functions designed to perform a variety of calculations. These include logical, aggregation, text, mathematical, statistical, date and time, and time intelligence functions. When a DAX formula is calculated, it is evaluated in its context. There are two types of context that apply in DAX: row and filter context. Row context applies to cells in a row, such as creating a calculated column. The DAX expression is calculated on each row separately. The concept of filter context refers to any filtering that has been applied that affects the results returned from the model. In previous recipes, you have created basic DAX calculations to count the number of crashes in the Iowa crash data using the COUNT function. For example, the Count_of_Crashes measure when originally applied gives the total of 559,227. As you apply other filters, the DAX expression is recalculated based on the new filter context. For instance, when you query the data through a tool such as Power View and use different columns, the formula is recalculated based on the new rows. In this example, the formula is filtered by the LIGHT_T table using the same calculation. The evaluation context applies the expression to each row in the LIGHT_T table to calculate the count of crashes by LIGHT_CONDITION. In this chapter, you will explore more capabilities of the DAX language to continue to enhance the Iowa crash data model developed in Chapter 8, Combining Tabular Models with Excel. Understanding DAX formulas There are two basic types of DAX formulas. The most common one that you will use performs a function on the data to return a value. The other returns data as a table most commonly used to create a new dataset or is used as input for another function. To create any DAX formula, you need to understand the basic syntax. This recipe explains how the Count_of_Crashes formula works and creates the formula using Power Pivot in Excel. Getting ready This recipe will use the Chapter 9 Power Pivot.xlsx workbook available from the Packt Publishing website. How to do it... 1. Open the Chapter 9 Power Pivot.xlsx workbook and click on the Power Pivot menu. 2. Then select the Measures tab and New Measure. 3. The Measure window will open. 4. Enter Count_of_Crashes in the Measure Name, and in the Formula area enter =COUNT(CASENUMBER) 5. Finally, change the Category to Number, Format to Whole Number, and check the Use 1000 seperator (,),and hit OK to close the window. How it works... In this recipe, you opened the Excel workbook and switched to Power Pivot. Then you opened the Measures menu, created a new measure, and then updated the formatting. The formatting was changed to always display this measure in whole numbers using the 1000 separator. Since this is done in the model, whenever someone uses this field in Excel, it will be automatically shown using the applied formatting. The basic syntax of DAX is shown in this formula =COUNT(column). Every function begins with an equal sign and the function name followed by the argument to pass to the function. In this recipe, you passed the COUNT function the column of CASENUMBER as the argument. The function then counts the number of rows that have a CASENUMBER and returns the result. There's more... To edit an existing calculation in Power Pivot, go back to the Power Pivot tab and then select the Measures menu. Select Manage Measures and a list of existing measures is shown. Select the measure you want to edit and then select Edit on the top window. To delete a measure, select the measure and then select Delete. Using the AutoSum measure in Visual Studio It is common when building a new model or enhancing an existing model that you will need to apply formatting onto various columns. Updating the formats in the model prevents users from needing to modify the format each time to access the model. For example, to determine the number of records in a table using a record ID column that is numerical would need to be aggregated with a COUNT function and not a SUM function. This behavior when set at the model level affects how everyone using the model sees the data. When designing your model in Visual Studio, there is an option to quickly apply one of six predefined common functions to numerical columns: Sum Average Count DistinctCount Max Min This option is very helpful when you need to add calculations on several columns quickly that are numerical data types. How to do it... 1. Open the Visual Studio solution for your crash data tabular model. 2. Select the CRASH_DATA_T table in the Grid View and then scroll to the right to find the FATALITIES, INJURIES, MAJINJURY, MININJURY, POSSINJURY, and UNKINJURY columns. 3. Hold down Shift and then select the five columns. 4. Each of these columns are whole numbers and are currently being summarized by the default function on the model. To change them to be summarized by the SUM function, select the AutoSum icon and then Sum. 5. Five new measures will be added to the model, one beneath each column chosen in step 3. Each column will be named Sum of with the column name. 6. To update the format to show thousand separators and display as a whole number, while all of the columns are selected, update the Properties window and change the Data Format to Whole Number and then the Show Thousand Separator to True. The numbers are now set to use this format in all client tools. 7. To change the name of the new measures, change the text on the left side of := in the formula bar. For example, change Sum of FATALITIES:=SUM(FATALITIES) to Total Fatalities:=SUM(FATALITIES). The measure will now show the new name and totals. How it works... In this recipe, you created five new measures in your model. You selected the CRASH_DATA_T table and then five columns that are numerical. Next, you used the AutoSum feature to quickly create five new summary measures. Finally, you renamed a column to make the name more meaningful for your users. Creating calculated measures Calculated measures are formulas that do more than simple aggregations of values. These formulas add additional information to the tabular model by creating business calculations. In addition, calculated measures are calculated based on the filter context applied to the data. For example, using a row or column filter in Excel PivotTables or Power View. Depending upon the filters selected, the DAX expression is calculated using the information of the filter in real time. In this recipe, you will use a measure that has the total number of fatal crashes. You will then create a new measure that calculates the total number of crashes minus the number of fatalities. Getting ready Follow the steps in the Using Autosum measures in Visual Studio recipe to create the Total Fatalities measure. How to do it... 1. On the CRASH_DATA_T table in the Grid view, select an empty cell under the Count_of_Crashes measure. 2. Enter the calculation in the formula bar: Total_NonFatal_Crashes:=(Count_of_Crashes - Total Fatalities) 3. Once you have entered the calculation, hit Enter. 4. To correct the formatting to make it better for users, select the Total_NonFatal_Crashes in the Measures window and change the following fields in the Properties window. Change the Format to Whole Number and the Show Thousand Separator to True. How it works... In this recipe, you created a calculated measure that uses two other measures. To determine the number of non-fatal crashes in the data, you subtracted the number of fatalities from the total number of records. Your users now have the option of looking at the total number of crashes, total fatalities, and total crashes less fatalities. In the Properties window, you updated the formatting to only show whole numbers and include the thousand separator, making it easier for your client tools and users to better use the data. Creating calculated columns When creating DAX formulas, there are two ways to apply them to the model. The first is to create a calculated column. When you add a calculated column to the model, it applies the function on a row-by-row basis. For example, if you want to parse the datetime format of a table to only show the current year, adding a new calculated column would evaluate the formula on the date column and add it to a new column on the table evaluated once for each row in the table. When the data is refreshed, the formula is evaluated on the table and no user interaction is required for the formula to be applied to its context. How to do it... 1. Open Visual Studio and the tabular model project. 2. On the CRASH_DATA_T table, review the CRASH_DATE column. It is a Date column that includes a timestamp. 3. Scroll to the end of the CRASH_DATA_T table and enter the DAX expression to parse the year from the Crash_date column: =YEAR(CRASH_DATA_TCRASH_DATE) 4. The tabular engine will now immediately parse the expression and add it to the column. Change the name by updating the Column Name in the Properties window to Crash_Year. How it works... In this recipe, you added a new calculated column. The DAX expression uses the YEAR function and applies it to the Crash_Date column, the required argument. This parses the column to return only the four-digit year to the column. You then updated the column name to Crash_Year to make it easier for reports to leverage. There's more... As an additional step, you can concatenate text with the YEAR function to add context to the values in the table. You can leverage the & symbol to join the data together. To add a label before the Crash_Year, you modify the formula to include the following: ="Year of Crash: " & YEAR(CRASH_DATA_TCRASH_DATE). Once completed, your column will be updated to include the text with the date. The addition of more detailed information to the columns can make the reports easier to leverage. The more descriptive a field will help enable a self-service BI environment. When building a report in Power View, the data is clearly displayed in the Crash_Year column with the content. Using the IF function DAX includes several functions that are classified as logical functions. These functions let you apply conditions to your calculations and measures when required. Some of the more common functions that you will use include IF, AND, and SWITCH. Recipes for each of these functions will be provided in this section. The IF function performs a logical test to return either true or false when the condition is met. In this recipe, you will add a formula that creates a label on each row. This label will let your users know which rows had fatalities or were non-fatal. The IF function has a required syntax of IF(logical_test,value_if_true,value_if_false). Getting ready All of these recipes will use the Chapter_9_DAX tabular model to add calculations. The sample model is available to download. How to do it... 1. Open the Chapter_9_DAX solution, and select the CRASH_DATA_T table and make sure you are in the data Grid view. 2. Scroll to the right until you find Add Column. Then in the expression box, add the formula to determine fatality type and press Enter to create the calculation. You will then see a label added to each row: =IF( FATALITIES=1, "Was Fatal", "Non Fatal") 3. On the Properties window, change the Column Name from Add Column to Fatality Flag. How it works... In this recipe, the IF function is checking for the condition on each row of the number of fatalities being greater than or equal to 1. On each row that matches this condition, the label of Was Fatal is added to the row. On all other rows, the label of Non Fatal is added. Using the AND function The AND function is similar to the IF function. When you use this function, it is checking two arguments at the same time to determine if the condition is true or false. When both arguments are true, then the function returns true. In this recipe, you will add a function on a new column to determine if the record is a single or multiple vehicle fatality. The AND function has a required syntax of AND(logical1, logical2). How to do it... 1. Open the Chapter_9_DAX solution, select the CRASH_DATA_T table, and make sure you are in the data Grid view. 2. Scroll to the right until you find the Add Column. Then in the expression box, add the formula to determine the number of vehicles and number of fatalities involved and press Enter to create the calculation. You will then see a label added to each row: =IF( And ( FATALITIES=1, VEHICLES=1), "Single Vehicle Fatality", "Multiple Vehicle Fatality" ) 3. On the Properties window, change the Column Name from Add Column to Fatality Group. Using the SWITCH function The SWITCH function is very useful when you need to evaluate an expression and return a result from a list of possible values. In this recipe, you will create a column that can be used to determine if the road was paved or unpaved. This will allow your users to filter the results easily by choosing a label versus a value. The SWITCH function has a required syntax of SWITCH(expression,value,result, value,result). How to do it... 1. Open the Chapter_9_DAX solution, select the CRASH_DATA_T table, and make sure you are in the data Grid view. 2. Scroll to the right until you find the Add Column. Then in the expression box, add the formula to evaluate each value and return the corresponding label. Then press Enter to create the calculation. You will then see a label added to each row: =SWITCH(PAVED, 1, "Paved", 2, "Unpaved", 99, "Unknown") 3. On the Properties window, change the Column Name from Add Column to Fatality Group. How it works... In this recipe, you added a formula that creates a label on each row. The SWITCH function you entered has three available values to evaluate in the formula. It works by looking for one of the values (1, 2, 99) and then returning the label that corresponds to each value (Paved, Unpaved, Unknown). The label is then added to the column and can be used in other measures or as a filter. There's more... By using the SWITCH function in this recipe, you created a column with unique labels. This could also be accomplished by creating a table that has the same values, and then adding this table to the model as a lookup table with a defined relationship. However, in cases where there are not a lot of values, using the SWITCH function lets you quickly add a value to the model without needing to link additional tables. Using the CONCATENATE function DAX includes several functions that are classified as text functions. These functions let you apply and manipulate strings in a variety of ways. Some of the more common functions that you will use include CONCATENATE and LEFT. Recipes for these functions will be provided in this section. The CONCATENATE function is very useful when you need to join two strings together into a single string. You can join either two columns together or you can join columns to text strings. When using a text string, the value must be enclosed in quotes. In this recipe, you will create a column to join two columns together. This will allow your users to filter the results easily by choosing a label versus a value. The CONCATENATE function has a required syntax of CONCATENATE(text1,text2). How to do it... 1. Open the Chapter_9_DAX solution, select the CRASH_DATA_T table, and make sure you are in the data Grid view. 2. Scroll to the right until you find the Add Column. Then in the expression box, add the formula to join the text Total Property Damage to the value in the PROPDMG column. Then press Enter to create the calculation. You will then see a label added to each row: =CONCATENATE("Total Property Damage " ,PROPDMG) 3. On the Properties window, change the Column Name from Add Column to Property Damage. How it works... This recipe uses the CONCATENATE function to create a new column. This column is created by passing in two arguments, the text "Total Property Damage " with the value stored in the PROPDMG. There's more... You can also use CONCATENATE in a calculated measure. This can be helpful if you want to add more information to the model or have variations on measures that you want users to understand. To create a calculated measure that shows all fatalities along with a label, in the measures area, add the following formula: Fatalities_Label:=CONCATENATE("Total Fatalities= ", CRASH_DATA_TNof_Fatalities) Using the LEFT Function The LEFT function is very useful when you need to parse a string to get a subset of the data. This is often used to make the data more meaningful for your users. In this recipe, you will create a column on the Manner_of_Crash table to return the first nine letters of each description. Then you will make the LEFT function use a dynamic argument to determine the number of characters to find a comma. On this table, you can now create a hierarchy that would group the two sideswipe rows into a single group. The LEFT function has a required syntax of LEFT(text,num_chars). How to do it... 1. Open the Chapter_9_DAX solution, select the CRASH_DATA_T table, and make sure you are in the data Grid view. 2. Scroll to the right until you find the Add Column. Then in the expression box, add the formula to return the first nine characters of the field. Then press Enter to create the calculation. You will then see a label added to each row: =LEFT(Manner_of_Crash). 3. On the Properties window, change the Column Name from Add Column to Manner_Group. 4. Now you only see the first nine characters. The term sideswipe is now consistent on rows 6 and 7; however, rows 1 and 4 are now only showing partial data. To fix the strings to show the full text or the word before the comma, you need to add the FIND function to locate the comma: =LEFT( Manner_of_Crash, IFERROR(FIND(",",Manner_of_Crash,1,20)-1,0) ) 5. Once you have done this, press Enter. How it works... This recipe first creates a new column that parses the string and returns the first nine characters of the Manner_of_Crash. Once completed, you were able to see that the new column is not parsing all fields to return the full description in cases that have a comma. To correct this, you leveraged the FIND function to locate the position of the comma. The code in part 4 works by making the length of each row dynamic by determining the number of characters to the comma and then subtracting one place. If it does not find a comma, it uses the default length of 20 to pass as the parameter to the LEFT function. There's more... If you need to get the data from the end of the column, there is also a RIGHT function that returns the number of characters from the end of a column. The RIGHT function has a required syntax of RIGHT(text,num_chars).