Writing for BIDN

Looking to give back to the community or learn through teaching others? Anyone can post blogs by clicking Add Blog Post after contacting us for permission.

«August 2015»
MonTueWedThuFriSatSun
272829

Power BI and Big Data

If you’re worked in the wide and diverse field of information technology for almost any amount of time, it probably hasn’t taken you long to discover that the one thing constant about IT is that the technologies and strategies involved change faster than you can learn them. And if you work in business intelligence like I do, you don’t have to look very far at all to see change. The Microsoft Power BI team rolls out a software update every month! If I want to stay learned up on the technology, I have to really be on top of things.

About ten years ago when Hadoop was first being developed at Yahoo, I don’t think anyone could have anticipated the size of the ripples (more likes cannonball sized splashes) being able to access Big Data could and would have on the IT industry. Hadoop (and other advances in hardware and software technologies) gave us something we never had before: The ability to access and report on data in real time on a scale never previously imagined gives an organization to identify and understand trends and patterns in the data and gain previously unknown insights. The organizations that are able to leverage big data will be the organizations that leave their competition in the dust.

Set Up and Configure the Hortonworks Sandbox in Azure

Not only does Power BI Desktop give us the ability to connect to Hadoop Distributed File System (HDFS) for reporting we can also mash it up with other more traditional and structured data sources with minimal effort required. But that’s not what this blog post is all about. This post is about setting up a virtual machine in Azure running Hadoop and connecting to our Hortonworks Sandbox with Power BI Desktop :).

The first thing you do if you don’t have access to a Hadoop cluster is to set up the Hortonworks Sandbox on Azure. The good news is its free (for the duration of the trial) and its super easy. Just follow the instructions at this link to set up the Hortonworks Sandbox.

Hadoop in Azure

Once that’s set up, you’ll need to add mapping for the IP address and host name to your hosts file. Devin Knight has a blog on this that you’ll find helpful.

Connecting to Hadoop with Power BI Desktop

Once your Hortonworks Sandbox is set up, you’re ready to set up your connection to Hadoop with Power BI Query. Start up the Power BI Desktop and click Get Data. Scroll down and select Hadoop File (HDFS) and click Connect.

Get Data with Power BI

From there you can follow the rest of the wizard to load the data into the semantic model.

Load Data with Power BI

Once the data is loaded, you’ll need to modify the query to navigate to the data you wish to use in your model.

In Power BI Desktop, go to the Home ribbon and click Edit Queries.

Read more

30

Three Best Practices for Power BI

Since the release of Power BI Desktop this past week, I’ve been really spending my extra time digging into the application focusing on learning and experimenting as much as I can. When my wife has been watching Law and Order: SVU reruns at night after the rug rats are in bed, I’ve been right there next to her designing Power BI dashboards like the total data nerd that I am. When my kids have been taking their naps during the weekend, I’ve been writing calculations in the model for my test dashboards. Or when I’ve been riding in the car back and forth to work I’ve been thinking of new things to do with Power BI Desktop.

Since I’ve been spending a decent amount of time with Power BI Desktop, I thought I’d take a moment to share three things to know and remember when designing your Power BI models and dashboards that I think will help you make the most of this tool and be effective at providing the data your business needs to succeed.

1. Optimize your Power BI Semantic Model

It probably hasn’t taken you long to figure this one out if you’ve built Power Pivot/Tabular models or at least it won’t when you do start developing Power BI dashboards. The visualizations in Power BI and Power View are heavily meta-data driven which means that column names, table or query names, formatting and more are surfaced to the user in the dashboard. So if you using a really whacky naming convention in your data warehouse for your tables like “dim_Product_scd2_v2” and the column names aren’t much better, these naming conventions are going to be shown to the users in the report visualizations and field list.

For example, take a look at the following report.

Power BI Dashboard without formatting

Notice anything wonky about it? Check the field names, report titles and number formatting. Not very pretty, is it? Now take a look at this report.

Power BI Dashboard with formatting

See the difference a little cleaned up metadata makes? All I did was spend a few minutes giving the fields user-friendly name and formatting the data types. This obviously makes a huge difference in the way the dashboard appears to the users. By the way, I should get into the movie production business. ;)

My point is that the names of columns, formatting, data types, data categories and relationships are all super important to creating clean, meaningful and user friendly dashboards. The importance of a well-defined semantic model cannot be understated in my opinion. A good rule of thumb is to spend 80% to 90% of your time on the data model (besides, designing the reports is the easy part).

I’d also like the mention the importance of the relationships between the objects in the semantic model. Chance are you will have a small group of power users that will want to design their own dashboards to meet their job’s requirements and that’s one of the beauties of Power BI. But when users began developing reports, they may query your model in unexpected ways that will generate unexpected behaviors and results. I only want to mention this because the relationships between the objects in the model will impact the results your users will see in their reports. Double check your relationships and ensure that they are correct, especially after you add new objects to the model since the

Read more
31

Power BI Fantasy Football Player Stats Dashboards for Download

Every year at Pragmatic Works some coworkers, including consultants, marketing staff, support team members, software development staff and project management, partake in a company fantasy football league. And with the recent release of the new Power BI Desktop, I thought what better way is there to prepare to completely annihilate my coworkers and friends in an imaginary nonsensical game than by creating some nifty Power BI dashboards based on last years player stats as recorded by Yahoo! Sports. So I thought I’d walk you through some of the steps I followed to leverage the Yahoo! Sports NFL player stats page as a data source and some of the query transformations I applied to prepare the data for reporting.

Power BI dashboard with Power BI Desktop

Click here to download my Fantasy Football Dashboards Power BI .pbix file.

If you’re completed new to Power BI Desktop I highly suggest you watch my video walkthrough of Power BI Desktop or read my blog post which walks you through each step of creating your first Power BI dashboards with Power BI Desktop. Last Friday, I also blogged about my three best practices for designing a killer Power BI solution, so take a look at that.

To create these dashboards, I simply navigated to the Yahoo! Sports NFL stats page and found the page for each position I’m interested in for this fantasy football season. I copied the URL to my clipboard. In Power BI Desktop, click Get Data and then use the Web data source option. Then all you have to do is copy and paste the URL into the text box and click OK.

Get data from web with Power BI Desktop

Then select the HTML table that contains your data and click Edit. We need to edit our query because there are some issues with the data. By clicking Edit, we can apply transformations to our query which will allow us to do things like rename columns, remove unwanted columns, modify data types, create custom columns and much more.

Get data from web with Power BI Desktop

One thing you’ll notice in the above screen grab is that the column names are in the first row, so we need to fix that.

On the Home ribbon of the Query Editor, just click the Use First Row As Headers button. Pre

Read more
12
3456789
10111213141516
17181920212223
24252627282930
31123456

Data Mining Add-ins - Analyze Key Influencers Tool

  • 2 May 2012
  • Author: Mike Milligan
  • Number of views: 5983
  • 0 Comments

The Analyze Key Influencers tool is used to show how column values in a data set might determine the values of a specified target column.  The process creates a temporary mining model in Microsoft SQL Server Analysis Services using the Naïve Bayes algorithm.  It then produces a Main Influencers report which represents the key influencers for a distinct value of the target column.  You have the option of creating one or many additional Discrimination Reports that compares the influencers for any two distinct values of the target column.  The Discrimination Reports are only useful if your target column contains more than two distinct states.

 

The Naïve Bayes algorithm is a simple probabilistic classifier based on applying Bayes' theorem with strong independence assumptions.  The naïve part of the name comes from the fact that it assumes that all attributes are unrelated to each other and that the combination of attributes independently contribute to the probabilities that it predicts.  For example, a fruit may be considered an orange if it is round, has the color orange, has seeds, grows on a tree, etc.  Even if any of these features depend on the existence of other features, a Naïve Bayes classifier considers these properties to independently contribute to the probability that the fruit is an orange.  One advantage of this algorithm is that it only requires a small set of data to estimate the means and variances of the variables required for classification. 

 

This blog post will work through two examples using the sample data provided with the Microsoft SQL Server 2012 Data Mining Add-ins and another example using data from the Contoso sample database.

 

Example One

Which properties of a customer in the sample data help to predict a customer's level of education?

  • Open the DMAddins_SampleData.xlsx file. 
  • Select the Table Analysis Tools sample sheet, highlight a cell within the table so the ribbon at the top displays the Table Tools, Analyze ribbon, and click the Analyze Key Influencers button. 
  • Select the column Education to analyze for key factors and click the link that says 'Choose columns to be used for analysis.' 
  • Uncheck the ID column.  This is just a sequential number that has nothing to do with anything other than the order the row was inserted into the table.  We also want to uncheck any other columns that have nothing to do with the customer's education level to streamline our analysis and improve our accuracy.  Let's also uncheck the purchased bike column.  Click Ok, and then Run.
  • Once it finishes thinking, move the Discrimination based on key influencers dialog out of the way for a moment.

image

The Key Influencers Report for Education shows which columns and which values of those columns have a significant impact over the value of the Education column.  According to this report, people between the age of 37 and 46 who work in Management are very likely to have their Bachelors degree.  Persons with only one car and work in a clerical profession are very likely to have only attended some College.  People with two cars that work in a manual occupation and earn less than about 39K per year are likely to have only attended high school.  Similar characteristics apply for those that only received a partial high school education.  Persons that do not own an automobile are very likely to have completed a graduate degree. 

 

Now, back to the Discrimination report dialog that we moved out of the way.  Let's run a discrimination report that compares those with graduate degrees with those who only attended some of High School.

 

image

 

 

We can add as many discrimination reports as we want. 

 

image

 

The Table Analysis Tools Sample worksheet only contains 1000 rows.  When we go through the exact same steps on the Source Data sheet which has 10,000 rows, we get remarkably similar results.

 

image

 Example Two

Next, I'll run the tool to see what factors most strongly influence whether or not the customer is likely to purchase a bike.

 

  • Give the Source Data worksheet focus.  Click the Analyze Key Influencers button.
  • Select BikeBuyer as the column to analyze.  Uncheck ID from the columns to analyze and run the analysis.
  • Go ahead and run a Discrimination report against the Yes/No values.  This will demonstrate that this report is useless for target columns with only two values.

image

 

The Key Influencers Report for BikeBuyer shows us that strongest predictors of whether or not the customer is likely to purchase a bike are when the customer doesn't own any cars, and that they are between the ages of 36 and 46.  The strongest predictors that they will not buy a bike are when they own two cars and are over or equal to the age of 64.

 

The discrimination report shows us essentially the same thing.

 

image

 

Example Three

For the next example, I have imported the V_Customer view from the Contoso Retail demo database which you can download from Microsoft. 

 

If you import the data using the Data ribbon, From Other data sources button it will automatically format it as a table which is required.  If you import your data from a CSV or copy and paste it into a spreadsheet it may not be formatted as a table. 

 

  • Once the data is Excel, formatted as a table, click the Analyze Key Influencers button and select HomeOwnerFlag as the column to analyze. 
  • Click the Choose columns to be used for analysis link and uncheck CustomerKey and Consumption and Run the analysis.

 

image

 

Here we see that MaritalStatus has the most impact on influencing the value of HouseOwnerFlag.  We also see that not having any children is a strong indicator for not owning a home.

 

I hope this explains how to use the Analyze Key Influencers tool sufficiently.  If you have any questions, please use the comments section below. 

 

Here are some additional links:

Analyze Key Influencers Video Tutorial

Microsoft BI - Data Mining - Analyze Key Influencers

Print
Categories: Blogs
Tags:
Rate this article:
No rating

Please login or register to post comments.