Business Intelligence Blogs

View blogs by industry experts on topics such as SSAS, SSIS, SSRS, Power BI, Performance Tuning, Azure, Big Data and much more! You can also sign up to post your own business intelligence blog.

«August 2015»

Power BI and Big Data

If you’re worked in the wide and diverse field of information technology for almost any amount of time, it probably hasn’t taken you long to discover that the one thing constant about IT is that the technologies and strategies involved change faster than you can learn them. And if you work in business intelligence like I do, you don’t have to look very far at all to see change. The Microsoft Power BI team rolls out a software update every month! If I want to stay learned up on the technology, I have to really be on top of things.

About ten years ago when Hadoop was first being developed at Yahoo, I don’t think anyone could have anticipated the size of the ripples (more likes cannonball sized splashes) being able to access Big Data could and would have on the IT industry. Hadoop (and other advances in hardware and software technologies) gave us something we never had before: The ability to access and report on data in real time on a scale never previously imagined gives an organization to identify and understand trends and patterns in the data and gain previously unknown insights. The organizations that are able to leverage big data will be the organizations that leave their competition in the dust.

Set Up and Configure the Hortonworks Sandbox in Azure

Not only does Power BI Desktop give us the ability to connect to Hadoop Distributed File System (HDFS) for reporting we can also mash it up with other more traditional and structured data sources with minimal effort required. But that’s not what this blog post is all about. This post is about setting up a virtual machine in Azure running Hadoop and connecting to our Hortonworks Sandbox with Power BI Desktop :).

The first thing you do if you don’t have access to a Hadoop cluster is to set up the Hortonworks Sandbox on Azure. The good news is its free (for the duration of the trial) and its super easy. Just follow the instructions at this link to set up the Hortonworks Sandbox.

Hadoop in Azure

Once that’s set up, you’ll need to add mapping for the IP address and host name to your hosts file. Devin Knight has a blog on this that you’ll find helpful.

Connecting to Hadoop with Power BI Desktop

Once your Hortonworks Sandbox is set up, you’re ready to set up your connection to Hadoop with Power BI Query. Start up the Power BI Desktop and click Get Data. Scroll down and select Hadoop File (HDFS) and click Connect.

Get Data with Power BI

From there you can follow the rest of the wizard to load the data into the semantic model.

Load Data with Power BI

Once the data is loaded, you’ll need to modify the query to navigate to the data you wish to use in your model.

In Power BI Desktop, go to the Home ribbon and click Edit Queries.

Read more


Three Best Practices for Power BI

Since the release of Power BI Desktop this past week, I’ve been really spending my extra time digging into the application focusing on learning and experimenting as much as I can. When my wife has been watching Law and Order: SVU reruns at night after the rug rats are in bed, I’ve been right there next to her designing Power BI dashboards like the total data nerd that I am. When my kids have been taking their naps during the weekend, I’ve been writing calculations in the model for my test dashboards. Or when I’ve been riding in the car back and forth to work I’ve been thinking of new things to do with Power BI Desktop.

Since I’ve been spending a decent amount of time with Power BI Desktop, I thought I’d take a moment to share three things to know and remember when designing your Power BI models and dashboards that I think will help you make the most of this tool and be effective at providing the data your business needs to succeed.

1. Optimize your Power BI Semantic Model

It probably hasn’t taken you long to figure this one out if you’ve built Power Pivot/Tabular models or at least it won’t when you do start developing Power BI dashboards. The visualizations in Power BI and Power View are heavily meta-data driven which means that column names, table or query names, formatting and more are surfaced to the user in the dashboard. So if you using a really whacky naming convention in your data warehouse for your tables like “dim_Product_scd2_v2” and the column names aren’t much better, these naming conventions are going to be shown to the users in the report visualizations and field list.

For example, take a look at the following report.

Power BI Dashboard without formatting

Notice anything wonky about it? Check the field names, report titles and number formatting. Not very pretty, is it? Now take a look at this report.

Power BI Dashboard with formatting

See the difference a little cleaned up metadata makes? All I did was spend a few minutes giving the fields user-friendly name and formatting the data types. This obviously makes a huge difference in the way the dashboard appears to the users. By the way, I should get into the movie production business. ;)

My point is that the names of columns, formatting, data types, data categories and relationships are all super important to creating clean, meaningful and user friendly dashboards. The importance of a well-defined semantic model cannot be understated in my opinion. A good rule of thumb is to spend 80% to 90% of your time on the data model (besides, designing the reports is the easy part).

I’d also like the mention the importance of the relationships between the objects in the semantic model. Chance are you will have a small group of power users that will want to design their own dashboards to meet their job’s requirements and that’s one of the beauties of Power BI. But when users began developing reports, they may query your model in unexpected ways that will generate unexpected behaviors and results. I only want to mention this because the relationships between the objects in the model will impact the results your users will see in their reports. Double check your relationships and ensure that they are correct, especially after you add new objects to the model since the

Read more

Power BI Fantasy Football Player Stats Dashboards for Download

Every year at Pragmatic Works some coworkers, including consultants, marketing staff, support team members, software development staff and project management, partake in a company fantasy football league. And with the recent release of the new Power BI Desktop, I thought what better way is there to prepare to completely annihilate my coworkers and friends in an imaginary nonsensical game than by creating some nifty Power BI dashboards based on last years player stats as recorded by Yahoo! Sports. So I thought I’d walk you through some of the steps I followed to leverage the Yahoo! Sports NFL player stats page as a data source and some of the query transformations I applied to prepare the data for reporting.

Power BI dashboard with Power BI Desktop

Click here to download my Fantasy Football Dashboards Power BI .pbix file.

If you’re completed new to Power BI Desktop I highly suggest you watch my video walkthrough of Power BI Desktop or read my blog post which walks you through each step of creating your first Power BI dashboards with Power BI Desktop. Last Friday, I also blogged about my three best practices for designing a killer Power BI solution, so take a look at that.

To create these dashboards, I simply navigated to the Yahoo! Sports NFL stats page and found the page for each position I’m interested in for this fantasy football season. I copied the URL to my clipboard. In Power BI Desktop, click Get Data and then use the Web data source option. Then all you have to do is copy and paste the URL into the text box and click OK.

Get data from web with Power BI Desktop

Then select the HTML table that contains your data and click Edit. We need to edit our query because there are some issues with the data. By clicking Edit, we can apply transformations to our query which will allow us to do things like rename columns, remove unwanted columns, modify data types, create custom columns and much more.

Get data from web with Power BI Desktop

One thing you’ll notice in the above screen grab is that the column names are in the first row, so we need to fix that.

On the Home ribbon of the Query Editor, just click the Use First Row As Headers button. Pre

Read more

Power BI Tip: Use the Treemap Chart as a Colorful Slicer

Power BI Desktop has been out for GA for over a week now and some of the pro’s out there have come up with some pretty cool tricks. For instance:

But if you’re looking for a way to spice up you report filtering with a little color, try using the Treemap chart as a Slicer for those fields that only contain a few unique values. At this point with Power BI, you don’t have any customization options for the Slicer visualization (although I’m sure that is coming down the pipe in a future release). This option won’t work terribly well if the field you would like to use as a slicer has more than a dozen or so unique members, but you can experiment with it and see what you can come up with. Here’s my Treemap Slicer in action.

tree map slice in action

To multi-select tiles in the Treemap slicer, just hold Cntrl as you click. To reselect

This little trick relies on the natural cross filtering between data regions in the Power BI dashboards. First I created a measure that calculates the distinct count of the field that I wish to use as my slicer. In this case the field is Genre.

Power BI Distinct Count DAX calculation

Then I added a Treemap chart to the report using the field Genre as the Group value and the measure Distinct Count Genre as the Values.


Then just resize the Treemap visualization so that the squares are about evenly sized. There’s a few ways you can arrange it, but just play around with it and see what you can come up with.

Power BI Dashboard with Treemap Slicer Power BI Dashboard with Treemap Slicer


What do you think? Leave me a comment below and let me know. Or if you’ve got a neat Power BI trick you’d like to share, let me know, as well. I love to hear new ideas! Thanks for reading!

Read more

Importing Excel Power View Dashboards into Power BI

If your organization is now a Power BI customer, congratulations. You’re now ready to create some very cool dashboards, integrate disparate and disconnected data sources and take advantage of Power BI’s ability to modify and transform your data, build interactive and dynamic dashboards and then share them with your team and organization. But until you create your dashboards to take advantage of the new visualization types and other improvements, you can easily import any existing Power View sheets in Excel into your Power BI site.

Power View dashboard in Excel ready for Power BI goodness

Above you’ll see an example of a Power View dashboard that I will import into my Power BI site.

Importing Power View into Power BI

To import Power View sheets into Power BI, navigate to your team’s Power BI site and click the Get Data button at the bottom left.


Then select where your data exists. In my case, I have my Power View reports in Excel saved in my One Drive folder so I’ll select Files.



You’ll have to log into your One Drive account. Once you’ve done that, just navigate to where your Excel file is and then click Connect.

After the import is complete, you’ll see your new Report and Dataset on the left in your My Workspace explorer. Open the new Report based on your Power View sheet. Now I can browse and interact with my Power View dashboard within my Power BI site.


I can also pin visualization from my Power View report to a Power BI dashboard with other visualizations.


And now that we have our Power View report imported and pinned to a dashboard can use Q&A to quickly generate visualizations using the imported data set.


Pretty cool stuff! It’s nice to know that I can easily jump from my legacy Power View dashboards to the new Power BI.

It’s important to not that in order to import Excel workbooks into your Pow

Read more

#PowerBI and #SSAS Tabular: A Natural Fit with the Power BI SSAS Connector

SSAS Tabular and Power BI In late June last month, the Microsoft Power BI team released the Microsoft Power BI Analysis Services Connector. The Power BI SSAS Connector allows your deployed Power BI reports to utilize your on-prem SSAS data sources. It’s super easy to set up and can be downloaded for free! And I who doesn’t love “free”?

Download the MS Power BI SSAS Connector here

Why Use the Power BI SSAS Connector?

Power BI and SSASOne of the advantages to using Power BI is that the tool has the ability to connect to an incredibly wide variety of data sources including SQL Server Analysis Services instances. With that in mind, what’s the purpose of using the Power BI SSAS Connector? Why not just load our SSAS data into our Power BI semantic model like we do with our Access, Excel, CSV and web data and then schedule the Power BI semantic model to refresh? And that’s a good question.

Power BI: Live & Prime Time with SSAS Tabular

First of all by utilizing the Power BI SSAS connector, we are granted a live connection to our SSAS instance. What this means is that every time a user interacts with a filter, slicer, chart or other data visualization, Power BI quickly generates a DAX query behind the scenes which is sent to your on-prem SSAS Tabular model. Now currently Power BI users are restricted to how often data sources are refreshed. If you’re a free Power BI user you’re limited to one data refresh per day and if you’re a Power BI Pro user you’re limited to Hourly data refreshes. By leveraging the live to connection to your SSAS Tabular instance, you can update the data in your Tabular model as often as you are able.

Because the Power BI SSAS Connector allows you to have a live connection to your SSAS Tabular model, this also means that your users experience less latency between updates to their data. Without the live connection to SSAS, each day the users would have to wait for the SSAS Tabular model to be processed and then for the Power BI semantic model to be refreshed. With the live connection, as soon as the Tabular model has finished processing, the Power BI users have access to the most current data instantly. Data is available to your users with potentially much less time between data refreshes.

Currently, the Power BI SSAS Connector only supports live connections to SSAS Tabular instances, although I would expect a future update to support live connections to SSAS multidimensional cubes.

Enterprise Data in Power BI

SSAS Tabular model partitions Currently Power BI semantic models are restricted by a data capacity limit. If you’re a free Power BI user you’re limited to 1 GB per/user and if you’re a Power BI Pro user you’re limited 10 GB/user. This can be a

Read more

Twitter Analysis with #PowerBI & Plus One

Earlier this week Christopher Finlan put together this awesome Datazen dashboard using Plus One. Christopher has been doing a lot of cool things with Datazen so I recommend that you do like I did and subscribe to his blog. But Christopher’s cool work with Plus One inspired me to create my own Social Media dashboard using Plus One, as well.

powerbi search completePlus One has created this nifty little desktop application that you can download and install on your computer. Once you’ve set the app up, all you need to do is enter a search query. In my case, I wanted to see what people were doing and saying with Power BI on Twitter. Plus One can only recover the previous seven days of data, so you’ll need to periodically refresh your search or schedule the search, which you can do easily with the Plus One application.

access db datasourceAfter running the search, the results of your query are saved in an Access database on your machine in the folder C:\Users\username\Documents\Plus One Social. And then all you have to do is use Power BI Desktop to suck the data into your Power BI semantic model and then start building some awesome dashboards.

Here’s the report and dashboard I created in Power BI. I haven’t scheduled the Plus One application to refresh the data so I don’t have any trend reports yet, but I did create some snapshot visualizations to gain insights into who is talking about Power BI and in what context by analyzing the accompanying hash tags. Pretty cool stuff!





Here’s the resources:


Leave your feedback down below! I’d love to see what kind of dashboards you can come up with using Plus One so feel free to leave a comment and a link to your blog, as well!

Read more

Cleaning Your #PowerBI Power Query Code

image Over the weekend I found this nifty tool called Power Query Management Studio. Someone shared it on Twitter and you’ve probably seen the link to download the tool on technet. Basically this tool is a fancy Excel workbook that allows you to easily clean up your Power Query code and insert it back into your Excel workbook or Power BI semantic model. It’s pretty nifty and easy to use so I figured I’d give you a quick run down on using it to clean up my Power Query code in my Fantasy Football & NFL stats Power BI model, which you can download here.

To begin using the Power Query Management Studio, download it here.

I want to use this tool to clean up my Power Query code in my Power BI model, so the first thing I’ll do is open my Power BI model in Power BI Desktop. Next, we need to capture my Power Query queries so to do this I’ll click the smiley face icon at the very top of Power BI Desktop and click Send Frown.


A little Send Feedback dialogue box will pop up. Uncheck the Include Screenshot (since we really don’t care about that) and leave the Include Formulas box checked. This will allow us to see the Power Query queries. Click OK.


When you click OK, this will open up an email for you that will include your Power Query queries in the body of the email. The code is broken up by queries seperated by semicolons so you can easily see each query.


I copied everything below the line “section Section1;”. Once you’ve copied that code to your clipboard, open the Power Query Management Studio Excel workbook. Clear the sheet called CodePaste (but don’t delete the table) and paste your Power Query queries into the table like so. Then click the Refresh All button up top in the Data ribbon of Excel.


After a few moments, the Excel workbook will have completed its magic. There’s a few sheets in the workbook I’ll point out that you’ll find useful.


CommentTransfer: T

Read more

Here’s the New #Excel 2016 Chart Types!

The Office 2016 Public Preview is now available for download! Included in the preview of Excel 2016 are a handful of new chart types and since I’m a huge fan of awesome data visualizations, I thought I’d take a few moments to play around with them and share my experience with you so you can have a better idea of what to expect in the next version of Excel. But to be honest, if you’re a data & visualizations nerd like me, you’re probably pretty excited!

imageNow one thing to be aware of with these new chart types is that if you attempt to create this chart on top of some data in a pivot table, you’ll get a warning like the one seen here. In order to use these charts, you’ll need to create them on top of data that is not in a pivot table (at least for the time being).

Box and Whisker Chart

The Box & Whisker chart is a really nice visualization for getting a quick look at the distribution of data including outliers, mean, range and quartiles, for example. In the below chart, I pulled in some data from with Power Query and performed some analysis on the yards per game for the top four running backs from last season.


You also have control over the chart formatting through some options specific to the Box & Whisker chart type.


Waterfall Chart

The Waterfall Chart was just added to Power BI so you’ve probably already had a look at that visualization. The neat thing about Waterfall chart is that it allows us to see how the small pieces of a whole contribute to the total. For instance, below I have a Waterfall chart that shows the play stats from the New England Patriots first drive in their conference championship game against the Indianapolis Colts which resulted in a touchdown. Using this chart I can see how each play in the drive led to the total yards gained on the play. Pretty cool!


A Waterfall chart could be really useful for monitoring changes in inventory or for viewing balance sheet data.

Sunburst Chart

The Sunburst chart is good for viewing hierarchical data. So if you wanted to view how individual accounts contribute to their parent accounts in a balance sheet, the Sunburst chart could be a really interesting way to visualize that type of data.

Here I’m using the Sunburst chart to analyze the receivers of a few different teams by player and position.

Read more


Refreshing Excel Power Query & Pivot Tables with SSIS and Task Factory

image With SSIS 2014 and earlier there is currently not native way to refresh an Excel workbook which include Power Query queries. Now that functionality is rumored to be included with SQL Server 2016 but if you’re currently running SQL Server 2014 or 2012 you are out of luck. But that’s why Pragmatic Works put together the Excel Power Refresh component for SSIS.

Configure the Excel Power Refresh Task in SSIS

Configuring the Excel Power Refresh Task is pretty straightforward. There’s not a lot of complexity to this component, which is a good thing.

First create a Connection Manager to your Excel 2013 file that includes your Power Query queries. In my case I have an Excel workbook that has some Power Query queries that query for some data that I used for a blog post on the new Excel 2016 chart types.

Once you create your connection to the Excel file, use the Data Connections and Pivot Table Sheets to select the queries and pivot table sheets that you wish to refresh.


And now I can schedule the refresh of any Power Query data connections or Pivot Tables with SSIS.


Very cool!


Download the free trial of Task Factory here.


If you have any questions or would like some more information on Task Factory, feel free to send me an email or leave a comment below. Thanks!

Read more

Power BI Tip: Use a Scatter Chart to Create a Calendar Report

Power BI Desktop Scatter Chart

The Scatter Chart in Power BI and Excel is very useful chart for visualizing three different metrics in tandem. But with a little bit of work you can use a Scatter Chart to create a Calendar chart for visualizing your metrics across the days of an individual month.

New to Power BI Desktop? Start here!

To configure a Scatter Chart too mimic a Calendar type report, you need the follow:

1) An attribute for the day number of the week (1,2,3,4,5,6,7).
2) An attribute for the day number of the month (1,2,3…29,30,31).
2) An attribute for the week number of the month (1,2,3,4,5,6).
3) An attribute for sorting the week numbers in reverse order.
4) A business metric you wish to represent in the report.

Most of these items you can get from a traditional date dimension. In this example, I’m utilizing the Adventure Works DW database which has a date dimension table.

To set up the Scatter chart correctly, configure the visualization as seen in this screen shot from Power BI Desktop.


I want to point out a couple things here. First, you can optionally add a field to the Legend to differentiate between the weekend and weekdays or to identify holidays, as seen below.

Power BI Desktop Scatter Chart

Secondly, in order to display the Calendar in the correct order, we actually need to reverse the order to the Week numbers so that the first week of the month is numerically higher than the last week of the month. To do this I used a TSQL Case statement to populate the new column in the Adventure Works Date dimension table:

[WeekNumberOfMonth]  AS 
[WeekNumberOfMonthReverse]  AS 
    (case (datepart(week,[FullDateAlternateKey])-datepart(week,dateadd(month,datediff(month,(0),[FullDateAlternateKey]),(0))))+(1) 
        when (1) then (6) 
        when (2) then (5) 
        when (3) the
Read more

Setting Up an HDInsight Cluster (No Scripts Required)

Let me start by saying, I am not a fan of scripting. It definitely has its place and a lot of my peers really like it. It is the easiest way to get functionality out from software vendors such as Microsoft. PowerShell is an incredibly powerful tool which can do just about anything. However, therein lies the problem for me. Scripting solves a lot of problems, however, I just wanted to set up and use a basic HDInsight cluster to create some Power BI demos (posts coming soon). So I started the journey to find the scripts and try to understand the syntax and so on. Then I went to the Azure Portal, here is what I did to set up my cluster and load data with no scripting required. My goal was to go to get a working demo platform up. Would I necessarily recommend this path for production work, not sure yet. But now I can work with HDInsight with considerably less work required to set up the environment.

HDInsight Cluster No Script Setup Requirements

You need an Azure account. You can go to to sign up for a free account if you like. If you have an MSDN subscription you should have some time available as well.

HDInsight Cluster No Script Setup

Once you have your account created, you should go to We will be doing our setup from here. During the process we will be creating a storage account (if this is your first run in azure, you may choose to set up a Resource Group as well) and the HDInsight cluster. Be aware that the cluster has compute costs and the storage has storage costs. At the end we will remove the cluster to save your compute time.

Create the Storage Account

This step can be done during the HDInsight cluster creation, but this limits your ability to share data across clusters. If you are just trying it for fun, you can do this during the cluster set up.

Click the + symbol on the portal, then Data + Storage, then Storage Account. This will open a blade with the set up instructions for a storage account.


When you create your account you will have some options to fill in:

  • Name: this name will need to be a unique name, e.g., joescoolhdinsight
  • Pricing tier: The pricing tier is really important if you are using a limited plan or if you plan to keep the data for a long time. If you are planning to use this as a demo, I would select Locally Redundant as that is the lower cost plan.
  • Resource Group: The resource group lets organize your Azure assets. This is for your benefit, so if you want to keep all of the HDInsight components together, you could create a group for that or stick with the default.
  • Subscription: This lets you choose the subscription you want to use.
  • Location: Be sure to select a location close to you that supports HDInsight. Check to see what Azure services are supported in each region.
  • Diagnostics: This is optional. If you are looking into the diagnostics or need to prep for production, you will find this useful. In most cases, we would not turn this on for demos.

Click Create and it will create your storage account. This may take a few minutes. The notifications section on the portal will alert you when this has been completed. Once that is complete, we will continue with setting up the cluster.

Create a SQL Database for a Metastore

Read more

Uploading Files to an HDInsight Cluster (No Scripting Required)

As I noted in my first post, I am not a fan of scripting. In that post we set up a cluster without using scripts to do so. Now we are going to look at how to upload files without scripts. While this will work for our demo and learning purposes, I would encourage you to use scripting to handle production level loads or even if you want to upload a lot of files. While I am not a fan, it does not mean the scripting may not be a better overall tool. However, when I am trying to learn the functionality or work with system using other tools (in this case Power BI), I find that methods such as these help me be more productive sooner.

Prepping to Load Data Into Your New HDInsight Cluster

A key difference between standard Hadoop and HDInsight is file management. With HDInsight, you can load files into Azure Storage and they can be consumed by the HDInsight cluster. Keeping with the No Scripting Required mantra, we will be using a graphical interface to load files into Azure storage. There are a number of options out there, you need one of them installed. For our example, we will be using the freeware version of CloudBerry Explorer for Azure Blob Storage. Once you have your tool of choice installed you are ready to get some files.

At this point, you need some files to load. I am using some data I created for another demo. My data is in 7 files of daily receipts for my restaurant for a week in March. Once you have the data, we can load that into the cluster.

Loading Data Into Your New HDInsight Cluster

A noted above, the next steps for use will use CloudBerry Explorer to load our data. In this case, I just copied the folder with my files over to the Azure Storage once I connected the tool to Azure.


Once that is done, we will look at working with the data in Hadoop and with Hive.

Creating an External Hive Table and Querying It

You can create two types of tables using Hive – internal and external. An internal table loads the data into a Hive database. An external table applies a schema to the data without moving it. I will be creating an external table. I like this concept because it applies schema to the files that have been uploaded and allows other tools to interact with that data using HiveQL. When you drop an external table, the data remains because the table represents structure only.

In order to help everyone through this (in particular me), the next sections walk through the steps I took to create my table and select data from it. (This is not a detailed look at Hive, but rather a focus on the process of making HDInsight data available using HiveQL.)

Understanding the Files

The first step was to document the structure of the data in the files. Here is the data that I had in each of the files in column order:

  • Ticket Number – int
  • Ticket Date – date
  • Hour of the Day – int
  • Seat Number – int
  • App Amount – int
  • Entrée Amount – int
  • Non Alcoholic Amount – int
  • Alcoholic Amount – int

My structure was fairly simplistic. Each file represented a day.

Creating the Table

Now that I had the structure, I needed to work out the table DDL. (Reference: htt

Read more

Using Power BI with HDInsight Part 1: Power Query and Files

With the rise of HDInsight and other Hadoop based tools, it is valuable to understand how Power BI can help you take advantage of those big data investments. If you need to set up a cluster to work with, check out my previous posts on Setting Up an HDInsight Cluster and Loading Data Into Your New HDInsight Cluster. These posts show how to do this with no scripting required. If you prefer to script, there are a number of resources with sample scripts on doing the same work.

In this article, I will focus on using Power Query to get data from the Hadoop file structure in HDInsight. I will be using Excel 2013 with the Power Query Add-In. I will also be using the restaurant data I loaded as noted in the three previous posts. If you need to create a cluster and load data I encourage you to check the following blog posts:

These posts walk through the process of creating a cluster and loading up data.

Connecting to HDInsight

First, open a new Excel workbook and click the Power Query tab. Once there, you can find the Azure HDInsight source in the From Other Sources dropdown. Select that option to open the following dialog:


You will need your storage account in order to continue. Then you will need the storage account key. Once you have added the key you will see that the Navigator opened in Excel on the right.


It should show the name of your cluster and the default container name. Double click the container name and it will open the Power Query window. It will show all the files available in the container. Even though we have it organized in folders, the view shows all the files. If you have a large amount of files and you don’t want to scroll to find them, you can click the down arrow on the Folder Path column and use the text filter to find the folder you are looking for.


Now I have the files I want to use in Power Query. If you click the binary link it will open a copy of the file. However, this is not how we want to work with the data as we have multiple files. (If you did this, remove steps up to the Filtered Rows step in the Applied Steps section.) I now have the files I uploaded showing.


In order to work with them all together we need to Combine Binaries.

Read more


Whats New BI-wise

  • 10 July 2013
  • Author: Mike Milligan
  • Number of views: 38528
Whats New BI-wise
My Gift to My Readers
This is going to be the page I update when I have quick links and blurbs related to business intelligence that I'd like to share.  Put your email in the widget at the very bottom if you'd like to be notified when the content changes. 

BI from g8rpal on 8tracks.




How to Test SSRS MDX Queries in SQL Server Management Studio



SSAS Gotcha!  Today I got bit by this gotcha.  I created a default member for my date dimension for yesterday's date.  Suddenly, when I use the date dimension in the Report Filter quadrant of an Excel PivotTable it doesn't work.  It filters the data on the default member (yesterday) no matter what I select in the report filter.

Default Member Trap (be careful of what you filter)



Comparing Data Warehouse Design Methodologies for Microsoft SQL Server

I like this article for it's textbook definition of an ODS.




1. Exceeding the boundaries of a single screen

2. Supplying inadequate context for the data

3. Displaying excessive detail or precision

4. Expressing measures indirectly

5. Choosing inappropriate media of display

6. Introducing meaningless variety

7. Using poorly designed display media

8. Encoding quantitative data inaccurately

9. Arranging the data poorly

10. Ineffectively highlighting what’s important

11. Cluttering the screen with useless decoration

12. Misusing or overusing color

13. Designing an unappealing visual display

From:  Common Pitfalls in Dashboard Design - Stephen Few

More of the same:

Clutter, data overload put dashboard designs on path to failure

Data Visualization and Dashboard Design



"Yet standardizing the T-SQL code across an organization is essential to facilitating effective and efficient code reviews, troubleshooting T-SQL script, supporting joint development efforts, and handing off projects from one group to another. DB engineers might disagree about what the code should look like, but few will question the wisdom of implementing such standards."


"And there's another thought here as well. Within your code, it's best if you specify the schema name (even if it is the default schema or some other mechanism) along with the object when you select or activate it. It saves the Query Processor a step from having to resolve the schema name out by itself."

You can find more here:



Features Not Supported in a Future Version of SQL Server

The following SQL Server Database Engine features are supported in the next version of SQL Server, but will be removed in a later version. The specific version of SQL Server has not been determined.


Microsoft's new 3D data visualization BI product - GeoFlow



"The survey findings would seem to suggest that the CFO prioritizes business applications higher than the CIO does," said Bill Sinnett, senior director, research at FERF. "If the CIO does not understand this, then there's a chance the CFO will sponsor his or her own initiatives, and not coordinate them with the IT organization. This demonstrates the trend that BI is becoming less of a CIO responsibility and more of a CFO and line-of-business responsibility."

Gartner Says Business Intelligence/Analytics Is Top Area for CFO Technology Investment Through 2014


Analysis Services and the Case of the Standby Cache

Finding and killing SSAS Sessions

Dealing with Long Running SSAS Queries using Powershell


Power View for Cubes

Coming Soon: A New Edition of Information Dashboard Design



EffectiveUserName research

Using EffectiveUserName To Impersonate in SSAS

 This link says you do not need to configure Kerberos on the
SSAS server:

 How to configure dynamic security in Analysis Services and
Performance Point 2010 without Kerberos

 Use Analysis Services EffectiveUserName in SharePoint Server

 Use EffectiveUserName in PerformancePoint Services
(SharePoint Server 2013)

 Unit Testing Role Security in Analysis Services




I am speaking at SQL Saturday Jacksonville

Registration is Open!

My session starts at 9AM, Data Mining with DMX.


Get yesterday's date and more using MDX


                  + VBAMDX.Format(VBAMDX.Now(), "yyyyMMdd")+ "]")) ON COLUMNs
FROM [Adventure Works];


-- The First Calculated member is the value of NOW()
WITH  MEMBER [Measures].[Full Date] as 'NOW()'
-- The Second Calculated Member is the Day part of the first calculated member.
MEMBER [Measures].[What Day] as 'DAY([Full Date])'
-- The Third Calculated Member is the Month part of the first calculated member.
MEMBER [Measures].[What Month] as 'MONTH([Full Date])'
-- The Fourth Calculated Member is the Year part of the first calculated member.
Member [Measures].[What Year] as 'YEAR([Full Date])'
   {[Full Date],[What Day],[What Month],[What Year]} ON COLUMNS
FROM [Adventure Works];



SSIS VB.Net Script to download an FTP file from an AS/400 DB2 server.  (Credit goes to Michael Simon for piecing this together from various sources on the internet.)


    Public Sub Main()
Try 'Create the connection to the ftp server Dim cm As ConnectionManager = Dts.Connections.Add("FTP") 'Set the properties like username & password cm.Properties("ServerName").SetValue(cm, "") cm.Properties("ServerUserName").SetValue(cm, "Hugh") cm.Properties("ServerPassword").SetValue(cm, "Janus") cm.Properties("ServerPort").SetValue(cm, "21") cm.Properties("Timeout").SetValue(cm, "0") 'The 0 setting will make it not timeout cm.Properties("ChunkSize").SetValue(cm, "1000") '1000 kb cm.Properties("Retries").SetValue(cm, "1") 'create the FTP object that sends the files and pass it the connection created above. Dim ftp As FtpClientConnection = New FtpClientConnection(cm.AcquireConnection(Nothing)) 'Connects to the ftp server ftp.Connect() 'Build a array of all the file names that is going to be FTP'ed (in this case only one file) Dim FileName(0) As String FileName(0) = Dts.Variables("ftpFileAndMember").Value.ToString Dim FileDate As Date FileDate = System.DateTime.Today Dim currentDirectory As String currentDirectory = Dts.Variables("ftpDirectory").Value.ToString ftp.SetWorkingDirectory(currentDirectory) 'ftp the file Dim LocalPath As String LocalPath = Dts.Variables("localFilePath").Value.ToString ftp.ReceiveFiles(FileName, LocalPath, False, True) ' the True makes it overwrite existing file and False is saying that it is not transferring ASCII ftp.Close() Catch ex As Exception Dts.TaskResult = ScriptResults.Failure End Try Dts.TaskResult = ScriptResults.Success End Sub


Exciting PerformancePoint links!

PerformancePoint tutorial

Formatting your PerformancePoint Analytic Grid!

Using SSAS MDX Calculation Color Expressions



Extra Exciting Links!

Comparison of Queries Written in T-SQL and SQL Server MDX

SQL Server Analysis Services Period over Period Variance Analysis



On Saturday, April 27th at 9:00 AM I will be presenting a session at Jacksonville's SQL Saturday event on the DMX data mining query language.  Hope to see you there!

SQL Down Under - Be sure to check out Erin Walker's session on dimensional modeling!  Our favorite Jacksonville consulting company Pragmatic Works is mentioned.

Bunch of posts about PerformancePoint and using Excel w/ KPIs from SSAS

Unbelievable.  M$ and their licensing.  Looks like they are effectively killing PowerPivot and PowerView.

Furthermore, with the more expensive SQL Server 2012 licensing (per core) I have to wonder how many people are staying with SQL 2008 R2.  Our server has 80 cores!  Why would we switch?  Theoretically you could keep the database engine on 2008 R2 and upgrade your SSAS, SSIS, and SSRS servers to 2012.

Check out this post from Brent Ozar and in particular his recommendation that you NOT use order by and instead let your application do the sorting.  This is contrary to what used to be the best practice of letting the SQL engine do the work.  Those of you that work with SSRS should know that letting the SSRS engine sort the data takes longer than the SQL engine more often than not.  The reason?  Cost per core.

"Avoid ORDER BY; sort in the app instead.  To sort your query results, SQL Server burns CPU time. SQL Server Enterprise Edition goes for about $7,000 per CPU core – not per processor, but per core. A two-socket, 6-core-each server rings up at around $84k – and that’s just the licensing costs, not the hardware costs. You can buy a heck of a lot of application servers (even ones with 256GB or more of memory) for $84k."



PerformancePoint Gotcha!  - "PerformancePoint Services could not connect to the specified data source."  When creating a new dashboard, you MUST do a save all right after you create your data sources otherwise you won't be able to create anything that uses that data source.


Dinky Linkys

Starting a Bike Shop (like AdventureWorks!)

Research Quality Data Sets 

How to Create and Use Temp Tables in SSIS

Programmer Bob - Latter-day Tom Sawyer or Massive Security Risk




SharePoint Gotcha! - Creating SharePoint 2010 list from Exel 2010 spreadsheet

This needs to be a future blogpost.  Here's the solution:

Here is some of the crud I went through to get there:

Dadgum it Micro$oft!  This product is 3 years old!  Here's a bugfix: get rid of the button if you can't make it work!  I hope it's fixed in SharePoint 2013.  All kidding aside, what am I going to do?  Switch to Cognos.  LOL.


SSAS Gotcha! - Duplicate Attribute Key

Remove all hierarchies.  Add them back one by one.  Select the errors through the relational engine w/ each column in the hierarchy ordering from top down.  When you find inconsistencies such as: month belonging to more than one quarter (ha ha!), employee having more than one manager (master), then you either need to make a compound key and force the hierarchy or the hierarchy doesn't make sense to begin with.

This needs it's own post, time permitting.


Other interesting tidbits:

Do not Disurb!

Cost of Interruptions:

(strangely similar)


Game theory and Batman, RIP:




How crApple sets its prices:

Home Depot is looking at Pricing!






New Years Resolutions - Professional Goals

1.  Reboot

2.  Avoid Snowflakes!

3.  Learn more data mining and the DMX query language (and blog about it.)

4.  Learn more about Master Data Services (and blog about it.)

5.  Become a master at MDX

6.  Architect an enterprise solution in which Contoso buys AdventureWorks and Northwind (and blog about it.)

7.  One presentation per quarter and at least one really good blog entry per month

8.  Mentor someone outside of the company I work

9.  Keep a personal log in which I will plan each day and reflect on the previous day every morning. 

10.  Publish Segment Analyzer on codeplex (and blog about it.) 


Segment Analyzer is an idea I have to help analyze various customizable groups of customers, products, or any type of dimension in an analysis services cube.  The main force behind the solution is a .Net application that can run against any data warehouse that uses a standard table and column naming scheme.  It won't matter what the naming scheme is as long as it is consistent, that criteria would be configurable. 

An analyst uses a GUI interface to create criteria for segments to be analyzed.  The criteria is based on dimensional attributes, and aggregated and non-aggregated facts and supports complex parenthetical and/or logic at many levels.  Dropdowns are populated based on the distinct values in the dimensional attributes and mathematical operations for the numeric information.  Which attributes and which metrics are featured in the interface is handled through a one-time configuration process. Segments can have parent segments in which the criteria of the parent is inherited to infinite levels of children.  This will be accomplished by parsing the segment criteria into dynamic SQL that populates bridge tables to support many to many dimensions in Analysis Services.  The segment dimension itself uses the parent child hierarchy feature in SSAS.  Two known query performance killers, I know; but, the features would only be used on specialized cubes or databases just for this purpose.  

Examples of segments of customers from adventure works:

Customers that visited the store within the last six months who are female, are between 16 and 24 years old, live within 25 miles of the store and have spent at least $100 in the past six months and redeemed two promotional coupons.

Customers that visited the store within the last year, are male, live in the 12345 zip code, own one car, spent at least $200 in the past year, and redeemed 0 promotions.




Tip of the Day

How to find the SSIS Package that Loads a Table

Browse to the packages folder using Windows Explorer and type “*.dtsx” into the search box.

Open Editpad Lite or a similar multi-tabbed editor.

Press Control-A to select all the packages and drag them to your editor.

Press Control F and enter the table name into the search dialog.  Be sure to check the “All files” and “Loop automatically” checkboxes.

This also works for SSRS reports, stored procedures, views, etc.



Presenting on the SQL Server 2012 Data Mining Add-ins for Office 2010 at Jacksonville Code Camp on October 6th.

Free Food!



Reasons to use Money data type over


  1. 13% improvement in Analysis Services processing speed
  2. 20% improvement in SSIS processing
  3. In the context of SQL Server data compression, the money
    and small money data types tend to compress well when the
    absolute value is low




Presenting at SQL Saturday #130 on Saturday 4/28 at

University of North Florida

1 UNF Drive, Building 50, Jacksonville, FL, 32224


SQL Server 2012 Data Mining Add-ins for Office


Free SQL Server 2012 eBook:  Kindle and others.


Great Podcast!  Good discussion about the roles data scientists are playing.

Big Data Monetization


Some links I recently found useful...

Using Named Sets in Analysis Services to Deliver Special Date Ranges to End Users

Dynamic named sets

Filter the Source Data for a Pivot Table

New demo of Power View is Live!

SQL Server 2012 Demo Showcase


I received this question from a past bootcamper I thought I'd share...


"Would have to happen to have a split string function that can accept values with commas?

I have values like this that I will need to split (‘New York, Brooklyn’,’New Jersey,Hoboken’)

The functions that I currently have cannot handle data with commas."

 --First we need this function

 create function Split (@StringToSplit varchar(2048),@Separator varchar(128)) returns table as return with indices as

(select 0 S, 1 E union all select E, charindex(@Separator, @StringToSplit, E) + len(@Separator) from indices where E > S )

select substring(@StringToSplit,S,case when E > len(@Separator) then e-s-len(@Separator) else len(@StringToSplit) - s + 1 end) String

,S StartIndex       from indices where S >0


 --Now we use the function

 declare @mystr as varchar(255)

 set quoted_identifier off;

set @mystr = "'New York, Brooklyn','New Jersey,Hoboken'"

set quoted_identifier on

select @mystr

select REPLACE([String],char(39),'') from dbo.split(@mystr,''',''')



MS released the Excel Data Mining plug-in for Excel!  Woo Hoo!  I can't wait to try this out.  Once I do, you can be sure I'll be blogging about it.  I just can't wait to share this information.  Thanks Jorge Segarra for sharing this with me!

Also, Adam Jorgensen sent me this awesome link from Brent Ozar about table partitioning in SQL Server.  Thanks Adam!

How to Decide If You Should Use Table Partitioning

Brent Ozar's post from a long time ago, Rock Stars, Normal People and You, is my favorite blog post of all time.  It helped me pull the trigger on a decision I made about a year ago to leave a company before my first year was up and look for something more fulfilling.  Leaving a company before giving them a full year trial was something I normally would never have considered.  Now I am at Pragmatic Works and couldn't be more challenged.  I've never met Mr. Ozar; but, I have him to thank for that.  I hope I get the opportunity to thank him in person one day.




New blog post on PerformancePoint 2010 Time Intelligence!



Hey!  What happened to January!?!?  What a year already.  No time for a proper post so this will have to do for now. -------------------------------------------------------------------------------

Tip of the Day!  Use SQL Server Native Client for your SSAS data provider.  (per Adam J!  :-) )
"SQL Server native client is the combination of ODBC and OLEDB into to one driver. So you are really using OLEDB with the Native Client. So you should see no difference."  Some guy on the web
ZoomIt - Screen zoom and annotation tool for technical presentations
Great blog entry on the various types of testing.  Unit Test or Integration Test and Why You Should Care
Jorge Segarra ( shared this link with me recently:
Digital SCRUM board: Create a digital version of cards/sticky notes we use for agile-type projects. Cool part is you can share your boards so multiple people can work on projects. Recommend watching the short intro video to see the different stuff you can do.
SSRS - Action Item Link changes from Dev to QA to Prod
Our solution has a SSRS report that includes an action item link that launches a dashboard page.  Our issue is that we do not want to have to manually hardcode the link each time we move from dev to qa and from qa to prod.
I found a possible solution here:
However, our link pointed to a SharePoint dashboard page, not a SSRS report.  Solution was to use a switch statement like so:
 Globals!ReportServerUrl = ""
 ,Globals!ReportServerUrl = ""
 ,Globals!ReportServerUrl = ""
chart pac man
Using the MERGE statement in SSIS via Stored Procedure  - Sometimes your package will run faster if you leave the heavy lifting up the SQL Engine.  It really just depends on your environment so if you are dealing with a lot of data, my recommendation is to test it both ways.
TheRestartPage.Com - How many have you sat through?  One of my New Year's resolutions this year is that every time someone asks me a question about something not working, my first response will be to ask them to reboot their computer.  My wife looked at me a little funny though when the dishwasher would not turn on.

Happy New Year! It's been a long time since I've posted. Two terrible colds, Christmas, New Years, professional goals, and challenging projects are the culprits. One of my resolutions is to become a consistent blogger so you can expect more from me this year. 

My first real blog post of the New Year has just been posted:  Near Real-Time OLAP using SSAS Proactive Caching.

I need to clear my plate so here are some of the gotcha’s, tips, and links I’ve saved up over the past couple of months.

I came across the following link today browsing through Google Reader:
While I like the idea, you still have to drag the table on to the query analyzer editor.
A tool I like to use during the data analysis stage of a BI project is called database browser.
There is even a portable version you can stick on a USB stick or SD card.
Using it, all I have to do is single left click on the table to see the results.  Awesome!
Microsoft SQL Server Analysis Services has some query performance issues when dealing with many-to-many relationships; especially if the intermediate dimension is especially large.
Here are some links to help you with this issue.


Recently I was working on an Analysis Services solutions that used a snapshot based fact table for its main measure group.  We set the aggregation function to Last Child; but, nothing was working.  This cube had multiple date dimensions and I discovered that we needed to have the date dimension we needed to slice by listed as the first in the list in the dimension usage tab for the last child aggregation to work properly.
Squeeze one more remote desktop connection out of RDP!  Did you know that if you add “/admin” after your shortcut to a RDP connection you can have three simultaneous RDP sessions instead of two?  The user account you use must have administrative rights on the box for this to work.
How to add a link to a SharePoint page?  Use the content editor web part.
Try this.  Open SSMS and click the new query button.  Paste some SQL code in there and hold down your ALT button while left clicking your mouse button and dragging it to select text horizontally.  You can also insert pasted clipboard contents in the same fashion by holding down the ALT key.
SharePoint master pages:
 Want to use Live Writer with
Connect with the following url:
Lost your last post?
Try finding the .wpost files you want in My Documents\My Weblog Posts\Recent Posts, and move them to the
My Weblog Posts\Drafts folder. I believe that should stop Writer from polling the server when you open the draft.
Gotcha!  PerformancePont Time Intelligence and SSRS
I was trying to use a PerformancePoint time intelligence filter to pass values to a SSRS report.  I discovered that the MonthToDate, YearToDate, QuarterToDate syntax does not work with SSRS.  Instead use Month.FirstDay:Day, Year:FirstDay:Day, Quarter.FirstDay:Day
SSRS – weird error encountered when multiple charts stacked on each other w/ visibility toggled by an object.  Width of report was all messed up until I added a horizontal line (which I made invisible by making it’s color blend in w/ background.)
SSRS tip: 
To open out into a new tab/window you would use ="javascript:void('http://www..... '))" in the action event
To open in the same window use:
="javascript:void('http://www..... + "','_self'))"
Oh how I love a good checklist of best practices!
Excel - How to make those neat gradiant filled data bars!
On the Home ribbon tab, inside the Styles group, click Conditional Formatting, point to Data Bars, and then, in the Gradient Fill group, select the blue (first) design.
Gotcha!  SSIS - XML
This came from a friend.  "I found out that it is a limitation in SSIS that it will not read the top most node in an XML document and pull the data.
 My solution to this was to create a XML Document called DummyNode that contained only this:
 I then merged the DummyNode.XML with the Source XML file into a merged file with the XML Task in SSIS.
 This made the top most node DummyNode and then it read the node underneath it that contained all of the Season data.
 Here is the link that helped me identify the limitation or known issue in SSIS:
Gotcha!  SSRS  - 
"Anyone come across an issue assigning view only permissions to a report library that holds SSRS reports, in which reports are not displayed?  It seems that they are only displayed if the user is given edit item permissions, which is not what we want."
The report and data source needs to published as a MAJOR VERSION in order for users with "view" only permissions.
SSRS - Geo Links!
Turn your addresses into Latitude and Longitude coordinates!
Pet peeves
  1. Unsorted tables.  I like all *Key (sk) columns at top, then all *ID (alternate keys, biz keys), then attributes, metrics, then maint (last modified, etc.)
  2. Cubes built using views built on views.  (push this to the ETL!!)
  3. Cubes built upon views that do a bunch of joins or data cleansing – push to etl!
  4. No Schemas defined in dw.    Dbo.
  5. Dimensions w/ no hierarchies.
  6. Ssis packages that are ugly.   At least use the auto format diagram option!
  7. Ssis packages that use ole db source for a component name.  or "data flow task".  Or "ole db destination."
  8. Sql code that is not formatted.
  9. Ssas – named queries  AND views
Live PerformancePoint demo site! Super cool! To get the right click functionality you have to click the compatibility icon in Internet Exploder. Go here and click on demonstrations:
Live PowerPivot demo site!
Password: P0werP!vot
What is Pragmatic Works Foundation?
The Pragmatic Works Foundation is a non-profit organization created to find passionate people who are interested in joining the technology field but cannot make the financial investments needed for training and hardware. The goal is to provide candidates the opportunity to learn new technologies from industry experts. In addition to offering free training, we work with our onsite recruiter to help place candidates who complete training with a new career in technology.
The deadline for applications is today!

Many to Many Relationships

I finally saw the movie Moneyball.  It was pretty good and I'd recommend it.  I do have two complaints. 
1.  There was only one computer in the whole movie
2.  Brad Pitt didn't play the data analyst
Bill James's Sabremetrics takes a back seat to a story about a manager trying to change the way things have always been done by replacing subjectivity with science.  The movie is based on the true story of Oakland A's baseball team manager, Billy Beane played by Brad Pitt.  It turns out that Peter Brandt, the analyst, also starred in another movie called Superbad.  (I wonder if it's any good?)
This week I've been ramping up on my PowerPivot skills.  I've read two books cover to cover on it; but, the only way to truly know it is to try and solve problems with it.  I had a presentation started using data from that just needed a little more work (or so I thought.)  I did a little research on APBRmetrics, the Sabremetrics of basketball, and got to work adding value to the data in PowerPivot.  DAX is truly a whole new way of thinking and wrapping my head around how to do things in DAX instead of SQL or MDX is a real challenge.  I'd like to do a full blog entry in the future of my experience and post the spreadsheet here when I am all finished.
I added a new blog entry that compares the Task Factory Address Parse Transform Component with the SSIS+ equivalent.  Please check it out, it's a pretty good read especially in light of the recent USPS news.  We can all help them cut costs by making sure our addresses are good so they're not out wandering around in the rain trying to deliver mail to bad addresses.
mmmm looks good
Link of the day!
I have got to go see this movie today! I taught a business intelligence class last week and one of the students worked for a professional basketball team and brought their data. We used the Excel data mining add-in to explore how we could use it for draft pick analysis. An excellent webinar for doing the same for fantasy baseball can be found here:
If you are interested in this sort of technology, be sure to read SQL Server 2008 Data Mining. Chapter two especially because it shows you how to do everything using the Excel data mining add-in.
Some neat utilities I came across this morning:
RichCopy - GUI frontend for RoboCopy.  Useful for copying or transfering large files.
NoReplyAll plug-in for Outlook 2007 or 2010 - Primary function is to prevent people from doing a reply-all to your message, or forwarding it.  Additionally, it includes a check for email goofs such as omitting attachments or subject lines.
SpeedLaunch - With Speed Launch, you can simply drag any file, document, or website URL into the Speed Launch bull’s eye on your desktop and tag it with any name you want. Then, any time you need it again, you can just go through the bull’s eye, type or click on the name, and there it is.
Screen Recorder - Very easy-to-use screen-to-video capture program, developed on top of Windows Media Encoder, that lets you easily capture what is going on to a small video file, which you can then send via e-mail to the appropriate person.
I found all of these at the link below so there may be some others you might find interesting. 
One tool that I absolutely find indenspensible is ResophNotes.  I use it for code snippets and plain text notes.  It syncs with and there is an Android app called mNote and I'm pretty sure there is an Iphone app.  I like the way I can choose how and where to store the notes making it a great portable usb app by the way.  I can store them in a database file or individual text files in a directory I choose.  I'd love it if they came up w/ a way to have a group share of code snippets.  I guess you could use a shared folder on the network or a shared account w/ simplenote but if there was a way to enfoce moderator approval of edits, deletions, and additions that would be awesome.
Pragmatic Works Informercial for BIxPress!  This is too funny!
My first webinar!
Zero to Dashboard- Intro to PerformancePoint
Come check out the PerformancePoint hotness! Mike will demonstrate the functionality in PerformancePoint services in SharePoint 2010 and show you how to quickly build some dynamic dashboards and reporting functionality your end users will crave.
It's been a long time...
...Since I've updated this blog.  I'm going to blame it on the migration to the new software and my fierce dedication to the tasks at hand. 
What's New BI page format change... new posts at the top.  Make sense doesn't it?
First things first...
Last post I described in great detail a technique I encountered for performing parallel loading of massive amounts of data into a single source using SSIS.  It was my intention to provide an example the very next day; but, the procrastination devil on my shoulder finally gave me a good reason not to do so.
 Balanced Data Distributor:
I intend to do a whole blog posting on making SSIS Speedy in preparation for my upcoming webinar so please, stay tuned!
BI Bootcamp -
I taught my first BI Bootcamp in Jacksonville last week and I think it was a pretty good success.  A few minor hiccups and some technical difficulties; but, overall I think everyone was very happy.  I really enjoyed sharing my knowledge and the students were very enthusiastic about learning all of the tricks of the trade.  Talking all day made me very tired!
A duplicate attribute key has been found while processing...
Aaargh!  Gotcha moment today.  As soon as Mike Davis told me the answer I knew it sounded familiar...  SSAS doesn't differentiate between a null value and an empty string.  Boy, I felt silly. 
One quick fix is to change the underlying view the dimension is processing against (you are using views aren't you?) with a simple case statement:
SomeColumn =
       when someColumn IsNull then 'UNKNOWN'
       when someColumn = '' then 'UNKNOWN'
       else SomeColumn
Of course, you would eventually push this view logic to the ETL which brings me to why I seldom encounter this error when I am designing a cube... I am usually the one that designed the ETL and I do this sort of thing from the get go.  Please don't misinterpret that as arrogance.  Make no mistake, I learned the hard way.
I was curious what a simple google search would produce so I Googled: "ssas duplicate attribute key has been found" and checked the top three results.
The first link had a little bit of a different answer buried a little bit:
If you have NULL values, you may need to change the null processing for the attribute.  In the dimension editor, select your problem attribute.  In the properties window, under the key column property is "Null Processing".  By default it's set to "Automatic".  Change it to "Unknown Member" and you should be all set.
(Kudos to : Christine C. M_ )
The second link didn't help at all for this particular cause of the error. 
The third link was the best. 
Baby's crying... gotta run.
Wow!  The past two weeks have been a whirlwind.  I've been heads down working with a team of very bright individuals on an extremely interesting project.
Whoever it was that said consulting is like a rocket ship to your brain sure hit the nail on the head.  I am thoroughly enjoying the challenges presented to me and totally love the fact that I am not just delivering a solution; but, passing on valuable knowledge and experience.  The hours are long, being away from the family is hard; but, at the end of the day it is comforting to work with people that really want to learn, participate, and do the best they can for themselves and the people they work with every day.

I’d love to share more; but, it’s a top secret project: need to know basis. 

I’ve been doing a lot of thinking about parallel processes in SSIS.  I recently modified the meta data driven version of the ETL framework on codeplex to support additional parallel processes.  No big deal there; but, it reminded me of a package I’d recently seen from another very bright individual.  He told me he found the idea and some of the code from a Google search.  I’d love to know the link. 
Basically that person developed a package to allow for configurable parallel processes at the individual package level. 
Nutshell is that he used the:

T=SQL to partition the incoming records.  For our example let’s say 100K rows need to be loaded.  He had a SSIS variable that held the number of parallel processes to launch (let’s say 4) and another one to store the current iteration of those four.   

So starting with iteration 1:
control flow: grab the iteration variable and set it to 2.  100K divided by 4 is 25K, Get rows 1 – 25K.
launch 2 parallel processes:
first: OLE DB Source SQL gets the first 25K using the row_number() over syntax.

Second: launch another version of the package ITSELF
This second package which is a clone of itself, reaches out to the table that holds the current iteration and gets 2 then sets it to 3.  It follows the same logic but loads 25k – 50k.  Simultaneously, launching another version of itself that repeats but loads 50K – 75K, and of course, it launches another clone that loads 75K – 100K.  Basically what you get is 4 packages loading 4 equal parts at the same time.

Very cool stuff!  And looking at the log tables I could see that 4 instances of the same package took different variables and loaded the data into the destination at the same time.

Up until then when I thought of parallel processes using SSIS I thought of it at the master package level launching different packages that loaded different data into different tables at the same time.  The difference here is that you are launching multiple processes to load one data set into one destination in multiple pieces at the same time.

I was unsure at first and I still wonder about the contention between multiple packages inserting data into the same table.  Apparently it wasn’t a problem in this case; because the load times decreased.  I imagine there would be a sweet spot to find for each environment where you would want to stop launching additional parallel processing tasks when overall load times started to increase.

I’d like to, time permitting of course… to take this a few steps further.  Create a master package ETL framework that not only can determine how many packages to launch at the same time; but, determine the largest data loads and break them out into multiple processes at the package level. 

So…. Wouldn’t this blog post be great if I uploaded actual examples instead of theories?   Maybe in the near future… 

Here's a totally unrelated cool link I liked:
What a week!  Great to be home!  The word of the week is PowerPivot!
Got a new tip to share too.  My esteemed colleague Jorge shared this with me:
"Trace Flag 1117 makes it so that if a file needs to auto grow and there are multiple files, they will all grow together at the same time. How to set SQL Server startup options (this is where you enable trace flags):

Paul Randal post regarding TempDB and how 1117 helps:

Another post regarding how it helps (mentions SAP but applies to SQL Server databases in general, application agnostic):
"I would recommend trace flag 1117 and 1118 for uniform extents to reduce SGAM contention and growth of all files in a filegroup at the same time, also want instant file initialization turned on."
Link of the week:

Computer learns language by playing games

By basing its strategies on the text of a manual, a computer infers the meanings of words without human supervision.
PerformancePoint Tip - Right clicking an analytic grid in the leftmost columns allows the user to select measures.  Some people may not want that, so how do you disable that ability?  PerformancePoint Services Application Settings - Select Measures Control - set maximum value to 0.  (default is 1000).
SSAS Tip - Implement measure security without resorting to cell level security (which can be a huge performance hit.)
I will be teaching the BI Boot camp from 09/12/11 - 09/16/11.  Sign up now!
SSRS - Oh the joy I had working with SSRS again this past week.  Dynamic Grouping.  Fun topic I should blog about it in further detail in the future.  Here are some links if you can't wait and want to guess what I am going to say.
I will say this now. 
SSRS tip #1
SAVE OFTEN.  Furthermore, save more often than you think you have to.  Save w/ descriptive names.  By descriptive names I mean:
SomeReportForSomePurpose = SRFSP
Saving is so frustrating in SSRS because there are at least 5 steps to even do it enough to truly protect yourself.   As soon as you File Save as it insists on removing the old one from your solution.  So if you really want to keep all versions in your solution you then have to add the old one back.  I think next time I'll have to try it another way, perhaps using zip files.  UPDATE: Zip files work great!
SSRS tip #2
If an MDX report is taking too long and has calculated measures, consider converting those MDX calculated measures to be straight facts loaded through the ETL process.  Only use MDX calculated measures when you absolutely need the OLAP type functionality and pivot ability  If you can get the metric into the ETL process and pull it straight into the cube as a regular measure than it will improve query performance immensely.
Great Links:
10 General Programming Tips for Programmers
10. Do not Fear.
9. Divide and Conquer
8. Assume that coding is like writing a Novel and write it with lot of interest so the developers maintaining it would love to read it.
7. “Code as if every mistake you make will be discovered and you will be held accountable.” – Anonymous.
6. Be a Good listener first .
5. Be Open, learn to say no if it is not possible.
4. Diving into code is the best way to figure out things.
3. Be Relaxed even when solving one of the toughest problem .
2. Never Give up , you will get it .
1. Keep it Simple
This looks like a neat tool for a freebie!  I think I'll play around w/ it some more and create a blog post about it in the future.  Maybe compare it with Pragmatic Works BI xPress which I think has some similar functionality (and then some!)
Today I learned something new about SSIS, the OLE DB Destination component and the fast load option.  Someone told me the reason they didn't have the fast load option set for some of their OLE DB Destinations was because they wanted the package to fail if column truncation was going to occur.  That sounded a little funny to me so I did some research and the best I could find was something about how fast load wouldn't give you as much information about errors.  So I made a quick and dirty test and lo and behold he was right!  Using the OLE DB Destination component without the fast load would cause package failure if you insert a 40 character long string into a varchar 10 destination.  Using the fast load component, it just truncated the column and did not fail the package.  The SQL Server destination component behaved properly and also failed.
Categories: Blogs
Rate this article:
No rating

Mike MilliganMike Milligan

Other posts by Mike Milligan

Please login or register to post comments.