Who is online?  0 guests and 1 members
Home  »  Blogs  »  cprice1979

Communifire Blogs

Blogs RSS Feed

cprice1979 : Most Recent postings

cprice1979

Making the Case for Statistical Semantic Search

2 days ago by cprice1979  -  Comments: 0  -  Views: [93]

As we sit on the cusp of SQL Server 2014, it seems a little odd to be writing a blog whose objective is to introduce File Table and Semantic Search. While both of these feature were new in SQL Server 2012 and both received quite a bit of attention I have found recently that they are still either misunderstood or simply overlooked. Regardless, either case is unfortunate because both features are very powerful and open up a number of possibilities in handling what is commonly referred to as unstru...

Read More

cprice1979

#Mahout Recommendation Engines: Part 2 - Ride the Elephant

5 days ago by cprice1979  -  Comments: 0  -  Views: [220]

In Part 1 of this blog series we built a foundation by introducing the various techniques that can be used to generate recommendations for products or items to your users. In this post, we begin looking at the Mahout as a platform for building a recommender including setting up a data model, common methods for calculating similarity and finally the algorithms used to generate recommendations Understanding Recommendations in Mahout Mahout is a machine learning library of algorithms that grew out ...

Read More

cprice1979

#Mahout Recommendation Engine: Part 1 - Types of Recommenders

5 days ago by cprice1979  -  Comments: 0  -  Views: [393]

Recommendation Engines have become a pervasive and daily part of our digitally connected lives. Whether your shopping on Amazon or reading new articles on your Yahoo! home page the products and news you offered are the result of some implicit or explicit behavior that is used to drive a computational engine that uses patterns to predict (hopefully successfully) your likes and dislikes in order to serve up recommendations. While this technology is nothing new, advancement in toolsets have made th...

Read More

cprice1979

Hello My Name is Sqoop

20 days ago by cprice1979  -  Comments: 0  -  Views: [548]

If my previous post we have looked at different means and methods for loading and subsequently working with data in a Hadoop environment. Largely missing from the discussion to date however is how SQL Server and other relational database play in this sandbox. While there are multiple points of integration the focus of this post will be on SQL-to-Hadoop tool better known as Sqoop. Have a Double Sqoop Sqoop is a relatively new command-line tool whose primary purpose is efficiently moving data betw...

Read More

cprice1979

#SQLPASS Abstract Review - My Perspective

26 days ago by cprice1979  -  Comments: 0  -  Views: [457]

I have been fortunate enough to participate as a team lead for the past two years on the abstract review committee for PASS Summit and I wanted to take a moment to provide some feedback based on my own personal experiences. First, this year was by far the toughest. The quality of abstracts was phenomenal which made the job of abstract review and session selection very tough (this is good thing btw). Much of this is not new. I am hoping that it will help you make more sense of the abstract review...

Read More

cprice1979

PASS Summit 2013

28 days ago by cprice1979  -  Comments: 0  -  Views: [394]

It's official!! I will be presenting a session on HDInsight and Predictive Analytics at PASS Summit 2013 in Charlotte, North Carolina. This is the first time the event is being held in Charlotte instead of Seattle and while I have attended previous Summits for many years in various capacities, this year is special as it will be my first time presenting. I hope you will consider joining me this year at PASS Summit! For more information and check out the official website at: http://www.sqlpass.org...

Read More

cprice1979

MapReduce Ninja Moves: Combiners, Shuffle & Doing A Sort

5/20/2013 by cprice1979  -  Comments: 0  -  Views: [547]

Who's driving this car? At first glance it appears that as a developer, you have very little if no control over how MapReduce behaves. In some regards this is an accurate assessment. You have no control over when or where a MapReduce job runs, what data a specific map job will process or which reducer will handle the map's intermediate output. Feeling helpless yet? Don't worry the truth is that despite all that, there are a number of ninja techniques you can use to take control of how data moves...

Read More

cprice1979

Tuning Multi-Dimensional Cube Processing

5/11/2013 by cprice1979  -  Comments: 1  -  Views: [753]

In my last post ( HERE ) we talked about troubleshooting and resolving issues with problematic MDX queries. In this post we will look at techniques to tune and troubleshoot the processing side of your Analysis Services cube. Understanding Cube Processing Some of the common questions I hear as a consultant are "Why does my cube take 4 hours to process?" or "How can I reduce the time it takes to process my cube?". The answer to both of these questions starts with identifying the processing bottlen...

Read More

cprice1979

Troubleshooting MDX Queries

5/11/2013 by cprice1979  -  Comments: 1  -  Views: [903]

In this post I am going to deviate from Hadoop and HDInsight to focus on SQL Server Analysis Services Mutli-dimensional and more specifically MDX queries. As a consultant one of the common issues I encounter more so than design is that of performance. Typically, the performance issues SSAS users encounter occur in one of two realms: cube processing and query execution, while this post will focus on the latter we start by establishing a higher level of understanding of what happens when an XMLA c...

Read More

cprice1979

MapReduce - First Glance

5/4/2013 by cprice1979  -  Comments: 0  -  Views: [362]

In my last post, we took a helicopter tour of the MapReduce framework and its many facets. I believe its important to have a functional understanding of MapReduce even if you never intend to never work directly with it since the more user-friendly abstractions of both Pig and Hive depend on it.  In this post we will again turn to Java as we let our fingers do the walking to build our first MapReduce program. For this demo we will start slowly, implementing first the map and reduce functions...

Read More

cprice1979

Map/Reduce - A Brief Introduction

4/24/2013 by cprice1979  -  Comments: 0  -  Views: [672]

Somewhere between teaching a BI Bootcamp class and wrestling my troop of kids, I promised myself I would get a blog post in this week. Luckily, I've had a few code heavy posts, so we will dial it back slightly as I briefly introduce MapReduce for Hadoop/HDInsight. Most of the MapReduce posts I've seen to date, talk very specifically about how to implement a C# MapReduce job on HDInsight. Before we go there, I think it's a topic that deserves a somewhat more abstract/academic discussion so that w...

Read More

cprice1979

MMM More Bacon - Pig User-Defined Functions (UDFs)

4/20/2013 by cprice1979  -  Comments: 0  -  Views: [571]

Okay...okay...I know...the pig jokes are lame and getting old by now...maybe a picture of a kitten dressed like a Pig will cheer you up. Luckily this is the last of my introductory Pig posts before moving on to MapReduce. In this post we are going to spend some time creating and playing around with Pig User-Defined Functions (UDFs). We will look at what they are, how they are developed and ultimately leveraged as operators within you Pig Latin scripts. So without further ado..... What is a Pig U...

Read More

cprice1979

Moving Day!

4/17/2013 by cprice1979  -  Comments: 0  -  Views: [404]

Wheww! What a year its been....It's been a crazy year from writing books, volunteering and speaking at events through the country (all while still managing to do my regular day job). Now that I've got a handle on things it's time to do a little housekeeping..... That being said....I am in the process of moving my blog over to WordPress. I will continue to "simul-post" my work here but will no longer spend the usual 30+ minutes per post that it takes to tweak the layout. Feel free to check out my...

Read More

cprice1979

Shakin' Bacon: Using Pig To Process Data

4/17/2013 by cprice1979  -  Comments: 0  -  Views: [1139]

In my last post (see HERE ), I introduce the Apache Pig project and showed you the equivalent of the "Hello World" demo in Pig. In this post, we are going to use the GSOD (Global Summary of the Day) station weather reports to calculate the average maximum daily temperature for each station. If you have not loaded the data, please see my previous post on Preparing and Loading data. Notes & Considerations You will need to set-up and use the PiggyBank UDFs (User Defined Functions) library. For ...

Read More

cprice1979

When Pigs Fly: An apache pig introduction

4/15/2013 by cprice1979  -  Comments: 0  -  Views: [572]

In previous posts, we have looked at what it takes to get started with with Hadoop on Windows using HDInsight. We also looked at Hive, which is the data warehousing framework built on top of Hadoop. In this post, we will dig a little deeper into the Hadoop Ecosystem focusing in on the parallel language and runtime known as Pig. Pig, More than just bacon Pig got its start at Yahoo in 2006, originally created as a research tool intended to allow for ad-hoc queries and exploration of large semi-str...

Read More

cprice1979

Preparing Data for Hadoop

4/12/2013 by cprice1979  -  Comments: 0  -  Views: [516]

In my next couple of blog entries, I will be focusing on PIG and then MapReduce. Before that however, I need to prepare a dataset and get it loaded in HDFS. The data that I will be working with is weather data, specifically the NOAA Global Summary of the Day (GSOD) data available for over 9,000 weather stations. GSOD data can be downloaded from the NOAA ftp site using the following address: ftp://ftp.ncdc.noaa.gov/pub/data/gsod . For this demo, I am only going to focus on a single full year's wo...

Read More

cprice1979

Being Productive with HDInsight

4/9/2013 by cprice1979  -  Comments: 0  -  Views: [513]

This post will be the holding place where I put misc. tools and tips for HDInsight Build Tools 1. Apache ANT ( http://ant.apache.org/manual/install.html ) Extract archive to c:\ant\ then modify the classpath to include Ant: set ANT_HOME=c:\ant set PATH=%PATH%;%ANT_HOME%\bin 2. Apache IVY ( http://ant.apache.org/ivy/history/latest-milestone/install.html ) Copy Ivy.JAR to Ant lib folder 3. Git Client ( http://git-scm.com/downloads ) Data Preparation/Research Tools 1. CURL ( http://curl.haxx.se/dow...

Read More

cprice1979

HIVE on HDInsight: First Glance

4/7/2013 by cprice1979  -  Comments: 0  -  Views: [735]

Hive Introduction Within the Hadoop ecosystem, you can use HDFS to load and store data and MapReduce to do both simple and hardcore processing. One of the missing pieces to the puzzle that is familiar to data warehousing professionals is the ability to interact with the data. Enter HIVE. Hive got is start at Facebook as they struggled to deal with the massive quantity of data accumulating daily within their Hadoop cluster. While it was easy for developers to write MapReduce jobs in a variety of ...

Read More

cprice1979

Installing Mahout for HDInsight on Windows Server

4/4/2013 by cprice1979  -  Comments: 0  -  Views: [791]

I am passionate when it comes to analytics, data mining and machine learning and I think most organizations do too little when it comes to this arena. That's why one of my favorite parts of the Hadoop ecosystem is Mahout. Mahout is a scalable machine learning library that includes multiple out of the box machine learning and data mining algorithms including clustering, classification, collaborative filtering and frequent pattern mining. If you are using HDInsight in the cloud Mahout comes pre-in...

Read More

cprice1979

Installing HDInsight

4/4/2013 by cprice1979  -  Comments: 0  -  Views: [767]

It's been a while since I've had the opportunity to blog so when I decided to install HDInsight on a VM, I figured what better opportunity to get back in the swing of it. The Jumping Off Point To get things started, I am using VirtualBox as my VM host and I am running a fully patched (all 150+ of them) version of Windows Server 2008R2. Not that its relevant, but to be thorough, I've also installed SQL Server 2012 SP 1 as it will be used in subsequent blogs. A Tale of Two Installers Before we div...

Read More

cprice1979

MDS 2012: Part 6–Business Rules

7/17/2012 by cprice1979  -  Comments: 2  -  Views: [3703]

In Part 6 of this Master Data Services blog series we will look at how we enforce quality standards and ensure accuracy in our master data by implementing business rules. In the prior parts to this series, we have spent time reviewing important Master Data concepts and MDM architectures. We also looked at configuring MDS before learning about the model, entities, attributes and members. In our last post we looked at derived and explicit hierarchies before being introduced to collections. Series ...

Read More

cprice1979

MDS 2012: Part 5–Hierarchies and Collections

7/11/2012 by cprice1979  -  Comments: 0  -  Views: [3509]

In continuing this blog post series on Master Data Services 2012, we will dial in on MDS Hierarchies and Collections. So far in this series we have spent time reviewing important Master Data concepts and MDM architectures. We also looked at configuring MDS. Finally, in the last post we started to dig deeper into MDS by looking at Models, Entities, Attributes and Members. Series Index Part 1 - Understanding Master Data Part 2 - Master Data Management Architectures Part 3 - Installing Master Data ...

Read More

cprice1979

Build Configurations in SSIS 2012

7/8/2012 by cprice1979  -  Comments: 1  -  Views: [10700]

Although not new in SSIS 2012, Build Configurations have become exponentially more useful with the introduction of parameters and the new project deployment model. Before we dive in to see how useful this feature is, let's take a moment to review parameters and the project deployment model. Parameters Parameters are a new feature intended to replace and simplify configuration of SSIS packages when running under the new project deployment model. They are treated like read-only variables and have ...

Read More

cprice1979

MDS 2012: Part 4–Models, Entities, Attributes & Members

6/29/2012 by cprice1979  -  Comments: 0  -  Views: [4207]

It's been more than a month since my last post and unfortunately this crazy thing called work kept me away for far too long. We will build some momentum and get back on track in this post. In the first three posts of this series we spent time to build a foundation for understanding important master data concepts and architectures. We also spent time to set-up and configure Master Data Services. In this post, we will get to the meat of MDS 2012. Specifically, we will dial in on the model, entitie...

Read More

cprice1979

MDS 2012: Part 3–Installing Master Data Services

4/23/2012 by cprice1979  -  Comments: 0  -  Views: [8338]

In the first two parts of this blog series we spent time talking about master data, master data management (MDM) and the architectural patterns that are prevalent in MDM solutions. In this post we will start narrowing our focus to Master Data Services (MDS) in SQL Server 2012 by starting with the installation and set-up process. Pre-Requisites Before we get started there are a couple of pre-requisites to be aware of. The first and one which will cause you problems in numerous places but will not...

Read More