Five SSIS Best Practices you can Take to the Bank

Who is online?  0 guests and 0 members
Home  »  Articles  »  Five SSIS Best Practices you can Take to the Bank

Five SSIS Best Practices you can Take to the Bank

change text size: A A A
Published: 12/16/2009 by  DevinKnight  - Views:  [2891]  

In any Business Intelligence project you intend on taking on it is a good idea to build a set of best practices either for yourself or for your company to follow to ensure a successful project.

Having a list of best practices does several things for you

·         Makes you a better developer by producing a better end product

·         Allows you to standardize development therefore saving time

·         Easily explains why you did what you did, especially useful for consultant handing off a project

Why do I need a best practices list?

The benefits of having a list seem obvious so why do people still not do it?  Lack of time is usually the reason most people give for not having a list or maybe they do have a list just not in writing.  Instead of thinking there is not enough time I want you to think about how much time this will be saving you in the long run.  Spending even a half day a day of work on this list could save you hundreds of hours in your career.  Hopefully after reading this I will have convinced you it is essential to not only have a list but to put it in writing so you have a quick reference guide. 

Best Practices

You may have SSIS best practices that I haven’t included in my list and I would love to hear from you if you do.

1.       Have a good naming conventions

Tasks, Transforms, Packages, Connection Managers and Variables should all be given names different from the default.  You may already do this but be consistent with it. 

Tasks and Transforms

Jamie Thomson has a great best practices blog that includes a list of acronyms for all tasks and transforms.  With his method for each of the tasks and transform you would use the task acronym and then a small description of what the task does.  For example, I have a Data Flow Task that loads my employee table so I would name the task DFT – Load Employees.

The main goal is to make it easier on yourself long term.  If the package you develop fails and you have not changed the tasks names you could get an error that says Script Task 3 failed.  This can be incredibly painful to debug especially if you are taking over the work of another developer. 

Packages

You can go many different ways with this one.  My preferred method is:

 {Stage | Process | Load}_{ Dim | Fact}_{StageTableName | DimensionTableName | FactTableName | ObjectName}

Ex. Load_Dim_Product.dtsx or Process_SalesCube

Again consistently is everything with this one.  I’ve seen others use a namespace approach (Ex. CompanyName.DataWarehouse.LoadDimProduct.dtsx).  However you decide to name your packages make it the same across your entire solution. 

Connection Managers

If you use the same connection in all of your packages then make sure to consistently name it across all those packages.  Here’s my Connection Manager naming convention method:

{ConnectionType}_{DatabaseName | FileName}_[FileType]

Ex. OLEDB_FinanceDW  or FLATFILE_EmployeeExtract_TXT

Variables

Variables are one of the most important objects to have good naming conventions with.  I prefer to use Hungarian notation.  This method identifies the variable data type as part of the name.  The reason it is so important to have this kind of naming convention with variables is because knowing variable data types is essential when working throughout an SSIS package.  For example, if you are building an expression using variables you will likely need to know the data type for the expression to evaluate correctly.  Hungarian notation is formatted like this:

[DataType][VariableName]

Ex. strFullName

2.       Create an auditing framework for your packages

Auditing your SSIS packages is a great way to keep track of information about your package executions.  SSIS has a method of natively logging but unfortunately the native logging is limited to what it provides and the possible storage methods are not optimal for reporting.  The ultimate goal in the end is to be able to report on your package executions.  That’s why I suggest creating your own or buying a tool that provides does it for you.  Here are some of the most important things you should track in your auditing framework. 

Minimum of Things to Track

Whether the package completes with errors, warnings or neither (success)

How long the package took to run

How many rows were loaded

When packages complete with errors list the errors

Luckily this information is easy to find in SSIS.  System variables and row counts can handle most of these.  For example, error messages are stored in the system variable ErrorDescription, which is only available in the EventHandlers. 

Each package run should be tracked separately so you can compare the performance between runs.  I suggest storing the auditing data in a database by itself so your SSIS package is not writing to another database that is essential to your company. 

The last step is to develop a set of reports to display the information you’ve just collected.  Here you should make it easy to get a understanding on how your environment is performing and if you have any errors to address.

I make it sound easy describing it in a couple paragraphs but this can actually involve some very complex logic.  Even a fairly simple Auditing Framework could take up to a week to perfect but once it is complete you have that framework you can use on any project.

3.       Use Annotations

Annotations are probably one of the simplest best practices to follow (Right there with naming conventions).  It is a simple method of documenting what is going on in the package you have developed.  For some reason it is also one of the most frequently ignored. 

Imagine you developed a package 6 months ago and haven’t touched it since.  You never bothered to add annotations to your package.  Now the package fails and you have no idea what the package does anymore.  Spending just a few moments to add this simple documentation could have been a lifesaver.  Even if you know your time is limited at a company it is only good karma to do this not only for yourself but for others.

4.       Utilize Configuration Files on Connections

Configuration files allow your package to dynamically change package properties from outside the package at runtime.  The property values are stored in an XML file and can easily be edited to manipulate how the package runs each time.  They can be used on just about any object in a package but my suggestion is at the very least on each data source. 

The easiest way to show a benefit to using configuration files is how it helps in moving through the development cycle.   For example, you have 50 SSIS packages and you’re ready to move from your development environment to a testing phase.  Instead of opening all 50 of your packages to change data sources to point to your new testing databases you can make the change just once in the configuration file and your done.  When the package runs it will refer to the settings in the file instead of what is in the package. 

Many developers like to use one configuration file for all packages instead of one for each data source.  I discourage doing this because that if you place all your data sources in a single configuration file then regardless of whether a package needs the connection or not it the file now mandates that it is present in the package.  If you do not include them all then you will receive the following error:

For more details on using configuration files download a whitepaper I coauthored on Managing and Configuring SSIS

5.       Use Checkpoints to restart packages

Checkpoints are a great tool in SSIS that many developers go years without even testing.  With Checkpoints enabled on a package it will save the state of the package as it moves through each step or task and place it in a XML file upon failure of the package. If your package does fail you can correct the problem in your package and rerun from the point of the tasks that did not successfully run the first time. Once the package completes successfully the file is no longer needed and automatically discarded.

Just imagine your package is loading a table with 10 million records. Your package passes the Data Flow that performs this huge load without any problem (Other than the fact that it took two hours to load). The next task in your package is a Send Mail Task and for some reason fails.

You correct the problem in the Send Mail Task, but without using Checkpoints your package would still have to run that Data Flow that loads the 10 million records again (taking another two hours) even though you’ve already done it once. If you had enable Checkpoints on this package you could simply correct the problem in the Send Mail Task and then run the package again starting at the Send Mail Task.  Sounds great right? 

For step by step directions on enabling checkpoints read my blog on Enabling Checkpoints in SSIS.

 
0
/5
Avg: 0/5: (0 votes)

Comments (6)

steve_bauman
steve_bauman said:
Thanks Devin for providing naming convention tips and cautions. This is definately an area we need to plan out before we convert some of our data warehouse processes over to SSIS.
12/3/2009
 · 
 
by
bretupdegraff
bretupdegraff said:
Great article Devin - I will be sure to share this with our team -hopefully they are already doing these ;-)
12/16/2009
 · 
 
by
jtarnott
jtarnott said:
We get around the connection string validation problem with a common configuration file by using expressions. One may override parts of the connection string in the "Expressions" section of a connection manager properties page. As I recall, with this technique there may still be a warning about a configuration element not being found in a package if that variable is not defined, but it doesn't stop it from running. If several packages have to be deployed to many separate production environments, each with its own site-specific server name and database name, we can have the packages use a configuration file customized to each site with that data, and concatenate into the run time an additional package-level configuration file for variables specific to the package, a file that may then be identical in all sites.
12/21/2009
 · 
 
by
rupeshjpatel
rupeshjpatel said:
Very useful article., Thanks.
12/21/2009
 · 
 
by
ashishnaik1
ashishnaik1 said:
Great article Devin - We are formalizing standards for SSIS development. This is like just in time!
12/24/2009
 · 
 
by
vinay
vinay said:
Thank you very much Devin..
9/20/2011
 · 
 
by

Most Recent Articles