posted 11/12/2009 by MikeDavis - Views: [3161]
When creating a data flow in SQL Integration Services, it is debatable to place a set of derived columns in a single series path or to use a conditional split to break them up to perform the derived columns only on the rows that need it. So I set up a little test to determine the difference. I created a flat file with over 200,000 rows and i wanted to use three derived columns to change the color column to the hex value. I could have placed all of the code in one derived column, but the point is to compare parallel to series so I used three derived columns instead. This way the two data flows are doing the exact same thing.
Here is my parallel dataflow.
Here is my series dataflow.
I ran the parallel dataflow and got a time of 4.212 seconds.
I ran the series dataflow and got a time of 3.323 seconds.
I didn't realize this when i used many times and tested now... You are right Mike, Series dataflows are better in performance. Thanks
It greatly depends on the transformations used that will determine the performance. In this case, it seems to make sense as Mike stated since the Union All transformation is semi-blocking. However, if in the control flow, you will more than likely notice gains when running in parallel rather than series. Of course this depends on many other factors, but generally speaking it will be the case.