Business Intelligence Blogs

View blogs by industry experts on topics such as SSAS, SSIS, SSRS, Power BI, Performance Tuning, Azure, Big Data and much more! You can also sign up to post your own business intelligence blog.

SSIS–Um…. How To Preserve Umlauts from UTF-8 Flat File in Düssledorf? (or anywhere else for that matter…)

  • 3 May 2012
  • Author: Mike Milligan
  • Number of views: 7680
  • 0 Comments

Recently I had the pleasure of pulling in a flat file encoded in UTF-8 format into a SQL Server 2008 table using SSIS.  I'd like to share my experience in the hopes that it will help someone else with a similar issue.  It might also help me in the event I come across the situation again and forget how to get past it.

 

The first indication of a problem was the following error message: 

The data conversion for column "Column 37" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page."

The SSIS output also told me what row of data was causing the problem:

" on data row 1415."   (search for "on data row" if you need to find out what row your process failed.)

I verified that the column was wide enough so I opened up EditPadLite and loaded the offending file.  I pressed control G to go to a specific row and saw that the row had a column with the text: Düssledorf.  Without going into a bunch of detail of all the things I tried; but, didn't work, I will explain what finally allowed me to bring Düssledorf into the table with umlaut preserved.

These are the types of errors I was getting as I struggled to EDIT the components:

The column "Column 2" cannot be processed because more than one code page (65001 and 1252) are specified for it.

Codepage in output column 0 is 1252, but has to be 65001.

1.  EditPadLite: clicked Convert, Text encoding from menu.

image

 

This told me the file's encoding was Unicode UTF-8.

image

 

The next steps have to be followed exactly and in the correct order.  There is no editing about it.  If you already have ANY of these three objects they MUST be deleted and re-created from scratch:  Flat File connection manager, flat file source, OLE DB Destination.  Seriously, no joke.

Note:  Before you delete your Destination you may want to sort the columns by the destination column and take screenshots if your incoming columns are listed as column 1, column 2, etc. because you will have to remap them.

 1.  Create your flat file connection manager using code page 65001 (UTF-8) and on the advanced tab change the datatype property of any columns that contain umlauts to unicode string dt_wstr or dt_ntext for Unicode text stream.

 2.  Create your Flat file source component

 3.  Create your ole db destination component.  Change defaultCodePage property to 65001 and AlwaysUseDefaultCodePage to True.  Hook your source to destination and do your mappings.  (Your target column needs to be nvarchar or another unicode capable data type.)

 Sounds simple; but, believe me if one step is out of order you simply can't edit it and you will go out of your mind trying to.  You HAVE to do it in the exact order I have described.

More info

Best Fit in WideCharToMultiByte and System.Text.Encoding Should be Avoided.

BIDS 2008 R2 inserts phantom codepage(s) into SSIS components irrelevant to format/locales/codepages of data and software used (blocking the SSIS tasks

Flatfile Import: Persistent data conversion errors

Print
Categories: Blogs
Tags:
Rate this article:
No rating

Mike MilliganMike Milligan

Other posts by Mike Milligan

Please login or register to post comments.