Recently I had the pleasure of pulling in a flat file encoded in UTF-8 format into a SQL Server 2008 table using SSIS. I'd like to share my experience in the hopes that it will help someone else with a similar issue. It might also help me in the event I come across the situation again and forget how to get past it.
The first indication of a problem was the following error message:
The data conversion for column "Column 37" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page."
The SSIS output also told me what row of data was causing the problem:
" on data row 1415." (search for "on data row" if you need to find out what row your process failed.)
I verified that the column was wide enough so I opened up EditPadLite and loaded the offending file. I pressed control G to go to a specific row and saw that the row had a column with the text: Düssledorf. Without going into a bunch of detail of all the things I tried; but, didn't work, I will explain what finally allowed me to bring Düssledorf into the table with umlaut preserved.
These are the types of errors I was getting as I struggled to EDIT the components:
The column "Column 2" cannot be processed because more than one code page (65001 and 1252) are specified for it.
Codepage in output column 0 is 1252, but has to be 65001.
1. EditPadLite: clicked Convert, Text encoding from menu.
This told me the file's encoding was Unicode UTF-8.
The next steps have to be followed exactly and in the correct order. There is no editing about it. If you already have ANY of these three objects they MUST be deleted and re-created from scratch: Flat File connection manager, flat file source, OLE DB Destination. Seriously, no joke.
Note: Before you delete your Destination you may want to sort the columns by the destination column and take screenshots if your incoming columns are listed as column 1, column 2, etc. because you will have to remap them.
1. Create your flat file connection manager using code page 65001 (UTF-8) and on the advanced tab change the datatype property of any columns that contain umlauts to unicode string dt_wstr or dt_ntext for Unicode text stream.
2. Create your Flat file source component
3. Create your ole db destination component. Change defaultCodePage property to 65001 and AlwaysUseDefaultCodePage to True. Hook your source to destination and do your mappings. (Your target column needs to be nvarchar or another unicode capable data type.)
Sounds simple; but, believe me if one step is out of order you simply can't edit it and you will go out of your mind trying to. You HAVE to do it in the exact order I have described.
Best Fit in WideCharToMultiByte and System.Text.Encoding Should be Avoided.
BIDS 2008 R2 inserts phantom codepage(s) into SSIS components irrelevant to format/locales/codepages of data and software used (blocking the SSIS tasks
Flatfile Import: Persistent data conversion errors