Character sets revisited

When I wrote the article on “Enhanced character set support in DCMTK“, I mentioned in the last sentence that this was only a first step. Now, after a year and a half, it’s time to describe what has been done since then.

First of all, we added support for recently approved DICOM character sets, i.e. GBK and GB2312 for Chinese text encoding, which have been introduced with CP-1234. Then, the API of the “dcmdata” classes has been extended by not only allowing for the conversion of a DICOM file or dataset to UTF-8 (Unicode) but to any character set that is supported, e.g. to ISO 8859-1 (Latin-1) or even ASCII. This new feature is also available to the command line tool “dcmconv“.

In addition, the OFCharacterEncoding class has been extended by some helper functions that convert a given character string between the Windows-specific wide character encoding (UTF-16) and a given code page (in both directions). These functions are only available on Windows systems, but they don’t require a special character encoding library like “libiconv“, which might not be an option in some cases because of its license.

Also the classes handling the command line parsing have been extended in this regard. Now, on a Windows system, command line parameters and options can be specified by wide character encoding (if the tool is compiled accordingly). For all other operating systems (especially, Linux), there is no need for this, because UTF-8 perfectly fits into a conventional array of 8-bit characters.

Of course, all this is only available when using the current snapshot or the latest development version of the DCMTK.

Update (2013-07-26)

The current status of character set support is now documented in the DCMTK support Wiki.

This entry was posted in DICOM, English and tagged , , , , . Bookmark the permalink.

Leave a Reply