
MARC-8 woes (and a c# utf8->marc8 converter)
Thursday, January 29th, 2009
I might as well not post this because the target audience has to be under 100 people worldwide (of which how many will read this post?) but just in-case, right?
If you don’t know what MARC-8 is, stop reading here.
If you never wanted to convert from UTF-8 *to* MARC-8 (why the hell would you want to anyway?), stop reading here.
If you are happy with MarcEdit’s broken MARC-8 output, stop reading here.
If you are still interested, here is a .NET port inspired by the Perl MARC::Charset package to do unidirectional conversion. Optimised for performance with multi-thread support for multiple input files. Benchmarks to process around 80mb/s of text with 4 threads on a q6600 equiv. The code (that you need to call) should be easy enough to understand. No binaries provided, just build from source.
I might as well not post this because the target audience has to be under 100 people worldwide (of which how many will read this post?) but just in-case, right?
If you don’t know what MARC-8 is, stop reading here.
If you never wanted to convert from UTF-8 *to* MARC-8 (why the hell would you want to anyway?), stop reading here.
If you are happy with MarcEdit’s broken MARC-8 output, stop reading here.
If you are still interested, here is a .NET port inspired by the Perl MARC::Charset package to do unidirectional conversion. Optimised for performance with multi-thread support for multiple input files. Benchmarks to process around 80mb/s of text with 4 threads on a q6600 equiv. The code (that you need to call) should be easy enough to understand. No binaries provided, just build from source.