You are reading an archived copy of my old blog. Please visit blog.ppy.sh for fresh stuff!

Archive for January 29th, 2009

h1

MARC-8 woes (and a c# utf8->marc8 converter)

Thursday, January 29th, 2009

I might as well not post this because the target audience has to be under 100 people worldwide (of which how many will read this post?) but just in-case, right?

If you don’t know what MARC-8 is, stop reading here.

If you never wanted to convert from UTF-8 *to* MARC-8 (why the hell would you want to anyway?), stop reading here.

If you are happy with MarcEdit’s broken MARC-8 output, stop reading here.

If you are still interested, here is a .NET port inspired by the Perl MARC::Charset package to do unidirectional conversion.  Optimised for performance with multi-thread support for multiple input files.  Benchmarks to process around 80mb/s of text with 4 threads on a q6600 equiv.  The code (that you need to call) should be easy enough to understand.  No binaries provided, just build from source.