You are reading an archived copy of my old blog. Please visit blog.ppy.sh for fresh stuff!

h1

MARC-8 woes (and a c# utf8->marc8 converter)

January 29th, 2009

I might as well not post this because the target audience has to be under 100 people worldwide (of which how many will read this post?) but just in-case, right?

If you don’t know what MARC-8 is, stop reading here.

If you never wanted to convert from UTF-8 *to* MARC-8 (why the hell would you want to anyway?), stop reading here.

If you are happy with MarcEdit’s broken MARC-8 output, stop reading here.

If you are still interested, here is a .NET port inspired by the Perl MARC::Charset package to do unidirectional conversion.  Optimised for performance with multi-thread support for multiple input files.  Benchmarks to process around 80mb/s of text with 4 threads on a q6600 equiv.  The code (that you need to call) should be easy enough to understand.  No binaries provided, just build from source.

3 comments to “MARC-8 woes (and a c# utf8->marc8 converter)”

  1. Any desire to convert this to an on-demand marc8 to utf8 converter? I have a byte[] (presumably in marc8) that I cannot find an appropriate System.Text.Encoding conversion for.

    P.S. Your math test doesn’t work in firefox.


  2. I looked at your class and rejected it. Your math test still sucks. And PERL::CHARSET RULEZ


  3. Spam test isn’t actually mine, but I believe there was a problem in the code and have updated to a newer version (thanks for pointing it out).

    As for the conversion code I have posted above, I since made a few bug fixes to it that may affect the ability to read some encodings.

    @Tobin: It would be great if you could provide a reason for “rejecting” it.

    That said, I am no longer working on an area that requires dealing with MARC8 (thank god) so apart from providing the bug fixes I mentioned on request (drop me a mail at pe@ppy.sh) this should be considered unsupported code.


Leave a Comment