
MARC-8 woes (and a c# utf8->marc8 converter)
January 29th, 2009I might as well not post this because the target audience has to be under 100 people worldwide (of which how many will read this post?) but just in-case, right?
If you don’t know what MARC-8 is, stop reading here.
If you never wanted to convert from UTF-8 *to* MARC-8 (why the hell would you want to anyway?), stop reading here.
If you are happy with MarcEdit’s broken MARC-8 output, stop reading here.
If you are still interested, here is a .NET port inspired by the Perl MARC::Charset package to do unidirectional conversion. Optimised for performance with multi-thread support for multiple input files. Benchmarks to process around 80mb/s of text with 4 threads on a q6600 equiv. The code (that you need to call) should be easy enough to understand. No binaries provided, just build from source.
Any desire to convert this to an on-demand marc8 to utf8 converter? I have a byte[] (presumably in marc8) that I cannot find an appropriate System.Text.Encoding conversion for.
P.S. Your math test doesn’t work in firefox.
I looked at your class and rejected it. Your math test still sucks. And PERL::CHARSET RULEZ
Spam test isn’t actually mine, but I believe there was a problem in the code and have updated to a newer version (thanks for pointing it out).
As for the conversion code I have posted above, I since made a few bug fixes to it that may affect the ability to read some encodings.
@Tobin: It would be great if you could provide a reason for “rejecting” it.
That said, I am no longer working on an area that requires dealing with MARC8 (thank god) so apart from providing the bug fixes I mentioned on request (drop me a mail at pe@ppy.sh) this should be considered unsupported code.