Supported Character Encodings
The Perl extension Lingua::Lid implements an interface to lid - a C/C++ library that currently supports a variety of 35 distinct character encodings. These cover both every modern and common encoding for any given language and a set of legacy encodings.
lid supports all common Unicode Transformation Formats, namely "UTF-8", "UTF-16BE", "UTF-16LE", "UTF-32BE" and "UTF-32LE" - for any language and transliteration!
| Character Encoding | Languages |
|---|---|
| ASCII | Bulgarian (DIN 1460 transliteration), Bulgarian (ISO 9 transliteration), Bulgarian (Streamlined System transliteration), Czech (Common transliteration), Danish, Dutch, English, Estonian, Finnish, French, German, German (Common transliteration), Greek (DIN 31634 transliteration), Greek (Greeklish transliteration), Greek (ISO 843 transliteration), Irish (Gaelic), Italian, Latvian, Lithuanian, Polish (Common transliteration), Portuguese, Romanian (Common transliteration), Slovak (Common transliteration), Slovenian, Slovenian (Common transliteration), Spanish, Swedish |
| Big5 | Mandarin (Chinese) |
| CP 737 | Greek |
| CP 775 | Estonian, Latvian, Lithuanian |
| CP 850 | Danish, Dutch, English, Finnish, French, German, Irish (Gaelic), Italian, Portuguese, Spanish, Swedish |
| CP 852 | Czech, Hungarian, Polish, Romanian, Slovak, Slovenian |
| CP 855 | Bulgarian, Russian |
| CP 866 | Bulgarian, Russian |
| GB2312 | Mandarin (Chinese) |
| ISO-8859-1 | Czech (Common transliteration), Danish, Dutch, English, Finnish, French, German, German (Common transliteration), Greek (Greeklish transliteration), Irish (Gaelic), Italian, Polish (Common transliteration), Portuguese, Romanian (Common transliteration), Slovak (Common transliteration), Slovenian (Common transliteration), Spanish, Swedish |
| ISO-8859-15 | Dutch, Finnish, French, German, Portuguese, Spanish |
| ISO-8859-16 | Hungarian, Italian, Polish, Slovenian |
| ISO-8859-2 | Czech, Hungarian, Polish, Romanian, Slovak, Slovenian |
| ISO-8859-3 | Maltese |
| ISO-8859-4 | Estonian, Latvian, Lithuanian |
| ISO-8859-5 | Bulgarian, Russian |
| ISO-8859-7 | Greek |
| KOI8-R | Bulgarian, Russian |
| KOI8-U | Ukrainian |
| MacCentralEuropean | Czech, Estonian, Hungarian, Latvian, Lithuanian, Polish, Slovak, Slovenian |
| MacCyrillic | Bulgarian, Russian |
| MacGreek | Greek |
| MacRoman | Danish, Dutch, English, Finnish, French, German, Irish (Gaelic), Italian, Portuguese, Spanish, Swedish |
| MacRomanian | Romanian |
| MacUkrainian | Ukrainian |
| UTF-16BE | Bulgarian, Bulgarian (DIN 1460 transliteration), Bulgarian (ISO 9 transliteration), Bulgarian (Streamlined System transliteration), Czech, Czech (Common transliteration), Danish, Dutch, English, Estonian, Finnish, French, German, German (Common transliteration), Greek, Greek (DIN 31634 transliteration), Greek (Greeklish transliteration), Greek (ISO 843 transliteration), Hungarian, Irish (Gaelic), Italian, Latvian, Lithuanian, Maltese, Mandarin (Chinese), Polish, Polish (Common transliteration), Portuguese, Romanian, Romanian (Common transliteration), Russian, Russian (DIN 1460 transliteration), Russian (ISO 9 transliteration), Slovak, Slovak (Common transliteration), Slovenian, Slovenian (Common transliteration), Spanish, Swedish, Ukrainian, Ukrainian (DIN 1460 transliteration), Ukrainian (ISO 9 transliteration) |
| UTF-16LE | Bulgarian, Bulgarian (DIN 1460 transliteration), Bulgarian (ISO 9 transliteration), Bulgarian (Streamlined System transliteration), Czech, Czech (Common transliteration), Danish, Dutch, English, Estonian, Finnish, French, German, German (Common transliteration), Greek, Greek (DIN 31634 transliteration), Greek (Greeklish transliteration), Greek (ISO 843 transliteration), Hungarian, Irish (Gaelic), Italian, Latvian, Lithuanian, Maltese, Mandarin (Chinese), Polish, Polish (Common transliteration), Portuguese, Romanian, Romanian (Common transliteration), Russian, Russian (DIN 1460 transliteration), Russian (ISO 9 transliteration), Slovak, Slovak (Common transliteration), Slovenian, Slovenian (Common transliteration), Spanish, Swedish, Ukrainian, Ukrainian (DIN 1460 transliteration), Ukrainian (ISO 9 transliteration) |
| UTF-32BE | Bulgarian, Bulgarian (DIN 1460 transliteration), Bulgarian (ISO 9 transliteration), Bulgarian (Streamlined System transliteration), Czech, Czech (Common transliteration), Danish, Dutch, English, Estonian, Finnish, French, German, German (Common transliteration), Greek, Greek (DIN 31634 transliteration), Greek (Greeklish transliteration), Greek (ISO 843 transliteration), Hungarian, Irish (Gaelic), Italian, Latvian, Lithuanian, Maltese, Mandarin (Chinese), Polish, Polish (Common transliteration), Portuguese, Romanian, Romanian (Common transliteration), Russian, Russian (DIN 1460 transliteration), Russian (ISO 9 transliteration), Slovak, Slovak (Common transliteration), Slovenian, Slovenian (Common transliteration), Spanish, Swedish, Ukrainian, Ukrainian (DIN 1460 transliteration), Ukrainian (ISO 9 transliteration) |
| UTF-32LE | Bulgarian, Bulgarian (DIN 1460 transliteration), Bulgarian (ISO 9 transliteration), Bulgarian (Streamlined System transliteration), Czech, Czech (Common transliteration), Danish, Dutch, English, Estonian, Finnish, French, German, German (Common transliteration), Greek, Greek (DIN 31634 transliteration), Greek (Greeklish transliteration), Greek (ISO 843 transliteration), Hungarian, Irish (Gaelic), Italian, Latvian, Lithuanian, Maltese, Mandarin (Chinese), Polish, Polish (Common transliteration), Portuguese, Romanian, Romanian (Common transliteration), Russian, Russian (DIN 1460 transliteration), Russian (ISO 9 transliteration), Slovak, Slovak (Common transliteration), Slovenian, Slovenian (Common transliteration), Spanish, Swedish, Ukrainian, Ukrainian (DIN 1460 transliteration), Ukrainian (ISO 9 transliteration) |
| UTF-8 | Bulgarian, Bulgarian (DIN 1460 transliteration), Bulgarian (ISO 9 transliteration), Bulgarian (Streamlined System transliteration), Czech, Czech (Common transliteration), Danish, Dutch, English, Estonian, Finnish, French, German, German (Common transliteration), Greek, Greek (DIN 31634 transliteration), Greek (Greeklish transliteration), Greek (ISO 843 transliteration), Hungarian, Irish (Gaelic), Italian, Latvian, Lithuanian, Maltese, Mandarin (Chinese), Polish, Polish (Common transliteration), Portuguese, Romanian, Romanian (Common transliteration), Russian, Russian (DIN 1460 transliteration), Russian (ISO 9 transliteration), Slovak, Slovak (Common transliteration), Slovenian, Slovenian (Common transliteration), Spanish, Swedish, Ukrainian, Ukrainian (DIN 1460 transliteration), Ukrainian (ISO 9 transliteration) |
| Windows-1250 | Bulgarian (DIN 1460 transliteration), Bulgarian (Streamlined System transliteration), Czech, Hungarian, Polish, Romanian, Slovak, Slovenian |
| Windows-1251 | Bulgarian, Russian, Ukrainian |
| Windows-1252 | Danish, Dutch, English, Finnish, French, German, Irish (Gaelic), Italian, Portuguese, Spanish, Swedish |
| Windows-1253 | Greek |
| Windows-1257 | Estonian, Latvian, Lithuanian |

