Supported Input Formats
lidc detects the language and character encoding of textual data and handles different input formats in which a text may be given: plain text, XML, HTML and email. The table below provides on overview on all supported input formats.
| Format | MIME-Type | Description |
|---|---|---|
| Text | text/plain | plain text |
| XML | application/xml | any valid XML-document |
| HTML | text/html | X-HTML, HTML (4.0,...) |
| text/plain | according to RFC 2045-2049 | |
| text/html | according to RFC 2045-2049 | |
| multipart/mixed | according to RFC 2045-2049 | |
| multipart/alternative | according to RFC 2045-2049 | |
| multipart/digest | according to RFC 2045-2049 | |
| message/rfc822 | according to RFC 2045-2049 | |
| multipart/parallel | according to RFC 2045-2049 | |
| multipart/related | according to RFC 2387 | |
| multipart/report | according to RFC 3462 | |
| multipart/signed | according to RFC 1847 | |
| - | according to RFC 822 |
For further information, have a look at lidc's software specification.
Support for additional formats may be added as the development of lidc proceeds. However, if you need a specific format supported, feel free to contact us so we can support it.


