Supported Input Formats

lidc detects the language and character encoding of textual data and handles different input formats in which a text may be given: plain text, XML, HTML and email. The table below provides on overview on all supported input formats.

Format MIME-Type Description
Text text/plain plain text
XML application/xml any valid XML-document
HTML text/html X-HTML, HTML (4.0,...)
Email text/plain according to RFC 2045-2049
text/html according to RFC 2045-2049
multipart/mixed according to RFC 2045-2049
multipart/alternative according to RFC 2045-2049
multipart/digest according to RFC 2045-2049
message/rfc822 according to RFC 2045-2049
multipart/parallel according to RFC 2045-2049
multipart/related according to RFC 2387
multipart/report according to RFC 3462
multipart/signed according to RFC 1847
- according to RFC 822

For further information, have a look at lidc's software specification.

Support for additional formats may be added as the development of lidc proceeds. However, if you need a specific format supported, feel free to contact us so we can support it.