Product Information
lidc detects the language and character encoding of text fast and reliably.
The application supports a variety of input formats, namely email, HTML, XML and plain text. Supporting these common formats, lidc has wide areas of application. The field of applications is even larger with a most flexible output handling. Using a common format string, you may customize the results exactly to your needs. Provided as a command line application lidc makes it possible to implement recurring processes automatically and thereby enhance effectiveness.
Features
lidc is a flexible tool for language detection that will support you in fulfilling your work.
- easy to use and with a clear and intuitive set of options.
- allows efficient usage within a pipe to handle even complex processes.
- supports a variety of input formats.
- offers you a maximum of flexibility in customizing the results.
- allows automated processing (i.e. within shell scripts).
- detects language and character encoding with high accuracy - and even very short input of about five words is in most cases sufficient to detect the language correctly.
- able to detect the language of texts available in transliterated form - lots of common transliterations are supported.
- identifies a great variety of character encodings and supports all common Unicode encodings (except for email).
Get a first impression of the language detection in the online demonstration of the underlying library lid and see how easily you can use lidc in the example of use.
Your Benefits
lidc is suitable for a wide range of applications. The benefits depend on the purpose lidc is used for. Get an impression of possible purposes and benefits with the following three examples.
Example 1: Databases
Store your textual data along with information on its character encoding to the database. This additional information ensures that the entries may be extracted and displayed correctly upon retrieval in any case.
Example 2: Email Tagging and Routing
Integrate lidc in your processing of emails and add the information on the language to your emails (e.g. "X-Language" tag). This language detection allows you to route emails to the appropriate person or department directly. The information may as well be used to enhance existing spam filtering solutions.
Example 3 Collecting Corpora
In case you need a lot of textual data to develop your software, lidc may help to enhance automated collection of corpora. Detect language and character encoding of the collected data and use this information to tag and sort your resources. Collecting huge amounts of data is eased significantly this way.
Supported Platforms
lidc is provided for several Unix operating systems in their native package format.
| Operating System | Distribution/Version | Architecture |
|---|---|---|
| Linux | Debian Etch (4.0) | x86/IA-32 |
| Linux | Debian Lenny (5.0) | x86/IA-32 |
| Linux | Ubuntu LTS (10.04) | x86/IA-32 |
| Solaris | 10 | Sparc |
| FreeBSD | 6 | x86/IA-32 |
| FreeBSD | 7 | x86/IA-32 |
| FreeBSD | 8 | x86/IA-32 |
If you need the software for another operating system or distribution do not hesitate to contact us.
Requirements
There are very little requirements as lidc does not need much resources and only depends on the native C library of the respective operating system.
- C library
- 250 KiB RAM
- 1.5 MB disk space
Decisive for the use of RAM is the input's size: the larger the input passed to lidc the more memory should be available.
All technical details are summed up in the software specification.


