Language Identifier / Encoding Identifier
Not only because of increasing globalization, automatically identifying the language of a text is constantly gaining importance. During the early stage of digitalizing many processes, the variety of languages has also led to a vast amount of character encodings, which are often incompatible with each other even if they are used to encode the alphabet of the same language.
The products of the lid family are not only capable of identifying the language in which a text is written, but also of identifying the document's character encoding (charset). As a result, they are able to assist you in creating applications and products, that can handle textual data in a most robust manner: by making this essential information available to your software products or internally used applications.
Lingua-Systems provides a set of products to make robust language and character encoding identification available to your business and/or products:
- lid
-
C/C++ language and encoding identification library
lid reliably identifies the language and character encoding of textual input and even short passages can in most cases be determined accurately. With minimal hardware requirements and a high performance, lid is very effective.
Due to an intuitive interface and no software dependencies, you can easily integrate lid into your software projects or infrastructures. Benefit from our knowledge! - lidc
-
language and encoding identification application
The command line application lidc is based on the lid library. It determines the language and character encoding of textual input fast and accurately. lidc supports a variety of input formats: email, HTML, XML and Plain Text.
- Lingua::Lid (Open Source)
-
Perl interface to the lid language and encoding identification library
This Perl extension provides an intuitive interface to the C/C++ library lid. It allows you to profit from all of lid's benefits within your Perl projects, too.








