Man page of lid_ffile(3), lid_fstr(3), lid_fnstr(3) and lid_fwstr(3)
Index
NAME
lid_ffile, lid_fstr, lid_fwstr, lid_fnstr - identify language and encoding of textual input
SYNOPSIS
C/C++ #include <lid.h> lid_t * lid_ffile(const char *file); lid_t * lid_fstr(const char *str); lid_t * lid_fwstr(const wchar_t *wstr); lid_t * lid_fnstr(const char *bstr, size_t len);
DESCRIPTION
The functions lid_ffile(), lid_fstr(), lid_fwstr() and lid_fnstr() identify language and encoding of their input. A list of supported languages and encodings is provided in the user manual and the software specification.
lid_ffile() reads its input from the file specified by file.
lid_fstr() uses the character string pointed to by str as an input, while lid_fwstr() handles a wide character string pointed to by wstr.
lid_fnstr() processes the input of the byte string pointed to by str assuming a length of len bytes. You have to pay special attention to assure that len is within the memory boundaries of str, because there is no way for lid_fnstr() to do so. In contrast to lid_fstr() this function handles NUL characters and is thus able to process UTF-16 and UTF-32 encoded strings.
RETURN VALUE
lid_ffile(), lid_fstr(), lid_fwstr() and lid_fnstr() return a pointer to a lid_t structure, which is defined as follows:
C/C++ typedef struct lid { char *language; char *encoding; char *isocode; } lid_t;
This data structure holds the results determined from the input and consists of:
- language
-
The identified language's name in English, i.e. "German".
- encoding
-
The identified encoding, i.e. "UTF-8".
- isocode
-
The identified language's ISO 639-3 code, i.e. "deu".
The memory pointed to for the result's structure lid_t should be freed using lid_free(3) if not needed anymore.
ERRORS
If an error occurred, the functions return a pointer to NULL and set the pseudo-variable lid_errno(3) to an appropriate value.
If additionally a natural language message describing the error is wanted, the function lid_strerror(3) can be used.
For convenience, named constants (of type lid_errno_t) may be used for case dependent error handling. See lid_errno(3) for details.
EXAMPLES
The following example of an application, lid-example.c, which is included in the distribution, takes a set of filenames as command line arguments and uses lid_ffile() to identify their language and encoding. Error checks are done, the results are printed and the memory used by the result's data structures is freed using lid_free(3).
C/C++ #include <stdio.h> #include <lid.h> int main (int argc, char *argv[]) { lid_t *res = NULL; int i = 0; for (i = 1; i < argc; i++) { res = lid_ffile(argv[i]); if (res == NULL) { fprintf(stderr, "%s: %s\n", argv[i], lid_strerror(lid_errno)); return 1; } printf("%s: lang=%s, enc=%s, iso=%s\n", argv[i], res->language, res->encoding, res->isocode); lid_free(res); } return 0; }
Here is the output of an example execution of the application:
$ ./lid_example /tmp/english.txt /tmp/german.txt /dev/null /tmp/english.txt: lang=English, enc=ASCII, iso=eng /tmp/german.txt: lang=German, enc=UTF-8, iso=deu /dev/null: Insufficient input length.
CAVEATS
- length
-
The input should consist of at least 25 characters.
- encoding
-
lid_fstr() is not able to handle character strings that are encoded using embedded NUL characters (i.e. UTF-16/UTF-32), because it cannot determine their length accurately. lid_fnstr() should be used instead.
- format
-
Only input in plain text can be processed.
NOTES
- -
-
All functions provided by lid are thread-safe and can be used by more than one thread at a time.
- -
-
Any call of either lid_ffile(), lid_fstr(), lid_fwstr() or lid_fnstr() resets the calling thread's lid_errno to LID_NOERR ("No error").
- -
-
At compile time, the library's version can be determined using the macro LID_VERSION_STRING, which expands to the quoted version string, i.e. "3.0.0". To determine the library's version at runtime, use either lid_version(3) or lid_version_string(3).
SEE ALSO
lid_free(3), lid_errno(3), lid_strerror(3), lid_version(3), lid_version_string(3)
lid User Manual, lid Software Specification
http://www.lingua-systems.com/language-identifier/lid-library/
COPYRIGHT
Copyright (c) 2008-2010 Lingua-Systems Software GmbH


