Man page of lid_ffile(3), lid_fstr(3), lid_fnstr(3) and lid_fwstr(3)
Index
NAME
lid_ffile, lid_fstr, lid_fwstr, lid_fnstr - determine language and encoding of textual input from a variety of sources
SYNOPSIS
C/C++ #include <lid.h> lid_t * lid_ffile(const char *file); lid_t * lid_fstr(const char *str); lid_t * lid_fwstr(const wchar_t *wstr); lid_t * lid_fnstr(const char *bstr, size_t len);
DESCRIPTION
The functions lid_ffile(), lid_fstr(), lid_fwstr() and lid_fnstr() determine language and encoding of their input. A list of supported languages and encodings is provided in the user manual and the software specification.
lid_ffile() reads its input from the file specified by file.
lid_fstr() uses the character string pointed to by str as an input, while lid_fwstr() handles a wide character string pointed to by wstr.
lid_fnstr() processes the input of the byte string pointed to by str for the length of len bytes. You have to pay special attention to assure that len is within the memory boundaries of str, because there is no way for lid_fnstr() to do so. In contrast to lid_fstr() this function handles NUL characters and is thus able to process UTF-16 and UTF-32 encoded strings.
RETURN VALUE
lid_ffile(), lid_fstr(), lid_fwstr() and lid_fnstr() return a pointer to a lid_t structure, which is defined as follows:
C/C++ typedef struct lid { char *language; char *encoding; char *isocode; } lid_t;
This data structure holds the results determined from the input and consists of:
- language
-
The determined language's name in English, i.e. "German".
- encoding
-
The determined encoding, i.e. "UTF-8".
- isocode
-
The determined language's ISO 639-3 code, i.e. "deu".
The memory pointed to for the result's structure lid_t should be freed using lid_free(3) if not needed anymore.
ERRORS
If an error occurred, the functions return a pointer to NULL and set the global error indicator lid_errno(3) to an appropriate value.
If additionally a natural language message describing the error is wanted, the function lid_strerror(3) can be used.
For convenience, macros can be used instead of the numeric error indicators. The following macros are defined:
- LID_ENOERR
-
No error/clear state
- LID_ENOMEM
-
Memory allocation failed
- LID_EFOPEN
-
Error opening an input file
- LID_EFCLOSE
-
Error closing an input file
- LID_EFIO
-
File IO error
- LID_EMATH
-
Math error
- LID_ESHORT
-
Input too short
- LID_EUDEC
-
UTF decoding failed
- LID_EUENC
-
UTF encoding failed
- LID_EUINV
-
Invalid UTF sequence
- LID_EWCCONV
-
Wide character conversion error
- LID_EBINARY
-
Binary data input
- LID_EARG
-
Invalid argument
- LID_EUNDEF
-
Undefined error
EXAMPLES
The following example of an application, lid_example, which is included in the distribution, takes a set of filenames as command line arguments and uses lid_ffile() to determine their language and encoding. Error checks are done, the results are printed and the memory used by the result's data structures is freed using lid_free(3).
C/C++ #include <stdio.h> #include <lid.h> int main (int argc, char *argv[]) { lid_t *res = NULL; int i = 0; for (i = 1; i < argc; i++) { res = lid_ffile(argv[i]); if (res == NULL) { fprintf(stderr, "%s: %s\n", argv[i], lid_strerror(lid_errno)); return 1; } printf("%s: lang=%s, enc=%s, iso=%s\n", argv[i], res->language, res->encoding, res->isocode); lid_free(res); } return 0; }
Here is the output of an example execution of the application:
$ ./lid_example /tmp/english.txt /tmp/german.txt /dev/null /tmp/english.txt: lang=English, enc=ASCII, iso=eng /tmp/german.txt: lang=German, enc=UTF-8, iso=deu /dev/null: Insufficient input length.
CAVEATS
- length
-
The input length has to reach a minimum size, which is about 25 characters.
- encoding
-
lid_fstr() is not able to handle character strings that are encoded using NUL characters (UTF-16/UTF-32), because it cannot determine their length accurately. lid_fnstr() should be used instead.
- format
-
Only input in plain text can be processed.
NOTES
The library's version is defined as the macro LID_VERSION, which expands to the quoted version string, i.e. "2.0.2".
SEE ALSO
lid_free(3), lid_strerror(3)
liblid User Manual, liblid Software Specification
"The CERT C Secure Coding Standard", "ERR05-C", p. 549ff.
COPYRIGHT AND LICENSE
Copyright (c) 2008-2009 Lingua-Systems Software GmbH










