Man page of auc_nconv(3) and auc_conv(3)

Index


NAME

auc_conv, auc_nconv - automatically convert text to Unicode

SYNOPSIS

C/C++ #include <auc.h>

 auc_bytes_t * auc_conv(const char *str,
                        auc_utf_t utf,
                        auc_flag_t flags)
 auc_bytes_t * auc_nconv(const char *bstr,
                         size_t blen,
                         auc_utf_t utf,
                         auc_flag_t flags)

DESCRIPTION

AutoUniConv provides functions that automatically detect and convert text from a variety of charsets to one of the common Unicode Transformation Formats.

auc_conv() automatically converts plain C strings that may contain text encoded in all supported 8-bit charsets. The function takes a pointer to a plain C string str, the desired Unicode Transformation Format utf and a specification of flags as an argument.

auc_nconv() automatically converts byte strings that may contain text encoded in any supported charset. It should be preferred at least whenever a string could contain UTF-16 and/or UTF-32 encoded text or the length of the byte string is already known. The function takes a pointer to a byte string bstr, its length blen (excluding NUL termination), the desired Unicode Transformation Format utf and a specification of flags as an argument.

Whenever the functions are invoked, auc_errno(3) is reset to AUC_NOERR.

UNICODE TRANSFORMATION FORMATS (auc_utf_t)

auc_utf_t provides named constants of all Unicode Transformation Formats supported by AutoUniConv. The set comprises the following constants:

AUC_UTF8

UTF-8

AUC_UTF16LE

UTF-16LE

AUC_UTF16BE

UTF-16BE

AUC_UTF32LE

UTF-32LE

AUC_UTF32BE

UTF-32BE

FLAGS TO ALTER MODE OF OPERATION (auc_flag_t)

AutoUniConv provides a set of named constant flags that are evaluated by auc_conv() and auc_nconv(). These flags allow to alter the functions' mode of operation and may be combined with each other to suite the user's requirements best.

The set comprises the following constants:

AUC_DEFAULT

Default mode of operation. In this mode, the functions attempts to replace characters that could not be decoded with a predefined placeholder, the tilde character ("~"). No error will be generated in this case and no warnings will be printed to stderr either.

AUC_STRICT

Require the functions to terminate on the first decoding error that may occur.

AUC_WARN

Print a warning to stderr whenever a decoding error occurs.

A combination of flags may be achieved by simply adding them to another (i.e. "AUC_STRICT + AUC_WARN").

RETURN VALUE (auc_bytes_t)

Both auc_conv() and auc_nconv() return an auc_bytes_t data structure, which comprises the byte string bytes encoded in the requested Unicode Transformation Format, the byte string's length len and a specification of the used format utf.

The data structure is defined as follows:

C/C++ typedef struct
 {
     char     *bytes;      /* byte string */
     size_t    len;        /* its length  */
     auc_utf_t utf;        /* used UTF    */
 } auc_bytes_t;

The memory allocated by an auc_bytes_t structure should be freed using auc_free_bytes_t(3).

If an error occurs during processing, the functions return a pointer to NULL and set auc_errno(3) to an appropriate value.

For in depth information on AutoUniConv's error handling facilities, have a look at auc_errno(3).

SUPPORTED CHARSETS

For a list of all supported charsets, please have a look at the User Manual.

NOTES

auc_conv() and auc_nconv() are thread-safe.

SEE ALSO

auc_free_bytes_t(3), auc_errno(3), auc_strerror(3), auc_version(3), auc_version_string(3), auc_utf_t_to_name(3)

AutoUniConv User Manual, AutoUniConv Software Specification

http://www.lingua-systems.com/unicode-converter/autouniconv-library/