Man page of Lingua::Lid(3)
Index
- NAME
- SYNOPSIS
- DESCRIPTION
- EXPORTS
- FUNCTIONS
- IDENTIFICATION RESULTS DATA STRUCTURE
- ERROR HANDLING
- COMPARISON TO THE C INTERFACE
- EXAMPLES
- BUGS
- SEE ALSO
- AUTHOR
- COPYRIGHT AND LICENSE
NAME
Lingua::Lid - Interface to the language and encoding identifier "lid"
SYNOPSIS
Perl use Lingua::Lid qw/:all/; # Identify the language and character encoding of... # ...a string $result = lid_fstr("This is a short English sentence."); # ...a plain text file $result = lid_ffile("/path/to/a/file.txt"); print "Lingua::Lid v$Lingua::Lid::VERSION, using lid v", lid_version(), "\n";
DESCRIPTION
The Perl extension Lingua::Lid provides a Perl interface to Lingua-Systems' language and character encoding identification library lid, which is required to build and use this extension.
The interface is implemented using the XS language and makes the functionality of the lid C library functions available to Perl applications and modules in a simple to use way.
This man page covers the usage of the Lingua::Lid Perl extension only - for more information on lid and a list on supported languages and character encodings, have a look at its manual, which is both included in its distribution and freely available under http://www.lingua-systems.com/language-identifier/lid-library/.
Lingua::Lid aims to stick with the C interface as close as reasonable - but with respect to common Perl conventions. Have a look at "COMPARISON TO THE C INTERFACE" for details.
EXPORTS
No symbols are exported by default.
Any function needed must either be requested for import explicitly or the
export tag :all may be used to import symbols for all provided functions:
Perl use Lingua::Lid qw/lid_ffile lid_fstr/; # or use Lingua::Lid qw/:all/;
FUNCTIONS
lid_fstr( $string )
Mnemonic: "Language and encoding identification... from string"
This function takes a $string as an argument and identifies its language
and encoding.
It returns a hash reference containing the results. See
IDENTIFICATION RESULTS DATA STRUCTURE for details.
If an error occurs, the function returns undef and sets
$Lingua::Lid::errstr to an appropriate message describing the error.
lid_ffile( $file )
Mnemonic: "Language and encoding identification... from file"
This function takes a plain text $file's path as an argument and identifies
its language and encoding. It returns a hash reference containing the
results. See IDENTIFICATION RESULTS DATA STRUCTURE for details.
If an error occurs, the function returns undef and sets
$Lingua::Lid::errstr to an appropriate message describing the error.
lid_version( )
This function returns the version of the underlying lid C library.
IDENTIFICATION RESULTS DATA STRUCTURE
The functions lid_fstr() and lid_ffile() return a hash reference containing the results of the language and encoding identification.
The hash reference contains the following keys:
- language
-
The language's name (in English), i.e. "German", "French", "English".
- isocode
-
The language's ISO 639-3 code, i.e. "deu", "fra", "eng".
- encoding
-
The character encoding, i.e. "UTF-8", "ISO-8859-1", "UTF-32BE".
$result = {
'language' => 'English',
'isocode' => 'eng',
'encoding' => 'ASCII'
};
ERROR HANDLING
The functions lid_fstr() and lid_ffile() return undef if an error occurs
and set Lingua::Lid's package variable $errstr ($Lingua::Lid::errstr)
to an appropriate message describing the error.
Have a look at lid's manual for a list of all error messages.
- NOTE:
-
The
$Lingua::Lid::errstrvariable is reset toundefwhenever lid_fstr() or lid_ffile() are called.
COMPARISON TO THE C INTERFACE
Lingua::Lid's function lid_fstr() and lid_ffile() behave exactly as their lid counterparts in C.
The C functions lid_fnstr() and lid_fwstr() are not needed, use the Lingua::Lid function lid_fstr() in any Perl code instead.
The C function lid_strerror() and the global C variable lid_errno are not
needed. Rather than returning a pointer to NULL, Lingua::Lid's
lid_fstr() and lid_ffile() return undef on errors and set
$Lingua::Lid::errstr to an appropriate message describing the error.
The C define LID_VERSION is not available in Lingua::Lid, use
lid_version() instead.
Lingua::Lid's results data structure sticks to the C lid_t * structure
as close as possible. See "IDENTIFICATION RESULTS DATA STRUCTURE" above.
EXAMPLES
Perl use strict; use Lingua::Lid qw/lid_fstr lid_version/; print "Lingua::Lid v$Lingua::Lid::VERSION, using lid v", lid_version(), "\n"; my @strings = ( "This is a short English sentence.", "Dies ist ein kurzer deutscher Satz.", "Too short." ); foreach my $string (@strings) { if (my $r = lid_fstr($string)) { print join(" - ", $r->{language}, $r->{isocode}, $r->{encoding}), "\n"; } else { print "lid_fstr() failed: $Lingua::Lid::errstr\n"; } }
The program above produces the following output:
Lingua::Lid v0.01, using lid v2.0.2 English - eng - ASCII German - deu - ASCII lid_fstr() failed: Insufficient input length
BUGS
None known.
Please report bugs either using CPAN's bug tracker or to <perl@lingua-systems.com>.
SEE ALSO
- Lingua::Lid's website: http://www.lingua-systems.com/language-identifier/Lingua-Lid-Perl-extension/
- lid's website: http://www.lingua-systems.com/language-identifier/lid-library/
- lid's manual (available in English and German)
AUTHOR
Alex Linke, <alinke@lingua-systems.com>
COPYRIGHT AND LICENSE
Copyright (C) 2009 Lingua-Systems Software GmbH
This extension is free software. It may be used, redistributed and/or modified under the terms of the zlib license. For details, see the full text of the license in the file LICENSE.









