crawlserv++
[under development]
Application for crawling and analyzing textual content of websites.
|
Multilingual POS (part of speech) tagger using Wapiti
by Thomas Lavergne.
More...
#include <Tagger.hpp>
Classes | |
class | Exception |
POS (part of speech)-tagging exception. More... | |
Construction and Destruction | |
Tagger ()=default | |
Default constructor. More... | |
virtual | ~Tagger () |
Destructor freeing the POS-tagging model, if one has been loaded. More... | |
Getter | |
static constexpr std::string_view | getVersion () |
Gets the underlying version of wapiti. More... | |
Setters | |
void | setPureMaxEntMode (bool isPureMaxEntMode) |
Sets whether the pure maxent mode of Wapiti is enabled. More... | |
void | setPosteriorDecoding (bool isPosteriorDecoding) |
Sets whether posterior decoding is used instead of the classical Viterbi encoding . More... | |
void | setPartlyLabeledInput (bool isPartlyLabeledInput) |
Sets whether the input is already partly labelled. More... | |
Model and Tagging | |
void | loadModel (const std::string &modelFile) |
Loads a POS-tagging model trained by using Wapiti . More... | |
void | label (std::vector< std::string >::iterator sentenceBegin, std::vector< std::string >::iterator sentenceEnd) |
POS (part of speech)-tags a sentence. More... | |
Copy and Move | |
Tagger (Tagger &)=delete | |
Deleted copy constructor. More... | |
Tagger & | operator= (Tagger &)=delete |
Deleted copy assignment operator. More... | |
Tagger (Tagger &&)=default | |
Default move constructor. More... | |
Tagger & | operator= (Tagger &&)=default |
Default move assignment operator. More... | |
Multilingual POS (part of speech) tagger using Wapiti
by Thomas Lavergne.
Based on a minimized version of Wapiti
.
Source: https://github.com/Jekub/Wapiti
Paper: Lavergne, Thomas / Cappe, Olivier / Yvon, François: Practical Very Large Scale CRFs, in: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, 11–16 July 2010, pp. 504–513.
Use the original wapiti program to train models for the tagger.
See its homepage for more information.
|
default |
Default constructor.
|
inlinevirtual |
Destructor freeing the POS-tagging model, if one has been loaded.
|
delete |
Deleted copy constructor.
|
default |
Default move constructor.
|
inlinestatic |
Gets the underlying version of wapiti.
|
inline |
POS (part of speech)-tags a sentence.
The tags will be added to each token of the specified sentence, separated by a space.
See the manual of Wapiti
for more information.
sentenceBegin | Iterator pointing to the beginning of the sentence to be tagged. |
sentenceEnd | Iterator pointing to the end of the sentence to be tagged. |
Tagger::Exception | if an error occurs while POS-tagging the sentence. |
|
inline |
Loads a POS-tagging model trained by using Wapiti
.
See the manual of Wapiti
for more information.
modelFile | Name (including path) of the model file to be used. |
Tagger::Exception | if the model file cannot be opened, or if the model cannot be loaded. |
Default move assignment operator.
|
inline |
Sets whether the input is already partly labelled.
Already existing labels will be kept used to improve the POS tagging of the remaining tokens.
The labels need to be separated from the tokens by either a space or a tabulator.
See the manual of Wapiti
for more information.
|
inline |
Sets whether posterior decoding is used instead of the classical Viterbi encoding .
See the manual of Wapiti
for more information.
|
inline |
Sets whether the pure maxent mode of Wapiti
is enabled.
See the manual of Wapiti
for more information.
isPureMaxEntMode | Set to true to enable the pure maxent mode of Wapiti . |