crawlserv++  [under development]
Application for crawling and analyzing textual content of websites.
crawlservpp::Data::Tagger Class Reference

Multilingual POS (part of speech) tagger using Wapiti by Thomas Lavergne. More...

#include <Tagger.hpp>

Classes

class  Exception
 POS (part of speech)-tagging exception. More...
 

Construction and Destruction

 Tagger ()=default
 Default constructor. More...
 
virtual ~Tagger ()
 Destructor freeing the POS-tagging model, if one has been loaded. More...
 

Getter

static constexpr std::string_view getVersion ()
 Gets the underlying version of wapiti. More...
 

Setters

void setPureMaxEntMode (bool isPureMaxEntMode)
 Sets whether the pure maxent mode of Wapiti is enabled. More...
 
void setPosteriorDecoding (bool isPosteriorDecoding)
 Sets whether posterior decoding is used instead of the classical Viterbi encoding . More...
 
void setPartlyLabeledInput (bool isPartlyLabeledInput)
 Sets whether the input is already partly labelled. More...
 

Model and Tagging

void loadModel (const std::string &modelFile)
 Loads a POS-tagging model trained by using Wapiti. More...
 
void label (std::vector< std::string >::iterator sentenceBegin, std::vector< std::string >::iterator sentenceEnd)
 POS (part of speech)-tags a sentence. More...
 

Copy and Move

The class is not copyable, only (default) moveable.

 Tagger (Tagger &)=delete
 Deleted copy constructor. More...
 
Taggeroperator= (Tagger &)=delete
 Deleted copy assignment operator. More...
 
 Tagger (Tagger &&)=default
 Default move constructor. More...
 
Taggeroperator= (Tagger &&)=default
 Default move assignment operator. More...
 

Detailed Description

Multilingual POS (part of speech) tagger using Wapiti by Thomas Lavergne.

Based on a minimized version of Wapiti.

Source: https://github.com/Jekub/Wapiti

Paper: Lavergne, Thomas / Cappe, Olivier / Yvon, François: Practical Very Large Scale CRFs, in: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, 11–16 July 2010, pp. 504–513.

Use the original wapiti program to train models for the tagger.

See its homepage for more information.

Constructor & Destructor Documentation

◆ Tagger() [1/3]

crawlservpp::Data::Tagger::Tagger ( )
default

Default constructor.

◆ ~Tagger()

crawlservpp::Data::Tagger::~Tagger ( )
inlinevirtual

Destructor freeing the POS-tagging model, if one has been loaded.

◆ Tagger() [2/3]

crawlservpp::Data::Tagger::Tagger ( Tagger )
delete

Deleted copy constructor.

◆ Tagger() [3/3]

crawlservpp::Data::Tagger::Tagger ( Tagger &&  )
default

Default move constructor.

Member Function Documentation

◆ getVersion()

constexpr std::string_view crawlservpp::Data::Tagger::getVersion ( )
inlinestatic

Gets the underlying version of wapiti.

Returns
The version of wapiti, on which the POS (part of speech) tagger is built.

◆ label()

void crawlservpp::Data::Tagger::label ( std::vector< std::string >::iterator  sentenceBegin,
std::vector< std::string >::iterator  sentenceEnd 
)
inline

POS (part of speech)-tags a sentence.

The tags will be added to each token of the specified sentence, separated by a space.

See the manual of Wapiti for more information.

Parameters
sentenceBeginIterator pointing to the beginning of the sentence to be tagged.
sentenceEndIterator pointing to the end of the sentence to be tagged.
Exceptions
Tagger::Exceptionif an error occurs while POS-tagging the sentence.

◆ loadModel()

void crawlservpp::Data::Tagger::loadModel ( const std::string &  modelFile)
inline

Loads a POS-tagging model trained by using Wapiti.

See the manual of Wapiti for more information.

Parameters
modelFileName (including path) of the model file to be used.
Exceptions
Tagger::Exceptionif the model file cannot be opened, or if the model cannot be loaded.

◆ operator=() [1/2]

Tagger& crawlservpp::Data::Tagger::operator= ( Tagger )
delete

Deleted copy assignment operator.

◆ operator=() [2/2]

Tagger& crawlservpp::Data::Tagger::operator= ( Tagger &&  )
default

Default move assignment operator.

◆ setPartlyLabeledInput()

void crawlservpp::Data::Tagger::setPartlyLabeledInput ( bool  isPartlyLabeledInput)
inline

Sets whether the input is already partly labelled.

Already existing labels will be kept used to improve the POS tagging of the remaining tokens.

The labels need to be separated from the tokens by either a space or a tabulator.

See the manual of Wapiti for more information.

◆ setPosteriorDecoding()

void crawlservpp::Data::Tagger::setPosteriorDecoding ( bool  isPosteriorDecoding)
inline

Sets whether posterior decoding is used instead of the classical Viterbi encoding .

See the manual of Wapiti for more information.

Note
Posterior decoding is slower, but more accurate.

◆ setPureMaxEntMode()

void crawlservpp::Data::Tagger::setPureMaxEntMode ( bool  isPureMaxEntMode)
inline

Sets whether the pure maxent mode of Wapiti is enabled.

See the manual of Wapiti for more information.

Parameters
isPureMaxEntModeSet to true to enable the pure maxent mode of Wapiti.

The documentation for this class was generated from the following file: