crawlserv++  [under development]
Application for crawling and analyzing textual content of websites.
crawlservpp::Module::Analyzer Namespace Reference

Namespace for analyzer classes. More...

Namespaces

 Algo
 Namespace for algorithm classes.
 

Classes

class  Config
 Abstract configuration for analyzers, to be implemented by algorithm classes. More...
 
class  Database
 Class providing database functionality for analyzer threads by implementing Wrapper::Database. More...
 
class  Thread
 Abstract class providing thread functionality to algorithm (child) classes. More...
 

Constants

constexpr std::uint8_t generalInputSourcesParsing {0}
 An analyzer uses a parsing table as data source. More...
 
constexpr std::uint8_t generalInputSourcesExtracting {1}
 An analyzer uses an extracting table as data source. More...
 
constexpr std::uint8_t generalInputSourcesAnalyzing {2}
 An analyzer uses an analyzing table as data source. More...
 
constexpr std::uint8_t generalInputSourcesCrawling {3}
 An analyzer uses a crawling table as data source. More...
 
constexpr std::uint8_t generalLoggingSilent {0}
 Logging is disabled. More...
 
constexpr std::uint8_t generalLoggingDefault {1}
 Default logging is enabled. More...
 
constexpr std::uint8_t generalLoggingExtended {2}
 Extended logging is enabled. More...
 
constexpr std::uint8_t generalLoggingVerbose {3}
 Verbose logging is enabled. More...
 
constexpr std::int32_t defaultRestartAfter {-1}
 Default time (in s) after which to restart analysis once it has been completed (-1=deactivated). More...
 
constexpr std::uint64_t defaultSleepMySqlS {60}
 Default time (in s) to wait before last try to re-connect to MySQL server. More...
 
constexpr std::uint64_t defaultSleepWhenFinishedMs {5000}
 Default time (in ms) to wait each tick when finished. More...
 
constexpr auto minPercentageCorpusSlices {1}
 Minimum percentage of the maximum length for corpus slices. More...
 
constexpr auto maxPercentageCorpusSlices {99}
 Maximum percentage of the maximum length for corpus slices. More...
 
constexpr auto defaultPercentageCorpusSlices {30}
 Default percentage of the maximum length for corpus slices. More...
 
constexpr auto defaultFreeMemoryEvery {100000000}
 Default number of processed bytes in a continuous corpus after which memory will be freed. More...
 
constexpr auto defaultCorpusSlicing {30}
 The default percentage of the maximum package size allowed by the MySQL server to be used for the maximum size of the corpus. More...
 
constexpr auto corpusSlicingFactor {1.F / 100}
 The factor used for corpus slicing percentage points (1/100). More...
 
constexpr auto maxNumCorpusColumns {3}
 The maximum number of columns used when creating a text corpus. More...
 
constexpr auto progressDeletedCorpus {0.05F}
 The progress with creating a corpus after the old corpus has been deleted. More...
 
constexpr auto progressReceivedSources {0.35F}
 The progress with creating a corpus after the source texts have been received. More...
 
constexpr auto progressMovedData {0.4F}
 The progress with creating a corpus after the data has been moved. More...
 
constexpr auto progressCreatedCorpus {0.6F}
 The progress with creating a corpus after the server created the corpus. More...
 
constexpr auto progressSlicedCorpus {0.65F}
 The progress with creating a corpus after the corpus has been sliced. More...
 
constexpr auto progressAddingCorpus
 The remaining progress, attributed to adding the corpus to the database. More...
 
constexpr auto progressReceivedCorpus {0.8F}
 The progress with getting an existing corpus after its contents have been received from the database. More...
 
constexpr auto progressGeneratedSavePoint {0.1F}
 The progress of saving a savepoint after generating it. More...
 
constexpr auto progressSavingSavePoint
 The remaining progress, attributed to saving a savepoint to the database. More...
 
constexpr auto combineUpdateStatusEvery {100000}
 The number of tokens after which the status will be updated when combining corpora. More...
 

Constants for SQL Queries

constexpr auto sqlArg1 {1}
 First argument in a SQL query. More...
 
constexpr auto sqlArg2 {2}
 Second argument in a SQL query. More...
 
constexpr auto sqlArg3 {3}
 Third argument in a SQL query. More...
 
constexpr auto sqlArg4 {4}
 Fourth argument in a SQL query. More...
 
constexpr auto sqlArg5 {5}
 Fifth argument in a SQL query. More...
 
constexpr auto sqlArg6 {6}
 Sixth argument in a SQL query. More...
 
constexpr auto sqlArg7 {7}
 Seventh argument in a SQL query. More...
 
constexpr auto sqlArg8 {8}
 Eighth argument in a SQL query. More...
 
constexpr auto sqlArg9 {9}
 Ninth argument in a SQL query. More...
 
constexpr auto sqlArg10 {10}
 Tenth argument in a SQL query. More...
 
constexpr auto sqlArg11 {11}
 Eleventh argument in a SQL query. More...
 
constexpr auto sqlArg12 {12}
 Twelfth argument in a SQL query. More...
 

Constants for Table Columns

constexpr auto column1 {0}
 First column in a table. More...
 
constexpr auto column2 {1}
 Second column in a table. More...
 
constexpr auto column3 {2}
 Third column in a table. More...
 
constexpr auto numColumns1 {1}
 One table column. More...
 
constexpr auto numColumns2 {2}
 Two table columns. More...
 

Detailed Description

Namespace for analyzer classes.

Variable Documentation

◆ column1

constexpr auto crawlservpp::Module::Analyzer::column1 {0}
inline

First column in a table.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ column2

constexpr auto crawlservpp::Module::Analyzer::column2 {1}
inline

Second column in a table.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ column3

constexpr auto crawlservpp::Module::Analyzer::column3 {2}
inline

Third column in a table.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ combineUpdateStatusEvery

constexpr auto crawlservpp::Module::Analyzer::combineUpdateStatusEvery {100000}

The number of tokens after which the status will be updated when combining corpora.

◆ corpusSlicingFactor

constexpr auto crawlservpp::Module::Analyzer::corpusSlicingFactor {1.F / 100}
inline

The factor used for corpus slicing percentage points (1/100).

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ defaultCorpusSlicing

constexpr auto crawlservpp::Module::Analyzer::defaultCorpusSlicing {30}
inline

The default percentage of the maximum package size allowed by the MySQL server to be used for the maximum size of the corpus.

◆ defaultFreeMemoryEvery

constexpr auto crawlservpp::Module::Analyzer::defaultFreeMemoryEvery {100000000}
inline

Default number of processed bytes in a continuous corpus after which memory will be freed.

◆ defaultPercentageCorpusSlices

constexpr auto crawlservpp::Module::Analyzer::defaultPercentageCorpusSlices {30}
inline

Default percentage of the maximum length for corpus slices.

Referenced by crawlservpp::Module::Analyzer::Config::checkOptions().

◆ defaultRestartAfter

constexpr std::int32_t crawlservpp::Module::Analyzer::defaultRestartAfter {-1}
inline

Default time (in s) after which to restart analysis once it has been completed (-1=deactivated).

◆ defaultSleepMySqlS

constexpr std::uint64_t crawlservpp::Module::Analyzer::defaultSleepMySqlS {60}
inline

Default time (in s) to wait before last try to re-connect to MySQL server.

◆ defaultSleepWhenFinishedMs

constexpr std::uint64_t crawlservpp::Module::Analyzer::defaultSleepWhenFinishedMs {5000}
inline

Default time (in ms) to wait each tick when finished.

◆ generalInputSourcesAnalyzing

constexpr std::uint8_t crawlservpp::Module::Analyzer::generalInputSourcesAnalyzing {2}
inline

◆ generalInputSourcesCrawling

constexpr std::uint8_t crawlservpp::Module::Analyzer::generalInputSourcesCrawling {3}
inline

◆ generalInputSourcesExtracting

constexpr std::uint8_t crawlservpp::Module::Analyzer::generalInputSourcesExtracting {1}
inline

◆ generalInputSourcesParsing

constexpr std::uint8_t crawlservpp::Module::Analyzer::generalInputSourcesParsing {0}
inline

◆ generalLoggingDefault

constexpr std::uint8_t crawlservpp::Module::Analyzer::generalLoggingDefault {1}
inline

◆ generalLoggingExtended

◆ generalLoggingSilent

constexpr std::uint8_t crawlservpp::Module::Analyzer::generalLoggingSilent {0}
inline

Logging is disabled.

◆ generalLoggingVerbose

◆ maxNumCorpusColumns

constexpr auto crawlservpp::Module::Analyzer::maxNumCorpusColumns {3}
inline

The maximum number of columns used when creating a text corpus.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ maxPercentageCorpusSlices

constexpr auto crawlservpp::Module::Analyzer::maxPercentageCorpusSlices {99}
inline

Maximum percentage of the maximum length for corpus slices.

Referenced by crawlservpp::Module::Analyzer::Config::checkOptions().

◆ minPercentageCorpusSlices

constexpr auto crawlservpp::Module::Analyzer::minPercentageCorpusSlices {1}
inline

Minimum percentage of the maximum length for corpus slices.

Referenced by crawlservpp::Module::Analyzer::Config::checkOptions().

◆ numColumns1

constexpr auto crawlservpp::Module::Analyzer::numColumns1 {1}
inline

◆ numColumns2

constexpr auto crawlservpp::Module::Analyzer::numColumns2 {2}
inline

◆ progressAddingCorpus

constexpr auto crawlservpp::Module::Analyzer::progressAddingCorpus
inline
Initial value:
{
}
constexpr auto progressSlicedCorpus
The progress with creating a corpus after the corpus has been sliced.
Definition: Database.hpp:104

The remaining progress, attributed to adding the corpus to the database.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ progressCreatedCorpus

constexpr auto crawlservpp::Module::Analyzer::progressCreatedCorpus {0.6F}
inline

The progress with creating a corpus after the server created the corpus.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ progressDeletedCorpus

constexpr auto crawlservpp::Module::Analyzer::progressDeletedCorpus {0.05F}
inline

The progress with creating a corpus after the old corpus has been deleted.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ progressGeneratedSavePoint

constexpr auto crawlservpp::Module::Analyzer::progressGeneratedSavePoint {0.1F}
inline

The progress of saving a savepoint after generating it.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ progressMovedData

constexpr auto crawlservpp::Module::Analyzer::progressMovedData {0.4F}
inline

The progress with creating a corpus after the data has been moved.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ progressReceivedCorpus

constexpr auto crawlservpp::Module::Analyzer::progressReceivedCorpus {0.8F}
inline

The progress with getting an existing corpus after its contents have been received from the database.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ progressReceivedSources

constexpr auto crawlservpp::Module::Analyzer::progressReceivedSources {0.35F}
inline

The progress with creating a corpus after the source texts have been received.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ progressSavingSavePoint

constexpr auto crawlservpp::Module::Analyzer::progressSavingSavePoint
inline
Initial value:
{
}
constexpr auto progressGeneratedSavePoint
The progress of saving a savepoint after generating it.
Definition: Database.hpp:115

The remaining progress, attributed to saving a savepoint to the database.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ progressSlicedCorpus

constexpr auto crawlservpp::Module::Analyzer::progressSlicedCorpus {0.65F}
inline

The progress with creating a corpus after the corpus has been sliced.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ sqlArg1

constexpr auto crawlservpp::Module::Analyzer::sqlArg1 {1}
inline

◆ sqlArg10

constexpr auto crawlservpp::Module::Analyzer::sqlArg10 {10}
inline

Tenth argument in a SQL query.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ sqlArg11

constexpr auto crawlservpp::Module::Analyzer::sqlArg11 {11}
inline

Eleventh argument in a SQL query.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ sqlArg12

constexpr auto crawlservpp::Module::Analyzer::sqlArg12 {12}
inline

Twelfth argument in a SQL query.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ sqlArg2

constexpr auto crawlservpp::Module::Analyzer::sqlArg2 {2}
inline

Second argument in a SQL query.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ sqlArg3

constexpr auto crawlservpp::Module::Analyzer::sqlArg3 {3}
inline

Third argument in a SQL query.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ sqlArg4

constexpr auto crawlservpp::Module::Analyzer::sqlArg4 {4}
inline

Fourth argument in a SQL query.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ sqlArg5

constexpr auto crawlservpp::Module::Analyzer::sqlArg5 {5}
inline

Fifth argument in a SQL query.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ sqlArg6

constexpr auto crawlservpp::Module::Analyzer::sqlArg6 {6}
inline

Sixth argument in a SQL query.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ sqlArg7

constexpr auto crawlservpp::Module::Analyzer::sqlArg7 {7}
inline

Seventh argument in a SQL query.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ sqlArg8

constexpr auto crawlservpp::Module::Analyzer::sqlArg8 {8}
inline

Eighth argument in a SQL query.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().

◆ sqlArg9

constexpr auto crawlservpp::Module::Analyzer::sqlArg9 {9}
inline

Ninth argument in a SQL query.

Referenced by crawlservpp::Module::Analyzer::Database::checkSources().