crawlserv++
[under development]
Application for crawling and analyzing textual content of websites.
|
Namespace for analyzer classes. More...
Namespaces | |
Algo | |
Namespace for algorithm classes. | |
Classes | |
class | Config |
Abstract configuration for analyzers, to be implemented by algorithm classes. More... | |
class | Database |
Class providing database functionality for analyzer threads by implementing Wrapper::Database. More... | |
class | Thread |
Abstract class providing thread functionality to algorithm (child) classes. More... | |
Constants | |
constexpr std::uint8_t | generalInputSourcesParsing {0} |
An analyzer uses a parsing table as data source. More... | |
constexpr std::uint8_t | generalInputSourcesExtracting {1} |
An analyzer uses an extracting table as data source. More... | |
constexpr std::uint8_t | generalInputSourcesAnalyzing {2} |
An analyzer uses an analyzing table as data source. More... | |
constexpr std::uint8_t | generalInputSourcesCrawling {3} |
An analyzer uses a crawling table as data source. More... | |
constexpr std::uint8_t | generalLoggingSilent {0} |
Logging is disabled. More... | |
constexpr std::uint8_t | generalLoggingDefault {1} |
Default logging is enabled. More... | |
constexpr std::uint8_t | generalLoggingExtended {2} |
Extended logging is enabled. More... | |
constexpr std::uint8_t | generalLoggingVerbose {3} |
Verbose logging is enabled. More... | |
constexpr std::int32_t | defaultRestartAfter {-1} |
Default time (in s) after which to restart analysis once it has been completed (-1=deactivated). More... | |
constexpr std::uint64_t | defaultSleepMySqlS {60} |
Default time (in s) to wait before last try to re-connect to MySQL server. More... | |
constexpr std::uint64_t | defaultSleepWhenFinishedMs {5000} |
Default time (in ms) to wait each tick when finished. More... | |
constexpr auto | minPercentageCorpusSlices {1} |
Minimum percentage of the maximum length for corpus slices. More... | |
constexpr auto | maxPercentageCorpusSlices {99} |
Maximum percentage of the maximum length for corpus slices. More... | |
constexpr auto | defaultPercentageCorpusSlices {30} |
Default percentage of the maximum length for corpus slices. More... | |
constexpr auto | defaultFreeMemoryEvery {100000000} |
Default number of processed bytes in a continuous corpus after which memory will be freed. More... | |
constexpr auto | defaultCorpusSlicing {30} |
The default percentage of the maximum package size allowed by the MySQL server to be used for the maximum size of the corpus. More... | |
constexpr auto | corpusSlicingFactor {1.F / 100} |
The factor used for corpus slicing percentage points (1/100). More... | |
constexpr auto | maxNumCorpusColumns {3} |
The maximum number of columns used when creating a text corpus. More... | |
constexpr auto | progressDeletedCorpus {0.05F} |
The progress with creating a corpus after the old corpus has been deleted. More... | |
constexpr auto | progressReceivedSources {0.35F} |
The progress with creating a corpus after the source texts have been received. More... | |
constexpr auto | progressMovedData {0.4F} |
The progress with creating a corpus after the data has been moved. More... | |
constexpr auto | progressCreatedCorpus {0.6F} |
The progress with creating a corpus after the server created the corpus. More... | |
constexpr auto | progressSlicedCorpus {0.65F} |
The progress with creating a corpus after the corpus has been sliced. More... | |
constexpr auto | progressAddingCorpus |
The remaining progress, attributed to adding the corpus to the database. More... | |
constexpr auto | progressReceivedCorpus {0.8F} |
The progress with getting an existing corpus after its contents have been received from the database. More... | |
constexpr auto | progressGeneratedSavePoint {0.1F} |
The progress of saving a savepoint after generating it. More... | |
constexpr auto | progressSavingSavePoint |
The remaining progress, attributed to saving a savepoint to the database. More... | |
constexpr auto | combineUpdateStatusEvery {100000} |
The number of tokens after which the status will be updated when combining corpora. More... | |
Constants for SQL Queries | |
constexpr auto | sqlArg1 {1} |
First argument in a SQL query. More... | |
constexpr auto | sqlArg2 {2} |
Second argument in a SQL query. More... | |
constexpr auto | sqlArg3 {3} |
Third argument in a SQL query. More... | |
constexpr auto | sqlArg4 {4} |
Fourth argument in a SQL query. More... | |
constexpr auto | sqlArg5 {5} |
Fifth argument in a SQL query. More... | |
constexpr auto | sqlArg6 {6} |
Sixth argument in a SQL query. More... | |
constexpr auto | sqlArg7 {7} |
Seventh argument in a SQL query. More... | |
constexpr auto | sqlArg8 {8} |
Eighth argument in a SQL query. More... | |
constexpr auto | sqlArg9 {9} |
Ninth argument in a SQL query. More... | |
constexpr auto | sqlArg10 {10} |
Tenth argument in a SQL query. More... | |
constexpr auto | sqlArg11 {11} |
Eleventh argument in a SQL query. More... | |
constexpr auto | sqlArg12 {12} |
Twelfth argument in a SQL query. More... | |
Constants for Table Columns | |
constexpr auto | column1 {0} |
First column in a table. More... | |
constexpr auto | column2 {1} |
Second column in a table. More... | |
constexpr auto | column3 {2} |
Third column in a table. More... | |
constexpr auto | numColumns1 {1} |
One table column. More... | |
constexpr auto | numColumns2 {2} |
Two table columns. More... | |
Namespace for analyzer classes.
|
inline |
First column in a table.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
Second column in a table.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
Third column in a table.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
constexpr auto crawlservpp::Module::Analyzer::combineUpdateStatusEvery {100000} |
The number of tokens after which the status will be updated when combining corpora.
|
inline |
The factor used for corpus slicing percentage points (1/100).
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
The default percentage of the maximum package size allowed by the MySQL server to be used for the maximum size of the corpus.
|
inline |
Default number of processed bytes in a continuous corpus after which memory will be freed.
|
inline |
Default percentage of the maximum length for corpus slices.
Referenced by crawlservpp::Module::Analyzer::Config::checkOptions().
|
inline |
Default time (in s) after which to restart analysis once it has been completed (-1=deactivated).
|
inline |
Default time (in s) to wait before last try to re-connect to MySQL server.
|
inline |
Default time (in ms) to wait each tick when finished.
|
inline |
An analyzer uses an analyzing table as data source.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources(), crawlservpp::Module::Analyzer::Database::getSourceColumnName(), crawlservpp::Module::Analyzer::Database::getSourceTableName(), and crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit().
|
inline |
An analyzer uses a crawling table as data source.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources(), crawlservpp::Module::Analyzer::Database::getSourceColumnName(), crawlservpp::Module::Analyzer::Database::getSourceTableName(), and crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit().
|
inline |
An analyzer uses an extracting table as data source.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources(), crawlservpp::Module::Analyzer::Database::getSourceColumnName(), crawlservpp::Module::Analyzer::Database::getSourceTableName(), and crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit().
|
inline |
An analyzer uses a parsing table as data source.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources(), crawlservpp::Module::Analyzer::Database::getSourceColumnName(), crawlservpp::Module::Analyzer::Database::getSourceTableName(), and crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit().
|
inline |
Default logging is enabled.
Referenced by crawlservpp::Module::Analyzer::Thread::addCorpora(), crawlservpp::Module::Analyzer::Algo::TopicModelling::checkAlgoOptions(), crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::Thread::finished(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AllTokens::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AllTokens::onAlgoTick(), crawlservpp::Module::Analyzer::Thread::onReset(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AllTokens::resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo(), and crawlservpp::Module::Analyzer::Thread::uploadResult().
|
inline |
Extended logging is enabled.
Referenced by crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AllTokens::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Empty::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), and crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo().
|
inline |
Logging is disabled.
|
inline |
Verbose logging is enabled.
Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AllTokens::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), and crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo().
|
inline |
The maximum number of columns used when creating a text corpus.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
Maximum percentage of the maximum length for corpus slices.
Referenced by crawlservpp::Module::Analyzer::Config::checkOptions().
|
inline |
Minimum percentage of the maximum length for corpus slices.
Referenced by crawlservpp::Module::Analyzer::Config::checkOptions().
|
inline |
One table column.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
Two table columns.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
The remaining progress, attributed to adding the corpus to the database.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
The progress with creating a corpus after the server created the corpus.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
The progress with creating a corpus after the old corpus has been deleted.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
The progress of saving a savepoint after generating it.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
The progress with creating a corpus after the data has been moved.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
The progress with getting an existing corpus after its contents have been received from the database.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
The progress with creating a corpus after the source texts have been received.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
The remaining progress, attributed to saving a savepoint to the database.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
The progress with creating a corpus after the corpus has been sliced.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
First argument in a SQL query.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources(), and crawlservpp::Module::Analyzer::Database::updateAdditionalTable().
|
inline |
Tenth argument in a SQL query.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
Eleventh argument in a SQL query.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
Twelfth argument in a SQL query.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
Second argument in a SQL query.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
Third argument in a SQL query.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
Fourth argument in a SQL query.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
Fifth argument in a SQL query.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
Sixth argument in a SQL query.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
Seventh argument in a SQL query.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
Eighth argument in a SQL query.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().
|
inline |
Ninth argument in a SQL query.
Referenced by crawlservpp::Module::Analyzer::Database::checkSources().