|
crawlserv++
[under development]
Application for crawling and analyzing textual content of websites.
|
Namespace for parser classes. More...
Classes | |
| class | Config |
| Configuration for parsers. More... | |
| class | Database |
| Class providing database functionality for parser threads by implementing Wrapper::Database. More... | |
| class | Thread |
| Parser thread. More... | |
Constants | |
| constexpr std::uint8_t | crawlerLoggingVerbose {0} |
| Logging is disabled. More... | |
| constexpr std::uint8_t | generalLoggingDefault {1} |
| Default logging is enabled. More... | |
| constexpr std::uint8_t | generalLoggingExtended {2} |
| Extended logging is enabled. More... | |
| constexpr std::uint8_t | generalLoggingVerbose {3} |
| Verbose logging is enabled. More... | |
| constexpr std::uint8_t | parsingSourceUrl {0} |
| Parse data from the URL of a crawled web page. More... | |
| constexpr std::uint8_t | parsingSourceContent {1} |
| Parse data from the content of a crawled web page. More... | |
| constexpr std::uint64_t | defaultCacheSize {2500} |
| Default cache size. More... | |
| constexpr std::uint32_t | defaultLockS {300} |
| Default URL locking time, in seconds. More... | |
| constexpr std::uint16_t | defaultMaxBatchSize {500} |
| Default maximum number of URLs to be processed in one MySQL query. More... | |
| constexpr std::uint64_t | defaultSleepIdleMs {5000} |
| Default time to wait before checking for new URLs when all URLs have been parsed, in milliseconds. More... | |
| constexpr std::uint64_t | defaultSleepMySqlS {60} |
| Default time to wait before last try to re-connect to MySQL server, in seconds. More... | |
| constexpr auto | maxContentSize {1073741824} |
| Maximum size of database content (= 1 GiB). More... | |
| constexpr auto | maxContentSizeString {"1 GiB"sv} |
| Maximum size of database content as string. More... | |
| constexpr std::uint8_t | updateContentCounterEvery {25} |
| The number of processed contents after which the thread status will be updated. More... | |
Constants for MySQL Queries | |
| constexpr auto | oneAtOnce {1} |
| Process one value at once. More... | |
| constexpr auto | nAtOnce10 {10} |
| Process ten values at once. More... | |
| constexpr auto | nAtOnce100 {100} |
| Process one hundred values at once. More... | |
| constexpr auto | sqlArg1 {1} |
| First argument in a SQL query. More... | |
| constexpr auto | sqlArg2 {2} |
| Second argument in a SQL query. More... | |
| constexpr auto | sqlArg3 {3} |
| Third argument in a SQL query. More... | |
| constexpr auto | sqlArg4 {4} |
| Fourth argument in a SQL query. More... | |
| constexpr auto | sqlArg5 {5} |
| Fifth argument in a SQL query. More... | |
| constexpr auto | sqlArg6 {6} |
| Sixth argument in a SQL query. More... | |
| constexpr auto | parsingTableAlias {"a"sv} |
| Alias, used in SQL queries, for the parsing table. More... | |
| constexpr auto | targetTableAlias {"b"sv} |
| Alias, used in SQL queries, for the target table. More... | |
| constexpr auto | minTargetColumns {4} |
| Minimum number of columns in the target table. More... | |
| constexpr auto | numArgsLockUrl {3} |
| Number of arguments for locking one URL. More... | |
| constexpr auto | minArsgAddUpdateData {5} |
| Minimum number of arguments to add or update a data entry. More... | |
| constexpr auto | numArgsFinishUrl {2} |
| Number of arguments for setting one URL to finished. More... | |
| constexpr auto | maxDateTimeValue {"9999-12-31 23:59:59"sv} |
| The maximum value of a DATETIME in the database. More... | |
Namespace for parser classes.
|
inline |
Logging is disabled.
|
inline |
Default cache size.
|
inline |
Default URL locking time, in seconds.
|
inline |
Default maximum number of URLs to be processed in one MySQL query.
|
inline |
Default time to wait before checking for new URLs when all URLs have been parsed, in milliseconds.
|
inline |
Default time to wait before last try to re-connect to MySQL server, in seconds.
|
inline |
Default logging is enabled.
Referenced by crawlservpp::Module::Parser::Thread::onClear(), crawlservpp::Module::Parser::Thread::onReset(), and crawlservpp::Module::Parser::Thread::onTick().
|
inline |
Extended logging is enabled.
Referenced by crawlservpp::Module::Parser::Thread::onReset(), and crawlservpp::Module::Parser::Thread::onTick().
|
inline |
Verbose logging is enabled.
Referenced by crawlservpp::Module::Parser::Thread::onReset().
|
inline |
Maximum size of database content (= 1 GiB).
Referenced by crawlservpp::Module::Parser::Database::updateTargetTable().
|
inline |
Maximum size of database content as string.
Referenced by crawlservpp::Module::Parser::Database::updateTargetTable().
|
inline |
The maximum value of a DATETIME in the database.
Referenced by crawlservpp::Module::Parser::Database::getLatestContent().
|
inline |
Minimum number of arguments to add or update a data entry.
Referenced by crawlservpp::Module::Parser::Database::updateOrAddEntries().
|
inline |
Minimum number of columns in the target table.
Referenced by crawlservpp::Module::Parser::Database::initTargetTable().
|
inline |
|
inline |
Process one hundred values at once.
Referenced by crawlservpp::Module::Parser::Database::fetchUrls(), crawlservpp::Module::Parser::Database::prepare(), crawlservpp::Module::Parser::Database::setUrlsFinishedIfLockOk(), and crawlservpp::Module::Parser::Database::updateOrAddEntries().
|
inline |
Number of arguments for setting one URL to finished.
Referenced by crawlservpp::Module::Parser::Database::setUrlsFinishedIfLockOk().
|
inline |
Number of arguments for locking one URL.
Referenced by crawlservpp::Module::Parser::Database::fetchUrls().
|
inline |
Process one value at once.
Referenced by crawlservpp::Module::Parser::Database::prepare().
|
inline |
Parse data from the content of a crawled web page.
Referenced by crawlservpp::Module::Parser::Thread::onReset().
|
inline |
Parse data from the URL of a crawled web page.
Referenced by crawlservpp::Module::Parser::Thread::onReset().
|
inline |
Alias, used in SQL queries, for the parsing table.
Referenced by crawlservpp::Module::Parser::Database::updateTargetTable().
|
inline |
First argument in a SQL query.
Referenced by crawlservpp::Module::Parser::Database::fetchUrls(), crawlservpp::Module::Parser::Database::getAllContents(), crawlservpp::Module::Parser::Database::getContentIdFromParsedId(), crawlservpp::Module::Parser::Database::getLatestContent(), crawlservpp::Module::Parser::Database::getLockTime(), crawlservpp::Module::Parser::Database::getNumberOfContents(), crawlservpp::Module::Parser::Database::getUrlLockTime(), crawlservpp::Module::Parser::Database::getUrlPosition(), crawlservpp::Module::Parser::Database::renewUrlLockIfOk(), crawlservpp::Module::Parser::Database::setUrlsFinishedIfLockOk(), crawlservpp::Module::Parser::Database::unLockUrlIfOk(), crawlservpp::Module::Parser::Database::unLockUrlsIfOk(), and crawlservpp::Module::Parser::Database::updateOrAddEntries().
|
inline |
Second argument in a SQL query.
Referenced by crawlservpp::Module::Parser::Database::fetchUrls(), crawlservpp::Module::Parser::Database::getContentIdFromParsedId(), crawlservpp::Module::Parser::Database::getLatestContent(), crawlservpp::Module::Parser::Database::renewUrlLockIfOk(), crawlservpp::Module::Parser::Database::setUrlsFinishedIfLockOk(), crawlservpp::Module::Parser::Database::unLockUrlIfOk(), and crawlservpp::Module::Parser::Database::updateOrAddEntries().
|
inline |
Third argument in a SQL query.
Referenced by crawlservpp::Module::Parser::Database::fetchUrls(), crawlservpp::Module::Parser::Database::renewUrlLockIfOk(), and crawlservpp::Module::Parser::Database::updateOrAddEntries().
|
inline |
Fourth argument in a SQL query.
Referenced by crawlservpp::Module::Parser::Database::renewUrlLockIfOk(), and crawlservpp::Module::Parser::Database::updateOrAddEntries().
|
inline |
Fifth argument in a SQL query.
Referenced by crawlservpp::Module::Parser::Database::updateOrAddEntries().
|
inline |
Sixth argument in a SQL query.
Referenced by crawlservpp::Module::Parser::Database::updateOrAddEntries().
|
inline |
Alias, used in SQL queries, for the target table.
Referenced by crawlservpp::Module::Parser::Database::updateTargetTable().
|
inline |
The number of processed contents after which the thread status will be updated.
Referenced by crawlservpp::Module::Parser::Thread::onReset().