mlpack
Functions
preprocess_describe_main.cpp File Reference
#include <mlpack/prereqs.hpp>
#include <mlpack/core/util/io.hpp>
#include <mlpack/core/util/mlpack_main.hpp>
#include <boost/format.hpp>
#include <boost/lexical_cast.hpp>
Include dependency graph for preprocess_describe_main.cpp:

Functions

 BINDING_NAME ("Descriptive Statistics")
 
 BINDING_SHORT_DESC ("A utility for printing descriptive statistics about a dataset. This " "prints a number of details about a dataset in a tabular format.")
 
 BINDING_LONG_DESC ("This utility takes a dataset and prints out the descriptive statistics " "of the data. Descriptive statistics is the discipline of quantitatively " "describing the main features of a collection of information, or the " "quantitative description itself. The program does not modify the original " "file, but instead prints out the statistics to the console. The printed " "result will look like a table." "\" "Optionally, width and precision of the output can be adjusted by a user " "using the "+PRINT_PARAM_STRING("width")+" and "+PRINT_PARAM_STRING("precision")+" parameters. A user can also select a " "specific dimension to analyze if there are too many dimensions. The "+PRINT_PARAM_STRING("population")+" parameter can be specified when the " "dataset should be considered as a population. Otherwise, the dataset " "will be considered as a sample.")
 
 BINDING_EXAMPLE ("So, a simple example where we want to print out statistical facts about " "the dataset "+PRINT_DATASET("X")+" using the default settings, we " "could run " "\"+PRINT_CALL("preprocess_describe", "input", "X", "verbose", true)+"\" "If we want to customize the width to 10 and precision to 5 and consider " "the dataset as a population, we could run" "\"+PRINT_CALL("preprocess_describe", "input", "X", "width", 10, "precision", 5, "verbose", true))
 
 BINDING_SEE_ALSO ("@preprocess_binarize", "#preprocess_binarize")
 
 BINDING_SEE_ALSO ("@preprocess_imputer", "#preprocess_imputer")
 
 BINDING_SEE_ALSO ("@preprocess_split", "#preprocess_split")
 
 PARAM_MATRIX_IN_REQ ("input", "Matrix containing data,", "i")
 
 PARAM_INT_IN ("dimension", "Dimension of the data. Use this to specify a " "dimension", "d", 0)
 
 PARAM_INT_IN ("precision", "Precision of the output statistics.", "p", 4)
 
 PARAM_INT_IN ("width", "Width of the output table.", "w", 8)
 
 PARAM_FLAG ("population", "If specified, the program will calculate statistics " "assuming the dataset is the population. By default, the program will " "assume the dataset as a sample.", "P")
 
 PARAM_FLAG ("row_major", "If specified, the program will calculate statistics " "across rows, not across columns. (Remember that in mlpack, a column " "represents a point, so this option is generally not necessary.)", "r")
 
double SumNthPowerDeviations (const arma::rowvec &input, const double &fMean, size_t n)
 Calculates the sum of deviations to the Nth Power. More...
 
double Skewness (const arma::rowvec &input, const double &fStd, const double &fMean, const bool population)
 Calculates Skewness of the given vector. More...
 
double Kurtosis (const arma::rowvec &input, const double &fStd, const double &fMean, const bool population)
 Calculates excess kurtosis of the given vector. More...
 
double StandardError (const size_t size, const double &fStd)
 Calculates standard error of standard deviation. More...
 

Detailed Description

Author
Keon Kim

Descriptive Statistics Class and binding.

mlpack is free software; you may redistribute it and/or modify it under the terms of the 3-clause BSD license. You should have received a copy of the 3-clause BSD license along with mlpack. If not, see http://www.opensource.org/licenses/BSD-3-Clause for more information.

Function Documentation

◆ Kurtosis()

double Kurtosis ( const arma::rowvec &  input,
const double &  fStd,
const double &  fMean,
const bool  population 
)

Calculates excess kurtosis of the given vector.

Parameters
inputVector that captures a dimension of a dataset
rowStdStandard Deviation of the given vector.
rowMeanMean of the given vector.
Returns
Kurtosis of the given vector.

◆ Skewness()

double Skewness ( const arma::rowvec &  input,
const double &  fStd,
const double &  fMean,
const bool  population 
)

Calculates Skewness of the given vector.

Parameters
inputVector that captures a dimension of a dataset
rowStdStandard Deviation of the given vector.
rowMeanMean of the given vector.
Returns
Skewness of the given vector.

◆ StandardError()

double StandardError ( const size_t  size,
const double &  fStd 
)

Calculates standard error of standard deviation.

Parameters
inputVector that captures a dimension of a dataset
rowStdStandard Deviation of the given vector.
Returns
Standard error of the stanrdard devation of the given vector.

◆ SumNthPowerDeviations()

double SumNthPowerDeviations ( const arma::rowvec &  input,
const double &  fMean,
size_t  n 
)

Calculates the sum of deviations to the Nth Power.

Parameters
inputVector that captures a dimension of a dataset.
rowMeanMean of the given vector.
nDegree of power.
Returns
sum of nth power deviations.