mlpack

Writing an mlpack binding

This tutorial gives some simple examples of how to write an mlpack binding that can be compiled for multiple languages. These bindings make up the core of how most users will interact with mlpack.

mlpack provides the following:

Each of those classes are well-documented, and that documentation in the source code should be consulted for further reference.

First, we’ll discuss the logging infrastructure, which is useful for giving output that users can see.

🔗 Simple logging example

mlpack has four logging levels:

Output to Log::Debug does not show (and has no performance penalty) when mlpack is compiled without debugging symbols. Output to Log::Info is only shown when the program is run with the verbose option (for a command-line binding, this is --verbose or -v). Log::Warn is always shown, and Log::Fatal will throw a std::runtime_error exception, after a newline is sent to it. If mlpack was compiled with debugging symbols, Log::Fatal will also print a backtrace, if the necessary libraries are available.

Here is a simple example binding, and its output. Note that instead of int main(), we use void BINDING_FUNCTION(). This is because the automatic binding generator will set up the environment and once that is done, it will call BINDING_FUNCTION().

#include <mlpack/core.hpp>
#include <mlpack/core/util/io.hpp>
// This definition below means we will only compile for the command line.
#define BINDING_TYPE BINDING_TYPE_CLI
#include <mlpack/core/util/mlpack_main.hpp>

using namespace mlpack;

void BINDING_FUNCTION(util::Params& params, util::Timers& timers)
{
  Log::Debug << "Compiled with debugging symbols." << std::endl;

  Log::Info << "Some test informational output." << std::endl;

  Log::Warn << "A warning!" << std::endl;

  Log::Fatal << "Program has crashed." << std::endl;

  Log::Warn << "Made it!" << std::endl;
}

Assuming mlpack is installed on the system and the code above is saved in test.cpp, this program can be compiled with the following command:

$ g++ -o test test.cpp -DDEBUG -g -rdynamic -lmlpack

Since we compiled with -DDEBUG, if we run the program as below, the following output is shown:

$ ./test --verbose
[DEBUG] Compiled with debugging symbols.
[INFO ] Some test informational output.
[WARN ] A warning!
[FATAL] [bt]: (1) /absolute/path/to/file/example.cpp:6: function()
[FATAL] Program has crashed.
terminate called after throwing an instance of 'std::runtime_error'
  what():  fatal error; see Log::Fatal output
Aborted

The flags -g and -rdynamic are only necessary for providing a backtrace. If those flags are not given during compilation, the following output would be shown:

$ ./test --verbose
[DEBUG] Compiled with debugging symbols.
[INFO ] Some test informational output.
[WARN ] A warning!
[FATAL] Cannot give backtrace because program was compiled without: -g -rdynamic
[FATAL] For a backtrace, recompile with: -g -rdynamic.
[FATAL] Program has crashed.
terminate called after throwing an instance of 'std::runtime_error'
  what():  fatal error; see Log::Fatal output
Aborted

The last warning is not reached, because Log::Fatal terminates the program.

Without debugging symbols (i.e. without -g and -DDEBUG) and without --verbose, the following is shown:

$ ./test
[WARN ] A warning!
[FATAL] Program has crashed.
terminate called after throwing an instance of 'std::runtime_error'
  what():  fatal error; see Log::Fatal output
Aborted

These four outputs can be very useful for both providing informational output and debugging output for your mlpack program.

🔗 Simple parameter example

Through the mlpack::util::Params object, parameters can be easily added to a binding with the BINDING_NAME, BINDING_SHORT_DESC, BINDING_LONG_DESC, BINDING_EXAMPLE, BINDING_SEE_ALSO, PARAM_INT, PARAM_DOUBLE, PARAM_STRING, and PARAM_FLAG macros.

Here is a sample use of those macros, extracted from methods/pca/pca_main.cpp. (Some details have been omitted from the snippet below.)

#include <mlpack/core.hpp>
#include <mlpack/core/util/io.hpp>
#include <mlpack/core/util/mlpack_main.hpp>

// Program Name.
BINDING_NAME("Principal Components Analysis");

// Short description.
BINDING_SHORT_DESC(
    "An implementation of several strategies for principal components analysis "
    "(PCA), a common preprocessing step.  Given a dataset and a desired new "
    "dimensionality, this can reduce the dimensionality of the data using the "
    "linear transformation determined by PCA.");

// Long description.
BINDING_LONG_DESC(
    "This program performs principal components analysis on the given dataset "
    "using the exact, randomized, randomized block Krylov, or QUIC SVD method. "
    "It will transform the data onto its principal components, optionally "
    "performing dimensionality reduction by ignoring the principal components "
    "with the smallest eigenvalues.");

// See also...
BINDING_SEE_ALSO("Principal component analysis on Wikipedia",
    "https://en.wikipedia.org/wiki/Principal_component_analysis");
BINDING_SEE_ALSO("PCA C++ class documentation",
    "@src/mlpack/methods/pca/pca.hpp");

// Parameters for program.
PARAM_MATRIX_IN_REQ("input", "Input dataset to perform PCA on.", "i");
PARAM_MATRIX_OUT("output", "Matrix to save modified dataset to.", "o");
PARAM_INT_IN("new_dimensionality", "Desired dimensionality of output dataset.",
    "d", 0);

using namespace mlpack;

void BINDING_FUNCTION(util::Params& params, util::Timers& timers)
{
  // Load input dataset.
  arma::mat& dataset = params.Get<arma::mat>("input");

  size_t newDimension = params.Get<int>("new_dimensionality");

  ...

  // Now save the results.
  if (params.Has("output"))
    params.Get<arma::mat>("output") = std::move(dataset);
}

Documentation is automatically generated using those macros, and if compiled to a command-line program, when that program is run with --help the following is displayed:

$ mlpack_pca --help
Principal Components Analysis

  This program performs principal components analysis on the given dataset.  It
  will transform the data onto its principal components, optionally performing
  dimensionality reduction by ignoring the principal components with the
  smallest eigenvalues.

Required options:

  --input_file [string]         Input dataset to perform PCA on.
  --output_file [string]        Matrix to save modified dataset to.

Options:

  --help (-h)                   Default help info.
  --info [string]               Get help on a specific module or option.
                                Default value ''.
  --new_dimensionality [int]    Desired dimensionality of output dataset.
                                Default value 0.
  --verbose (-v)                Display informational messages and the full list
                                of parameters and timers at the end of
                                execution.

The mlpack::IO source code can be consulted for further and complete documentation. Also useful is to look at other example bindings, found in src/mlpack/methods/.