Bandicoot: API Documentation

[top] API Documentation for Bandicoot 1.0

Preamble

Bandicoot is a GPU-based linear algebra library that meant to be API-compatible with Armadillo

Bandicoot supports the use of any OpenCL or CUDA device as a backend; see the backend configuration details

For converting Armadillo programs to Bandicoot, see the Armadillo/Bandicoot conversion guide

For converting Matlab/Octave programs, see the syntax conversion table

First time users: please see the short example program

If you discover any bugs or regressions, please report them

History of API additions

Please cite the following papers if you use Bandicoot in your research and/or software.
Citations are useful for the continued development and maintenance of the library.

TODO: a technical report!

Overview

matrix and vector classes
member functions & variables

generated vectors / matrices
functions of vectors / matrices

decompositions, factorisations, and inverses

signal & image processing
statistics and clustering
miscellaneous (constants, configuration)

Matrix and Vector Classes

Mat<type>, fmat, mat		dense matrix class
Col<type>, fcolvec, fvec, colvec, vec		dense column vector class
Row<type>, frowvec, rowvec		dense row vector class

operators		`+ − * % / == != <= >= < > && \|\|`

Member Functions & Variables

attributes		.n_rows, .n_cols, .n_elem, .n_slices, ...
element access		element/object access via (), [] and .at()

.zeros		set all elements to zero
.ones		set all elements to one
.eye		set elements along main diagonal to one and off-diagonal elements to zero
.randu / .randn		set all elements to random values

.fill		set all elements to specified value

.clamp		clamp values to lower and upper limits

.set_size		change size without keeping elements (fast)
.reshape		change size while keeping elements
.resize		change size while keeping elements and preserving layout
.reset		change size to empty

submatrix views		read/write access to contiguous and non-contiguous submatrices

.get_dev_mem()		get underlying raw GPU memory pointer

.diag		read/write access to matrix diagonals

.t / .st		return matrix transpose
.eval		force evaluation of delayed expression

.is_empty		check whether object is empty
.is_vec		check whether matrix is a vector

.is_square		check whether matrix is square sized

.print		print object to std::cout or user specified stream
.raw_print		print object without formatting

Generated Vectors / Matrices

linspace		generate vector with linearly spaced elements
eye		generate identity matrix
ones		generate object filled with ones
zeros		generate object filled with zeros
randu		generate object with random values (uniform distribution)
randn		generate object with random values (normal distribution)
randi		generate object with random integer values in specified interval

Functions of Vectors / Matrices

abs		obtain magnitude of each element
accu		accumulate (sum) all elements
all		check whether all elements are non-zero, or satisfy a relational condition
any		check whether any element is non-zero, or satisfies a relational condition
as_scalar		convert 1x1 matrix to pure scalar
clamp		obtain clamped elements according to given limits
conv_to		convert/cast between matrix types
cross		cross product
det		determinant
diagmat		generate diagonal matrix from given matrix or vector
diagvec		extract specified diagonal
dot		dot product
find		find indices of non-zero elements, or elements satisfying a relational condition
find_finite		find indices of finite elements
find_nonfinite		find indices of non-finite elements
find_nan		find indices of NaN elements
join_rows / join_cols		concatenation of matrices
min / max		return extremum values
norm		various norms of vectors and matrices
normalise		normalise vectors to unit p-norm
pow		element-wise power
repmat		replicate matrix in block-like fashion
reshape		change size while keeping elements
resize		change size while keeping elements and preserving layout
size		obtain dimensions of given object
sort		sort elements
sort_index		vector describing sorted order of elements
sum		sum of elements
symmatu / symmatl		generate symmetric matrix from given matrix
trace		sum of diagonal elements
trans		transpose of matrix
vectorise		flatten matrix into vector
misc functions		miscellaneous element-wise functions: exp, log, sqrt, round, sign, ...
trig functions		trigonometric element-wise functions: cos, sin, tan, ...

Decompositions, Factorisations, and Inverses

chol		Cholesky decomposition
eig_sym		eigen decomposition of dense symmetric/hermitian matrix
lu		lower-upper decomposition
pinv		pseudo-inverse / generalised inverse
svd		singular value decomposition

Signal & Image Processing

conv		1D convolution
conv2		2D convolution

Statistics

stats functions		mean, median, standard deviation, variance
cov		covariance
cor		correlation

Miscellaneous

backend configuration		configuring the use of OpenCL or CUDA backends for Bandicoot
output streams		streams for printing warnings and errors
uword / sword		shorthand for unsigned and signed integers
Matlab/Bandicoot syntax differences		examples of Matlab syntax and conceptually corresponding Bandicoot syntax
Armadillo/Bandicoot differences		conceptual differences between Bandicoot and Armadillo
example program		short example program
config.hpp		configuration options
direct linking		guide to linking without using the wrapper library
kernel cache		infrastructure for caching compiled GPU kernel functions
API additions		API stability and list of API additions

Matrix and Vector Classes

Mat<type>
fmat
mat

Classes for dense matrices, with elements stored in column-major ordering (ie. column by column) on the GPU

The root matrix class is Mat<type>, where type is one of:
- float, double, short, int, long, and unsigned versions of short, int, long
- Bandicoot provides convenient u32, u64, s32, and s64 types that can also be used
- Important: not all types are supported on all devices; runtime exceptions will be thrown if a type is not supported

For convenience the following typedefs have been defined:

`fmat`	=	`Mat<float>`
`mat`	=	`Mat<double>`	note: not supported on all devices
`dmat`	=	`Mat<double>`	note: not supported on all devices
`umat`	=	`Mat<uword>`
`imat`	=	`Mat<sword>`
`u32_mat`	=	`Mat<u32>`
`s32_mat`	=	`Mat<s32>`
`u64_mat`	=	`Mat<u64>`
`s64_mat`	=	`Mat<s64>`

In this documentation the fmat type is used for convenience, speed, and portability; it is possible to use other types instead, eg. mat

Note that standard consumer GPUs may not have support for 64-bit floats (double), and if they do, they may not show speedup over CPU-based Armadillo matrices unless they are high-end GPUs

Functions which use more complex functionality (generally matrix decompositions) are only valid for the following types: fmat, dmat, mat

Constructors:

Caveat:
- unlike Armadillo 10.5 and newer, the elements are not initialised; they may contain garbage values, including NaN
- consider using zeros() or other functions to fill memory
- setting the elements one by one is generally very inefficient and should be avoided if possible

Each instance of fmat automatically allocates and releases internal memory on the GPU. All internally allocated memory used by an instance of fmat is automatically released as soon as the instance goes out of scope. For example, if an instance of fmat is declared inside a function, it will be automatically destroyed at the end of the function. To forcefully release memory at any point, use .reset(); note that in normal use this is not required.

Advanced constructors:

Examples:

fmat A(5, 5);
A.randu();
float x = A(1, 2); // note: try to avoid repeated individual element accesses!

fmat B = A + A;
fmat C = A * B;
fmat D = A % B;

B.zeros();
B.set_size(10, 10);
B.ones(5, 6);

B.print("B:");

// convert from Armadillo
arma::fmat C(10, 10, arma::fill::randu);
fmat D(C);

// advanced constructors

// when using the OpenCL backend
cl_mem m_cl = clCreateBuffer(get_rt().cl_rt.get_context(), CL_MEM_READ_WRITE, sizeof(float) * 24, NULL, NULL);
fmat H(wrap_mem_cl(m_cl), 4, 6);  // use auxiliary memory

// when using the CUDA backend
float* m_cuda;
cudaMalloc(&m_cuda, sizeof(float) * 24);
fmat J(wrap_mem_cuda(m_cuda), 4, 6);  // use auxiliary memory

// make an alias of another matrix
arma::fmat K(D.get_dev_mem(), D.n_rows, D.n_cols);

See also:
- matrix attributes
- accessing elements
- initialising elements
- math & relational operators
- submatrix views
- printing matrices
- .get_dev_mem()
- .eval()
- conv_to() (convert between matrix types)
- explanation of typedef (cplusplus.com)
- Col class
- Row class
- config.hpp

Col<type>
fvec
vec

Classes for column vectors (dense matrices with one column)

The Col<type> class is derived from the Mat<type> class and inherits most of the member functions

For convenience the following typedefs have been defined:

`fvec`	=	`fcolvec`	=	`Col<float>`
`vec`	=	`colvec`	=	`Col<double>`	note: not supported on all devices
`dvec`	=	`dcolvec`	=	`Col<double>`	note: not supported on all devices
`uvec`	=	`ucolvec`	=	`Col<uword>`
`ivec`	=	`icolvec`	=	`Col<sword>`
`u32_vec`	=	`u32_colvec`	=	`Col<u32>`
`s32_vec`	=	`s32_colvec`	=	`Col<s32>`
`u64_vec`	=	`u64_colvec`	=	`Col<u64>`
`s64_vec`	=	`s64_colvec`	=	`Col<s64>`

In this documentation, the vec and colvec types have the same meaning and are used interchangeably

In this documentation, the types fvec or fcolvec are used for convenience, speed, and portability; it is possible to use other types instead, eg. vec, colvec

Note that standard consumer GPUs may not have support for 64-bit floats (double), and if they do, they may not show speedup over CPU-based Armadillo matrices unless they are high-end GPUs

Functions which take Mat as input can generally also take Col as input; main exceptions are functions which require square matrices

Constructors:

`fvec()`
`fvec(n_elem)`
`fvec(size(X))`
`fvec(fvec)`
`fvec(arma::fvec)`		(convert from CPU-based Armadillo vector)
`fvec(fmat)`		(std::logic_error exception is thrown if the given matrix has more than one column)

Caveat:
- unlike Armadillo 10.5 and newer, the elements are not initialised; they may contain garbage values, including NaN
- consider using zeros() or other functions to fill memory
- setting the elements one by one is generally very inefficient and should be avoided if possible

Advanced constructors:

Examples:

fvec x(10);
fvec y(10, fill::ones);

fmat A(10, 10, fill::randu);
fvec z = A.col(5); // extract a column vector

// convert from Armadillo
arma::fvec d(100, arma::fill::randu);
fvec e(d);

See also:

Row<type>
frowvec
rowvec

Classes for row vectors (dense matrices with one row)

The template Row<type> class is derived from the Mat<type> class and inherits most of the member functions

For convenience the following typedefs have been defined:

`frowvec`	=	`Row<float>`
`rowvec`	=	`Row<double>`	note: not supported on all devices
`drowvec`	=	`Row<double>`	note: not supported on all devices
`urowvec`	=	`Row<uword>`
`irowvec`	=	`Row<sword>`
`u32_rowvec`	=	`Row<u32>`
`s32_rowvec`	=	`Row<s32>`
`u64_rowvec`	=	`Row<u64>`
`s64_rowvec`	=	`Row<s64>`

In this documentation, the frowvec type is used for convenience, speed, and portability; it is possible to use other types instead, eg. rowvec

Note that standard consumer GPUs may not have support for 64-bit floats (double), and if they do, they may not show speedup over CPU-based Armadillo matrices unless they are high-end GPUs

Functions which take Mat as input can generally also take Row as input. Main exceptions are functions which require square matrices

Constructors:

`frowvec()`
`frowvec(n_elem)`
`frowvec(size(X))`
`frowvec(frowvec)`
`frowvec(arma::fmat)`		(convert from CPU-based Armadillo row vector)
`frowvec(fmat)`		(std::logic_error exception is thrown if the given matrix has more than one row)

Caveat:
- unlike Armadillo 10.5 and newer, the elements are not initialised; they may contain garbage values, including NaN
- consider using zeros() or other functions to fill memory
- setting the elements one by one is generally very inefficient and should be avoided if possible

Advanced constructors:

Examples:

frowvec x(10);
frowvec y(10, fill::ones);

fmat    A(10, 10, fill::randu);
frowvec z = A.row(5); // extract a row vector

// convert from Armadillo
arma::frowvec d(100, arma::fill::randu);
fvec e(d);

See also:

operators: + − * % / == != <= >= < > && ||

Overloaded operators for Mat, Col, and Row classes

Operations:

`+`		addition of two objects
`−`		subtraction of one object from another or negation of an object

`*`		matrix multiplication of two objects

`%`		element-wise multiplication of two objects (Schur product)
`/`		element-wise division of an object by another object or a scalar

`==`		element-wise equality evaluation of two objects; generates a matrix of type umat
`!=`		element-wise non-equality evaluation of two objects; generates a matrix of type umat

`>=`		element-wise "greater than or equal to" evaluation of two objects; generates a matrix of type umat
`<=`		element-wise "less than or equal to" evaluation of two objects; generates a matrix/cube of type umat

`>`		element-wise "greater than" evaluation of two objects; generates a matrix of type umat
`<`		element-wise "less than" evaluation of two objects; generates a matrix of type umat

`&&`		element-wise logical AND evaluation of two objects; generates a matrix of type umat
`\|\|`		element-wise logical OR evaluation of two objects; generates a matrix of type umat

For element-wise relational and logical operations (ie. ==, !=, >=, <=, >, <, &&, ||) each element in the generated object is either 0 or 1, depending on the result of the operation

Caveat: operators involving equality comparison (ie. ==, !=, >=, <=) are not recommended for matrices of type mat or fmat, due to the necessarily limited precision of floating-point element types

If incompatible object sizes are used, a std::logic_error exception is thrown

Examples:

fmat A = randu<fmat>(5, 10);
fmat B = randu<fmat>(5, 10);
fmat C = randu<fmat>(10, 5);

fmat P = A + B;
fmat Q = A - B;
fmat R = -B;
fmat S = A / 123.0;
fmat T = A % B;
fmat U = A * C;

fmat V = A + B + A + B;

imat AA = linspace<imat>(1, 9, 9);
imat BB = linspace<imat>(9, 1, 9);

// compare elements
umat ZZ = (AA >= BB);

See also:
- pow()
- any()
- all()
- accu()
- as_scalar()
- find()
- miscellaneous element-wise functions (exp, log, sqrt, square, round, ...)
- floating point arithmetic in Wikipedia
- floating point representation in MathWorld

Member Functions & Variables

attributes

`.n_rows`		number of rows; present in Mat, Col, and Row
`.n_cols`		number of columns; present in Mat, Col, and Row
`.n_elem`		total number of elements; present in Mat, Col, and Row

The variables are of type uword

The variables are read-only; to change the size, use .set_size(), .zeros(), .ones(), or .reset()

For the Col and Row classes, n_elem also indicates vector length

Examples:

fmat X(4,5);
cout << "X has " << X.n_cols << " columns" << endl;

See also:
- .set_size()
- .zeros()
- .ones()
- .reset()
- size()

element access via (), [] and .at()

Provide access to individual elements in a Mat, Col, or Row

`(i)`		For fvec and frowvec, access the element stored at index i. For fmat, access the element/object stored at index i under the assumption of a flat layout, with column-major ordering of data (i.e. column by column). An exception is thrown if the requested element is out of bounds.

`.at(i)` or `[i]`		As for `(i)`, but without a bounds check; not recommended; see the caveats below

`(r,c)`		For fmat, access the element/object stored at row r and column c. An exception is thrown if the requested element is out of bounds.

`.at(r,c)`		As for `(r,c)`, but without a bounds check; not recommended; see the caveats below

Important: every element access involves a transfer from GPU memory to CPU memory; therefore, for efficiency, avoid repeated element access when possible; see the Armadillo conversion guide for more details and suggestions.

The indices of elements are specified via the uword type, which is a typedef for an unsigned integer type.

Caveats:
- accessing elements without bounds checks is slightly faster, but is not recommended until your code has been thoroughly debugged first
- indexing in C++ starts at 0
- accessing elements via [r,c] does not work correctly in C++; instead use (r,c) and (r,c,s)

Examples:

// remember that individual element accesses are slow and should be avoided;
// when possible, operate in batch instead of on individual elements!
fmat M(10, 10);
M.randu();
M(9, 9) = 123.0;
float x = M(1, 2);

fvec v(10);
v.randu();
v(9) = 123.0;
float y = v(0);

See also:

.zeros()			(member function of Mat, Col, and Row)
.zeros( n_elem )			(member function of Col and Row)
.zeros( n_rows, n_cols )			(member function of Mat)

Set the elements of an object to zero, optionally first changing the size to specified dimensions

Examples:

fmat A;
A.zeros(5, 10);

fvec B;
B.zeros(100);

fmat C(5, 10);
C.zeros();

See also:
- zeros() (standalone function)
- .ones()
- .randu()
- .fill()
- .reset()
- .set_size()
- size()

.ones()			(member function of Mat, Col, and Row)
.ones( n_elem )			(member function of Col and Row)
.ones( n_rows, n_cols )			(member function of Mat)

Set all the elements of an object to one, optionally first changing the size to specified dimensions

Examples:

fmat A;
A.ones(5, 10);

fvec B;
B.ones(100);

fmat C(5, 10);
C.ones();

See also:
- ones() (standalone function)
- .eye()
- .zeros()
- .fill()
- .randu()
- size()

.eye()
.eye( n_rows, n_cols )

Member functions of Mat

Set the elements along the main diagonal to one and off-diagonal elements to zero, optionally first changing the size to specified dimensions

An identity matrix is generated when n_rows = n_cols

Examples:

fmat A;
A.eye(5, 5);

fmat B(5, 5);
B.eye();

See also:
- .ones()
- .diag()
- diagmat()
- diagvec()
- eye() (standalone function)
- size()

.randu()			(member function of Mat, Col, and Row)
.randu( n_elem )			(member function of Col and Row)
.randu( n_rows, n_cols )			(member function of Mat)

.randn()			(member function of Mat, Col, and Row)
.randn( n_elem )			(member function of Col and Row)
.randn( n_rows, n_cols )			(member function of Mat)

Set all the elements to random values, optionally first changing the size to specified dimensions

.randu() uses a uniform distribution in the [0,1] interval

.randn() uses a normal/Gaussian distribution with zero mean and unit variance

To change the RNG seed, use coot_rng::set_seed(value) or coot_rng::set_seed_random() functions

Examples:

fmat A;
A.randu(5, 10);

fvec B;
B.randu(100);

fmat C(5, 10);
C.randu();

coot_rng::set_seed_random(); // set the seed to a random value
coot_rng::set_seed(42);      // set the seed to a specific value

See also:
- randu() (standalone function with extended functionality)
- randn() (standalone function with extended functionality)
- .fill()
- .ones()
- .zeros()
- size()
- uniform distribution in Wikipedia
- normal distribution in Wikipedia

.fill( value )

Member function of Mat, Col, and Row

Sets the elements to a specified value

The type of value must match the type of elements used by the container object (e.g. for fmat the type is float)

Examples:

See also:

.clamp( min_value, max_value )

Member function of Mat, Col, and Row

Clamp each element to the [min_val, max_val] interval; any value lower than min_val will be set to min_val, and any value higher than max_val will be set to max_val

Examples:

fmat A(5, 6);
A.randu();

A.clamp(0.2, 0.8);

See also:
- clamp() (standalone function)
- relational operators

.set_size( n_elem )			(member function of Col and Row)
.set_size( n_rows, n_cols )			(member function of Mat)
.set_size( size(X) )			(member function of Mat, Col, and Row)

Change the size of an object, without explicitly preserving data and without initialising the elements (i.e. elements may contain garbage values, including NaN)

To initialise the elements to zero while changing the size, use .zeros() instead

To explicitly preserve data while changing the size, use .reshape() or .resize() instead;
NOTE: .reshape() and .resize() are considerably slower than .set_size()

Examples:

fmat A;
A.set_size(5, 10);      // or:  mat A(5, 10);

fmat B;
B.set_size( size(A) );  // or:  mat B(size(A));

fvec v;
v.set_size(100);        // or:  vec v(100);

See also:
- .reset()
- .reshape()
- .resize()
- .zeros()
- size()

.reshape( n_rows, n_cols )			(member function of Mat)
.reshape( size(X) )			(member function of Mat)

Recreate the object according to given size specifications, with the elements taken from the previous version of the object in a column-wise manner; the elements in the generated object are placed column-wise (i.e. the first column is filled up before filling the second column)

The layout of the elements in the recreated object will be different to the layout in the previous version of the object

If the total number of elements in the previous version of the object is less than the specified size, the extra elements in the recreated object are set to zero

If the total number of elements in the previous version of the object is greater than the specified size, only a subset of the elements is taken

Caveats:
- to change the size without preserving data, use .set_size() instead, which is much faster
- to grow/shrink the object while preserving the elements as well as the layout of the elements, use .resize() instead
- to flatten a matrix into a vector, use vectorise() or .as_col() / .as_row() instead

Examples:

fmat A(4, 5);
A.randu();

A.reshape(5, 4);

See also:
- .resize()
- .set_size()
- .zeros()
- .reset()
- reshape() (standalone function)
- vectorise()
- size()

.resize( n_elem )			(member function of Col and Row)
.resize( n_rows, n_cols )			(member function of Mat)
.resize( size(X) )			(member function of Mat, Col, and Row)

Recreate the object according to given size specifications, while preserving the elements as well as the layout of the elements

Can be used for growing or shrinking an object (i.e. adding/removing rows, and/or columns, and/or slices)

Caveat: to change the size without preserving data, use .set_size() instead, which is much faster

Examples:

fmat A(4, 5);
A.randu();

A.resize(7, 6);

See also:
- .reshape()
- .set_size()
- .zeros()
- .reset()
- resize() (standalone function)
- vectorise()
- size()

.reset()

Reset the size to zero (the object will have no elements)

Examples:

See also:

submatrix views

A collection of member functions of Mat, Col and Row classes that provide read/write access to submatrix views
contiguous views for matrix X:

contiguous views for vector V:

related matrix views (documented separately)

Instances of span(start,end) can be replaced by span::all to indicate the entire range

Examples:

fmat A(5, 10);
A.zeros();

A.submat( 0,1, 2,3 )      = randu<fmat>(3, 3);
A( span(0,2), span(1,3) ) = randu<fmat>(3, 3);
A( 0,1, size(3,3) )       = randu<fmat>(3, 3);

fmat B = A.submat( 0,1, 2,3 );
fmat C = A( span(0,2), span(1,3) );
fmat D = A( 0,1, size(3,3) );

A.col(1)        = randu<fmat>(5,1);
A(span::all, 1) = randu<fmat>(5,1);

// add 123 to the last 5 elements of vector a
vec a(10);
a.randu();
a.subvec(a.n_elem - 5, a.n_elem - 1) += 123.0;

// add 123 to the first 3 elements of column 2 of X
X.col(2).subvec(0, 2) += 123;

See also:

.get_dev_mem()
.get_dev_mem( synchronise )

Member function of Mat, Col, and Row

Obtain dev_mem_t object that holds raw GPU memory pointers

By default, all asynchronous GPU operations are forced to complete, unless synchronise is passed as false

Depending on backend configuration, underlying GPU memory may be accessed as
- .get_dev_mem().cl_mem_ptr for the OpenCL backend; this has type cl_mem
- .get_dev_mem().cuda_mem_ptr for the CUDA backend; for a matrix type Mat<eT>, this has type eT* (e.g. for fmat the type will be float*)

Examples:

// when using the OpenCL backend
fmat A = randu<fmat>(3, 4);
cl_mem A_mem = A.get_dev_mem().cl_mem_ptr;

// when using the CUDA backend
fmat B = randu<fmat>(3, 4);
float* B_mem = B.get_dev_mem().cuda_mem_ptr;

See also:

.diag()
.diag( k )

Member function of Mat

Read/write access to a diagonal in a matrix

The argument k is optional; by default the main diagonal is accessed (k = 0)

For k > 0, the k-th super-diagonal is accessed (top-right corner)

For k < 0, the k-th sub-diagonal is accessed (bottom-left corner)

The diagonal is interpreted as a column vector within expressions

Note: to calculate only the diagonal elements of a compound expression, use diagvec() or diagmat()

Examples:

fmat X(5, 5);
X.randu();

fvec a = X.diag();
fvec b = X.diag(1);
fvec c = X.diag(-2);

X.diag() = randu<fvec>(5);
X.diag() += 6;
X.diag().ones();

See also:

.t()
.st()

Member functions of any matrix or vector expression

.t() and .st() provide transposed copies of the matrix

Examples:

fmat A(4, 5);
A.randu();

fmat B = A.t();

See also:

.eval()

Member function of any matrix or vector expression

Explicitly forces the evaluation of a delayed expression and outputs a matrix

This function should be used sparingly and only in cases where it is absolutely necessary; indiscriminate use can degrade performance

Examples:

fmat A = randu<fmat>(4,4);

A.t().eval().print("A.t()");

See also:
- as_scalar()
- Mat class

.is_empty()

Returns true if the object has no elements

Returns false if the object has one or more elements

Examples:

fmat A(5, 5, fill::randu);
cout << A.is_empty() << endl;

A.reset();
cout << A.is_empty() << endl;

See also:

.is_vec()
.is_colvec()
.is_rowvec()

Member functions of Mat

.is_vec():
- returns true if the matrix can be interpreted as a vector (either column or row vector)
- returns false if the matrix does not have exactly one column or one row

.is_colvec():
- returns true if the matrix can be interpreted as a column vector
- returns false if the matrix does not have exactly one column

.is_rowvec():
- returns true if the matrix can be interpreted as a row vector
- returns false if the matrix does not have exactly one row

Caveat: do not assume that the vector has elements if these functions return true; it is possible to have an empty vector (eg. 0x1)

Examples:

fmat A = randu<fmat>(1, 5);
fmat B = randu<fmat>(5, 1);
fmat C = randu<fmat>(5, 5);

cout << A.is_vec() << endl;
cout << B.is_vec() << endl;
cout << C.is_vec() << endl;

See also:
- .is_empty()
- .is_square()

.is_square()

Member function of Mat

Returns true if the matrix is square, ie. number of rows is equal to the number of columns

Returns false if the matrix is not square

Examples:

fmat A = randu<fmat>(5, 5);
fmat B = randu<fmat>(6, 7);

cout << A.is_square() << endl;
cout << B.is_square() << endl;

See also:

.print()
.print( header )

.print( stream )
.print( stream, header )

Member functions of Mat, Col, and Row

Print the contents of an object to the std::cout stream (default), or a user specified stream, with an optional header string

Objects can also be printed using the << stream operator

Examples:

fmat A = randu<fmat>(5, 5);
fmat B = randu<fmat>(6, 6);

A.print();

// print a transposed version of A
A.t().print();

// "B:" is the optional header line
B.print("B:");

cout << A << endl;

cout << "B:" << endl;
cout << B << endl;

See also:
- .raw_print()
- output streams

.raw_print()
.raw_print( header )

.raw_print( stream )
.raw_print( stream, header )

Member functions of Mat, Col, and Row

Similar to the .print() member function, with the difference that no formatting of the output is done; the stream's parameters such as precision, cell width, etc. can be set manually

If the cell width is set to zero, a space is printed between the elements

Examples:

fmat A = randu<fmat>(5, 5);

cout.precision(11);
cout.setf(ios::fixed);

A.raw_print(cout, "A:");

See also:
- .print()
- std::ios_base::fmtflags (cppreference.com)
- std::ios_base::fmtflags (cplusplus.com)

Generated Vectors / Matrices

linspace( start, end )
linspace( start, end, N )

Generate a vector with N elements; the values of the elements are linearly spaced from start to (and including) end

The argument N is optional; by default N = 100

Usage:
- fvec v = linspace(start, end, N)
- vector_type v = linspace<vector_type>(start, end, N)

Caveat: for N = 1, the generated vector will have a single element equal to end

Examples:

   fvec a = linspace(0, 5, 6);

frowvec b = linspace<frowvec>(5, 0, 6);

See also:
- ones()

eye( n_rows, n_cols )
eye( size(X) )

Generate a matrix with the elements along the main diagonal set to one and off-diagonal elements set to zero

An identity matrix is generated when n_rows = n_cols

Usage:
- mat X = eye( n_rows, n_cols )
- matrix_type X = eye<matrix_type>( n_rows, n_cols )
- matrix_type Y = eye<matrix_type>( size(X) )

Examples:

  fmat A = eye(5,5);

  fmat B = 123.0 * eye<fmat>(5,5);

  imat C = eye<imat>( size(B) );

See also:
- .eye() (member function of Mat)
- .diag()
- ones()
- diagmat()
- diagvec()
- size()

ones( n_elem )
ones( n_rows, n_cols )
ones( size(X) )

Generate a vector or matrix with all elements set to one

Usage:
- vector_type v = ones<vector_type>( n_elem )
- matrix_type X = ones<matrix_type>( n_rows, n_cols )
- matrix_type Y = ones<matrix_type>( size(X) )

Examples:

   fvec v = ones(10);
   uvec u = ones<uvec>(10);
frowvec r = ones<frowvec>(10);

fmat A = ones(5,6);
imat B = ones<imat>(5,6);
umat C = ones<umat>(5,6);

See also:
- .ones() (member function of Mat, Col, Row and Cube)
- .fill()
- eye()
- linspace()
- zeros()
- randu()

zeros( n_elem )
zeros( n_rows, n_cols )
zeros( size(X) )

Generate a vector or matrix with the elements set to zero

Usage:
- vector_type v = zeros<vector_type>( n_elem )
- matrix_type X = zeros<matrix_type>( n_rows, n_cols )
- matrix_type Y = zeros<matrix_type>( size(X) )

Examples:

   fvec v = zeros(10);
   uvec u = zeros<uvec>(10);
frowvec r = zeros<rowvec>(10);

fmat A = zeros(5,6);
imat B = zeros<imat>(5,6);
umat C = zeros<umat>(5,6);

See also:
- .zeros() (member function of Mat, Col, Row, SpMat and Cube)
- .fill()
- ones()
- randu()
- size()

randu( n_elem )
randu( n_rows, n_cols )
randu( size(X) )

Generate a vector or matrix with the elements set to random floating point values uniformly distributed in the [0,1] interval

Usage:
- vector_type v = randu<vector_type>( n_elem )
- matrix_type X = randu<matrix_type>( n_rows, n_cols )

To change the RNG seed, use coot_rng::set_seed(value) or coot_rng::set_seed_random() functions

Caveat: to generate a matrix with random integer values instead of floating point values, use randi() instead

Examples:

fvec v1 = randu(5);

frowvec r1 = randu<rowvec>(5);

fmat A1 = randu(5, 6);

mat B1 = randu<mat>(5, 6);
mat B2 = randu<mat>(5, 6, distr_param(10,20));

coot_rng::set_seed_random(); // set the seed to a random value
coot_rng::set_seed(42);      // set the seed to a specific value

See also:
- .randu() (member function)
- randn()
- randi()
- ones()
- zeros()
- size()
- uniform distribution in Wikipedia

randn( n_elem )
randn( n_elem, distr_param(mu,sd) )

randn( n_rows, n_cols )
randn( n_rows, n_cols, distr_param(mu,sd) )

randn( size(X) )
randn( size(X), distr_param(mu,sd) )

Generate a vector or matrix with the elements set to random values with normal / Gaussian distribution, parameterised by mean mu and standard deviation sd

The default distribution parameters are mu = 0 and sd = 1

Usage:
- vector_type v = randn<vector_type>( n_elem )
- vector_type v = randn<vector_type>( n_elem, distr_param(mu,sd) )
- matrix_type X = randn<matrix_type>( n_rows, n_cols )
- matrix_type X = randn<matrix_type>( n_rows, n_cols, distr_param(mu,sd) )

To change the RNG seed, use coot_rng::set_seed(value) or coot_rng::set_seed_random() functions

Examples:

fvec v1 = randn(5);
fvec v2 = randn(5, distr_param(10,5));

frowvec r1 = randn<rowvec>(5);
frowvec r2 = randn<rowvec>(5, distr_param(10,5));

fmat A1 = randn(5, 6);
fmat A2 = randn(5, 6, distr_param(10,5));

mat B1 = randn<mat>(5, 6);
mat B2 = randn<mat>(5, 6, distr_param(10,5));

coot_rng::set_seed_random(); // set the seed to a random value
coot_rng::set_seed(42);      // set the seed to a specific value

See also:
- .randn() (member function)
- randu()
- randi()
- size()
- normal distribution in Wikipedia

randi( n_elem )
randi( n_elem, distr_param(a,b) )

randi( n_rows, n_cols )
randi( n_rows, n_cols, distr_param(a,b) )

randi( size(X) )
randi( size(X), distr_param(a,b) )

Generate a vector or matrix with the elements set to random integer values uniformly distributed in the [a,b] interval

The default distribution parameters are a = 0 and b = maximum_int

Usage:
- vector_type v = randi<vector_type>( n_elem )
- vector_type v = randi<vector_type>( n_elem, distr_param(a,b) )
- matrix_type X = randi<matrix_type>( n_rows, n_cols )
- matrix_type X = randi<matrix_type>( n_rows, n_cols, distr_param(a,b) )

To change the RNG seed, use coot_rng::set_seed(value) or coot_rng::set_seed_random() functions

Caveat: to generate a matrix with random floating point values (ie. float or double) instead of integers, use randu() instead~

Examples:

imat A1 = randi(5, 6);
imat A2 = randi(5, 6, distr_param(-10, +20));

fmat B1 = randi<fmat>(5, 6);
fmat B2 = randi<fmat>(5, 6, distr_param(-10, +20));

coot_rng::set_seed_random(); // set the seed to a random value
coot_rng::set_seed(42);      // set the seed to a specific value

See also:
- randu()
- ones()
- zeros()
- size()

Functions of Vectors / Matrices

abs( X )

Obtain the magnitude of each element

X and Y must have the same matrix type, such as fmat or ivec

Examples:

fmat A = randu<fmat>(5, 5);
fmat B = abs(A);

fvec X = linspace<fvec>(-5, 5, 11);
fvec Y = abs(X);

See also:
- pow()
- miscellaneous element-wise functions

accu( X )

Accumulate (sum) all elements of a vector or matrix

Examples:

fmat A = randu<fmat>(5, 6);
fmat B = randu<fmat>(5, 6);

float x = accu(A);

float y = accu(A % B);

See also:
- sum()
- trace()
- mean()
- dot()
- as_scalar()

all( V )
all( X )
all( X, dim )

For vector V, return true if all elements of the vector are non-zero or satisfy a relational condition

For matrix X and
- dim = 0, return a row vector (of type urowvec or umat), with each element (0 or 1) indicating whether the corresponding column of X has all non-zero elements
- dim = 1, return a column vector (of type ucolvec or umat), with each element (0 or 1) indicating whether the corresponding row of X has all non-zero elements

The dim argument is optional; by default dim = 0 is used

Relational operators can be used instead of V or X, eg. A > 0.5

Examples:

fvec V = randu<fvec>(10);
fmat X = randu<fmat>(5, 5);

// status1 will be set to true if vector V has all non-zero elements
bool status1 = all(V);

// status2 will be set to true if vector V has all elements greater than 0.5
bool status2 = all(V > 0.5);

// status3 will be set to true if matrix X has all elements greater than 0.6;
// note the use of vectorise()
bool status3 = all(vectorise(X) > 0.6);

// generate a row vector indicating which columns of X have all elements greater than 0.7
umat A = all(X > 0.7);

See also:
- any()
- find()
- conv_to() (convert between matrix/vector types)
- vectorise()

any( V )
any( X )
any( X, dim )

For vector V, return true if any element of the vector is non-zero or satisfies a relational condition

For matrix X and
- dim = 0, return a row vector (of type urowvec or umat), with each element (0 or 1) indicating whether the corresponding column of X has any non-zero elements
- dim = 1, return a column vector (of type ucolvec or umat), with each element (0 or 1) indicating whether the corresponding row of X has any non-zero elements

The dim argument is optional; by default dim = 0 is used

Relational operators can be used instead of V or X, eg. A > 0.9

Examples:

fvec V = randu<fvec>(10);
fmat X = randu<fmat>(5, 5);

// status1 will be set to true if vector V has any non-zero elements
bool status1 = any(V);

// status2 will be set to true if vector V has any elements greater than 0.5
bool status2 = any(V > 0.5);

// status3 will be set to true if matrix X has any elements greater than 0.6;
// note the use of vectorise()
bool status3 = any(vectorise(X) > 0.6);

// generate a row vector indicating which columns of X have elements greater than 0.7
umat A = any(X > 0.7);

See also:
- all()
- find()
- conv_to() (convert between matrix/vector types)
- vectorise()

as_scalar( expression )

Evaluate an expression that results in a 1x1 matrix, followed by converting the 1x1 matrix to a pure scalar

Optimised expression evaluations are automatically used when a binary or trinary expression is given (ie. 2 or 3 terms)

Examples:

frowvec r = randu<frowvec>(5);
fcolvec q = randu<fcolvec>(5);

mat X(5, 5, fill::randu);

// examples of expressions which have optimised implementations

float a = as_scalar(r*q);
float b = as_scalar(r*X*q);
float c = as_scalar(r*diagmat(X)*q);
float d = as_scalar(r*inv(diagmat(X))*q);

See also:
- vectorise()
- accu()
- trace()
- dot()
- norm()
- conv_to()

clamp( X, min_val, max_val )

Create a copy of X with each element clamped to the [min_val, max_val] interval;
any value lower than min_val will be set to min_val, and any value higher than max_val will be set to max_val

Examples:

fmat A = randu<fmat>(5, 5);

fmat B = clamp(A, 0.2,         0.8);

fmat C = clamp(A, min(min(A)), 0.8);

fmat D = clamp(A, 0.2, max(max(A)));

See also:
- .clamp() (member function)
- find()

conv_to< type >::from( X )

Convert (cast) from one matrix type to another (eg. fmat to imat)

Conversion between Armadillo and Bandicoot matrices/vectors is also possible

Conversion of a fmat object into fcolvec or frowvec is possible if the object can be interpreted as a vector

When conv_to is applied to an expression, the conversion operation will be fused with the expression computation when possible

Examples:

fmat A = randu<fmat>(5, 5);
 mat B = conv_to<mat>::from(A);

fmat C = randu<fmat>(10, 1);
fcolvec x = conv_to< fcolvec >::from(C);

// convert from Armadillo object
arma::fmat D = conv_to<arma::fmat>::from(A);

See also:

cross( A, B )

Calculate the cross product between A and B, under the assumption that A and B are 3 dimensional vectors

Examples:

fvec a = randu<fvec>(3);
fvec b = randu<fvec>(3);

fvec c = cross(a, b);

See also:

val = det( A )		(form 1)
det( val, A )		(form 2)

Calculate the determinant of square matrix A, based on LU decomposition

form 1: return the determinant

form 2: store the calculated determinant in val and return a bool indicating success

If A is not square sized, a std::logic_error exception is thrown

If the calculation fails:
- val = det(A) throws a std::runtime_error exception
- det(val,A) returns a bool set to false (exception is not thrown)

Examples:

fmat A = randu<fmat>(5, 5);

float val1 = det(A);         // form 1

float val2;
bool success = det(val2, A); // form 2

See also:
- determinant in MathWorld
- determinant in Wikipedia

diagmat( V )
diagmat( V, k )

diagmat( X )
diagmat( X, k )

Generate a diagonal matrix from vector V or matrix X

Given vector V, generate a square matrix with the k-th diagonal containing a copy of the vector; all other elements are set to zero

Given matrix X, generate a matrix with the k-th diagonal containing a copy of the k-th diagonal of X; all other elements are set to zero

If X is an expression, the evaluation of the expression aims to calculate only the diagonal elements

The argument k is optional; by default the main diagonal is used (k = 0)

For k > 0, the k-th super-diagonal is used (above main diagonal, towards top-right corner)

For k < 0, the k-th sub-diagonal is used (below main diagonal, towards bottom-left corner)

Examples:

fmat A = randu<fmat>(5, 5);
fmat B = diagmat(A);
fmat C = diagmat(A,1);

fvec v = randu<fvec>(5);
fmat D = diagmat(v);
fmat E = diagmat(v,1);

See also:

diagvec( X )
diagvec( X, k )

Extract the k-th diagonal from matrix X

If X is an expression, the evaluation of the expression aims to calculate only the diagonal elements

The argument k is optional; by default the main diagonal is extracted (k = 0)

For k > 0, the k-th super-diagonal is extracted (top-right corner)

For k < 0, the k-th sub-diagonal is extracted (bottom-left corner)

The extracted diagonal is interpreted as a column vector

Examples:

fmat A = randu<fmat>(5, 5);

fvec d = diagvec(A);

See also:
- .diag()
- diagmat()
- trace()
- vectorise()

dot( A, B )

Dot product of A and B, treating A and B as vectors

Caveat: norm() is more robust for calculating the norm, as it handles underflows and overflows

Examples:

fvec a = randu<fvec>(10);
fvec b = randu<fvec>(10);

float x = dot(a,b);

See also:

find( X )
find( X, k )
find( X, k, s )

Return a column vector containing the indices of elements of X that are non-zero or satisfy a relational condition

The output vector must have the type uvec (i.e. the indices are stored as unsigned integers of type uword)

X is interpreted as a vector, with column-by-column ordering of the elements of X

Relational operators can be used instead of X, eg. A > 0.5

If k = 0 (default), return the indices of all non-zero elements, otherwise return at most k of their indices

If s = "first" (default), return at most the first k indices of the non-zero elements

If s = "last", return at most the last k indices of the non-zero elements

Caveats:
- to clamp values to an interval, clamp() is more efficient

Examples:

fmat A = randu<fmat>(5, 5);
fmat B = randu<fmat>(5, 5);

uvec q1 = find(A > B);
uvec q2 = find(A > 0.5);
uvec q3 = find(A > 0.5, 3, "last");

// change elements of A greater than 0.5 to 1
A.elem( find(A > 0.5) ).ones();

See also:

find_finite( X )

Return a column vector containing the indices of elements of X that are finite (i.e. not ±Inf and not NaN)

The output vector must have the type uvec (i.e. the indices are stored as unsigned integers of type uword)

X is interpreted as a vector, with column-by-column ordering of the elements of X

Examples:

fmat A = randu<fmat>(5, 5);

A(1, 1) = datum::inf;

// find only finite elements
uvec f = find_finite(A);

See also:

find_nonfinite( X )

Return a column vector containing the indices of elements of X that are non-finite (i.e. ±Inf or NaN)

The output vector must have the type uvec (i.e. the indices are stored as unsigned integers of type uword)

X is interpreted as a vector, with column-by-column ordering of the elements of X

Examples:

fmat A = randu<fmat>(5, 5);

A(1, 1) = datum::inf;
A(2, 2) = datum::nan;

// return indices of two non-finite elements
uvec f = find_nonfinite(A);

See also:

find_nan( X )

Return a column vector containing the indices of elements of X that are NaN (not-a-number)

The output vector must have the type uvec (i.e. the indices are stored as unsigned integers of type uword)

X is interpreted as a vector, with column-by-column ordering of the elements of X

Examples:

fmat A = randu<fmat>(5, 5);

A(2, 3) = datum::nan;

// indices will be { 17 }
uvec indices = find_nan(A);

See also:
- find()
- find_finite()
- find_nonfinite()
- constants (pi, nan, inf, ...)
- NaN in Wikipedia

join_rows( A, B )
join_rows( A, B, C )
join_rows( A, B, C, D )

join_cols( A, B )
join_cols( A, B, C )
join_cols( A, B, C, D )

join_horiz( A, B )
join_horiz( A, B, C )
join_horiz( A, B, C, D )

join_vert( A, B )
join_vert( A, B, C )
join_vert( A, B, C, D )

join_rows() and join_horiz(): horizontal concatenation; join the corresponding rows of the given matrices; the given matrices must have the same number of rows

join_cols() and join_vert(): vertical concatenation; join the corresponding columns of the given matrices; the given matrices must have the same number of columns

Examples:

fmat A = randu<fmat>(4, 5);
fmat B = randu<fmat>(4, 6);
fmat C = randu<fmat>(6, 5);

fmat AB = join_rows(A, B);
fmat AC = join_cols(A, C);

See also:
- submatrix views

min( V )
min( M )
min( M, dim )
min( Q )
min( Q, dim )
min( A, B )

max( V )
max( M )
max( M, dim )
max( Q )
max( Q, dim )
max( A, B )

For vector V, return the extremum value

For matrix M, return the extremum value for each column (dim = 0), or each row (dim = 1)

The dim argument is optional; by default dim = 0 is used

For two matrices A and B, return a matrix containing element-wise extremum values

Examples:

fcolvec v = randu<fcolvec>(10);
float x = max(v);

fmat M = randu<fmat>(10, 10);

frowvec a = max(M);
frowvec b = max(M, 0);
fcolvec c = max(M, 1);

// element-wise maximum
fmat X = randu<fmat>(5, 6);
fmat Y = randu<fmat>(5, 6);
fmat Z = coot::max(X, Y); // use coot:: prefix to distinguish from std::max()

See also:
- clamp()
- statistics functions

norm( X )
norm( X, p )

Compute the p-norm of X, where X is a vector or matrix

For vectors, p is an integer ≥ 1, or one of: "-inf", "inf", "fro"

For matrices, p is one of: 1, 2, "inf", "fro"

"-inf" is the minimum quasi-norm, "inf" is the maximum norm, "fro" is the Frobenius norm

The argument p is optional; by default p = 2 is used

For vector norm with p = 2 and matrix norm with p = "fro", a robust algorithm is used to reduce the likelihood of underflows and overflows

Caveats:
- to obtain the zero/Hamming pseudo-norm (the number of non-zero elements), use this expression: accu(X != 0)
- matrix 2-norm (spectral norm) is based on SVD, which is computationally intensive for large matrices

Examples:

fvec q = randu<fvec>(5);

float x = norm(q, 2);
float y = norm(q, "inf");

See also:

normalise( V )
normalise( V, p )

normalise( X )
normalise( X, p )
normalise( X, p, dim )

For vector V, return its normalised version (ie. having unit p-norm)

For matrix X, return its normalised version, where each column (dim = 0) or row (dim = 1) has been normalised to have unit p-norm

The p argument is optional; by default p = 2 is used

The dim argument is optional; by default dim = 0 is used

Examples:

fvec A = randu<fvec>(10);
fvec B = normalise(A);
fvec C = normalise(A, 1);

fmat X = randu<fmat>(5, 6);
fmat Y = normalise(X);
fmat Z = normalise(X, 2, 1);

See also:

pow( A, scalar )

(form 1)

Element-wise power operation: raise all elements in A to the power denoted by the given scalar

Caveat:
- to raise all elements to the power 2, use square() instead

Examples:

fmat A = randu<fmat>(5, 6);
fmat B = pow(A, 3.45);

frowvec R = randu<frowvec>(6);
frowvec S = pow(R, -1.0);

See also:
- abs()
- miscellaneous element-wise functions

repmat( A, num_copies_per_row, num_copies_per_col )

Generate a matrix by replicating matrix A in a block-like fashion

The generated matrix has the following size:

Examples:

fmat A = randu<fmat>(2, 3);

fmat B = repmat(A, 4, 5);

See also:
- reshape()
- resize()

reshape( X, n_rows, n_cols )
reshape( X, size(Y) )

Generate a vector/matrix with given size specifications, whose elements are taken from the given object in a column-wise manner; the elements in the generated object are placed column-wise (i.e. the first column is filled up before filling the second column)

The layout of the elements in the generated object will be different to the layout in the given object

If the total number of elements in the given object is less than the specified size, the remaining elements in the generated object are set to zero

If the total number of elements in the given object is greater than the specified size, only a subset of elements is taken from the given object

Caveats:
- to change the size without preserving data, use .set_size() instead, which is much faster
- to grow/shrink a matrix while preserving the elements as well as the layout of the elements, use resize() instead
- to flatten a matrix into a vector, use vectorise() instead

Examples:

fmat A = randu<fmat>(10, 5);

fmat B = reshape(A, 5, 10);

See also:
- .reshape() (member function)
- .set_size()
- resize()
- vectorise()
- as_scalar()
- conv_to()
- diagmat()
- repmat()
- size()

resize( X, n_rows, n_cols )
resize( X, size(Y) )

Generate a vector/matrix with given size specifications, whose elements as well as the layout of the elements are taken from the given object

Caveat: to change the size without preserving data, use .set_size() instead, which is much faster

Examples:

fmat A = randu<fmat>(4, 5);

fmat B = resize(A, 7, 6);

See also:
- .resize() (member function of Mat)
- .set_size() (member function of Mat)
- reshape()
- vectorise()
- as_scalar()
- conv_to()
- repmat()
- size()

size( X )
size( n_rows, n_cols )

Obtain the dimensions of object X, or explicitly specify the dimensions

The dimensions can be used in conjunction with:

The dimensions support simple arithmetic operations; they can also be printed and compared for equality/inequality

Caveat: to prevent interference from std::size() in C++17, preface Bandicoot's size() with the coot namespace qualification, eg. coot::size(X)

Examples:

fmat A(5,6);

fmat B = zeros<fmat>(size(A));

fmat C;
C.randu(size(A));

fmat D = ones<fmat>(size(A));

fmat E = ones<fmat>(10, 20);
E(3, 4, size(C)) = C;    // access submatrix of E

fmat F( size(A) + size(E) );

fmat G( size(A) * 2 );

cout << "size of A: " << size(A) << endl;

bool is_same_size = (size(A) == size(E));

See also:
- attributes

sort( V )
sort( V, sort_direction )

sort( X )
sort( X, sort_direction )
sort( X, sort_direction, dim )

For vector V, return a vector which is a sorted version of the input vector

For matrix X, return a matrix with the elements of the input matrix sorted in each column (dim = 0), or each row (dim = 1)

The dim argument is optional; by default dim = 0 is used

The sort_direction argument is optional; sort_direction is either "ascend" or "descend"; by default "ascend" is used

The sorting algorithm used is radix sort

Examples:

fmat A = randu<fmat>(10, 10);
fmat B = sort(A);

See also:
- sort_index()
- randi()

sort_index( X )
sort_index( X, sort_direction )

stable_sort_index( X )
stable_sort_index( X, sort_direction )

Return a vector which describes the sorted order of the elements of X (i.e. it contains the indices of the elements of X)

The output vector must have the type uvec (i.e. the indices are stored as unsigned integers of type uword)

X is interpreted as a vector, with column-by-column ordering of the elements of X

The sort_direction argument is optional; sort_direction is either "ascend" or "descend"; by default "ascend" is used

The stable_sort_index() variant preserves the relative order of elements with equivalent values

The sorting algorithm used is radix sort

Examples:

fvec q = randu<fvec>(10);

uvec indices = sort_index(q);

See also:
- sort()
- find()

sum( V )
sum( M )
sum( M, dim )

For vector V, return the sum of all elements

For matrix M, return the sum of elements in each column (dim = 0), or each row (dim = 1)

The dim argument is optional; by default dim = 0 is used

Caveat: to get a sum of all the elements regardless of the object type (i.e. vector or matrix), use accu() instead

Examples:

fcolvec v = randu<fcolvec>(10);
float x = sum(v);

fmat M = randu<fmat>(10, 10);

frowvec a = sum(M);
frowvec b = sum(M, 0);
fcolvec c = sum(M, 1);

float y = accu(M);   // find the overall sum regardless of object type

See also:
- accu()
- trace()
- mean()
- as_scalar()

symmatu( A )
symmatl( A )

symmatu(A): generate symmetric matrix from square matrix A, by reflecting the upper triangle to the lower triangle

symmatl(A): generate symmetric matrix from square matrix A, by reflecting the lower triangle to the upper triangle

If A is non-square, a std::logic_error exception is thrown

Examples:

fmat A = randu<fmat>(5, 5);

fmat B = symmatu(A);
fmat C = symmatl(A);

See also:
- diagmat()
- Symmetric matrix in Wikipedia

trace( X )

Sum of the elements on the main diagonal of matrix X

If X is an expression, the evaluation of the expression aims to calculate only the diagonal elements

Examples:

fmat A = randu<fmat>(5, 5);

float x = trace(A);

See also:
- accu()
- as_scalar()
- .diag()
- diagvec()
- diagmat()
- sum()

trans( A )
strans( A )

Compute a transposed copy of the matrix

Examples:

fmat A = randu<fmat>(5, 10);

fmat B = trans(A);
fmat C = A.t();    // equivalent to trans(A), but more compact

See also:

vectorise( X )
vectorise( X, dim )

Generate a flattened version of matrix X

The argument dim is optional; by default dim = 0 is used

For dim = 0, the elements are copied from X column-wise, resulting in a column vector; equivalent to concatenating all the columns of X

For dim = 1, the elements are copied from X row-wise, resulting in a row vector; equivalent to concatenating all the rows of X

Caveat: column-wise vectorisation is faster than row-wise vectorisation

Examples:

fmat X = randu<fmat>(4, 5);

fvec v = vectorise(X);

See also:

miscellaneous element-wise functions:

exp

log

square

floor

erf

sign

exp2

log2

sqrt

ceil

erfc

lgamma

exp10

log10

round

trunc_exp

trunc_log

trunc

Apply a function to each element

Usage:

B = fn(A), where fn(A) is one of the functions below
A and B must have the same matrix or vector type, such as fmat or ivec

exp(A) base-e exponential: e^x

exp2(A) base-2 exponential: 2^x

exp10(A) base-10 exponential: 10^x

trunc_exp(A) base-e exponential, truncated to avoid infinity (only for float and double elements)

log(A) natural log: log_e x

log2(A) base-2 log: log₂ x

log10(A) base-10 log: log₁₀ x

trunc_log(A) natural log, truncated to avoid ±infinity (only for float and double elements)

square(A) square: x²

sqrt(A) square root: √x

floor(A) largest integral value that is not greater than the input value

ceil(A) smallest integral value that is not less than the input value

round(A) round to nearest integer, with halfway cases rounded away from zero

trunc(A) round to nearest integer, towards zero

erf(A) error function (only for float and double elements)

erfc(A) complementary error function (only for float and double elements)

lgamma(A) natural log of the absolute value of gamma function (only for float and double elements)

sign(A)

signum function; for each element a in A, the corresponding element b in B is:

	⎧	−1	if a < 0
b =	⎨	0	if a = 0
	⎩	+1	if a > 0

if a is complex and non-zero, then b = a / abs(a)

Caveat: all of the above functions are applied element-wise, where each element is treated independently

Examples:

fmat A = randu<fmat>(5, 5);
fmat B = exp(A);

See also:

trigonometric element-wise functions (cos, sin, tan, ...)

For single argument functions, B = trig_fn(A), where trig_fn is applied to each element in A, with trig_fn as one of:
- cos, acos, cosh, acosh
- sin, asin, sinh, asinh
- tan, atan, tanh, atanh
- sinc, defined as sinc(x) = sin(πx) / (πx) for x ≠ 0, and sinc(x) = 1 for x = 0

For dual argument functions, apply the function to each tuple of two corresponding elements in X and Y:
- Z = atan2(Y, X)
- Z = hypot(X, Y)

Examples:

fmat X = randu<fmat>(5, 5);
fmat Y = cos(X);

See also:

Decompositions, Factorisations, and Inverses

R = chol( X )		(form 1)
chol( R, X )		(form 2)

Cholesky decomposition of symmetric/hermitian matrix X into triangular matrix R

By default, R is upper triangular

X is required to be positive definite

The decomposition has the form X = R.t() * R

If the decomposition fails:
- the form R = chol(X) resets R and throws a std::runtime_error exception
- the form chol(X, R) resets R and returns a bool set to false (exception is not thrown)

Caveat: there is no explicit check that X is symmetric or positive definite

Examples:

fmat A = randu<fmat>(5, 5);
fmat X = A.t() * A;

mat R1 = chol(X);
mat R2;
bool ok = chol(R2, X);

See also:

vec eigval = eig_sym( X )

eig_sym( eigval, X )

eig_sym( eigval, eigvec, X )

Eigendecomposition symmetric/hermitian matrix X

The eigenvalues and corresponding eigenvectors are stored in eigval and eigvec, respectively

The eigenvalues are in ascending order

The eigenvectors are stored as column vectors

If X is not square sized, a std::logic_error exception is thrown

If the decomposition fails:
- eigval = eig_sym(X) resets eigval and throws a std::runtime_error exception
- eig_sym(eigval,X) resets eigval and returns a bool set to false (exception is not thrown)
- eig_sym(eigval,eigvec,X) resets eigval & eigvec and returns a bool set to false (exception is not thrown)

Caveats:
- there is no explicit check whether X is symmetric/hermitian
- if eigenvectors are not necessary, it is more efficient to use a form that does not compute them (i.e. eig_sym(eigval, X))

Examples:

// for matrices with real elements

fmat A = randu<fmat>(50, 50);
fmat B = A.t()*A;  // generate a symmetric matrix

fvec eigval;
fmat eigvec;

eig_sym(eigval, eigvec, B);

See also:

lu( L, U, P, X )
lu( L, U, X )

Lower-upper decomposition (with partial pivoting) of matrix X

The first form provides a lower-triangular matrix L, an upper-triangular matrix U, and a permutation matrix P, such that P.t()*L*U = X

The second form provides permuted L and U, such that L*U = X; note that in this case L is generally not lower-triangular

If the decomposition fails:
- lu(L,U,P,X) resets L, U, P and returns a bool set to false (exception is not thrown)
- lu(L,U,X) resets L, U and returns a bool set to false (exception is not thrown)

Examples:

fmat A = randu<fmat>(5, 5);

fmat L, U, P;

lu(L, U, P, A);

fmat B = P.t() * L * U;

See also:

B = pinv( A )
B = pinv( A, tolerance )

pinv( B, A )
pinv( B, A, tolerance )

Moore-Penrose pseudo-inverse (generalised inverse) of matrix A

The computation is based on singular value decomposition

The tolerance argument is optional

The default tolerance is set to max_rc · max_sv · epsilon, where:
- mar_rc = max(A.n_rows, A.n_cols)
- max_sv = maximum singular value of A
- epsilon = difference between 1 and the least value greater than 1 that is representable

Any singular values less than tolerance are treated as zero

If the decomposition fails:
- B = pinv(A) resets B and throws a std::runtime_error exception
- pinv(B,A) resets B and returns a bool set to false (exception is not thrown)

Examples:

fmat A = randu<fmat>(4, 5);

fmat B = pinv(A);        // use default tolerance

fmat C = pinv(A, 0.01);  // set tolerance to 0.01

See also:

s = svd( X )

svd( s, X )

svd( U, s, V, X )

Singular value decomposition of matrix X into vector of singular values s and matrices of left/right singular vectors U, V

If X is square, it can be reconstructed using X = U*diagmat(s)*V.t()

The singular values are in descending order

If the decomposition fails, the output objects are reset and:
- s = svd(X) resets s and throws a std::runtime_error exception
- svd(s,X) resets s and returns a bool set to false (exception is not thrown)
- svd(U,s,V,X) resets U, s, V and returns a bool set to false (exception is not thrown)

Examples:

fmat X = randu<fmat>(5, 5);

fmat U;
fvec s;
fmat V;

svd(U, s, V, X);

See also:

Signal & Image Processing

conv( A, B )
conv( A, B, shape )

1D convolution of vectors A and B

The orientation of the result vector is the same as the orientation of A (ie. either column or row vector)

The shape argument is optional; it is one of:

`"full"`	=	return the full convolution (default setting), with the size equal to A.n_elem + B.n_elem - 1
`"same"`	=	return the central part of the convolution, with the same size as vector A

The convolution operation is also equivalent to FIR filtering

Examples:

fvec A = randu<fvec>(256);

fvec B = randu<fvec>(16);

fvec C = conv(A, B);

fvec D = conv(A, B, "same");

See also:

conv2( A, B )
conv2( A, B, shape )

2D convolution of matrices A and B

The shape argument is optional; it is one of:

Examples:

fmat A = randu<fmat>(256, 256);

fmat B = randu<fmat>(16, 16);

fmat C = conv2(A, B);

fmat D = conv2(A, B, "same");

See also:

Statistics

mean, median, stddev, var, range

mean( V ) mean( M ) mean( M, dim )		⎫ ⎪ ⎬ mean (average value) ⎪ ⎭
median( V ) median( M ) median( M, dim )		⎫ ⎬ median ⎭
stddev( V ) stddev( V, norm_type ) stddev( M ) stddev( M, norm_type ) stddev( M, norm_type, dim )		⎫ ⎪ ⎬ standard deviation ⎪ ⎭
var( V ) var( V, norm_type ) var( M ) var( M, norm_type ) var( M, norm_type, dim )		⎫ ⎪ ⎬ variance ⎪ ⎭
range( V ) range( M ) range( M, dim )		⎫ ⎬ range (difference between max and min) ⎭

For vector V, return the statistic calculated using all the elements of the vector

For matrix M, find the statistic for each column (dim = 0), or each row (dim = 1)

The dim argument is optional; by default dim = 0 is used

The norm_type argument is optional; by default norm_type = 0 is used

For the var() and stddev() functions:
- the default norm_type = 0 performs normalisation using N-1 (where N is the number of samples), providing the best unbiased estimator
- using norm_type = 1 performs normalisation using N, which provides the second moment around the mean

Caveat: to obtain statistics for integer matrices/vectors (eg. umat, imat, uvec, ivec), convert to a matrix/vector with floating point values (eg. mat, vec) using the conv_to() function

Examples:

fmat A = randu<fmat>(5, 5);

fmat B  = mean(A);
fmat C  = var(A);
float m = mean(mean(A));

fvec v = randu<fvec>(5);
float x = var(v);

See also:

cov( X, Y )
cov( X, Y, norm_type )

cov( X )
cov( X, norm_type )

For two matrix arguments X and Y, if each row of X and Y is an observation and each column is a variable, the (i,j)-th entry of cov(X,Y) is the covariance between the i-th variable in X and the j-th variable in Y

For vector arguments, the type of vector is ignored and each element in the vector is treated as an observation

For matrices, X and Y must have the same dimensions

For vectors, X and Y must have the same number of elements

cov(X) is equivalent to cov(X, X)

The norm_type argument is optional; by default norm_type = 0 is used

the norm_type argument controls the type of normalisation used, with N denoting the number of observations:
- for norm_type = 0, normalisation is done using N-1, providing the best unbiased estimation of the covariance matrix (if the observations are from a normal distribution)
- for norm_type = 1, normalisation is done using N, which provides the second moment matrix of the observations about their mean

Examples:

fmat X = randu<fmat>(4, 5);
fmat Y = randu<fmat>(4, 5);

fmat C = cov(X, Y);
fmat D = cov(X, Y, 1);

See also:

cor( X, Y )
cor( X, Y, norm_type )

cor( X )
cor( X, norm_type )

For two matrix arguments X and Y, if each row of X and Y is an observation and each column is a variable, the (i,j)-th entry of cor(X,Y) is the correlation coefficient between the i-th variable in X and the j-th variable in Y

For vector arguments, the type of vector is ignored and each element in the vector is treated as an observation

For matrices, X and Y must have the same dimensions

For vectors, X and Y must have the same number of elements

cor(X) is equivalent to cor(X, X)

The norm_type argument is optional; by default norm_type = 0 is used

the norm_type argument controls the type of normalisation used, with N denoting the number of observations:
- for norm_type = 0, normalisation is done using N-1
- for norm_type = 1, normalisation is done using N

Examples:

fmat X = randu<fmat>(4, 5);
fmat Y = randu<fmat>(4, 5);

fmat R = cor(X, Y);
fmat S = cor(X, Y, 1);

See also:

Miscellaneous

backend configuration

Bandicoot can use either CUDA or OpenCL as a hardware backend

To enable CUDA or OpenCL, set the COOT_USE_CUDA or COOT_USE_OPENCL macros in the Bandicoot configuration

If both backends are enabled, select the default backend by setting the COOT_DEFAULT_BACKEND macro to the desired backend (e.g. #define COOT_BACKEND CL_BACKEND)

By default, at the time of first usage, Bandicoot will automatically initialise to use the first available device with the default backend

Bandicoot can also be manually initialised using the coot_init() function:

coot_init( )		default initialization
coot_init( print_info )		initialize to default backend, optionally printing information about the chosen GPU device
coot_init( "opencl", print_info )		initialize to OpenCL backend; `COOT_USE_OPENCL` must be enabled
coot_init( "opencl", print_info, platform_id, device_id )		use a specific OpenCL platform ID and device ID
coot_init( "cuda", print_info )		initialize to CUDA backend; `COOT_USE_CUDA` must be enabled
coot_init( "cuda", print_info, device_id )		use specific CUDA device ID

coot_init() returns a boolean indicating whether or not initialisation was successful

if print_info is set to true, information about the selected GPU device will be printed

for the "opencl" initialisations, platform_id and device_id specify the desired OpenCL platform and device IDs; available platforms and devices can be listed using the clinfo command-line utility, available in most package managers: clinfo -l

for the "cuda" initialisations, device_id specifies the desired CUDA device; available device IDs can be listed with the nvidia-smi command-line utility

Caveats:
- calling coot_init() manually must be done before any other Bandicoot operations
- coot_init() can only be called once
- if either caveat above is violated when calling coot_init(), a std::runtime_error exception will be thrown
At any time, all asychronous operations can be forced to complete by calling coot_synchronise()

See also:
- CUDA in Wikipedia
- OpenCL in Wikipedia

constants (pi, inf, eps, ...)

`datum::pi`		π, the ratio of any circle's circumference to its diameter
`datum::tau`		τ, the ratio of any circle's circumference to its radius (equivalent to 2π)
`datum::inf`		∞, infinity
`datum::nan`		“not a number” (NaN); caveat: NaN is not equal to anything, even itself

`datum::eps`		machine epsilon; approximately 2.2204e-16; difference between 1 and the next representable value
`datum::e`		base of the natural logarithm
`datum::sqrt2`		square root of 2

`datum::log_min`		log of minimum non-zero value (type and machine dependent)
`datum::log_max`		log of maximum value (type and machine dependent)
`datum::euler`		Euler's constant, aka Euler-Mascheroni constant

`datum::gratio`		golden ratio
`datum::m_u`		atomic mass constant (in kg)
`datum::N_A`		Avogadro constant

`datum::k`		Boltzmann constant (in joules per kelvin)
`datum::k_evk`		Boltzmann constant (in eV/K)
`datum::a_0`		Bohr radius (in meters)

`datum::mu_B`		Bohr magneton
`datum::Z_0`		characteristic impedance of vacuum (in ohms)
`datum::G_0`		conductance quantum (in siemens)

`datum::k_e`		Coulomb's constant (in meters per farad)
`datum::eps_0`		electric constant (in farads per meter)
`datum::m_e`		electron mass (in kg)

`datum::eV`		electron volt (in joules)
`datum::ec`		elementary charge (in coulombs)
`datum::F`		Faraday constant (in coulombs)

`datum::alpha`		fine-structure constant
`datum::alpha_inv`		inverse fine-structure constant
`datum::K_J`		Josephson constant

`datum::mu_0`		magnetic constant (in henries per meter)
`datum::phi_0`		magnetic flux quantum (in webers)
`datum::R`		molar gas constant (in joules per mole kelvin)

`datum::G`		Newtonian constant of gravitation (in newton square meters per kilogram squared)
`datum::h`		Planck constant (in joule seconds)
`datum::h_bar`		Planck constant over 2 pi, aka reduced Planck constant (in joule seconds)

`datum::m_p`		proton mass (in kg)
`datum::R_inf`		Rydberg constant (in reciprocal meters)
`datum::c_0`		speed of light in vacuum (in meters per second)

`datum::sigma`		Stefan-Boltzmann constant
`datum::R_k`		von Klitzing constant (in ohms)
`datum::b`		Wien wavelength displacement law constant

The constants are stored in the Datum<type> class, where type is either float or double;
for convenience, Datum<double> is typedefed as datum, and Datum<float> is typedefed as fdatum

Caveat: datum::nan is not equal to anything, even itself; to check whether a scalar x is finite, use std::isfinite(x)

The physical constants were mainly taken from NIST 2018 CODATA values, and some from WolframAlpha (as of 2009-06-23)

Examples:

cout << "speed of light = " << datum::c_0 << endl;

cout << "log_max for floats = ";
cout << fdatum::log_max << endl;

cout << "log_max for doubles = ";
cout << datum::log_max << endl;

See also:
- .fill()
- NaN in Wikipedia
- physical constant in Wikipedia
- replacement of 2π with τ in Wikipedia
- The Tau Manifesto by Michael Hartl
- std::numeric_limits in cplusplus.com
- std::numeric_limits in cppreference.com

wall_clock

Simple timer class for measuring the number of elapsed seconds

An instance of the class has two member functions:

`.tic()`		start the timer
`.toc()`		return the number of seconds since the last call to `.tic()`

Examples:

wall_clock timer;

timer.tic();

// ... do something ...

double n = timer.toc();

cout << "number of seconds: " << n << endl;

See also:
- elapsed real time in Wikipedia

output streams

The default stream for printing matrices and cubes is std::cout
the stream can be changed via the COOT_COUT_STREAM define; see config.hpp

The default stream for printing warnings and errors is std::cerr
the stream can be changed via the COOT_CERR_STREAM define; see config.hpp

Whether warnings are printed is controlled by the COOT_PRINT_ERRORS and COOT_DONT_PRINT_ERRORS defines; see config.hpp

The COOT_DONT_PRINT_ERRORS define takes precedence over the COOT_PRINT_ERRORS define

See also:

uword, sword

uword is a typedef for an unsigned integer type; it is used for matrix indices as well as all internal counters and loops

sword is a typedef for a signed integer type

The minimum width of both uword and sword is either 32 or 64 bits:
- the default width is 32 bits on 32-bit platforms
- the default width is 64 bits on 64-bit platforms
- on most systems, uword is a typedef for size_t

Caveat: the Bandicoot uword and sword types are not guaranteed to be the same as the Armadillo arma::uword and arma::sword types

See also:
- C++ variable types
- explanation of typedef
- imat & umat matrix types
- ivec & uvec vector types

Examples of Matlab/Octave syntax and conceptually corresponding Bandicoot syntax

Matlab/Octave	Bandicoot	Notes

`A(1, 1)`	`A(0, 0)`	indexing in Bandicoot starts at 0
`A(k, k)`	`A(k-1, k-1)`

`size(A,1)`	`A.n_rows`	read only
`size(A,2)`	`A.n_cols`
`numel(A)`	`A.n_elem`

`A(:, k)`	`A.col(k)`	this is a conceptual example only; exact conversion from Matlab/Octave to Bandicoot syntax will require taking into account that indexing starts at 0
`A(k, :)`	`A.row(k)`
`A(:, p:q)`	`A.cols(p, q)`
`A(p:q, :)`	`A.rows(p, q)`
`A(p:q, r:s)`	`A( span(p,q), span(r,s) )`	A( span(first_row, last_row), span(first_col, last_col) )

`A'`	`A.t() or trans(A)`	matrix transpose / Hermitian transpose

`A = zeros(size(A))`	`A.zeros()`
`A = ones(size(A))`	`A.ones()`
`A = zeros(k)`	`A = zeros<fmat>(k,k)`
`A = ones(k)`	`A = ones<fmat>(k,k)`

`A .* B`	`A % B`	element-wise multiplication
`A ./ B`	`A / B`	element-wise division
`A = A + 1;`	`A++`
`A = A - 1;`	`A--`

`X = A(:)`	`X = vectorise(A)`
`X = [ A B ]`	`X = join_horiz(A,B)`
`X = [ A; B ]`	`X = join_vert(A,B)`

`A`	`cout << A << endl;` or `A.print("A =");`
`A = randn(2,3); B = randn(4,5);`	`fmat A = randn(2,3); fmat B = randn(4,5);`

Armadillo/Bandicoot conversion guide

Bandicoot is meant to be a GPU-accelerated linear algebra library that is API-compatible with Armadillo and thus can function as a drop-in replacement; however, due to the different architecture of the GPU and other constraints, it is not always a benefit to use Bandicoot instead of Armadillo

The first run of any Bandicoot program requires compiling all Bandicoot kernel functions for the given device, which can be a time-consuming process; kernels are cached and subsequent runs will use the cache

Upgrading Bandicoot versions may incur recompilation of kernels
Using a new backend for the first time may incur recompilation of kernels
Using a new device may incur recompilation of kernels
For more information see the kernel cache documentation

Where possible, use batch operations with Bandicoot; e.g., use A += 1 instead of for (uword i = 0; i < A.n_elem; ++i) { A[i] += 1; }

GPUs are best suited for operations on large matrices, so small matrices (e.g. less than 100 elements) may not show significant speedup

Individual element access (such as A.at(i, j)) requires a transfer between the GPU and CPU; when adapting Armadillo code to Bandicoot, these should be avoided wherever possible
- If such operations cannot be avoided, consider temporarily transferring the entire Bandicoot matrix back to memory by creating an Armadillo matrix with conv_to<arma::fmat>() or similar
- For this reason, unlike Armadillo, Bandicoot does not provide iterators: they are guaranteed to be inefficient
Consumer-level GPUs are not designed for intensive linear algebra operations and thus may not show significant speedup; the best results will be obtained with high-end hardware

Most GPUs show better performance with 32-bit floating point elements (e.g. float instead of double), so using fmat instead of mat is recommended wherever possible

If support you need for a conversion is not available, please file a bug report so that the support can be prioritised

example program

#include <iostream>
#include <bandicoot>

using namespace std;
using namespace coot;

int main()
  {
  fmat A = randu<fmat>(4, 5);
  fmat B = randu<fmat>(4, 5);

  cout << A * B.t() << endl;

  return 0;
  }

If the above program is stored as example.cpp, under Linux and macOS it can be compiled using:

Bandicoot extensively uses template meta-programming, so it's recommended to enable optimisation when compiling programs (eg. use the -O2 or -O3 options for GCC or clang)

See the Questions page for more info on compiling and linking

If coming from Armadillo, be sure to check the Armadillo/Bandicoot differences for advice on writing efficient code

See also the example program that comes with the Bandicoot archive

config.hpp

Bandicoot can be configured via editing the file include/bandicoot_bits/config.hpp

Specific functionality can be enabled or disabled by uncommenting or commenting out a particular #define, listed below.

Some options can also be specified by explicitly defining them before including the bandicoot header.

`COOT_DONT_USE_WRAPPER`		Disable going through the run-time Bandicoot wrapper library (libbandicoot.so) when calling GPU-specific functions. Overrides `COOT_USE_WRAPPER`. You will need to directly link with GPU libraries (e.g. `-lOpenCL -lclBLAS` or similar depending on backend configuration)

`COOT_USE_WRAPPER`		Enable use of Bandicoot wrapper library, which allows linking against all enabled backends with `-lbandicoot` only.

`COOT_USE_OPENCL`		Enable use of OpenCL as a GPU backend. Note that either `COOT_USE_OPENCL` or `COOT_USE_CUDA` must be enabled. OpenCL headers and clBLAS headers must be available on the system.

`COOT_USE_CUDA`		Enable use of CUDA as a GPU backend. Note that either `COOT_USE_OPENCL` or `COOT_USE_CUDA` must be enabled. The CUDA toolkit must be available on the system.

`COOT_DEFAULT_BACKEND`		Set the backend that Bandicoot will use. This is only necessary if multiple backends are enabled; that is, when both `COOT_USE_OPENCL` and `COOT_USE_CUDA` are enabled. This should be set to either `CUDA_BACKEND` or `CL_BACKEND` (e.g. `#define COOT_BACKEND CUDA_BACKEND`). See also the backend configuration documentation.

`COOT_USE_OPENMP`		Use OpenMP for parallelisation of some CPU-based parts of Bandicoot functionalities. Automatically enabled when using a compiler which has OpenMP 3.1+ active (eg. the `-fopenmp` option for gcc and clang). Note: this may not have a noticeable effect on performance since most Bandicoot implementations do not use the CPU heavily or at all.

`COOT_DONT_USE_OPENMP`		Disable use of OpenMP for parallelisation; overrides `COOT_USE_OPENMP`.

`COOT_KERNEL_CACHE_DIR`		If defined, specifies a custom directory to use for the kernel cache. Distribution packagers may choose to specify `COOT_SYSTEM_KERNEL_CACHE_DIR`, though it is overridden by `COOT_KERNEL_CACHE_DIR` if specified.

`COOT_BLAS_CAPITALS`		Use capitalised (uppercase) BLAS and LAPACK function names (eg. DGEMM vs dgemm)

`COOT_BLAS_UNDERSCORE`		Append an underscore to BLAS and LAPACK function names (eg. dgemm_ vs dgemm). Enabled by default.

`COOT_BLAS_LONG`		Use "long" instead of "int" when calling BLAS and LAPACK functions

`COOT_BLAS_LONG_LONG`		Use "long long" instead of "int" when calling BLAS and LAPACK functions

`COOT_NO_DEBUG`		Disable all run-time checks, including size conformance and bounds checks. NOT RECOMMENDED. DO NOT USE UNLESS YOU KNOW WHAT YOU ARE DOING AND ARE WILLING TO RISK THE DOWNSIDES. Keeping run-time checks enabled during development and deployment greatly aids in finding mistakes in your code.

`COOT_EXTRA_DEBUG`		Print out the trace of internal functions used for evaluating expressions. Not recommended for normal use. This is mainly useful for debugging the library.

`COOT_COUT_STREAM`		The default stream used for printing matrices and cubes by .print(). Must be always enabled. By default defined to std::cout

`COOT_CERR_STREAM`		The default stream used for printing warnings and errors. Must be always enabled. By default defined to std::cerr

See also:

direct linking

If COOT_USE_WRAPPER is not defined (or COOT_DONT_USE_WRAPPER is defined), then Bandicoot will need to be linked against all dependencies of its backends

Unfortunately this could be a lot of dependencies depending on configuration options; so, enabling COOT_USE_WRAPPER is the default and is recommended

Regardless of backend configuration, these libraries must always be linked against:

-larmadillo (assuming ARMA_USE_WRAPPER is enabled; see the Armadillo linking FAQ)

If COOT_USE_OPENCL is set (i.e. the OpenCL backend is enabled), these libraries must be linked against:
- -lOpenCL (core OpenCL support)
- -lclBLAS (clBLAS for BLAS operations)
If COOT_USE_CUDA is set (i.e. the CUDA backend is enabled), these libraries must be linked against:
- -lcuda (core CUDA support)
- -lcudart (CUDA runtime library)
- -lnvrtc (runtime compilation of CUDA kernels)
- -lcublas (cuBLAS for BLAS operations)
- -lcusolver (cuSolverDn for decompositions and factorisations)
- -lcurand (cuRand for random number generation)

kernel cache

In order to perform GPU-based linear algebra, Bandicoot must first compile GPU kernel functions to a particular device

The first time Bandicoot is run on a system, all GPU kernel functions will be compiled; this can take a long time! (usually less than 3-5 minutes)

Compiled kernels are stored in disk in the kernel cache for later reuse

Compiled kernels are specific to Bandicoot version, backend, and device; thus, if any of those three factors change, recompilation will be triggered; see the backend configuration documentation for more details

The default location to store the kernel cache is
- ~/.bandicoot/cache/ on Linux and OS X and UNIX-like systems
- %APPDATA%\bandicoot\cache on Windows (e.g. C:\Users\Username\AppData\bandicoot\cache)
Custom locations can be specified with the COOT_KERNEL_CACHE_DIR configuration variable

History of API Additions, Changes and Deprecations

API Stability and Version Policy:
- Each release of Bandicoot has its public API (functions, classes, constants) described in the accompanying API documentation specific to that release.
- Each release of Bandicoot has its full version specified as A.B.C, where A is a major version number, B is a minor version number, and C is a patch level (indicating bug fixes). The version specification has explicit meaning, similar to Semantic Versioning, as follows:
- Caveat: the above policy applies only to the public API described in the documentation. Any functionality within Bandicoot which is not explicitly described in the public API documentation is considered as internal implementation details, and may be changed or removed without notice.

List of additions and changes for each version:
- Version 1.0:
  - first stable release!