[top]
API Documentation for Bandicoot 1.0
Preamble
|
|
|
-
Please cite the following papers if you use Bandicoot in your research and/or software.
Citations are useful for the continued development and maintenance of the library.
TODO: a technical report!
|
Overview
Matrix and Vector Classes
Member Functions & Variables
attributes | | .n_rows, .n_cols, .n_elem, .n_slices, ... |
element access | | element/object access via (), [] and .at() |
| | |
.zeros | | set all elements to zero |
.ones | | set all elements to one |
.eye | | set elements along main diagonal to one and off-diagonal elements to zero |
.randu / .randn | | set all elements to random values |
| | |
.fill | | set all elements to specified value |
| | |
.clamp | | clamp values to lower and upper limits |
| | |
.set_size | | change size without keeping elements (fast) |
.reshape | | change size while keeping elements |
.resize | | change size while keeping elements and preserving layout |
.reset | | change size to empty |
| | |
submatrix views | | read/write access to contiguous and non-contiguous submatrices |
| | |
.get_dev_mem() | | get underlying raw GPU memory pointer |
| | |
.diag | | read/write access to matrix diagonals |
| | |
.t / .st | | return matrix transpose |
.eval | | force evaluation of delayed expression |
| | |
.is_empty | | check whether object is empty |
.is_vec | | check whether matrix is a vector |
| | |
.is_square | | check whether matrix is square sized |
| | |
.print | | print object to std::cout or user specified stream |
.raw_print | | print object without formatting |
Generated Vectors / Matrices
linspace | | generate vector with linearly spaced elements |
eye | | generate identity matrix |
ones | | generate object filled with ones |
zeros | | generate object filled with zeros |
randu | | generate object with random values (uniform distribution) |
randn | | generate object with random values (normal distribution) |
randi | | generate object with random integer values in specified interval |
Functions of Vectors / Matrices
abs | | obtain magnitude of each element |
accu | | accumulate (sum) all elements |
all | | check whether all elements are non-zero, or satisfy a relational condition |
any | | check whether any element is non-zero, or satisfies a relational condition |
as_scalar | | convert 1x1 matrix to pure scalar |
clamp | | obtain clamped elements according to given limits |
conv_to | | convert/cast between matrix types |
cross | | cross product |
det | | determinant |
diagmat | | generate diagonal matrix from given matrix or vector |
diagvec | | extract specified diagonal |
dot | | dot product |
find | | find indices of non-zero elements, or elements satisfying a relational condition |
find_finite | | find indices of finite elements |
find_nonfinite | | find indices of non-finite elements |
find_nan | | find indices of NaN elements |
join_rows / join_cols | | concatenation of matrices |
min / max | | return extremum values |
norm | | various norms of vectors and matrices |
normalise | | normalise vectors to unit p-norm |
pow | | element-wise power |
repmat | | replicate matrix in block-like fashion |
reshape | | change size while keeping elements |
resize | | change size while keeping elements and preserving layout |
size | | obtain dimensions of given object |
sort | | sort elements |
sort_index | | vector describing sorted order of elements |
sum | | sum of elements |
symmatu / symmatl | | generate symmetric matrix from given matrix |
trace | | sum of diagonal elements |
trans | | transpose of matrix |
vectorise | | flatten matrix into vector |
misc functions | | miscellaneous element-wise functions: exp, log, sqrt, round, sign, ... |
trig functions | | trigonometric element-wise functions: cos, sin, tan, ... |
Decompositions, Factorisations, and Inverses
chol | | Cholesky decomposition |
eig_sym | | eigen decomposition of dense symmetric/hermitian matrix |
lu | | lower-upper decomposition |
pinv | | pseudo-inverse / generalised inverse |
svd | | singular value decomposition |
Signal & Image Processing
Statistics
Miscellaneous
Matrix and Vector Classes
Mat<type>
fmat
mat
-
Classes for dense matrices, with elements stored in column-major ordering (ie. column by column) on the GPU
-
The root matrix class is Mat<type>, where type is one of:
-
float, double,
short, int, long, and unsigned versions of short, int, long
-
Bandicoot provides convenient u32, u64, s32, and s64 types that can also be used
-
Important: not all types are supported on all devices; runtime exceptions will be thrown if a type is not supported
-
For convenience the following typedefs have been defined:
fmat
|
=
|
Mat<float>
|
|
|
mat
|
=
|
Mat<double>
|
|
note: not supported on all devices
|
dmat
|
=
|
Mat<double>
|
|
note: not supported on all devices
|
umat
|
=
|
Mat<uword>
|
|
|
imat
|
=
|
Mat<sword>
|
|
|
u32_mat
|
=
|
Mat<u32>
|
|
|
s32_mat
|
=
|
Mat<s32>
|
|
|
u64_mat
|
=
|
Mat<u64>
|
|
|
s64_mat
|
=
|
Mat<s64>
|
|
|
-
In this documentation the fmat type is used for convenience, speed, and portability;
it is possible to use other types instead, eg. mat
-
Note that standard consumer GPUs may not have support for 64-bit floats (
double ),
and if they do, they may not show speedup over CPU-based Armadillo matrices
unless they are high-end GPUs
-
Functions which use more complex functionality (generally matrix decompositions) are only valid for the following types:
fmat, dmat, mat
-
Constructors:
fmat() | | |
fmat(n_rows, n_cols) | | |
fmat(size(X)) | | |
fmat(fmat) | | |
fmat(arma::fmat) | | (convert from CPU-based Armadillo matrix) |
fmat(fvec) | | |
fmat(frowvec) | | |
-
Caveat:
-
Each instance of fmat automatically allocates and releases internal memory on the GPU.
All internally allocated memory used by an instance of fmat is automatically released as soon as the instance goes out of scope.
For example, if an instance of fmat is declared inside a function, it will be automatically destroyed at the end of the function.
To forcefully release memory at any point, use .reset(); note that in normal use this is not required.
-
Advanced constructors:
fmat(ptr_aux_mem, n_rows, n_cols)
Create a matrix using data from writable auxiliary (external) memory, where ptr_aux_mem is a pointer to the memory.
This matrix will use the auxiliary memory directly (i.e., no copying);
this can be dangerous unless you know what you are doing!
The ptr_aux_mem argument should be one of:
-
a memory pointer from another Bandicoot matrix obtained with .get_dev_mem()
-
a cl_mem object, if using the OpenCL backend
-
a CUDA memory pointer (e.g. a float* for an fmat), if using the CUDA backend
-
Examples:
fmat A(5, 5);
A.randu();
float x = A(1, 2);
fmat B = A + A;
fmat C = A * B;
fmat D = A % B;
B.zeros();
B.set_size(10, 10);
B.ones(5, 6);
B.print("B:");
arma::fmat C(10, 10, arma::fill::randu);
fmat D(C);
cl_mem m_cl = clCreateBuffer(get_rt().cl_rt.get_context(), CL_MEM_READ_WRITE, sizeof(float) * 24, NULL, NULL);
fmat H(wrap_mem_cl(m_cl), 4, 6); // use auxiliary memory
float* m_cuda;
cudaMalloc(&m_cuda, sizeof(float) * 24);
fmat J(wrap_mem_cuda(m_cuda), 4, 6); // use auxiliary memory
arma::fmat K(D.get_dev_mem(), D.n_rows, D.n_cols);
See also:
Col<type>
fvec
vec
-
Classes for column vectors (dense matrices with one column)
- The Col<type> class is derived from the Mat<type> class
and inherits most of the member functions
-
For convenience the following typedefs have been defined:
fvec
|
=
|
fcolvec
|
=
|
Col<float>
|
|
|
vec
|
=
|
colvec
|
=
|
Col<double>
|
|
note: not supported on all devices
|
dvec
|
=
|
dcolvec
|
=
|
Col<double>
|
|
note: not supported on all devices
|
uvec
|
=
|
ucolvec
|
=
|
Col<uword>
|
|
|
ivec
|
=
|
icolvec
|
=
|
Col<sword>
|
|
|
u32_vec
|
=
|
u32_colvec
|
=
|
Col<u32>
|
|
|
s32_vec
|
=
|
s32_colvec
|
=
|
Col<s32>
|
|
|
u64_vec
|
=
|
u64_colvec
|
=
|
Col<u64>
|
|
|
s64_vec
|
=
|
s64_colvec
|
=
|
Col<s64>
|
|
|
-
In this documentation, the vec and colvec types have the same meaning and are used interchangeably
-
In this documentation, the types fvec or fcolvec are used for convenience, speed, and portability; it is possible to use other types instead, eg. vec, colvec
-
Note that standard consumer GPUs may not have support for 64-bit floats (
double ), and if they do, they may not show speedup over CPU-based Armadillo matrices unless they are high-end GPUs
-
Functions which take Mat as input can generally also take Col as input;
main exceptions are functions which require square matrices
-
Constructors:
fvec() | | |
fvec(n_elem) | | |
fvec(size(X)) | | |
fvec(fvec) | | |
fvec(arma::fvec) | | (convert from CPU-based Armadillo vector) |
fvec(fmat) | | (std::logic_error exception is thrown if the given matrix has more than one column) |
-
Caveat:
-
Advanced constructors:
fvec(ptr_aux_mem, number_of_elements)
Create a column vector using data from writable auxiliary (external) memory, where ptr_aux_mem is a pointer to the memory.
This vector will directly use the auxiliary memory (ie. no copying); this can be dangerous unless you know what you are doing!
The ptr_aux_mem argument should be one of:
-
a memory pointer from another Bandicoot matrix obtained with .get_dev_mem()
-
a cl_mem object, if using the OpenCL backend
-
a CUDA memory pointer (e.g. a float* for an fmat), if using the CUDA backend
-
Examples:
fvec x(10);
fvec y(10, fill::ones);
fmat A(10, 10, fill::randu);
fvec z = A.col(5);
arma::fvec d(100, arma::fill::randu);
fvec e(d);
See also:
Row<type>
frowvec
rowvec
-
Classes for row vectors (dense matrices with one row)
- The template Row<type> class is derived from the Mat<type> class
and inherits most of the member functions
-
For convenience the following typedefs have been defined:
frowvec
|
=
|
Row<float>
|
|
|
rowvec
|
=
|
Row<double>
|
|
note: not supported on all devices
|
drowvec
|
=
|
Row<double>
|
|
note: not supported on all devices
|
urowvec
|
=
|
Row<uword>
|
|
|
irowvec
|
=
|
Row<sword>
|
|
|
u32_rowvec
|
=
|
Row<u32>
|
|
|
s32_rowvec
|
=
|
Row<s32>
|
|
|
u64_rowvec
|
=
|
Row<u64>
|
|
|
s64_rowvec
|
=
|
Row<s64>
|
|
|
-
In this documentation, the frowvec type is used for convenience, speed, and portability;
it is possible to use other types instead, eg. rowvec
-
Note that standard consumer GPUs may not have support for 64-bit floats (
double ), and if they do, they may not show speedup over CPU-based Armadillo matrices unless they are high-end GPUs
-
Functions which take Mat as input can generally also take Row as input.
Main exceptions are functions which require square matrices
-
Constructors:
frowvec() | | |
frowvec(n_elem) | | |
frowvec(size(X)) | | |
frowvec(frowvec) | | |
frowvec(arma::fmat) | | (convert from CPU-based Armadillo row vector) |
frowvec(fmat) | | (std::logic_error exception is thrown if the given matrix has more than one row) |
-
Caveat:
-
Advanced constructors:
frowvec(ptr_aux_mem, number_of_elements)
Create a row vector using data from writable auxiliary (external) memory, where ptr_aux_mem is a pointer to the memory.
This vector will directly use the auxiliary memory (ie. no copying);
this can be dangerous unless you know what you are doing!
The ptr_aux_mem argument should be one of:
-
a memory pointer from another Bandicoot matrix obtained with .get_dev_mem()
-
a cl_mem object, if using the OpenCL backend
-
a CUDA memory pointer (e.g. a float* for an fmat), if using the CUDA backend
-
Examples:
frowvec x(10);
frowvec y(10, fill::ones);
fmat A(10, 10, fill::randu);
frowvec z = A.row(5);
arma::frowvec d(100, arma::fill::randu);
fvec e(d);
See also:
operators: + − * % / == != <= >= < > && ||
-
Overloaded operators for Mat, Col, and Row classes
-
Operations:
+ |
|
addition of two objects |
− |
|
subtraction of one object from another or negation of an object |
|
|
|
* |
|
matrix multiplication of two objects |
|
|
|
% |
|
element-wise multiplication of two objects (Schur product) |
/ |
|
element-wise division of an object by another object or a scalar |
|
|
|
== |
|
element-wise equality evaluation of two objects; generates a matrix of type umat |
!= |
|
element-wise non-equality evaluation of two objects; generates a matrix of type umat |
|
|
|
>= |
|
element-wise "greater than or equal to" evaluation of two objects; generates a matrix of type umat |
<= |
|
element-wise "less than or equal to" evaluation of two objects; generates a matrix/cube of type umat |
|
|
|
> |
|
element-wise "greater than" evaluation of two objects; generates a matrix of type umat |
< |
|
element-wise "less than" evaluation of two objects; generates a matrix of type umat |
|
|
|
&& |
|
element-wise logical AND evaluation of two objects; generates a matrix of type umat |
|| |
|
element-wise logical OR evaluation of two objects; generates a matrix of type umat |
-
For element-wise relational and logical operations
(ie.
== , != , >= , <= , > , < , && , || )
each element in the generated object is either 0 or 1, depending on the result of the operation
-
Caveat:
operators involving equality comparison (ie.
== , != , >= , <= )
are not recommended for matrices of type mat or fmat,
due to the necessarily limited precision of floating-point element types
-
If incompatible object sizes are used, a std::logic_error exception is thrown
-
Examples:
fmat A = randu<fmat>(5, 10);
fmat B = randu<fmat>(5, 10);
fmat C = randu<fmat>(10, 5);
fmat P = A + B;
fmat Q = A - B;
fmat R = -B;
fmat S = A / 123.0;
fmat T = A % B;
fmat U = A * C;
fmat V = A + B + A + B;
imat AA = linspace<imat>(1, 9, 9);
imat BB = linspace<imat>(9, 1, 9);
umat ZZ = (AA >= BB);
See also:
Member Functions & Variables
attributes
.n_rows
|
|
number of rows; present in Mat, Col, and Row
|
.n_cols
|
|
number of columns; present in Mat, Col, and Row
|
.n_elem
|
|
total number of elements; present in Mat, Col, and Row
|
See also:
element access via (), [] and .at()
-
Provide access to individual elements in a Mat, Col, or Row
(i)
|
|
For fvec and frowvec, access the element stored at index i.
For fmat, access the element/object stored at index i under the assumption of a flat layout,
with column-major ordering of data (i.e. column by column).
An exception is thrown if the requested element is out of bounds.
|
|
|
|
.at(i) or [i]
|
|
As for (i) , but without a bounds check; not recommended; see the caveats below
|
|
|
|
(r,c)
|
|
For fmat, access the element/object stored at row r and column c.
An exception is thrown if the requested element is out of bounds.
|
|
|
|
.at(r,c)
|
|
As for (r,c) , but without a bounds check; not recommended; see the caveats below
|
|
-
Important:
every element access involves a transfer from GPU memory to CPU memory;
therefore, for efficiency, avoid repeated element access when possible;
see the Armadillo conversion guide for more details and suggestions.
-
The indices of elements are specified via the uword type, which is a typedef for an unsigned integer type.
-
Caveats:
-
accessing elements without bounds checks is slightly faster, but is not recommended until your code has been thoroughly debugged first
-
indexing in C++ starts at 0
- accessing elements via
[r,c] does not work correctly in C++;
instead use (r,c) and (r,c,s)
-
Examples:
fmat M(10, 10);
M.randu();
M(9, 9) = 123.0;
float x = M(1, 2);
fvec v(10);
v.randu();
v(9) = 123.0;
float y = v(0);
See also:
.zeros() |
|
|
(member function of Mat, Col, and Row)
|
.zeros( n_elem ) |
|
|
(member function of Col and Row)
|
.zeros( n_rows, n_cols ) |
|
|
(member function of Mat)
|
See also:
.ones() |
|
|
(member function of Mat, Col, and Row)
|
.ones( n_elem ) |
|
|
(member function of Col and Row)
|
.ones( n_rows, n_cols ) |
|
|
(member function of Mat)
|
See also:
.eye()
.eye( n_rows, n_cols )
See also:
.randu() |
|
|
(member function of Mat, Col, and Row)
|
.randu( n_elem ) |
|
|
(member function of Col and Row)
|
.randu( n_rows, n_cols ) |
|
|
(member function of Mat)
|
.randn() |
|
|
(member function of Mat, Col, and Row)
|
.randn( n_elem ) |
|
|
(member function of Col and Row)
|
.randn( n_rows, n_cols ) |
|
|
(member function of Mat)
|
See also:
.fill( value )
See also:
.clamp( min_value, max_value )
See also:
.set_size( n_elem ) |
|
|
(member function of Col and Row)
|
.set_size( n_rows, n_cols ) |
|
|
(member function of Mat)
|
.set_size( size(X) ) |
|
|
(member function of Mat, Col, and Row)
|
See also:
.reshape( n_rows, n_cols ) |
|
|
(member function of Mat)
|
.reshape( size(X) ) |
|
|
(member function of Mat)
|
See also:
.resize( n_elem ) |
|
|
(member function of Col and Row)
|
.resize( n_rows, n_cols ) |
|
|
(member function of Mat)
|
.resize( size(X) ) |
|
|
(member function of Mat, Col, and Row)
|
See also:
.reset()
See also:
submatrix views
- A collection of member functions of Mat, Col and Row classes that provide read/write access to submatrix views
- contiguous views for matrix X:
X.col( col_number )
X.row( row_number )
X.cols( first_col, last_col )
X.rows( first_row, last_row )
X.submat( first_row, first_col, last_row, last_col )
X( span(first_row, last_row), span(first_col, last_col) )
X( first_row, first_col, size(n_rows, n_cols) )
X( first_row, first_col, size(Y) ) [ Y is a matrix ]
X( span(first_row, last_row), col_number )
X( row_number, span(first_col, last_col) )
X.head_cols( number_of_cols )
X.head_rows( number_of_rows )
X.tail_cols( number_of_cols )
X.tail_rows( number_of_rows )
-
contiguous views for vector V:
V.subvec( first_index, last_index )
V.subvec( first_index, size(W) ) [ W is a vector ]
- related matrix views (documented separately)
-
Instances of span(start,end) can be replaced by span::all to indicate the entire range
-
Examples:
fmat A(5, 10);
A.zeros();
A.submat( 0,1, 2,3 ) = randu<fmat>(3, 3);
A( span(0,2), span(1,3) ) = randu<fmat>(3, 3);
A( 0,1, size(3,3) ) = randu<fmat>(3, 3);
fmat B = A.submat( 0,1, 2,3 );
fmat C = A( span(0,2), span(1,3) );
fmat D = A( 0,1, size(3,3) );
A.col(1) = randu<fmat>(5,1);
A(span::all, 1) = randu<fmat>(5,1);
vec a(10);
a.randu();
a.subvec(a.n_elem - 5, a.n_elem - 1) += 123.0;
X.col(2).subvec(0, 2) += 123;
See also:
.get_dev_mem()
.get_dev_mem( synchronise )
See also:
.diag()
.diag( k )
See also:
.t()
.st()
See also:
.eval()
See also:
.is_empty()
See also:
.is_vec()
.is_colvec()
.is_rowvec()
-
Member functions of Mat
- .is_vec():
- returns true if the matrix can be interpreted as a vector (either column or row vector)
- returns false if the matrix does not have exactly one column or one row
- .is_colvec():
- returns true if the matrix can be interpreted as a column vector
- returns false if the matrix does not have exactly one column
- .is_rowvec():
- returns true if the matrix can be interpreted as a row vector
- returns false if the matrix does not have exactly one row
- Caveat: do not assume that the vector has elements if these functions return true; it is possible to have an empty vector (eg. 0x1)
-
Examples:
fmat A = randu<fmat>(1, 5);
fmat B = randu<fmat>(5, 1);
fmat C = randu<fmat>(5, 5);
cout << A.is_vec() << endl;
cout << B.is_vec() << endl;
cout << C.is_vec() << endl;
See also:
.is_square()
See also:
.print()
.print( header )
.print( stream )
.print( stream, header )
-
Member functions of Mat, Col, and Row
-
Print the contents of an object to the std::cout stream (default),
or a user specified stream, with an optional header string
-
Objects can also be printed using the << stream operator
-
Examples:
fmat A = randu<fmat>(5, 5);
fmat B = randu<fmat>(6, 6);
A.print();
// print a transposed version of A
A.t().print();
// "B:" is the optional header line
B.print("B:");
cout << A << endl;
cout << "B:" << endl;
cout << B << endl;
See also:
.raw_print()
.raw_print( header )
.raw_print( stream )
.raw_print( stream, header )
See also:
Generated Vectors / Matrices
linspace( start, end )
linspace( start, end, N )
-
Generate a vector with N elements;
the values of the elements are linearly spaced from start to (and including) end
-
The argument N is optional; by default N = 100
-
Usage:
- fvec v = linspace(start, end, N)
- vector_type v = linspace<vector_type>(start, end, N)
-
Caveat: for N = 1, the generated vector will have a single element equal to end
-
Examples:
fvec a = linspace(0, 5, 6);
frowvec b = linspace<frowvec>(5, 0, 6);
See also:
eye( n_rows, n_cols )
eye( size(X) )
See also:
ones( n_elem )
ones( n_rows, n_cols )
ones( size(X) )
-
Generate a vector or matrix with all elements set to one
-
Usage:
- vector_type v = ones<vector_type>( n_elem )
- matrix_type X = ones<matrix_type>( n_rows, n_cols )
- matrix_type Y = ones<matrix_type>( size(X) )
-
Examples:
fvec v = ones(10);
uvec u = ones<uvec>(10);
frowvec r = ones<frowvec>(10);
fmat A = ones(5,6);
imat B = ones<imat>(5,6);
umat C = ones<umat>(5,6);
See also:
zeros( n_elem )
zeros( n_rows, n_cols )
zeros( size(X) )
-
Generate a vector or matrix with the elements set to zero
-
Usage:
- vector_type v = zeros<vector_type>( n_elem )
- matrix_type X = zeros<matrix_type>( n_rows, n_cols )
- matrix_type Y = zeros<matrix_type>( size(X) )
-
Examples:
fvec v = zeros(10);
uvec u = zeros<uvec>(10);
frowvec r = zeros<rowvec>(10);
fmat A = zeros(5,6);
imat B = zeros<imat>(5,6);
umat C = zeros<umat>(5,6);
See also:
randu( n_elem )
randu( n_rows, n_cols )
randu( size(X) )
-
Generate a vector or matrix with the elements set to random floating point values uniformly distributed in the [0,1] interval
-
Usage:
- vector_type v = randu<vector_type>( n_elem )
- matrix_type X = randu<matrix_type>( n_rows, n_cols )
-
To change the RNG seed, use coot_rng::set_seed(value) or coot_rng::set_seed_random() functions
-
Caveat: to generate a matrix with random integer values instead of floating point values, use randi() instead
-
Examples:
fvec v1 = randu(5);
frowvec r1 = randu<rowvec>(5);
fmat A1 = randu(5, 6);
mat B1 = randu<mat>(5, 6);
mat B2 = randu<mat>(5, 6, distr_param(10,20));
coot_rng::set_seed_random();
coot_rng::set_seed(42);
See also:
randn( n_elem )
randn( n_elem, distr_param(mu,sd) )
randn( n_rows, n_cols )
randn( n_rows, n_cols, distr_param(mu,sd) )
randn( size(X) )
randn( size(X), distr_param(mu,sd) )
-
Generate a vector or matrix with the elements set to random values with normal / Gaussian distribution, parameterised by mean mu and standard deviation sd
- The default distribution parameters are mu = 0 and sd = 1
-
Usage:
- vector_type v = randn<vector_type>( n_elem )
- vector_type v = randn<vector_type>( n_elem, distr_param(mu,sd) )
- matrix_type X = randn<matrix_type>( n_rows, n_cols )
- matrix_type X = randn<matrix_type>( n_rows, n_cols, distr_param(mu,sd) )
-
To change the RNG seed, use coot_rng::set_seed(value) or coot_rng::set_seed_random() functions
-
Examples:
fvec v1 = randn(5);
fvec v2 = randn(5, distr_param(10,5));
frowvec r1 = randn<rowvec>(5);
frowvec r2 = randn<rowvec>(5, distr_param(10,5));
fmat A1 = randn(5, 6);
fmat A2 = randn(5, 6, distr_param(10,5));
mat B1 = randn<mat>(5, 6);
mat B2 = randn<mat>(5, 6, distr_param(10,5));
coot_rng::set_seed_random();
coot_rng::set_seed(42);
See also:
randi( n_elem )
randi( n_elem, distr_param(a,b) )
randi( n_rows, n_cols )
randi( n_rows, n_cols, distr_param(a,b) )
randi( size(X) )
randi( size(X), distr_param(a,b) )
-
Generate a vector or matrix with the elements set to random integer values uniformly distributed in the [a,b] interval
- The default distribution parameters are a = 0 and b = maximum_int
-
Usage:
- vector_type v = randi<vector_type>( n_elem )
- vector_type v = randi<vector_type>( n_elem, distr_param(a,b) )
- matrix_type X = randi<matrix_type>( n_rows, n_cols )
- matrix_type X = randi<matrix_type>( n_rows, n_cols, distr_param(a,b) )
-
To change the RNG seed, use coot_rng::set_seed(value) or coot_rng::set_seed_random() functions
-
Caveat: to generate a matrix with random floating point values (ie. float or double) instead of integers, use randu() instead~
-
Examples:
imat A1 = randi(5, 6);
imat A2 = randi(5, 6, distr_param(-10, +20));
fmat B1 = randi<fmat>(5, 6);
fmat B2 = randi<fmat>(5, 6, distr_param(-10, +20));
coot_rng::set_seed_random();
coot_rng::set_seed(42);
See also:
Functions of Vectors / Matrices
abs( X )
See also:
accu( X )
See also:
all( V )
all( X )
all( X, dim )
See also:
any( V )
any( X )
any( X, dim )
See also:
as_scalar( expression )
-
Evaluate an expression that results in a 1x1 matrix,
followed by converting the 1x1 matrix to a pure scalar
-
Optimised expression evaluations are automatically used when a binary or trinary expression is given (ie. 2 or 3 terms)
-
Examples:
frowvec r = randu<frowvec>(5);
fcolvec q = randu<fcolvec>(5);
mat X(5, 5, fill::randu);
float a = as_scalar(r*q);
float b = as_scalar(r*X*q);
float c = as_scalar(r*diagmat(X)*q);
float d = as_scalar(r*inv(diagmat(X))*q);
See also:
clamp( X, min_val, max_val )
-
Create a copy of X with each element clamped to the [min_val, max_val] interval;
any value lower than min_val will be set to min_val, and any value higher than max_val will be set to max_val
-
Examples:
fmat A = randu<fmat>(5, 5);
fmat B = clamp(A, 0.2, 0.8);
fmat C = clamp(A, min(min(A)), 0.8);
fmat D = clamp(A, 0.2, max(max(A)));
See also:
conv_to< type >::from( X )
See also:
cross( A, B )
See also:
val = det( A ) | | (form 1) |
det( val, A ) | | (form 2) |
See also:
diagmat( V )
diagmat( V, k )
diagmat( X )
diagmat( X, k )
-
Generate a diagonal matrix from vector V or matrix X
-
Given vector V, generate a square matrix with the k-th diagonal containing a copy of the vector; all other elements are set to zero
-
Given matrix X, generate a matrix with the k-th diagonal containing a copy of the k-th diagonal of X; all other elements are set to zero
-
If X is an expression, the evaluation of the expression aims to calculate only the diagonal elements
-
The argument k is optional; by default the main diagonal is used (k = 0)
-
For k > 0, the k-th super-diagonal is used (above main diagonal, towards top-right corner)
-
For k < 0, the k-th sub-diagonal is used (below main diagonal, towards bottom-left corner)
-
Examples:
fmat A = randu<fmat>(5, 5);
fmat B = diagmat(A);
fmat C = diagmat(A,1);
fvec v = randu<fvec>(5);
fmat D = diagmat(v);
fmat E = diagmat(v,1);
See also:
diagvec( X )
diagvec( X, k )
See also:
dot( A, B )
See also:
find( X )
find( X, k )
find( X, k, s )
- Return a column vector containing the indices of elements of X that are non-zero or satisfy a relational condition
- The output vector must have the type uvec
(i.e. the indices are stored as unsigned integers of type uword)
-
X is interpreted as a vector, with column-by-column ordering of the elements of X
- Relational operators can be used instead of X, eg. A > 0.5
- If k = 0 (default), return the indices of all non-zero elements, otherwise return at most k of their indices
- If s = "first" (default), return at most the first k indices of the non-zero elements
- If s = "last", return at most the last k indices of the non-zero elements
-
Caveats:
- to clamp values to an interval, clamp() is more efficient
-
Examples:
fmat A = randu<fmat>(5, 5);
fmat B = randu<fmat>(5, 5);
uvec q1 = find(A > B);
uvec q2 = find(A > 0.5);
uvec q3 = find(A > 0.5, 3, "last");
A.elem( find(A > 0.5) ).ones();
See also:
find_finite( X )
See also:
find_nonfinite( X )
See also:
find_nan( X )
See also:
join_rows( A, B )
join_rows( A, B, C )
join_rows( A, B, C, D )
join_cols( A, B )
join_cols( A, B, C )
join_cols( A, B, C, D )
|
|
join_horiz( A, B )
join_horiz( A, B, C )
join_horiz( A, B, C, D )
join_vert( A, B )
join_vert( A, B, C )
join_vert( A, B, C, D )
|
-
join_rows() and join_horiz(): horizontal concatenation;
join the corresponding rows of the given matrices;
the given matrices must have the same number of rows
-
join_cols() and join_vert(): vertical concatenation;
join the corresponding columns of the given matrices;
the given matrices must have the same number of columns
-
Examples:
fmat A = randu<fmat>(4, 5);
fmat B = randu<fmat>(4, 6);
fmat C = randu<fmat>(6, 5);
fmat AB = join_rows(A, B);
fmat AC = join_cols(A, C);
See also:
min( V )
min( M )
min( M, dim )
min( Q )
min( Q, dim )
min( A, B )
|
|
max( V )
max( M )
max( M, dim )
max( Q )
max( Q, dim )
max( A, B )
|
-
For vector V, return the extremum value
-
For matrix M, return the extremum value for each column (dim = 0), or each row (dim = 1)
-
The dim argument is optional; by default dim = 0 is used
-
For two matrices A and B, return a matrix containing element-wise extremum values
-
Examples:
fcolvec v = randu<fcolvec>(10);
float x = max(v);
fmat M = randu<fmat>(10, 10);
frowvec a = max(M);
frowvec b = max(M, 0);
fcolvec c = max(M, 1);
fmat X = randu<fmat>(5, 6);
fmat Y = randu<fmat>(5, 6);
fmat Z = coot::max(X, Y);
See also:
norm( X )
norm( X, p )
See also:
normalise( V )
normalise( V, p )
normalise( X )
normalise( X, p )
normalise( X, p, dim )
- For vector V, return its normalised version (ie. having unit p-norm)
-
For matrix X, return its normalised version, where each column (dim = 0) or row (dim = 1) has been normalised to have unit p-norm
-
The p argument is optional; by default p = 2 is used
-
The dim argument is optional; by default dim = 0 is used
-
Examples:
fvec A = randu<fvec>(10);
fvec B = normalise(A);
fvec C = normalise(A, 1);
fmat X = randu<fmat>(5, 6);
fmat Y = normalise(X);
fmat Z = normalise(X, 2, 1);
See also:
pow( A, scalar ) | | (form 1) |
- Element-wise power operation: raise all elements in A to the power denoted by the given scalar
- Caveat:
- to raise all elements to the power 2, use square() instead
-
Examples:
fmat A = randu<fmat>(5, 6);
fmat B = pow(A, 3.45);
frowvec R = randu<frowvec>(6);
frowvec S = pow(R, -1.0);
See also:
repmat( A, num_copies_per_row, num_copies_per_col )
See also:
reshape( X, n_rows, n_cols )
reshape( X, size(Y) )
See also:
resize( X, n_rows, n_cols )
resize( X, size(Y) )
See also:
size( X )
size( n_rows, n_cols )
-
Obtain the dimensions of object X, or explicitly specify the dimensions
-
The dimensions can be used in conjunction with:
-
The dimensions support simple arithmetic operations; they can also be printed and compared for equality/inequality
-
Caveat: to prevent interference from std::size() in C++17,
preface Bandicoot's size() with the coot namespace qualification, eg. coot::size(X)
-
Examples:
fmat A(5,6);
fmat B = zeros<fmat>(size(A));
fmat C;
C.randu(size(A));
fmat D = ones<fmat>(size(A));
fmat E = ones<fmat>(10, 20);
E(3, 4, size(C)) = C;
fmat F( size(A) + size(E) );
fmat G( size(A) * 2 );
cout << "size of A: " << size(A) << endl;
bool is_same_size = (size(A) == size(E));
See also:
sort( V )
sort( V, sort_direction )
sort( X )
sort( X, sort_direction )
sort( X, sort_direction, dim )
See also:
sort_index( X )
sort_index( X, sort_direction )
stable_sort_index( X )
stable_sort_index( X, sort_direction )
See also:
sum( V )
sum( M )
sum( M, dim )
-
For vector V, return the sum of all elements
-
For matrix M, return the sum of elements in each column (dim = 0), or each row (dim = 1)
-
The dim argument is optional; by default dim = 0 is used
-
Caveat: to get a sum of all the elements regardless of the object type (i.e. vector or matrix), use accu() instead
-
Examples:
fcolvec v = randu<fcolvec>(10);
float x = sum(v);
fmat M = randu<fmat>(10, 10);
frowvec a = sum(M);
frowvec b = sum(M, 0);
fcolvec c = sum(M, 1);
float y = accu(M);
See also:
symmatu( A )
symmatl( A )
See also:
trace( X )
See also:
trans( A )
strans( A )
See also:
vectorise( X )
vectorise( X, dim )
See also:
miscellaneous element-wise functions:
exp | | log | | square | | floor | | erf | | sign | | |
exp2 | | log2 | | sqrt | | ceil | | erfc | | lgamma | | |
exp10 | | log10 | | | | round | | | | | | |
trunc_exp | | trunc_log | | | | trunc | | | | | | |
See also:
trigonometric element-wise functions (cos, sin, tan, ...)
See also:
Decompositions, Factorisations, and Inverses
R = chol( X ) | | (form 1) |
chol( R, X ) | | (form 2) |
See also:
vec eigval = eig_sym( X )
eig_sym( eigval, X )
eig_sym( eigval, eigvec, X )
- Eigendecomposition symmetric/hermitian matrix X
- The eigenvalues and corresponding eigenvectors are stored in eigval and eigvec, respectively
- The eigenvalues are in ascending order
- The eigenvectors are stored as column vectors
- If X is not square sized, a std::logic_error exception is thrown
- If the decomposition fails:
- eigval = eig_sym(X) resets eigval and throws a std::runtime_error exception
- eig_sym(eigval,X) resets eigval and returns a bool set to false (exception is not thrown)
- eig_sym(eigval,eigvec,X) resets eigval & eigvec and returns a bool set to false (exception is not thrown)
- Caveats:
- there is no explicit check whether X is symmetric/hermitian
- if eigenvectors are not necessary, it is more efficient to use a form that does not compute them (i.e. eig_sym(eigval, X))
-
Examples:
fmat A = randu<fmat>(50, 50);
fmat B = A.t()*A;
fvec eigval;
fmat eigvec;
eig_sym(eigval, eigvec, B);
See also:
lu( L, U, P, X )
lu( L, U, X )
-
Lower-upper decomposition (with partial pivoting) of matrix X
-
The first form provides
a lower-triangular matrix L,
an upper-triangular matrix U,
and a permutation matrix P,
such that P.t()*L*U = X
-
The second form provides permuted L and U, such that L*U = X;
note that in this case L is generally not lower-triangular
-
If the decomposition fails:
- lu(L,U,P,X) resets L, U, P and returns a bool set to false (exception is not thrown)
- lu(L,U,X) resets L, U and returns a bool set to false (exception is not thrown)
-
Examples:
fmat A = randu<fmat>(5, 5);
fmat L, U, P;
lu(L, U, P, A);
fmat B = P.t() * L * U;
See also:
B = pinv( A )
B = pinv( A, tolerance )
pinv( B, A )
pinv( B, A, tolerance )
See also:
s = svd( X )
svd( s, X )
svd( U, s, V, X )
-
Singular value decomposition of matrix X into vector of singular values s and matrices of left/right singular vectors U, V
- If X is square, it can be reconstructed using X = U*diagmat(s)*V.t()
-
The singular values are in descending order
-
If the decomposition fails, the output objects are reset and:
- s = svd(X) resets s and throws a std::runtime_error exception
- svd(s,X) resets s and returns a bool set to false (exception is not thrown)
- svd(U,s,V,X) resets U, s, V and returns a bool set to false (exception is not thrown)
-
Examples:
fmat X = randu<fmat>(5, 5);
fmat U;
fvec s;
fmat V;
svd(U, s, V, X);
See also:
Signal & Image Processing
conv( A, B )
conv( A, B, shape )
-
1D convolution of vectors A and B
-
The orientation of the result vector is the same as the orientation of A (ie. either column or row vector)
-
The shape argument is optional; it is one of:
"full" | = | return the full convolution (default setting), with the size equal to A.n_elem + B.n_elem - 1 |
"same" | = | return the central part of the convolution, with the same size as vector A |
-
The convolution operation is also equivalent to FIR filtering
-
Examples:
fvec A = randu<fvec>(256);
fvec B = randu<fvec>(16);
fvec C = conv(A, B);
fvec D = conv(A, B, "same");
See also:
conv2( A, B )
conv2( A, B, shape )
-
2D convolution of matrices A and B
-
The shape argument is optional; it is one of:
"full" | = | return the full convolution (default setting), with the size equal to size(A) + size(B) - 1 |
"same" | = | return the central part of the convolution, with the same size as matrix A |
-
Examples:
fmat A = randu<fmat>(256, 256);
fmat B = randu<fmat>(16, 16);
fmat C = conv2(A, B);
fmat D = conv2(A, B, "same");
See also:
Statistics
mean, median, stddev, var, range
See also:
cov( X, Y )
cov( X, Y, norm_type )
cov( X )
cov( X, norm_type )
-
For two matrix arguments X and Y,
if each row of X and Y is an observation and each column is a variable,
the (i,j)-th entry of cov(X,Y) is the covariance between the i-th variable in X and the j-th variable in Y
-
For vector arguments, the type of vector is ignored and each element in the vector is treated as an observation
-
For matrices, X and Y must have the same dimensions
-
For vectors, X and Y must have the same number of elements
-
cov(X) is equivalent to cov(X, X)
-
The norm_type argument is optional; by default norm_type = 0 is used
-
the norm_type argument controls the type of normalisation used, with N denoting the number of observations:
-
for norm_type = 0, normalisation is done using N-1,
providing the best unbiased estimation of the covariance matrix (if the observations are from a normal distribution)
-
for norm_type = 1, normalisation is done using N,
which provides the second moment matrix of the observations about their mean
-
Examples:
fmat X = randu<fmat>(4, 5);
fmat Y = randu<fmat>(4, 5);
fmat C = cov(X, Y);
fmat D = cov(X, Y, 1);
See also:
cor( X, Y )
cor( X, Y, norm_type )
cor( X )
cor( X, norm_type )
-
For two matrix arguments X and Y,
if each row of X and Y is an observation and each column is a variable,
the (i,j)-th entry of cor(X,Y) is the correlation coefficient between the i-th variable in X and the j-th variable in Y
-
For vector arguments, the type of vector is ignored and each element in the vector is treated as an observation
-
For matrices, X and Y must have the same dimensions
-
For vectors, X and Y must have the same number of elements
-
cor(X) is equivalent to cor(X, X)
-
The norm_type argument is optional; by default norm_type = 0 is used
-
the norm_type argument controls the type of normalisation used, with N denoting the number of observations:
- for norm_type = 0, normalisation is done using N-1
- for norm_type = 1, normalisation is done using N
-
Examples:
fmat X = randu<fmat>(4, 5);
fmat Y = randu<fmat>(4, 5);
fmat R = cor(X, Y);
fmat S = cor(X, Y, 1);
See also:
Miscellaneous
backend configuration
- Bandicoot can use either CUDA or OpenCL as a hardware backend
- To enable CUDA or OpenCL, set the
COOT_USE_CUDA or COOT_USE_OPENCL macros in the Bandicoot configuration
- If both backends are enabled, select the default backend by setting the
COOT_DEFAULT_BACKEND macro to the desired backend (e.g. #define COOT_BACKEND CL_BACKEND )
- By default, at the time of first usage, Bandicoot will automatically initialise to use the first available device with the default backend
- Bandicoot can also be manually initialised using the
coot_init() function:
coot_init( )
|
|
default initialization
|
coot_init( print_info )
|
|
initialize to default backend, optionally printing information about the chosen GPU device
|
coot_init( "opencl", print_info )
|
|
initialize to OpenCL backend; COOT_USE_OPENCL must be enabled
|
coot_init( "opencl", print_info, platform_id, device_id )
|
|
use a specific OpenCL platform ID and device ID
|
coot_init( "cuda", print_info )
|
|
initialize to CUDA backend; COOT_USE_CUDA must be enabled
|
coot_init( "cuda", print_info, device_id )
|
|
use specific CUDA device ID
|
- coot_init() returns a boolean indicating whether or not initialisation was successful
- if print_info is set to true, information about the selected GPU device will be printed
- for the "opencl" initialisations, platform_id and device_id specify the desired OpenCL platform and device IDs; available platforms and devices can be listed using the clinfo command-line utility, available in most package managers: clinfo -l
- for the "cuda" initialisations, device_id specifies the desired CUDA device; available device IDs can be listed with the nvidia-smi command-line utility
-
Caveats:
- calling coot_init() manually must be done before any other Bandicoot operations
- coot_init() can only be called once
- if either caveat above is violated when calling coot_init(), a std::runtime_error exception will be thrown
-
At any time, all asychronous operations can be forced to complete by calling coot_synchronise()
-
See also:
constants (pi, inf, eps, ...)
See also:
wall_clock
See also:
output streams
-
The default stream for printing matrices and cubes is
std::cout
the stream can be changed via the COOT_COUT_STREAM define; see config.hpp
-
The default stream for printing warnings and errors is
std::cerr
the stream can be changed via the COOT_CERR_STREAM define; see config.hpp
-
Whether warnings are printed is controlled by the
COOT_PRINT_ERRORS and COOT_DONT_PRINT_ERRORS defines; see config.hpp
-
The
COOT_DONT_PRINT_ERRORS define takes precedence over the COOT_PRINT_ERRORS define
-
See also:
uword, sword
- uword is a typedef for an unsigned integer type; it is used for matrix indices as well as all internal counters and loops
- sword is a typedef for a signed integer type
- The minimum width of both uword and sword is either 32 or 64 bits:
- the default width is 32 bits on 32-bit platforms
- the default width is 64 bits on 64-bit platforms
- on most systems, uword is a typedef for size_t
-
Caveat: the Bandicoot uword and sword types are not guaranteed to be the same as the Armadillo arma::uword and arma::sword types
- See also:
Examples of Matlab/Octave syntax and conceptually corresponding Bandicoot syntax
Matlab/Octave
|
|
Bandicoot
|
|
Notes
|
|
|
|
|
|
A(1, 1)
|
|
A(0, 0)
|
|
indexing in Bandicoot starts at 0
|
A(k, k)
|
|
A(k-1, k-1)
|
|
|
|
|
|
|
|
size(A,1)
|
|
A.n_rows
|
|
read only
|
size(A,2)
|
|
A.n_cols
|
|
|
numel(A)
|
|
A.n_elem
|
|
|
|
|
|
|
|
A(:, k)
|
|
A.col(k)
|
|
this is a conceptual example only;
exact conversion from Matlab/Octave to Bandicoot syntax
will require taking into account that indexing starts at 0
|
A(k, :)
|
|
A.row(k)
|
|
|
A(:, p:q)
|
|
A.cols(p, q)
|
|
|
A(p:q, :)
|
|
A.rows(p, q)
|
|
|
A(p:q, r:s)
|
|
A( span(p,q), span(r,s) )
|
|
A( span(first_row, last_row), span(first_col, last_col) )
|
|
|
|
|
|
A'
|
|
A.t() or trans(A)
|
|
matrix transpose / Hermitian transpose
|
|
|
|
|
|
A = zeros(size(A))
|
|
A.zeros()
|
|
|
A = ones(size(A))
|
|
A.ones()
|
|
|
A = zeros(k)
|
|
A = zeros<fmat>(k,k)
|
|
|
A = ones(k)
|
|
A = ones<fmat>(k,k)
|
|
|
|
|
|
|
|
A .* B
|
|
A % B
|
|
element-wise multiplication
|
A ./ B
|
|
A / B
|
|
element-wise division
|
A = A + 1;
|
|
A++
|
|
|
A = A - 1;
|
|
A--
|
|
|
|
|
|
|
|
X = A(:)
|
|
X = vectorise(A)
|
|
|
X = [ A B ]
|
|
X = join_horiz(A,B)
|
|
|
X = [ A; B ]
|
|
X = join_vert(A,B)
|
|
|
|
|
|
|
|
A
|
|
cout << A << endl;
or
A.print("A =");
|
|
|
A = randn(2,3);
B = randn(4,5);
|
|
fmat A = randn(2,3);
fmat B = randn(4,5);
|
|
|
Armadillo/Bandicoot conversion guide
- Bandicoot is meant to be a GPU-accelerated linear algebra library that is API-compatible with Armadillo and thus can function as a drop-in replacement; however, due to the different architecture of the GPU and other constraints, it is not always a benefit to use Bandicoot instead of Armadillo
- The first run of any Bandicoot program requires compiling all Bandicoot kernel functions for the given device, which can be a time-consuming process; kernels are cached and subsequent runs will use the cache
- Upgrading Bandicoot versions may incur recompilation of kernels
- Using a new backend for the first time may incur recompilation of kernels
- Using a new device may incur recompilation of kernels
- For more information see the kernel cache documentation
- Where possible, use batch operations with Bandicoot; e.g., use
A += 1 instead of for (uword i = 0; i < A.n_elem; ++i) { A[i] += 1; }
- GPUs are best suited for operations on large matrices, so small matrices (e.g. less than 100 elements) may not show significant speedup
- Individual element access (such as
A.at(i, j) ) requires a transfer between the GPU and CPU; when adapting Armadillo code to Bandicoot, these should be avoided wherever possible
- If such operations cannot be avoided, consider temporarily transferring the entire Bandicoot matrix back to memory by creating an Armadillo matrix with conv_to<arma::fmat>() or similar
- For this reason, unlike Armadillo, Bandicoot does not provide iterators: they are guaranteed to be inefficient
- Consumer-level GPUs are not designed for intensive linear algebra operations and thus may not show significant speedup; the best results will be obtained with high-end hardware
- Most GPUs show better performance with 32-bit floating point elements (e.g. float instead of double), so using fmat instead of mat is recommended wherever possible
- If support you need for a conversion is not available, please file a bug report so that the support can be prioritised
example program
#include <iostream>
#include <bandicoot>
using namespace std;
using namespace coot;
int main()
{
fmat A = randu<fmat>(4, 5);
fmat B = randu<fmat>(4, 5);
cout << A * B.t() << endl;
return 0;
}
If the above program is stored as example.cpp,
under Linux and macOS it can be compiled using:
g++ example.cpp -o example -std=c++11 -O2 -lbandicoot
Bandicoot extensively uses template meta-programming,
so it's recommended to enable optimisation when compiling programs (eg. use the -O2 or -O3 options for GCC or clang)
See the Questions page for more info on compiling and linking
If coming from Armadillo, be sure to check the Armadillo/Bandicoot differences for advice on writing efficient code
See also the example program that comes with the Bandicoot archive
config.hpp
-
Bandicoot can be configured via editing the file include/bandicoot_bits/config.hpp
-
Specific functionality can be enabled or disabled by uncommenting or commenting out a particular #define, listed below.
-
Some options can also be specified by explicitly defining them before including the bandicoot header.
COOT_DONT_USE_WRAPPER
|
|
Disable going through the run-time Bandicoot wrapper library (libbandicoot.so) when calling GPU-specific functions.
Overrides COOT_USE_WRAPPER .
You will need to directly link with GPU libraries (e.g. -lOpenCL -lclBLAS or similar depending on backend configuration)
|
|
|
|
COOT_USE_WRAPPER
|
|
Enable use of Bandicoot wrapper library, which allows linking against all enabled backends with -lbandicoot only.
|
|
|
|
COOT_USE_OPENCL
|
|
Enable use of OpenCL as a GPU backend. Note that either COOT_USE_OPENCL or COOT_USE_CUDA must be enabled.
OpenCL headers and clBLAS headers must be available on the system.
|
|
|
|
COOT_USE_CUDA
|
|
Enable use of CUDA as a GPU backend. Note that either COOT_USE_OPENCL or COOT_USE_CUDA must be enabled.
The CUDA toolkit must be available on the system.
|
|
|
|
COOT_DEFAULT_BACKEND
|
|
Set the backend that Bandicoot will use. This is only necessary if multiple backends are enabled; that is, when both COOT_USE_OPENCL and COOT_USE_CUDA are enabled. This should be set to either CUDA_BACKEND or CL_BACKEND (e.g. #define COOT_BACKEND CUDA_BACKEND ). See also the backend configuration documentation.
|
|
|
|
COOT_USE_OPENMP
|
|
Use OpenMP for parallelisation of some CPU-based parts of Bandicoot functionalities.
Automatically enabled when using a compiler which has OpenMP 3.1+ active (eg. the -fopenmp option for gcc and clang).
Note: this may not have a noticeable effect on performance since most Bandicoot implementations do not use the CPU heavily or at all.
|
|
|
|
COOT_DONT_USE_OPENMP
|
|
Disable use of OpenMP for parallelisation; overrides COOT_USE_OPENMP .
|
|
|
|
COOT_KERNEL_CACHE_DIR
|
|
If defined, specifies a custom directory to use for the kernel cache.
Distribution packagers may choose to specify COOT_SYSTEM_KERNEL_CACHE_DIR , though it is overridden by COOT_KERNEL_CACHE_DIR if specified.
|
|
|
|
COOT_BLAS_CAPITALS
|
|
Use capitalised (uppercase) BLAS and LAPACK function names (eg. DGEMM vs dgemm)
|
|
|
|
COOT_BLAS_UNDERSCORE
|
|
Append an underscore to BLAS and LAPACK function names (eg. dgemm_ vs dgemm). Enabled by default.
|
|
|
|
COOT_BLAS_LONG
|
|
Use "long" instead of "int" when calling BLAS and LAPACK functions
|
|
|
|
COOT_BLAS_LONG_LONG
|
|
Use "long long" instead of "int" when calling BLAS and LAPACK functions
|
|
|
|
COOT_NO_DEBUG
|
|
Disable all run-time checks, including size conformance and bounds checks.
NOT RECOMMENDED.
DO NOT USE UNLESS YOU KNOW WHAT YOU ARE DOING AND ARE WILLING TO RISK THE DOWNSIDES.
Keeping run-time checks enabled during development and deployment greatly aids in finding mistakes in your code.
|
|
|
|
COOT_EXTRA_DEBUG
|
|
Print out the trace of internal functions used for evaluating expressions.
Not recommended for normal use.
This is mainly useful for debugging the library.
|
|
|
|
COOT_COUT_STREAM
|
|
The default stream used for printing matrices and cubes by .print().
Must be always enabled.
By default defined to std::cout
|
|
|
|
COOT_CERR_STREAM
|
|
The default stream used for printing warnings and errors.
Must be always enabled.
By default defined to std::cerr
|
-
See also:
direct linking
- If
COOT_USE_WRAPPER is not defined (or COOT_DONT_USE_WRAPPER is defined), then Bandicoot will need to be linked against all dependencies of its backends
- Unfortunately this could be a lot of dependencies depending on configuration options; so, enabling
COOT_USE_WRAPPER is the default and is recommended
- Regardless of backend configuration, these libraries must always be linked against:
- If
COOT_USE_OPENCL is set (i.e. the OpenCL backend is enabled), these libraries must be linked against:
-lOpenCL (core OpenCL support)
-lclBLAS (clBLAS for BLAS operations)
- If
COOT_USE_CUDA is set (i.e. the CUDA backend is enabled), these libraries must be linked against:
-lcuda (core CUDA support)
-lcudart (CUDA runtime library)
-lnvrtc (runtime compilation of CUDA kernels)
-lcublas (cuBLAS for BLAS operations)
-lcusolver (cuSolverDn for decompositions and factorisations)
-lcurand (cuRand for random number generation)
kernel cache
- In order to perform GPU-based linear algebra, Bandicoot must first compile GPU kernel functions to a particular device
- The first time Bandicoot is run on a system, all GPU kernel functions will be compiled; this can take a long time! (usually less than 3-5 minutes)
- Compiled kernels are stored in disk in the kernel cache for later reuse
- Compiled kernels are specific to Bandicoot version, backend, and device; thus, if any of those three factors change, recompilation will be triggered; see the backend configuration documentation for more details
- The default location to store the kernel cache is
~/.bandicoot/cache/ on Linux and OS X and UNIX-like systems
%APPDATA%\bandicoot\cache on Windows (e.g. C:\Users\Username\AppData\bandicoot\cache )
- Custom locations can be specified with the
COOT_KERNEL_CACHE_DIR configuration variable
History of API Additions, Changes and Deprecations
- API Stability and Version Policy:
-
Each release of Bandicoot has its public API (functions, classes, constants) described in the accompanying API documentation specific to that release.
-
Each release of Bandicoot has its full version specified as A.B.C, where A is a major version number, B is a minor version number, and C is a patch level (indicating bug fixes).
The version specification has explicit meaning, similar to Semantic Versioning, as follows:
-
Within a major version (eg. 1), each minor version (eg. 1.1) has a public API that strongly strives to be backwards compatible (at the source level) with the public API of preceding minor versions.
For example, user code written for version 1.0 should work with version 1.1, 1.2, etc.
However, later minor versions may have more features (API additions and extensions) than preceding minor versions.
As such, user code specifically written for version 1.2 may not work with 1.1.
-
An increase in the patch level, while the major and minor versions are retained, indicates modifications to the code and/or documentation which aim to fix bugs without altering the public API.
-
We don't like changes to existing public API and strongly prefer not to break any user software.
However, to allow evolution, the public API may be altered in future major versions while remaining backwards compatible in as many cases as possible
(eg. major version 2 may have slightly different public API than major version 1).
-
Caveat:
the above policy applies only to the public API described in the documentation.
Any functionality within Bandicoot which is not explicitly described in the public API documentation is considered as internal implementation details,
and may be changed or removed without notice.
-
List of additions and changes for each version:
|