[top] API Documentation for Bandicoot 1.0


Preamble

 
  • Please cite the following papers if you use Bandicoot in your research and/or software.
    Citations are useful for the continued development and maintenance of the library.


    TODO: a technical report!


Overview
Matrix and Vector Classes
Member Functions & Variables
    attributes .n_rows, .n_cols, .n_elem, .n_slices, ...
    element access element/object access via (), [] and .at()
       
    .zeros set all elements to zero
    .ones set all elements to one
    .eye set elements along main diagonal to one and off-diagonal elements to zero
    .randu / .randn set all elements to random values
       
    .fill set all elements to specified value
       
    .clamp clamp values to lower and upper limits
       
    .set_size change size without keeping elements (fast)
    .reshape change size while keeping elements
    .resize change size while keeping elements and preserving layout
    .reset change size to empty
       
    submatrix views read/write access to contiguous and non-contiguous submatrices
       
    .get_dev_mem() get underlying raw GPU memory pointer
       
    .diag read/write access to matrix diagonals
       
    .t / .st  return matrix transpose
    .eval force evaluation of delayed expression
       
    .is_empty check whether object is empty
    .is_vec check whether matrix is a vector
       
    .is_square check whether matrix is square sized
       
    .print print object to std::cout or user specified stream
    .raw_print print object without formatting

Generated Vectors / Matrices
    linspace generate vector with linearly spaced elements
    eye generate identity matrix
    ones generate object filled with ones
    zeros generate object filled with zeros
    randu generate object with random values (uniform distribution)
    randn generate object with random values (normal distribution)
    randi generate object with random integer values in specified interval

Functions of Vectors / Matrices
    abs obtain magnitude of each element
    accu accumulate (sum) all elements
    all check whether all elements are non-zero, or satisfy a relational condition
    any check whether any element is non-zero, or satisfies a relational condition
    as_scalar convert 1x1 matrix to pure scalar
    clamp obtain clamped elements according to given limits
    conv_to convert/cast between matrix types
    cross cross product
    det determinant
    diagmat generate diagonal matrix from given matrix or vector
    diagvec extract specified diagonal
    dot dot product
    find find indices of non-zero elements, or elements satisfying a relational condition
    find_finite find indices of finite elements
    find_nonfinite find indices of non-finite elements
    find_nan find indices of NaN elements
    join_rows / join_cols concatenation of matrices
    min / max return extremum values
    norm various norms of vectors and matrices
    normalise normalise vectors to unit p-norm
    pow element-wise power
    repmat replicate matrix in block-like fashion
    reshape change size while keeping elements
    resize change size while keeping elements and preserving layout
    size obtain dimensions of given object
    sort sort elements
    sort_index vector describing sorted order of elements
    sum sum of elements
    symmatu / symmatl generate symmetric matrix from given matrix
    trace sum of diagonal elements
    trans transpose of matrix
    vectorise flatten matrix into vector
    misc functions miscellaneous element-wise functions: exp, log, sqrt, round, sign, ...
    trig functions trigonometric element-wise functions: cos, sin, tan, ...

Decompositions, Factorisations, and Inverses
    chol Cholesky decomposition
    eig_sym eigen decomposition of dense symmetric/hermitian matrix
    lu   lower-upper decomposition
    pinv pseudo-inverse / generalised inverse
    svd singular value decomposition

Signal & Image Processing
Statistics
Miscellaneous




Matrix and Vector Classes



Mat<type>
fmat
mat
  • Classes for dense matrices, with elements stored in column-major ordering (ie. column by column) on the GPU

  • The root matrix class is Mat<type>, where type is one of:
    • float, double, short, int, long, and unsigned versions of short, int, long
    • Bandicoot provides convenient u32, u64, s32, and s64 types that can also be used
    • Important: not all types are supported on all devices; runtime exceptions will be thrown if a type is not supported

  • For convenience the following typedefs have been defined:
      fmat  =  Mat<float>
      mat  =  Mat<double>     note: not supported on all devices
      dmat  =  Mat<double>     note: not supported on all devices
      umat  =  Mat<uword>
      imat  =  Mat<sword>
      u32_mat  =  Mat<u32>
      s32_mat  =  Mat<s32>
      u64_mat  =  Mat<u64>
      s64_mat  =  Mat<s64>

  • In this documentation the fmat type is used for convenience, speed, and portability; it is possible to use other types instead, eg. mat

  • Note that standard consumer GPUs may not have support for 64-bit floats (double), and if they do, they may not show speedup over CPU-based Armadillo matrices unless they are high-end GPUs

  • Functions which use more complex functionality (generally matrix decompositions) are only valid for the following types: fmat, dmat, mat

  • Constructors:
      fmat()  
      fmat(n_rows, n_cols)  
      fmat(size(X))  
      fmat(fmat)  
      fmat(arma::fmat)  (convert from CPU-based Armadillo matrix)
      fmat(fvec)  
      fmat(frowvec)  

  • Caveat:

  • Each instance of fmat automatically allocates and releases internal memory on the GPU. All internally allocated memory used by an instance of fmat is automatically released as soon as the instance goes out of scope. For example, if an instance of fmat is declared inside a function, it will be automatically destroyed at the end of the function. To forcefully release memory at any point, use .reset(); note that in normal use this is not required.

  • Advanced constructors:

      fmat(ptr_aux_mem, n_rows, n_cols)

        Create a matrix using data from writable auxiliary (external) memory, where ptr_aux_mem is a pointer to the memory. This matrix will use the auxiliary memory directly (i.e., no copying); this can be dangerous unless you know what you are doing!

        The ptr_aux_mem argument should be one of:
        • a memory pointer from another Bandicoot matrix obtained with .get_dev_mem()
        • a cl_mem object, if using the OpenCL backend
        • a CUDA memory pointer (e.g. a float* for an fmat), if using the CUDA backend


  • Examples:
      fmat A(5, 5);
      A.randu();
      float x = A(1, 2); // note: try to avoid repeated individual element accesses!
      
      fmat B = A + A;
      fmat C = A * B;
      fmat D = A % B;
      
      B.zeros();
      B.set_size(10, 10);
      B.ones(5, 6);
      
      B.print("B:");
      
      // convert from Armadillo
      arma::fmat C(10, 10, arma::fill::randu);
      fmat D(C);
      
      // advanced constructors
      
      // when using the OpenCL backend
      cl_mem m_cl = clCreateBuffer(get_rt().cl_rt.get_context(), CL_MEM_READ_WRITE, sizeof(float) * 24, NULL, NULL);
      fmat H(wrap_mem_cl(m_cl), 4, 6);  // use auxiliary memory
      
      // when using the CUDA backend
      float* m_cuda;
      cudaMalloc(&m_cuda, sizeof(float) * 24);
      fmat J(wrap_mem_cuda(m_cuda), 4, 6);  // use auxiliary memory
      
      // make an alias of another matrix
      arma::fmat K(D.get_dev_mem(), D.n_rows, D.n_cols);
      

  • See also:



Col<type>
fvec
vec
  • Classes for column vectors (dense matrices with one column)

  • The Col<type> class is derived from the Mat<type> class and inherits most of the member functions

  • For convenience the following typedefs have been defined:
      fvec  =  fcolvec  =  Col<float>
      vec  =  colvec  =  Col<double>     note: not supported on all devices
      dvec  =  dcolvec  =  Col<double>     note: not supported on all devices
      uvec  =  ucolvec  =  Col<uword>
      ivec  =  icolvec  =  Col<sword>
      u32_vec  =  u32_colvec  =  Col<u32>
      s32_vec  =  s32_colvec  =  Col<s32>
      u64_vec  =  u64_colvec  =  Col<u64>
      s64_vec  =  s64_colvec  =  Col<s64>

  • In this documentation, the vec and colvec types have the same meaning and are used interchangeably

  • In this documentation, the types fvec or fcolvec are used for convenience, speed, and portability; it is possible to use other types instead, eg. vec, colvec

  • Note that standard consumer GPUs may not have support for 64-bit floats (double), and if they do, they may not show speedup over CPU-based Armadillo matrices unless they are high-end GPUs

  • Functions which take Mat as input can generally also take Col as input; main exceptions are functions which require square matrices

  • Constructors:
      fvec()  
      fvec(n_elem)  
      fvec(size(X))  
      fvec(fvec)  
      fvec(arma::fvec) (convert from CPU-based Armadillo vector)
      fvec(fmat) (std::logic_error exception is thrown if the given matrix has more than one column)

  • Caveat:

  • Advanced constructors:

      fvec(ptr_aux_mem, number_of_elements)

        Create a column vector using data from writable auxiliary (external) memory, where ptr_aux_mem is a pointer to the memory. This vector will directly use the auxiliary memory (ie. no copying); this can be dangerous unless you know what you are doing!

        The ptr_aux_mem argument should be one of:
        • a memory pointer from another Bandicoot matrix obtained with .get_dev_mem()
        • a cl_mem object, if using the OpenCL backend
        • a CUDA memory pointer (e.g. a float* for an fmat), if using the CUDA backend


  • Examples:
      fvec x(10);
      fvec y(10, fill::ones);
      
      fmat A(10, 10, fill::randu);
      fvec z = A.col(5); // extract a column vector
      
      // convert from Armadillo
      arma::fvec d(100, arma::fill::randu);
      fvec e(d);
      

  • See also:



Row<type>
frowvec
rowvec
  • Classes for row vectors (dense matrices with one row)

  • The template Row<type> class is derived from the Mat<type> class and inherits most of the member functions

  • For convenience the following typedefs have been defined:
      frowvec  =  Row<float>
      rowvec  =  Row<double>     note: not supported on all devices
      drowvec  =  Row<double>     note: not supported on all devices
      urowvec  =  Row<uword>
      irowvec  =  Row<sword>
      u32_rowvec  =  Row<u32>
      s32_rowvec  =  Row<s32>
      u64_rowvec  =  Row<u64>
      s64_rowvec  =  Row<s64>

  • In this documentation, the frowvec type is used for convenience, speed, and portability; it is possible to use other types instead, eg. rowvec

  • Note that standard consumer GPUs may not have support for 64-bit floats (double), and if they do, they may not show speedup over CPU-based Armadillo matrices unless they are high-end GPUs

  • Functions which take Mat as input can generally also take Row as input. Main exceptions are functions which require square matrices

  • Constructors:
      frowvec()  
      frowvec(n_elem)  
      frowvec(size(X))  
      frowvec(frowvec)  
      frowvec(arma::fmat) (convert from CPU-based Armadillo row vector)
      frowvec(fmat) (std::logic_error exception is thrown if the given matrix has more than one row)

  • Caveat:

  • Advanced constructors:

      frowvec(ptr_aux_mem, number_of_elements)

        Create a row vector using data from writable auxiliary (external) memory, where ptr_aux_mem is a pointer to the memory. This vector will directly use the auxiliary memory (ie. no copying); this can be dangerous unless you know what you are doing!

        The ptr_aux_mem argument should be one of:
        • a memory pointer from another Bandicoot matrix obtained with .get_dev_mem()
        • a cl_mem object, if using the OpenCL backend
        • a CUDA memory pointer (e.g. a float* for an fmat), if using the CUDA backend


  • Examples:
      frowvec x(10);
      frowvec y(10, fill::ones);
      
      fmat    A(10, 10, fill::randu);
      frowvec z = A.row(5); // extract a row vector
      
      // convert from Armadillo
      arma::frowvec d(100, arma::fill::randu);
      fvec e(d);
      

  • See also:



operators:  +    *  %  /  ==  !=  <=  >=  <  >  &&  ||
  • Overloaded operators for Mat, Col, and Row classes

  • Operations:

      +    
      addition of two objects

      subtraction of one object from another or negation of an object
           
      *
      matrix multiplication of two objects
           
      %
      element-wise multiplication of two objects (Schur product)
      /
      element-wise division of an object by another object or a scalar
           
      ==
      element-wise equality evaluation of two objects; generates a matrix of type umat
      !=
      element-wise non-equality evaluation of two objects; generates a matrix of type umat
           
      >=
      element-wise "greater than or equal to" evaluation of two objects; generates a matrix of type umat
      <=
      element-wise "less than or equal to" evaluation of two objects; generates a matrix/cube of type umat
           
      >
      element-wise "greater than" evaluation of two objects; generates a matrix of type umat
      <
      element-wise "less than" evaluation of two objects; generates a matrix of type umat
           
      &&
      element-wise logical AND evaluation of two objects; generates a matrix of type umat
      ||
      element-wise logical OR evaluation of two objects; generates a matrix of type umat

  • For element-wise relational and logical operations (ie. ==, !=, >=, <=, >, <, &&, ||) each element in the generated object is either 0 or 1, depending on the result of the operation

  • Caveat: operators involving equality comparison (ie. ==, !=, >=, <=) are not recommended for matrices of type mat or fmat, due to the necessarily limited precision of floating-point element types

  • If incompatible object sizes are used, a std::logic_error exception is thrown

  • Examples:
      fmat A = randu<fmat>(5, 10);
      fmat B = randu<fmat>(5, 10);
      fmat C = randu<fmat>(10, 5);
      
      fmat P = A + B;
      fmat Q = A - B;
      fmat R = -B;
      fmat S = A / 123.0;
      fmat T = A % B;
      fmat U = A * C;
      
      fmat V = A + B + A + B;
      
      imat AA = linspace<imat>(1, 9, 9);
      imat BB = linspace<imat>(9, 1, 9);
      
      // compare elements
      umat ZZ = (AA >= BB);
      

  • See also:





Member Functions & Variables



attributes
    .n_rows     number of rows; present in Mat, Col, and Row
    .n_cols     number of columns; present in Mat, Col, and Row
    .n_elem     total number of elements; present in Mat, Col, and Row


element access via (), [] and .at()
  • Provide access to individual elements in a Mat, Col, or Row

      (i)  
      For fvec and frowvec, access the element stored at index i. For fmat, access the element/object stored at index i under the assumption of a flat layout, with column-major ordering of data (i.e. column by column). An exception is thrown if the requested element is out of bounds.
           
      .at(i)  or  [i] 
      As for (i), but without a bounds check; not recommended; see the caveats below
           
      (r,c)
      For fmat, access the element/object stored at row r and column c. An exception is thrown if the requested element is out of bounds.
           
      .at(r,c)
      As for (r,c), but without a bounds check; not recommended; see the caveats below

  • Important: every element access involves a transfer from GPU memory to CPU memory; therefore, for efficiency, avoid repeated element access when possible; see the Armadillo conversion guide for more details and suggestions.

  • The indices of elements are specified via the uword type, which is a typedef for an unsigned integer type.

  • Caveats:
    • accessing elements without bounds checks is slightly faster, but is not recommended until your code has been thoroughly debugged first
    • indexing in C++ starts at 0
    • accessing elements via [r,c] does not work correctly in C++; instead use (r,c) and (r,c,s)

  • Examples:
      // remember that individual element accesses are slow and should be avoided;
      // when possible, operate in batch instead of on individual elements!
      fmat M(10, 10);
      M.randu();
      M(9, 9) = 123.0;
      float x = M(1, 2);
      
      fvec v(10);
      v.randu();
      v(9) = 123.0;
      float y = v(0);
      

  • See also:



.zeros()
  (member function of Mat, Col, and Row)
.zeros( n_elem )
  (member function of Col and Row)
.zeros( n_rows, n_cols )
  (member function of Mat)
  • Set the elements of an object to zero, optionally first changing the size to specified dimensions

  • Examples:
      fmat A;
      A.zeros(5, 10);
      
      fvec B;
      B.zeros(100);
      
      fmat C(5, 10);
      C.zeros();
      

  • See also:



.ones()
  (member function of Mat, Col, and Row)
.ones( n_elem )
  (member function of Col and Row)
.ones( n_rows, n_cols )
  (member function of Mat)
  • Set all the elements of an object to one, optionally first changing the size to specified dimensions

  • Examples:
      fmat A;
      A.ones(5, 10);
      
      fvec B;
      B.ones(100);
      
      fmat C(5, 10);
      C.ones();
      

  • See also:



.eye()
.eye( n_rows, n_cols )
  • Member functions of Mat

  • Set the elements along the main diagonal to one and off-diagonal elements to zero, optionally first changing the size to specified dimensions

  • An identity matrix is generated when n_rows = n_cols

  • Examples:
      fmat A;
      A.eye(5, 5);
      
      fmat B(5, 5);
      B.eye();
      

  • See also:



.randu()
  (member function of Mat, Col, and Row)
.randu( n_elem )
  (member function of Col and Row)
.randu( n_rows, n_cols )
  (member function of Mat)

.randn()
  (member function of Mat, Col, and Row)
.randn( n_elem )
  (member function of Col and Row)
.randn( n_rows, n_cols )
  (member function of Mat)
  • Set all the elements to random values, optionally first changing the size to specified dimensions

  • .randu() uses a uniform distribution in the [0,1] interval

  • .randn() uses a normal/Gaussian distribution with zero mean and unit variance

  • To change the RNG seed, use coot_rng::set_seed(value) or coot_rng::set_seed_random() functions

  • Examples:
      fmat A;
      A.randu(5, 10);
      
      fvec B;
      B.randu(100);
      
      fmat C(5, 10);
      C.randu();
      
      coot_rng::set_seed_random(); // set the seed to a random value
      coot_rng::set_seed(42);      // set the seed to a specific value 
      

  • See also:



.fill( value )


.clamp( min_value, max_value )
  • Member function of Mat, Col, and Row

  • Clamp each element to the [min_val, max_val] interval; any value lower than min_val will be set to min_val, and any value higher than max_val will be set to max_val

  • Examples:
      fmat A(5, 6);
      A.randu();
      
      A.clamp(0.2, 0.8);
      

  • See also:



.set_size( n_elem )
  (member function of Col and Row)
.set_size( n_rows, n_cols )
  (member function of Mat)
.set_size( size(X) )
  (member function of Mat, Col, and Row)
  • Change the size of an object, without explicitly preserving data and without initialising the elements (i.e. elements may contain garbage values, including NaN)

  • To initialise the elements to zero while changing the size, use .zeros() instead

  • To explicitly preserve data while changing the size, use .reshape() or .resize() instead;
    NOTE: .reshape() and .resize() are considerably slower than .set_size()

  • Examples:
      fmat A;
      A.set_size(5, 10);      // or:  mat A(5, 10);
      
      fmat B;
      B.set_size( size(A) );  // or:  mat B(size(A));
      
      fvec v;
      v.set_size(100);        // or:  vec v(100);
      

  • See also:



.reshape( n_rows, n_cols )
  (member function of Mat)
.reshape( size(X) )
  (member function of Mat)
  • Recreate the object according to given size specifications, with the elements taken from the previous version of the object in a column-wise manner; the elements in the generated object are placed column-wise (i.e. the first column is filled up before filling the second column)

  • The layout of the elements in the recreated object will be different to the layout in the previous version of the object

  • If the total number of elements in the previous version of the object is less than the specified size, the extra elements in the recreated object are set to zero

  • If the total number of elements in the previous version of the object is greater than the specified size, only a subset of the elements is taken

  • Caveats:
    • to change the size without preserving data, use .set_size() instead, which is much faster
    • to grow/shrink the object while preserving the elements as well as the layout of the elements, use .resize() instead
    • to flatten a matrix into a vector, use vectorise() or .as_col() / .as_row() instead

  • Examples:
      fmat A(4, 5);
      A.randu();
      
      A.reshape(5, 4);
      

  • See also:



.resize( n_elem )
  (member function of Col and Row)
.resize( n_rows, n_cols )
  (member function of Mat)
.resize( size(X) )
  (member function of Mat, Col, and Row)
  • Recreate the object according to given size specifications, while preserving the elements as well as the layout of the elements

  • Can be used for growing or shrinking an object (i.e. adding/removing rows, and/or columns, and/or slices)

  • Caveat: to change the size without preserving data, use .set_size() instead, which is much faster

  • Examples:
      fmat A(4, 5);
      A.randu();
      
      A.resize(7, 6);
      

  • See also:



.reset()


submatrix views
  • A collection of member functions of Mat, Col and Row classes that provide read/write access to submatrix views

  • contiguous views for matrix X:

      X.col( col_number )
      X.row( row_number )

      X.cols( first_col, last_col )
      X.rows( first_row, last_row )

      X.submat( first_row, first_col, last_row, last_col )

      X( span(first_row, last_row), span(first_col, last_col) )

      Xfirst_row, first_col, size(n_rowsn_cols) )
      Xfirst_row, first_col, size(Y) )    [ Y is a matrix ]

      X( span(first_row, last_row), col_number )
      X( row_number, span(first_col, last_col) )

      X.head_cols( number_of_cols )
      X.head_rows( number_of_rows )

      X.tail_cols( number_of_cols )
      X.tail_rows( number_of_rows )

  • contiguous views for vector V:

      V.subvec( first_index, last_index )

      V.subvec( first_index, size(W) )    [ W is a vector ]

  • related matrix views (documented separately)

  • Instances of span(start,end) can be replaced by span::all to indicate the entire range

  • Examples:
      fmat A(5, 10);
      A.zeros();
      
      A.submat( 0,1, 2,3 )      = randu<fmat>(3, 3);
      A( span(0,2), span(1,3) ) = randu<fmat>(3, 3);
      A( 0,1, size(3,3) )       = randu<fmat>(3, 3);
      
      fmat B = A.submat( 0,1, 2,3 );
      fmat C = A( span(0,2), span(1,3) );
      fmat D = A( 0,1, size(3,3) );
      
      A.col(1)        = randu<fmat>(5,1);
      A(span::all, 1) = randu<fmat>(5,1);
      
      // add 123 to the last 5 elements of vector a
      vec a(10);
      a.randu();
      a.subvec(a.n_elem - 5, a.n_elem - 1) += 123.0;
      
      // add 123 to the first 3 elements of column 2 of X
      X.col(2).subvec(0, 2) += 123;
      

  • See also:



.get_dev_mem()
.get_dev_mem( synchronise )
  • Member function of Mat, Col, and Row

  • Obtain dev_mem_t object that holds raw GPU memory pointers

  • By default, all asynchronous GPU operations are forced to complete, unless synchronise is passed as false

  • Depending on backend configuration, underlying GPU memory may be accessed as
    • .get_dev_mem().cl_mem_ptr for the OpenCL backend; this has type cl_mem
    • .get_dev_mem().cuda_mem_ptr for the CUDA backend; for a matrix type Mat<eT>, this has type eT* (e.g. for fmat the type will be float*)

  • Examples:
      // when using the OpenCL backend
      fmat A = randu<fmat>(3, 4);
      cl_mem A_mem = A.get_dev_mem().cl_mem_ptr;
      
      // when using the CUDA backend
      fmat B = randu<fmat>(3, 4);
      float* B_mem = B.get_dev_mem().cuda_mem_ptr;
      

  • See also:


.diag()
.diag( k )
  • Member function of Mat

  • Read/write access to a diagonal in a matrix

  • The argument k is optional; by default the main diagonal is accessed (k = 0)

  • For k > 0, the k-th super-diagonal is accessed (top-right corner)

  • For k < 0, the k-th sub-diagonal is accessed (bottom-left corner)

  • The diagonal is interpreted as a column vector within expressions

  • Note: to calculate only the diagonal elements of a compound expression, use diagvec() or diagmat()

  • Examples:
      fmat X(5, 5);
      X.randu();
      
      fvec a = X.diag();
      fvec b = X.diag(1);
      fvec c = X.diag(-2);
      
      X.diag() = randu<fvec>(5);
      X.diag() += 6;
      X.diag().ones();
      

  • See also:



.t()
.st()


.eval()
  • Member function of any matrix or vector expression

  • Explicitly forces the evaluation of a delayed expression and outputs a matrix

  • This function should be used sparingly and only in cases where it is absolutely necessary; indiscriminate use can degrade performance

  • Examples:
      fmat A = randu<fmat>(4,4);
      
      A.t().eval().print("A.t()");
      

  • See also:



.is_empty()
  • Returns true if the object has no elements

  • Returns false if the object has one or more elements

  • Examples:
      fmat A(5, 5, fill::randu);
      cout << A.is_empty() << endl;
      
      A.reset();
      cout << A.is_empty() << endl;
      

  • See also:



.is_vec()
.is_colvec()
.is_rowvec()
  • Member functions of Mat

  • .is_vec():
    • returns true if the matrix can be interpreted as a vector (either column or row vector)
    • returns false if the matrix does not have exactly one column or one row

  • .is_colvec():
    • returns true if the matrix can be interpreted as a column vector
    • returns false if the matrix does not have exactly one column

  • .is_rowvec():
    • returns true if the matrix can be interpreted as a row vector
    • returns false if the matrix does not have exactly one row

  • Caveat: do not assume that the vector has elements if these functions return true; it is possible to have an empty vector (eg. 0x1)

  • Examples:
      fmat A = randu<fmat>(1, 5);
      fmat B = randu<fmat>(5, 1);
      fmat C = randu<fmat>(5, 5);
      
      cout << A.is_vec() << endl;
      cout << B.is_vec() << endl;
      cout << C.is_vec() << endl;
      

  • See also:



.is_square()
  • Member function of Mat

  • Returns true if the matrix is square, ie. number of rows is equal to the number of columns

  • Returns false if the matrix is not square

  • Examples:
      fmat A = randu<fmat>(5, 5);
      fmat B = randu<fmat>(6, 7);
      
      cout << A.is_square() << endl;
      cout << B.is_square() << endl;
      

  • See also:



.print()
.print( header )

.print( stream )
.print( stream, header )
  • Member functions of Mat, Col, and Row

  • Print the contents of an object to the std::cout stream (default), or a user specified stream, with an optional header string

  • Objects can also be printed using the << stream operator

  • Examples:
      fmat A = randu<fmat>(5, 5);
      fmat B = randu<fmat>(6, 6);
      
      A.print();
      
      // print a transposed version of A
      A.t().print();
      
      // "B:" is the optional header line
      B.print("B:");
      
      cout << A << endl;
      
      cout << "B:" << endl;
      cout << B << endl;
      

  • See also:



.raw_print()
.raw_print( header )

.raw_print( stream )
.raw_print( stream, header )
  • Member functions of Mat, Col, and Row

  • Similar to the .print() member function, with the difference that no formatting of the output is done; the stream's parameters such as precision, cell width, etc. can be set manually

  • If the cell width is set to zero, a space is printed between the elements

  • Examples:
      fmat A = randu<fmat>(5, 5);
      
      cout.precision(11);
      cout.setf(ios::fixed);
      
      A.raw_print(cout, "A:");
      

  • See also:





Generated Vectors / Matrices



linspace( start, end )
linspace( start, end, N )
  • Generate a vector with N elements; the values of the elements are linearly spaced from start to (and including) end

  • The argument N is optional; by default N = 100

  • Usage:
    • fvec v = linspace(start, end, N)
    • vector_type v = linspace<vector_type>(start, end, N)

  • Caveat: for N = 1, the generated vector will have a single element equal to end

  • Examples:
         fvec a = linspace(0, 5, 6);
      
      frowvec b = linspace<frowvec>(5, 0, 6);
      

  • See also:



eye( n_rows, n_cols )
eye( size(X) )
  • Generate a matrix with the elements along the main diagonal set to one and off-diagonal elements set to zero

  • An identity matrix is generated when n_rows = n_cols

  • Usage:
    • mat X = eye( n_rows, n_cols )
    • matrix_type X = eye<matrix_type>( n_rows, n_cols )
    • matrix_type Y = eye<matrix_type>( size(X) )

  • Examples:
        fmat A = eye(5,5);
      
        fmat B = 123.0 * eye<fmat>(5,5);
      
        imat C = eye<imat>( size(B) );
      

  • See also:



ones( n_elem )
ones( n_rows, n_cols )
ones( size(X) )
  • Generate a vector or matrix with all elements set to one

  • Usage:
    • vector_type v = ones<vector_type>( n_elem )
    • matrix_type X = ones<matrix_type>( n_rows, n_cols )
    • matrix_type Y = ones<matrix_type>( size(X) )

  • Examples:
         fvec v = ones(10);
         uvec u = ones<uvec>(10);
      frowvec r = ones<frowvec>(10);
      
      fmat A = ones(5,6);
      imat B = ones<imat>(5,6);
      umat C = ones<umat>(5,6);
      

  • See also:



zeros( n_elem )
zeros( n_rows, n_cols )
zeros( size(X) )
  • Generate a vector or matrix with the elements set to zero

  • Usage:
    • vector_type v = zeros<vector_type>( n_elem )
    • matrix_type X = zeros<matrix_type>( n_rows, n_cols )
    • matrix_type Y = zeros<matrix_type>( size(X) )

  • Examples:
         fvec v = zeros(10);
         uvec u = zeros<uvec>(10);
      frowvec r = zeros<rowvec>(10);
      
      fmat A = zeros(5,6);
      imat B = zeros<imat>(5,6);
      umat C = zeros<umat>(5,6);
      

  • See also:



randu( n_elem )
randu( n_rows, n_cols )
randu( size(X) )
  • Generate a vector or matrix with the elements set to random floating point values uniformly distributed in the [0,1] interval

  • Usage:
    • vector_type v = randu<vector_type>( n_elem )

    • matrix_type X = randu<matrix_type>( n_rows, n_cols )

  • To change the RNG seed, use coot_rng::set_seed(value) or coot_rng::set_seed_random() functions

  • Caveat: to generate a matrix with random integer values instead of floating point values, use randi() instead

  • Examples:
      fvec v1 = randu(5);
      
      frowvec r1 = randu<rowvec>(5);
      
      fmat A1 = randu(5, 6);
      
      mat B1 = randu<mat>(5, 6);
      mat B2 = randu<mat>(5, 6, distr_param(10,20));
      
      coot_rng::set_seed_random(); // set the seed to a random value
      coot_rng::set_seed(42);      // set the seed to a specific value
      
  • See also:



randn( n_elem )
randn( n_elem, distr_param(mu,sd) )

randn( n_rows, n_cols )
randn( n_rows, n_cols, distr_param(mu,sd) )

randn( size(X) )
randn( size(X), distr_param(mu,sd) )
  • Generate a vector or matrix with the elements set to random values with normal / Gaussian distribution, parameterised by mean mu and standard deviation sd

  • The default distribution parameters are mu = 0 and sd = 1

  • Usage:
    • vector_type v = randn<vector_type>( n_elem )
    • vector_type v = randn<vector_type>( n_elem, distr_param(mu,sd) )

    • matrix_type X = randn<matrix_type>( n_rows, n_cols )
    • matrix_type X = randn<matrix_type>( n_rows, n_cols, distr_param(mu,sd) )

  • To change the RNG seed, use coot_rng::set_seed(value) or coot_rng::set_seed_random() functions

  • Examples:
      fvec v1 = randn(5);
      fvec v2 = randn(5, distr_param(10,5));
      
      frowvec r1 = randn<rowvec>(5);
      frowvec r2 = randn<rowvec>(5, distr_param(10,5));
      
      fmat A1 = randn(5, 6);
      fmat A2 = randn(5, 6, distr_param(10,5));
      
      mat B1 = randn<mat>(5, 6);
      mat B2 = randn<mat>(5, 6, distr_param(10,5));
      
      coot_rng::set_seed_random(); // set the seed to a random value
      coot_rng::set_seed(42);      // set the seed to a specific value
      
  • See also:



randi( n_elem )
randi( n_elem, distr_param(a,b) )

randi( n_rows, n_cols )
randi( n_rows, n_cols, distr_param(a,b) )

randi( size(X) )
randi( size(X), distr_param(a,b) )
  • Generate a vector or matrix with the elements set to random integer values uniformly distributed in the [a,b] interval

  • The default distribution parameters are a = 0 and b = maximum_int

  • Usage:
    • vector_type v = randi<vector_type>( n_elem )
    • vector_type v = randi<vector_type>( n_elem, distr_param(a,b) )

    • matrix_type X = randi<matrix_type>( n_rows, n_cols )
    • matrix_type X = randi<matrix_type>( n_rows, n_cols, distr_param(a,b) )

  • To change the RNG seed, use coot_rng::set_seed(value) or coot_rng::set_seed_random() functions

  • Caveat: to generate a matrix with random floating point values (ie. float or double) instead of integers, use randu() instead~

  • Examples:
      imat A1 = randi(5, 6);
      imat A2 = randi(5, 6, distr_param(-10, +20));
      
      fmat B1 = randi<fmat>(5, 6);
      fmat B2 = randi<fmat>(5, 6, distr_param(-10, +20));
      
      coot_rng::set_seed_random(); // set the seed to a random value
      coot_rng::set_seed(42);      // set the seed to a specific value
      
  • See also:





Functions of Vectors / Matrices



abs( X )
  • Obtain the magnitude of each element

  • X and Y must have the same matrix type, such as fmat or ivec

  • Examples:
      fmat A = randu<fmat>(5, 5);
      fmat B = abs(A);
      
      fvec X = linspace<fvec>(-5, 5, 11);
      fvec Y = abs(X);
      
  • See also:



accu( X )
  • Accumulate (sum) all elements of a vector or matrix

  • Examples:
      fmat A = randu<fmat>(5, 6);
      fmat B = randu<fmat>(5, 6);
      
      float x = accu(A);
      
      float y = accu(A % B);
      

  • See also:



all( V )
all( X )
all( X, dim )
  • For vector V, return true if all elements of the vector are non-zero or satisfy a relational condition

  • For matrix X and
    • dim = 0, return a row vector (of type urowvec or umat), with each element (0 or 1) indicating whether the corresponding column of X has all non-zero elements
    • dim = 1, return a column vector (of type ucolvec or umat), with each element (0 or 1) indicating whether the corresponding row of X has all non-zero elements

  • The dim argument is optional; by default dim = 0 is used

  • Relational operators can be used instead of V or X, eg. A > 0.5

  • Examples:
      fvec V = randu<fvec>(10);
      fmat X = randu<fmat>(5, 5);
      
      // status1 will be set to true if vector V has all non-zero elements
      bool status1 = all(V);
      
      // status2 will be set to true if vector V has all elements greater than 0.5
      bool status2 = all(V > 0.5);
      
      // status3 will be set to true if matrix X has all elements greater than 0.6;
      // note the use of vectorise()
      bool status3 = all(vectorise(X) > 0.6);
      
      // generate a row vector indicating which columns of X have all elements greater than 0.7
      umat A = all(X > 0.7);
      
      

  • See also:



any( V )
any( X )
any( X, dim )
  • For vector V, return true if any element of the vector is non-zero or satisfies a relational condition

  • For matrix X and
    • dim = 0, return a row vector (of type urowvec or umat), with each element (0 or 1) indicating whether the corresponding column of X has any non-zero elements
    • dim = 1, return a column vector (of type ucolvec or umat), with each element (0 or 1) indicating whether the corresponding row of X has any non-zero elements

  • The dim argument is optional; by default dim = 0 is used

  • Relational operators can be used instead of V or X, eg. A > 0.9

  • Examples:
      fvec V = randu<fvec>(10);
      fmat X = randu<fmat>(5, 5);
      
      // status1 will be set to true if vector V has any non-zero elements
      bool status1 = any(V);
      
      // status2 will be set to true if vector V has any elements greater than 0.5
      bool status2 = any(V > 0.5);
      
      // status3 will be set to true if matrix X has any elements greater than 0.6;
      // note the use of vectorise()
      bool status3 = any(vectorise(X) > 0.6);
      
      // generate a row vector indicating which columns of X have elements greater than 0.7
      umat A = any(X > 0.7);
      
      

  • See also:



as_scalar( expression )
  • Evaluate an expression that results in a 1x1 matrix, followed by converting the 1x1 matrix to a pure scalar

  • Optimised expression evaluations are automatically used when a binary or trinary expression is given (ie. 2 or 3 terms)

  • Examples:
      frowvec r = randu<frowvec>(5);
      fcolvec q = randu<fcolvec>(5);
      
      mat X(5, 5, fill::randu);
      
      // examples of expressions which have optimised implementations
      
      float a = as_scalar(r*q);
      float b = as_scalar(r*X*q);
      float c = as_scalar(r*diagmat(X)*q);
      float d = as_scalar(r*inv(diagmat(X))*q);
      

  • See also:



clamp( X, min_val, max_val )
  • Create a copy of X with each element clamped to the [min_val, max_val] interval;
    any value lower than min_val will be set to min_val, and any value higher than max_val will be set to max_val

  • Examples:
      fmat A = randu<fmat>(5, 5);
      
      fmat B = clamp(A, 0.2,         0.8);
      
      fmat C = clamp(A, min(min(A)), 0.8);
      
      fmat D = clamp(A, 0.2, max(max(A)));
      
  • See also:



conv_to< type >::from( X )
  • Convert (cast) from one matrix type to another (eg. fmat to imat)

  • Conversion between Armadillo and Bandicoot matrices/vectors is also possible

  • Conversion of a fmat object into fcolvec or frowvec is possible if the object can be interpreted as a vector

  • When conv_to is applied to an expression, the conversion operation will be fused with the expression computation when possible
  • Examples:
      fmat A = randu<fmat>(5, 5);
       mat B = conv_to<mat>::from(A);
      
      fmat C = randu<fmat>(10, 1);
      fcolvec x = conv_to< fcolvec >::from(C);
      
      // convert from Armadillo object
      arma::fmat D = conv_to<arma::fmat>::from(A);
      

  • See also:



cross( A, B )


val = det( A )   (form 1)
det( val, A )   (form 2)
  • Calculate the determinant of square matrix A, based on LU decomposition

  • form 1: return the determinant

  • form 2: store the calculated determinant in val and return a bool indicating success

  • If A is not square sized, a std::logic_error exception is thrown

  • If the calculation fails:
    • val = det(A) throws a std::runtime_error exception
    • det(val,A) returns a bool set to false (exception is not thrown)

  • Examples:
      fmat A = randu<fmat>(5, 5);
      
      float val1 = det(A);         // form 1
      
      float val2;
      bool success = det(val2, A); // form 2
      

  • See also:



diagmat( V )
diagmat( V, k )

diagmat( X )
diagmat( X, k )
  • Generate a diagonal matrix from vector V or matrix X

  • Given vector V, generate a square matrix with the k-th diagonal containing a copy of the vector; all other elements are set to zero

  • Given matrix X, generate a matrix with the k-th diagonal containing a copy of the k-th diagonal of X; all other elements are set to zero

  • If X is an expression, the evaluation of the expression aims to calculate only the diagonal elements

  • The argument k is optional; by default the main diagonal is used (k = 0)

  • For k > 0, the k-th super-diagonal is used (above main diagonal, towards top-right corner)

  • For k < 0, the k-th sub-diagonal is used (below main diagonal, towards bottom-left corner)

  • Examples:
      fmat A = randu<fmat>(5, 5);
      fmat B = diagmat(A);
      fmat C = diagmat(A,1);
      
      fvec v = randu<fvec>(5);
      fmat D = diagmat(v);
      fmat E = diagmat(v,1);
      

  • See also:



diagvec( X )
diagvec( X, k )
  • Extract the k-th diagonal from matrix X

  • If X is an expression, the evaluation of the expression aims to calculate only the diagonal elements

  • The argument k is optional; by default the main diagonal is extracted (k = 0)

  • For k > 0, the k-th super-diagonal is extracted (top-right corner)

  • For k < 0, the k-th sub-diagonal is extracted (bottom-left corner)

  • The extracted diagonal is interpreted as a column vector

  • Examples:
      fmat A = randu<fmat>(5, 5);
      
      fvec d = diagvec(A);
      

  • See also:



dot( A, B )
  • Dot product of A and B, treating A and B as vectors

  • Caveat: norm() is more robust for calculating the norm, as it handles underflows and overflows

  • Examples:
      fvec a = randu<fvec>(10);
      fvec b = randu<fvec>(10);
      
      float x = dot(a,b);
      

  • See also:



find( X )
find( X, k )
find( X, k, s )
  • Return a column vector containing the indices of elements of X that are non-zero or satisfy a relational condition

  • The output vector must have the type uvec (i.e. the indices are stored as unsigned integers of type uword)

  • X is interpreted as a vector, with column-by-column ordering of the elements of X

  • Relational operators can be used instead of X, eg. A > 0.5

  • If k = 0 (default), return the indices of all non-zero elements, otherwise return at most k of their indices

  • If s = "first" (default), return at most the first k indices of the non-zero elements

  • If s = "last", return at most the last k indices of the non-zero elements

  • Caveats:
    • to clamp values to an interval, clamp() is more efficient

  • Examples:
      fmat A = randu<fmat>(5, 5);
      fmat B = randu<fmat>(5, 5);
      
      uvec q1 = find(A > B);
      uvec q2 = find(A > 0.5);
      uvec q3 = find(A > 0.5, 3, "last");
      
      // change elements of A greater than 0.5 to 1
      A.elem( find(A > 0.5) ).ones();
      

  • See also:



find_finite( X )
  • Return a column vector containing the indices of elements of X that are finite (i.e. not ±Inf and not NaN)

  • The output vector must have the type uvec (i.e. the indices are stored as unsigned integers of type uword)

  • X is interpreted as a vector, with column-by-column ordering of the elements of X

  • Examples:
      fmat A = randu<fmat>(5, 5);
      
      A(1, 1) = datum::inf;
      
      // find only finite elements
      uvec f = find_finite(A);
      

  • See also:



find_nonfinite( X )
  • Return a column vector containing the indices of elements of X that are non-finite (i.e. ±Inf or NaN)

  • The output vector must have the type uvec (i.e. the indices are stored as unsigned integers of type uword)

  • X is interpreted as a vector, with column-by-column ordering of the elements of X

  • Examples:
      fmat A = randu<fmat>(5, 5);
      
      A(1, 1) = datum::inf;
      A(2, 2) = datum::nan;
      
      // return indices of two non-finite elements
      uvec f = find_nonfinite(A);
      

  • See also:



find_nan( X )
  • Return a column vector containing the indices of elements of X that are NaN (not-a-number)

  • The output vector must have the type uvec (i.e. the indices are stored as unsigned integers of type uword)

  • X is interpreted as a vector, with column-by-column ordering of the elements of X

  • Examples:
      fmat A = randu<fmat>(5, 5);
      
      A(2, 3) = datum::nan;
      
      // indices will be { 17 }
      uvec indices = find_nan(A);
      

  • See also:



join_rows( A, B )
join_rows( A, B, C )
join_rows( A, B, C, D )
 
join_cols( A, B )
join_cols( A, B, C )
join_cols( A, B, C, D )
       join_horiz( A, B )
join_horiz( A, B, C )
join_horiz( A, B, C, D )
 
join_vert( A, B )
join_vert( A, B, C )
join_vert( A, B, C, D )
  • join_rows() and join_horiz(): horizontal concatenation; join the corresponding rows of the given matrices; the given matrices must have the same number of rows

  • join_cols() and join_vert(): vertical concatenation; join the corresponding columns of the given matrices; the given matrices must have the same number of columns

  • Examples:
      fmat A = randu<fmat>(4, 5);
      fmat B = randu<fmat>(4, 6);
      fmat C = randu<fmat>(6, 5);
      
      fmat AB = join_rows(A, B);
      fmat AC = join_cols(A, C);
      

  • See also:



min( V )
min( M )
min( M, dim )
min( Q )
min( Q, dim )
min( A, B )
       max( V )
max( M )
max( M, dim )
max( Q )
max( Q, dim )
max( A, B )
  • For vector V, return the extremum value

  • For matrix M, return the extremum value for each column (dim = 0), or each row (dim = 1)

  • The dim argument is optional; by default dim = 0 is used

  • For two matrices A and B, return a matrix containing element-wise extremum values

  • Examples:
      fcolvec v = randu<fcolvec>(10);
      float x = max(v);
      
      fmat M = randu<fmat>(10, 10);
      
      frowvec a = max(M);
      frowvec b = max(M, 0);
      fcolvec c = max(M, 1);
      
      // element-wise maximum
      fmat X = randu<fmat>(5, 6);
      fmat Y = randu<fmat>(5, 6);
      fmat Z = coot::max(X, Y); // use coot:: prefix to distinguish from std::max()
      

  • See also:



norm( X )
norm( X, p )
  • Compute the p-norm of X, where X is a vector or matrix

  • For vectors, p is an integer ≥ 1, or one of: "-inf", "inf", "fro"

  • For matrices, p is one of: 1, 2, "inf", "fro"

  • "-inf" is the minimum quasi-norm, "inf" is the maximum norm, "fro" is the Frobenius norm

  • The argument p is optional; by default p = 2 is used

  • For vector norm with p = 2 and matrix norm with p = "fro", a robust algorithm is used to reduce the likelihood of underflows and overflows

  • Caveats:
    • to obtain the zero/Hamming pseudo-norm (the number of non-zero elements), use this expression: accu(X != 0)
    • matrix 2-norm (spectral norm) is based on SVD, which is computationally intensive for large matrices

  • Examples:
      fvec q = randu<fvec>(5);
      
      float x = norm(q, 2);
      float y = norm(q, "inf");
      

  • See also:



normalise( V )
normalise( V, p )

normalise( X )
normalise( X, p )
normalise( X, p, dim )
  • For vector V, return its normalised version (ie. having unit p-norm)

  • For matrix X, return its normalised version, where each column (dim = 0) or row (dim = 1) has been normalised to have unit p-norm

  • The p argument is optional; by default p = 2 is used

  • The dim argument is optional; by default dim = 0 is used

  • Examples:
      fvec A = randu<fvec>(10);
      fvec B = normalise(A);
      fvec C = normalise(A, 1);
      
      fmat X = randu<fmat>(5, 6);
      fmat Y = normalise(X);
      fmat Z = normalise(X, 2, 1);
      

  • See also:



pow( A, scalar )   (form 1)
  • Element-wise power operation: raise all elements in A to the power denoted by the given scalar

  • Caveat:
    • to raise all elements to the power 2, use square() instead

  • Examples:
      fmat A = randu<fmat>(5, 6);
      fmat B = pow(A, 3.45);
      
      frowvec R = randu<frowvec>(6);
      frowvec S = pow(R, -1.0);
      

  • See also:



repmat( A, num_copies_per_row, num_copies_per_col )
  • Generate a matrix by replicating matrix A in a block-like fashion

  • The generated matrix has the following size:
      n_rows = num_copies_per_row×A.n_rows
      n_cols = num_copies_per_col×A.n_cols

  • Examples:
      fmat A = randu<fmat>(2, 3);
      
      fmat B = repmat(A, 4, 5);
      
  • See also:



reshape( X, n_rows, n_cols )
reshape( X, size(Y) )
  • Generate a vector/matrix with given size specifications, whose elements are taken from the given object in a column-wise manner; the elements in the generated object are placed column-wise (i.e. the first column is filled up before filling the second column)

  • The layout of the elements in the generated object will be different to the layout in the given object

  • If the total number of elements in the given object is less than the specified size, the remaining elements in the generated object are set to zero

  • If the total number of elements in the given object is greater than the specified size, only a subset of elements is taken from the given object

  • Caveats:
    • to change the size without preserving data, use .set_size() instead, which is much faster
    • to grow/shrink a matrix while preserving the elements as well as the layout of the elements, use resize() instead
    • to flatten a matrix into a vector, use vectorise() instead

  • Examples:
      fmat A = randu<fmat>(10, 5);
      
      fmat B = reshape(A, 5, 10);
      

  • See also:



resize( X, n_rows, n_cols )
resize( X, size(Y) )
  • Generate a vector/matrix with given size specifications, whose elements as well as the layout of the elements are taken from the given object

  • Caveat: to change the size without preserving data, use .set_size() instead, which is much faster

  • Examples:
      fmat A = randu<fmat>(4, 5);
      
      fmat B = resize(A, 7, 6);
      

  • See also:



size( X )
size( n_rows, n_cols )
  • Obtain the dimensions of object X, or explicitly specify the dimensions

  • The dimensions can be used in conjunction with:

  • The dimensions support simple arithmetic operations; they can also be printed and compared for equality/inequality

  • Caveat: to prevent interference from std::size() in C++17, preface Bandicoot's size() with the coot namespace qualification, eg. coot::size(X)

  • Examples:
      fmat A(5,6);
      
      fmat B = zeros<fmat>(size(A));
      
      fmat C;
      C.randu(size(A));
      
      fmat D = ones<fmat>(size(A));
      
      fmat E = ones<fmat>(10, 20);
      E(3, 4, size(C)) = C;    // access submatrix of E
      
      fmat F( size(A) + size(E) );
      
      fmat G( size(A) * 2 );
      
      cout << "size of A: " << size(A) << endl;
      
      bool is_same_size = (size(A) == size(E));
      

  • See also:



sort( V )
sort( V, sort_direction )

sort( X )
sort( X, sort_direction )
sort( X, sort_direction, dim )
  • For vector V, return a vector which is a sorted version of the input vector

  • For matrix X, return a matrix with the elements of the input matrix sorted in each column (dim = 0), or each row (dim = 1)

  • The dim argument is optional; by default dim = 0 is used

  • The sort_direction argument is optional; sort_direction is either "ascend" or "descend"; by default "ascend" is used

  • The sorting algorithm used is radix sort

  • Examples:
      fmat A = randu<fmat>(10, 10);
      fmat B = sort(A);
      
  • See also:



sort_index( X )
sort_index( X, sort_direction )

stable_sort_index( X )
stable_sort_index( X, sort_direction )
  • Return a vector which describes the sorted order of the elements of X (i.e. it contains the indices of the elements of X)

  • The output vector must have the type uvec (i.e. the indices are stored as unsigned integers of type uword)

  • X is interpreted as a vector, with column-by-column ordering of the elements of X

  • The sort_direction argument is optional; sort_direction is either "ascend" or "descend"; by default "ascend" is used

  • The stable_sort_index() variant preserves the relative order of elements with equivalent values

  • The sorting algorithm used is radix sort

  • Examples:
      fvec q = randu<fvec>(10);
      
      uvec indices = sort_index(q);
      

  • See also:



sum( V )
sum( M )
sum( M, dim )
  • For vector V, return the sum of all elements

  • For matrix M, return the sum of elements in each column (dim = 0), or each row (dim = 1)

  • The dim argument is optional; by default dim = 0 is used

  • Caveat: to get a sum of all the elements regardless of the object type (i.e. vector or matrix), use accu() instead

  • Examples:
      fcolvec v = randu<fcolvec>(10);
      float x = sum(v);
      
      fmat M = randu<fmat>(10, 10);
      
      frowvec a = sum(M);
      frowvec b = sum(M, 0);
      fcolvec c = sum(M, 1);
      
      float y = accu(M);   // find the overall sum regardless of object type
      

  • See also:



symmatu( A )
symmatl( A )
  • symmatu(A): generate symmetric matrix from square matrix A, by reflecting the upper triangle to the lower triangle

  • symmatl(A): generate symmetric matrix from square matrix A, by reflecting the lower triangle to the upper triangle

  • If A is non-square, a std::logic_error exception is thrown

  • Examples:
      fmat A = randu<fmat>(5, 5);
      
      fmat B = symmatu(A);
      fmat C = symmatl(A);
      

  • See also:



trace( X )
  • Sum of the elements on the main diagonal of matrix X

  • If X is an expression, the evaluation of the expression aims to calculate only the diagonal elements

  • Examples:
      fmat A = randu<fmat>(5, 5);
      
      float x = trace(A);
      

  • See also:



trans( A )
strans( A )


vectorise( X )
vectorise( X, dim )
  • Generate a flattened version of matrix X

  • The argument dim is optional; by default dim = 0 is used

  • For dim = 0, the elements are copied from X column-wise, resulting in a column vector; equivalent to concatenating all the columns of X

  • For dim = 1, the elements are copied from X row-wise, resulting in a row vector; equivalent to concatenating all the rows of X

  • Caveat: column-wise vectorisation is faster than row-wise vectorisation

  • Examples:
      fmat X = randu<fmat>(4, 5);
      
      fvec v = vectorise(X);
      

  • See also:



miscellaneous element-wise functions:
    exp    log    square   floor    erf    sign     
    exp2    log2    sqrt   ceil    erfc    lgamma    
    exp10    log10        round             
    trunc_exp   trunc_log       trunc             
  • Apply a function to each element

  • Usage:
    • B = fn(A), where fn(A) is one of the functions below
    • A and B must have the same matrix or vector type, such as fmat or ivec

    exp(A)    base-e exponential: e x
    exp2(A)    base-2 exponential: 2 x
    exp10(A)    base-10 exponential: 10 x
    trunc_exp(A)   base-e exponential, truncated to avoid infinity   (only for float and double elements)
    log(A)    natural log: loge x
    log2(A)    base-2 log: log2 x
    log10(A)    base-10 log: log10 x
    trunc_log(A)   natural log, truncated to avoid ±infinity   (only for float and double elements)
    square(A)   square: x 2
    sqrt(A)   square root: √x
    floor(A)   largest integral value that is not greater than the input value
    ceil(A)   smallest integral value that is not less than the input value
    round(A)   round to nearest integer, with halfway cases rounded away from zero
    trunc(A)   round to nearest integer, towards zero
    erf(A)   error function   (only for float and double elements)
    erfc(A)   complementary error function   (only for float and double elements)
    lgamma(A)   natural log of the absolute value of gamma function   (only for float and double elements)
    sign(A)   signum function; for each element a in A, the corresponding element b in B is:
      ⎧ −1 if a < 0
      b = ⎨  0 if a = 0
      ⎩  +1 if a > 0
    if a is complex and non-zero, then b = a / abs(a)

  • Caveat: all of the above functions are applied element-wise, where each element is treated independently

  • Examples:
      fmat A = randu<fmat>(5, 5);
      fmat B = exp(A);
      

  • See also:



trigonometric element-wise functions (cos, sin, tan, ...)




Decompositions, Factorisations, and Inverses



R = chol( X )   (form 1)
chol( R, X )   (form 2)
  • Cholesky decomposition of symmetric/hermitian matrix X into triangular matrix R

  • By default, R is upper triangular

  • X is required to be positive definite

  • The decomposition has the form X = R.t() * R

  • If the decomposition fails:
    • the form R = chol(X) resets R and throws a std::runtime_error exception
    • the form chol(X, R) resets R and returns a bool set to false (exception is not thrown)

  • Caveat: there is no explicit check that X is symmetric or positive definite

  • Examples:
      fmat A = randu<fmat>(5, 5);
      fmat X = A.t() * A;
      
      mat R1 = chol(X);
      mat R2;
      bool ok = chol(R2, X);
      

  • See also:



vec eigval = eig_sym( X )

eig_sym( eigval, X )

eig_sym( eigval, eigvec, X )
  • Eigendecomposition symmetric/hermitian matrix X

  • The eigenvalues and corresponding eigenvectors are stored in eigval and eigvec, respectively

  • The eigenvalues are in ascending order

  • The eigenvectors are stored as column vectors

  • If X is not square sized, a std::logic_error exception is thrown

  • If the decomposition fails:
    • eigval = eig_sym(X) resets eigval and throws a std::runtime_error exception
    • eig_sym(eigval,X) resets eigval and returns a bool set to false (exception is not thrown)
    • eig_sym(eigval,eigvec,X) resets eigval & eigvec and returns a bool set to false (exception is not thrown)

  • Caveats:
    • there is no explicit check whether X is symmetric/hermitian
    • if eigenvectors are not necessary, it is more efficient to use a form that does not compute them (i.e. eig_sym(eigval, X))

  • Examples:
      // for matrices with real elements
      
      fmat A = randu<fmat>(50, 50);
      fmat B = A.t()*A;  // generate a symmetric matrix
      
      fvec eigval;
      fmat eigvec;
      
      eig_sym(eigval, eigvec, B);
      

  • See also:



lu( L, U, P, X )
lu( L, U, X )
  • Lower-upper decomposition (with partial pivoting) of matrix X

  • The first form provides a lower-triangular matrix L, an upper-triangular matrix U, and a permutation matrix P, such that P.t()*L*U = X

  • The second form provides permuted L and U, such that L*U = X; note that in this case L is generally not lower-triangular

  • If the decomposition fails:
    • lu(L,U,P,X) resets L, U, P and returns a bool set to false (exception is not thrown)
    • lu(L,U,X) resets L, U and returns a bool set to false (exception is not thrown)

  • Examples:
      fmat A = randu<fmat>(5, 5);
      
      fmat L, U, P;
      
      lu(L, U, P, A);
      
      fmat B = P.t() * L * U;
      

  • See also:



B = pinv( A )
B = pinv( A, tolerance )

pinv( B, A )
pinv( B, A, tolerance )
  • Moore-Penrose pseudo-inverse (generalised inverse) of matrix A

  • The computation is based on singular value decomposition

  • The tolerance argument is optional

  • The default tolerance is set to max_rc · max_sv · epsilon, where:
    • mar_rc = max(A.n_rows, A.n_cols)
    • max_sv = maximum singular value of A
    • epsilon = difference between 1 and the least value greater than 1 that is representable

  • Any singular values less than tolerance are treated as zero

  • If the decomposition fails:
    • B = pinv(A) resets B and throws a std::runtime_error exception
    • pinv(B,A) resets B and returns a bool set to false (exception is not thrown)

  • Examples:
      fmat A = randu<fmat>(4, 5);
      
      fmat B = pinv(A);        // use default tolerance
      
      fmat C = pinv(A, 0.01);  // set tolerance to 0.01
      

  • See also:



s = svd( X )

svd( s, X )

svd( U, s, V, X )
  • Singular value decomposition of matrix X into vector of singular values s and matrices of left/right singular vectors U, V

  • If X is square, it can be reconstructed using X = U*diagmat(s)*V.t()

  • The singular values are in descending order

  • If the decomposition fails, the output objects are reset and:
    • s = svd(X) resets s and throws a std::runtime_error exception
    • svd(s,X) resets s and returns a bool set to false (exception is not thrown)
    • svd(U,s,V,X) resets U, s, V and returns a bool set to false (exception is not thrown)

  • Examples:
      fmat X = randu<fmat>(5, 5);
      
      fmat U;
      fvec s;
      fmat V;
      
      svd(U, s, V, X);
      

  • See also:





Signal & Image Processing



conv( A, B )
conv( A, B, shape )
  • 1D convolution of vectors A and B

  • The orientation of the result vector is the same as the orientation of A (ie. either column or row vector)

  • The shape argument is optional; it is one of:
        "full" = return the full convolution (default setting), with the size equal to A.n_elem + B.n_elem - 1
        "same" = return the central part of the convolution, with the same size as vector A

  • The convolution operation is also equivalent to FIR filtering

  • Examples:
      fvec A = randu<fvec>(256);
      
      fvec B = randu<fvec>(16);
      
      fvec C = conv(A, B);
      
      fvec D = conv(A, B, "same");
      

  • See also:



conv2( A, B )
conv2( A, B, shape )
  • 2D convolution of matrices A and B

  • The shape argument is optional; it is one of:
        "full" = return the full convolution (default setting), with the size equal to size(A) + size(B) - 1
        "same" = return the central part of the convolution, with the same size as matrix A

  • Examples:
      fmat A = randu<fmat>(256, 256);
      
      fmat B = randu<fmat>(16, 16);
      
      fmat C = conv2(A, B);
      
      fmat D = conv2(A, B, "same");
      

  • See also:





Statistics



mean, median, stddev, var, range
    mean( V )
    mean( M )
    mean( M, dim )

        ⎫ 
    ⎪ 
    ⎬  mean (average value)
    ⎪ 
    ⎭ 
    median( V )
    median( M )
    median( M, dim )

        ⎫ 
    ⎬  median
    ⎭ 
    stddev( V )
    stddev( V, norm_type )
    stddev( M )
    stddev( M, norm_type )
    stddev( M, norm_type, dim )

        ⎫ 
    ⎪ 
    ⎬  standard deviation
    ⎪ 
    ⎭ 
    var( V )
    var( V, norm_type )
    var( M )
    var( M, norm_type )
    var( M, norm_type, dim )

        ⎫ 
    ⎪ 
    ⎬  variance
    ⎪ 
    ⎭ 
    range( V )
    range( M )
    range( M, dim )

        ⎫ 
    ⎬  range (difference between max and min)
    ⎭ 
  • For vector V, return the statistic calculated using all the elements of the vector

  • For matrix M, find the statistic for each column (dim = 0), or each row (dim = 1)

  • The dim argument is optional; by default dim = 0 is used

  • The norm_type argument is optional; by default norm_type = 0 is used

  • For the var() and stddev() functions:
    • the default norm_type = 0 performs normalisation using N-1 (where N is the number of samples), providing the best unbiased estimator
    • using norm_type = 1 performs normalisation using N, which provides the second moment around the mean

  • Caveat: to obtain statistics for integer matrices/vectors (eg. umat, imat, uvec, ivec), convert to a matrix/vector with floating point values (eg. mat, vec) using the conv_to() function

  • Examples:
      fmat A = randu<fmat>(5, 5);
      
      fmat B  = mean(A);
      fmat C  = var(A);
      float m = mean(mean(A));
      
      fvec v = randu<fvec>(5);
      float x = var(v);
      

  • See also:



cov( X, Y )
cov( X, Y, norm_type )

cov( X )
cov( X, norm_type )
  • For two matrix arguments X and Y, if each row of X and Y is an observation and each column is a variable, the (i,j)-th entry of cov(X,Y) is the covariance between the i-th variable in X and the j-th variable in Y

  • For vector arguments, the type of vector is ignored and each element in the vector is treated as an observation

  • For matrices, X and Y must have the same dimensions

  • For vectors, X and Y must have the same number of elements

  • cov(X) is equivalent to cov(X, X)

  • The norm_type argument is optional; by default norm_type = 0 is used

  • the norm_type argument controls the type of normalisation used, with N denoting the number of observations:
    • for norm_type = 0, normalisation is done using N-1, providing the best unbiased estimation of the covariance matrix (if the observations are from a normal distribution)
    • for norm_type = 1, normalisation is done using N, which provides the second moment matrix of the observations about their mean

  • Examples:
      fmat X = randu<fmat>(4, 5);
      fmat Y = randu<fmat>(4, 5);
      
      fmat C = cov(X, Y);
      fmat D = cov(X, Y, 1);
      

  • See also:



cor( X, Y )
cor( X, Y, norm_type )

cor( X )
cor( X, norm_type )
  • For two matrix arguments X and Y, if each row of X and Y is an observation and each column is a variable, the (i,j)-th entry of cor(X,Y) is the correlation coefficient between the i-th variable in X and the j-th variable in Y

  • For vector arguments, the type of vector is ignored and each element in the vector is treated as an observation

  • For matrices, X and Y must have the same dimensions

  • For vectors, X and Y must have the same number of elements

  • cor(X) is equivalent to cor(X, X)

  • The norm_type argument is optional; by default norm_type = 0 is used

  • the norm_type argument controls the type of normalisation used, with N denoting the number of observations:
    • for norm_type = 0, normalisation is done using N-1
    • for norm_type = 1, normalisation is done using N

  • Examples:
      fmat X = randu<fmat>(4, 5);
      fmat Y = randu<fmat>(4, 5);
      
      fmat R = cor(X, Y);
      fmat S = cor(X, Y, 1);
      

  • See also:





Miscellaneous



backend configuration
  • Bandicoot can use either CUDA or OpenCL as a hardware backend

  • To enable CUDA or OpenCL, set the COOT_USE_CUDA or COOT_USE_OPENCL macros in the Bandicoot configuration

  • If both backends are enabled, select the default backend by setting the COOT_DEFAULT_BACKEND macro to the desired backend (e.g. #define COOT_BACKEND CL_BACKEND)

  • By default, at the time of first usage, Bandicoot will automatically initialise to use the first available device with the default backend

  • Bandicoot can also be manually initialised using the coot_init() function:

    coot_init( )     default initialization
    coot_init( print_info )     initialize to default backend, optionally printing information about the chosen GPU device
    coot_init( "opencl", print_info )     initialize to OpenCL backend; COOT_USE_OPENCL must be enabled
    coot_init( "opencl", print_info, platform_id, device_id )     use a specific OpenCL platform ID and device ID
    coot_init( "cuda", print_info )     initialize to CUDA backend; COOT_USE_CUDA must be enabled
    coot_init( "cuda", print_info, device_id )     use specific CUDA device ID

    • coot_init() returns a boolean indicating whether or not initialisation was successful

    • if print_info is set to true, information about the selected GPU device will be printed

    • for the "opencl" initialisations, platform_id and device_id specify the desired OpenCL platform and device IDs; available platforms and devices can be listed using the clinfo command-line utility, available in most package managers: clinfo -l

    • for the "cuda" initialisations, device_id specifies the desired CUDA device; available device IDs can be listed with the nvidia-smi command-line utility

  • Caveats:
    • calling coot_init() manually must be done before any other Bandicoot operations
    • coot_init() can only be called once
    • if either caveat above is violated when calling coot_init(), a std::runtime_error exception will be thrown

  • At any time, all asychronous operations can be forced to complete by calling coot_synchronise()

  • See also:



constants (pi, inf, eps, ...)
    datum::pi   π, the ratio of any circle's circumference to its diameter
    datum::tau   τ, the ratio of any circle's circumference to its radius (equivalent to 2π)
    datum::inf   ∞, infinity
    datum::nan   “not a number” (NaN); caveat: NaN is not equal to anything, even itself
         
    datum::eps   machine epsilon; approximately 2.2204e-16; difference between 1 and the next representable value
    datum::e   base of the natural logarithm
    datum::sqrt2   square root of 2
         
    datum::log_min   log of minimum non-zero value (type and machine dependent)
    datum::log_max   log of maximum value (type and machine dependent)
    datum::euler   Euler's constant, aka Euler-Mascheroni constant
         
    datum::gratio   golden ratio
    datum::m_u   atomic mass constant (in kg)
    datum::N_A   Avogadro constant
         
    datum::k   Boltzmann constant (in joules per kelvin)
    datum::k_evk   Boltzmann constant (in eV/K)
    datum::a_0   Bohr radius (in meters)
         
    datum::mu_B   Bohr magneton
    datum::Z_0   characteristic impedance of vacuum (in ohms)
    datum::G_0   conductance quantum (in siemens)
         
    datum::k_e   Coulomb's constant (in meters per farad)
    datum::eps_0   electric constant (in farads per meter)
    datum::m_e   electron mass (in kg)
         
    datum::eV   electron volt (in joules)
    datum::ec   elementary charge (in coulombs)
    datum::F   Faraday constant (in coulombs)
         
    datum::alpha   fine-structure constant
    datum::alpha_inv   inverse fine-structure constant
    datum::K_J   Josephson constant
         
    datum::mu_0   magnetic constant (in henries per meter)
    datum::phi_0   magnetic flux quantum (in webers)
    datum::R   molar gas constant (in joules per mole kelvin)
         
    datum::G   Newtonian constant of gravitation (in newton square meters per kilogram squared)
    datum::h   Planck constant (in joule seconds)
    datum::h_bar   Planck constant over 2 pi, aka reduced Planck constant (in joule seconds)
         
    datum::m_p   proton mass (in kg)
    datum::R_inf   Rydberg constant (in reciprocal meters)
    datum::c_0   speed of light in vacuum (in meters per second)
         
    datum::sigma   Stefan-Boltzmann constant
    datum::R_k   von Klitzing constant (in ohms)
    datum::b   Wien wavelength displacement law constant

  • The constants are stored in the Datum<type> class, where type is either float or double;
    for convenience, Datum<double> is typedefed as datum, and Datum<float> is typedefed as fdatum

  • Caveat: datum::nan is not equal to anything, even itself; to check whether a scalar x is finite, use std::isfinite(x)

  • The physical constants were mainly taken from NIST 2018 CODATA values, and some from WolframAlpha (as of 2009-06-23)

  • Examples:
      cout << "speed of light = " << datum::c_0 << endl;
      
      cout << "log_max for floats = ";
      cout << fdatum::log_max << endl;
      
      cout << "log_max for doubles = ";
      cout << datum::log_max << endl;
      
  • See also:



wall_clock
  • Simple timer class for measuring the number of elapsed seconds

  • An instance of the class has two member functions:

    .tic()  
    start the timer
    .toc()  
    return the number of seconds since the last call to .tic()

  • Examples:
      wall_clock timer;
      
      timer.tic();
      
      // ... do something ...
      
      double n = timer.toc();
      
      cout << "number of seconds: " << n << endl;
      
  • See also:



output streams
  • The default stream for printing matrices and cubes is std::cout
    the stream can be changed via the COOT_COUT_STREAM define; see config.hpp

  • The default stream for printing warnings and errors is std::cerr
    the stream can be changed via the COOT_CERR_STREAM define; see config.hpp

  • Whether warnings are printed is controlled by the COOT_PRINT_ERRORS and COOT_DONT_PRINT_ERRORS defines; see config.hpp

  • The COOT_DONT_PRINT_ERRORS define takes precedence over the COOT_PRINT_ERRORS define

  • See also:



uword, sword
  • uword is a typedef for an unsigned integer type; it is used for matrix indices as well as all internal counters and loops

  • sword is a typedef for a signed integer type

  • The minimum width of both uword and sword is either 32 or 64 bits:
    • the default width is 32 bits on 32-bit platforms
    • the default width is 64 bits on 64-bit platforms
    • on most systems, uword is a typedef for size_t

  • Caveat: the Bandicoot uword and sword types are not guaranteed to be the same as the Armadillo arma::uword and arma::sword types

  • See also:



Examples of Matlab/Octave syntax and conceptually corresponding Bandicoot syntax

    Matlab/Octave   Bandicoot   Notes
             
    A(1, 1)   A(0, 0)   indexing in Bandicoot starts at 0
    A(k, k)   A(k-1, k-1)    
             
    size(A,1)   A.n_rows   read only
    size(A,2)   A.n_cols    
    numel(A)   A.n_elem    
             
    A(:, k)   A.col(k)   this is a conceptual example only; exact conversion from Matlab/Octave to Bandicoot syntax will require taking into account that indexing starts at 0
    A(k, :)   A.row(k)    
    A(:, p:q)   A.cols(p, q)    
    A(p:q, :)   A.rows(p, q)    
    A(p:q, r:s)   A( span(p,q), span(r,s) )   A( span(first_row, last_row), span(first_col, last_col) )
             
    A'   A.t() or trans(A)   matrix transpose / Hermitian transpose
             
    A = zeros(size(A))   A.zeros()    
    A = ones(size(A))   A.ones()    
    A = zeros(k)   A = zeros<fmat>(k,k)    
    A = ones(k)   A = ones<fmat>(k,k)    
             
    A .* B   A % B   element-wise multiplication
    A ./ B   A / B   element-wise division
    A = A + 1;   A++    
    A = A - 1;   A--    
             
    X = A(:)   X = vectorise(A)    
    X = [ A  B ]   X = join_horiz(A,B)    
    X = [ A; B ]   X = join_vert(A,B)    
             
    A   cout << A << endl;
    or
    A.print("A =");
       
    A = randn(2,3);
    B = randn(4,5);
      fmat A = randn(2,3);
    fmat B = randn(4,5);
       



Armadillo/Bandicoot conversion guide
  • Bandicoot is meant to be a GPU-accelerated linear algebra library that is API-compatible with Armadillo and thus can function as a drop-in replacement; however, due to the different architecture of the GPU and other constraints, it is not always a benefit to use Bandicoot instead of Armadillo

  • The first run of any Bandicoot program requires compiling all Bandicoot kernel functions for the given device, which can be a time-consuming process; kernels are cached and subsequent runs will use the cache
    • Upgrading Bandicoot versions may incur recompilation of kernels
    • Using a new backend for the first time may incur recompilation of kernels
    • Using a new device may incur recompilation of kernels
    • For more information see the kernel cache documentation

  • Where possible, use batch operations with Bandicoot; e.g., use A += 1 instead of for (uword i = 0; i < A.n_elem; ++i) { A[i] += 1; }

  • GPUs are best suited for operations on large matrices, so small matrices (e.g. less than 100 elements) may not show significant speedup

  • Individual element access (such as A.at(i, j)) requires a transfer between the GPU and CPU; when adapting Armadillo code to Bandicoot, these should be avoided wherever possible
    • If such operations cannot be avoided, consider temporarily transferring the entire Bandicoot matrix back to memory by creating an Armadillo matrix with conv_to<arma::fmat>() or similar
    • For this reason, unlike Armadillo, Bandicoot does not provide iterators: they are guaranteed to be inefficient

  • Consumer-level GPUs are not designed for intensive linear algebra operations and thus may not show significant speedup; the best results will be obtained with high-end hardware

  • Most GPUs show better performance with 32-bit floating point elements (e.g. float instead of double), so using fmat instead of mat is recommended wherever possible

  • If support you need for a conversion is not available, please file a bug report so that the support can be prioritised



example program
    #include <iostream>
    #include <bandicoot>
    
    using namespace std;
    using namespace coot;
    
    int main()
      {
      fmat A = randu<fmat>(4, 5);
      fmat B = randu<fmat>(4, 5);
    
      cout << A * B.t() << endl;
    
      return 0;
      }
    
  • If the above program is stored as example.cpp, under Linux and macOS it can be compiled using:
      g++ example.cpp -o example -std=c++11 -O2 -lbandicoot

  • Bandicoot extensively uses template meta-programming, so it's recommended to enable optimisation when compiling programs (eg. use the -O2 or -O3 options for GCC or clang)

  • See the Questions page for more info on compiling and linking

  • If coming from Armadillo, be sure to check the Armadillo/Bandicoot differences for advice on writing efficient code

  • See also the example program that comes with the Bandicoot archive



config.hpp
  • Bandicoot can be configured via editing the file include/bandicoot_bits/config.hpp

  • Specific functionality can be enabled or disabled by uncommenting or commenting out a particular #define, listed below.

  • Some options can also be specified by explicitly defining them before including the bandicoot header.

  • COOT_DONT_USE_WRAPPER   Disable going through the run-time Bandicoot wrapper library (libbandicoot.so) when calling GPU-specific functions. Overrides COOT_USE_WRAPPER. You will need to directly link with GPU libraries (e.g. -lOpenCL -lclBLAS or similar depending on backend configuration)
         
    COOT_USE_WRAPPER   Enable use of Bandicoot wrapper library, which allows linking against all enabled backends with -lbandicoot only.
         
    COOT_USE_OPENCL   Enable use of OpenCL as a GPU backend. Note that either COOT_USE_OPENCL or COOT_USE_CUDA must be enabled. OpenCL headers and clBLAS headers must be available on the system.
         
    COOT_USE_CUDA   Enable use of CUDA as a GPU backend. Note that either COOT_USE_OPENCL or COOT_USE_CUDA must be enabled. The CUDA toolkit must be available on the system.
         
    COOT_DEFAULT_BACKEND   Set the backend that Bandicoot will use. This is only necessary if multiple backends are enabled; that is, when both COOT_USE_OPENCL and COOT_USE_CUDA are enabled. This should be set to either CUDA_BACKEND or CL_BACKEND (e.g. #define COOT_BACKEND CUDA_BACKEND). See also the backend configuration documentation.
         
    COOT_USE_OPENMP   Use OpenMP for parallelisation of some CPU-based parts of Bandicoot functionalities. Automatically enabled when using a compiler which has OpenMP 3.1+ active (eg. the -fopenmp option for gcc and clang). Note: this may not have a noticeable effect on performance since most Bandicoot implementations do not use the CPU heavily or at all.
         
    COOT_DONT_USE_OPENMP   Disable use of OpenMP for parallelisation; overrides COOT_USE_OPENMP.
         
    COOT_KERNEL_CACHE_DIR   If defined, specifies a custom directory to use for the kernel cache. Distribution packagers may choose to specify COOT_SYSTEM_KERNEL_CACHE_DIR, though it is overridden by COOT_KERNEL_CACHE_DIR if specified.
         
    COOT_BLAS_CAPITALS   Use capitalised (uppercase) BLAS and LAPACK function names (eg. DGEMM vs dgemm)
         
    COOT_BLAS_UNDERSCORE   Append an underscore to BLAS and LAPACK function names (eg. dgemm_ vs dgemm). Enabled by default.
         
    COOT_BLAS_LONG   Use "long" instead of "int" when calling BLAS and LAPACK functions
         
    COOT_BLAS_LONG_LONG   Use "long long" instead of "int" when calling BLAS and LAPACK functions
         
    COOT_NO_DEBUG   Disable all run-time checks, including size conformance and bounds checks. NOT RECOMMENDED. DO NOT USE UNLESS YOU KNOW WHAT YOU ARE DOING AND ARE WILLING TO RISK THE DOWNSIDES. Keeping run-time checks enabled during development and deployment greatly aids in finding mistakes in your code.
         
    COOT_EXTRA_DEBUG   Print out the trace of internal functions used for evaluating expressions. Not recommended for normal use. This is mainly useful for debugging the library.
         
    COOT_COUT_STREAM   The default stream used for printing matrices and cubes by .print(). Must be always enabled. By default defined to std::cout
         
    COOT_CERR_STREAM   The default stream used for printing warnings and errors. Must be always enabled. By default defined to std::cerr

  • See also:



direct linking
  • If COOT_USE_WRAPPER is not defined (or COOT_DONT_USE_WRAPPER is defined), then Bandicoot will need to be linked against all dependencies of its backends

  • Unfortunately this could be a lot of dependencies depending on configuration options; so, enabling COOT_USE_WRAPPER is the default and is recommended

  • Regardless of backend configuration, these libraries must always be linked against:

  • If COOT_USE_OPENCL is set (i.e. the OpenCL backend is enabled), these libraries must be linked against:
    • -lOpenCL (core OpenCL support)
    • -lclBLAS (clBLAS for BLAS operations)

  • If COOT_USE_CUDA is set (i.e. the CUDA backend is enabled), these libraries must be linked against:
    • -lcuda (core CUDA support)
    • -lcudart (CUDA runtime library)
    • -lnvrtc (runtime compilation of CUDA kernels)
    • -lcublas (cuBLAS for BLAS operations)
    • -lcusolver (cuSolverDn for decompositions and factorisations)
    • -lcurand (cuRand for random number generation)



kernel cache
  • In order to perform GPU-based linear algebra, Bandicoot must first compile GPU kernel functions to a particular device

  • The first time Bandicoot is run on a system, all GPU kernel functions will be compiled; this can take a long time! (usually less than 3-5 minutes)

  • Compiled kernels are stored in disk in the kernel cache for later reuse

  • Compiled kernels are specific to Bandicoot version, backend, and device; thus, if any of those three factors change, recompilation will be triggered; see the backend configuration documentation for more details

  • The default location to store the kernel cache is
    • ~/.bandicoot/cache/ on Linux and OS X and UNIX-like systems
    • %APPDATA%\bandicoot\cache on Windows (e.g. C:\Users\Username\AppData\bandicoot\cache)

  • Custom locations can be specified with the COOT_KERNEL_CACHE_DIR configuration variable



History of API Additions, Changes and Deprecations
  • API Stability and Version Policy:

    • Each release of Bandicoot has its public API (functions, classes, constants) described in the accompanying API documentation specific to that release.

    • Each release of Bandicoot has its full version specified as A.B.C, where A is a major version number, B is a minor version number, and C is a patch level (indicating bug fixes). The version specification has explicit meaning, similar to Semantic Versioning, as follows:

      • Within a major version (eg. 1), each minor version (eg. 1.1) has a public API that strongly strives to be backwards compatible (at the source level) with the public API of preceding minor versions. For example, user code written for version 1.0 should work with version 1.1, 1.2, etc. However, later minor versions may have more features (API additions and extensions) than preceding minor versions. As such, user code specifically written for version 1.2 may not work with 1.1.

      • An increase in the patch level, while the major and minor versions are retained, indicates modifications to the code and/or documentation which aim to fix bugs without altering the public API.

      • We don't like changes to existing public API and strongly prefer not to break any user software. However, to allow evolution, the public API may be altered in future major versions while remaining backwards compatible in as many cases as possible (eg. major version 2 may have slightly different public API than major version 1).

    • Caveat: the above policy applies only to the public API described in the documentation. Any functionality within Bandicoot which is not explicitly described in the public API documentation is considered as internal implementation details, and may be changed or removed without notice.


  • List of additions and changes for each version:

    • Version 1.0:
      • first stable release!