# What are tensors and tensor decompositions?¶

Tensors are higher-order generalisations of a matrix.
A matrix is a square of numbers and is, therefore, a second-order tensor.
However, we can also have a cube of numbers, which would be a third-order tensor.
So, a third-order tensor is a three-dimensional cube of numbers.
However, since *dimensions* often denote the number of entries in a vector, we do not use the word
dimension for tensors. Instead, we use the word *mode* [KB09].

Tensors in Python are often represented by NumPy arrays, and NumPy uses the word *axis* for the different modes.
Therefore, in TLViz, we alias `mode`

and `axis`

in all the function calls.
However, the word `mode`

is generally used in the documentation.

## Notation¶

Above, we see an illustration of the notation used in TLViz.
We represent dense tensors either as NumPy arrays,
Pandas DataFrames or xarray DataArrays with the name `dataset`

.
PARAFAC (or CP or CPD) models are represented by a tuple, where the first element is a vector of weights,
one for each component, and the second element is a list of factor matrices, one for each mode.
This form is compatible with TensorLy, and we have also taken care to give variables
on this form the name `cp_tensor`

to match TensorLy.

In TLViz, we also use the terms *labelled* and *unlabelled* dataset and decompositions.
A labelled dataset is either a Pandas DataFrame or an xarray DataArray.
By using Pandas and xarray objects, we keep the metadata together with the dataset,
making it easier to produce rich visualisations later.
Similarly, a labelled decomposition is a decomposition where the factor matrices are stored as Pandas DataFrames
with an index equal to the coordinates along the corresponding mode of the xarray DataArray.
TLViz can seamlessly work with both labelled and unlabelled data and decompositions,
but we recommend using the labelled variant whenever possible.

## What are tensor factorisations?¶

Similarly to matrix factorisation methods, tensor factorisation methods decompose a tensor into low-rank components. These components can be very informative and give insight into the patterns in the data. One of the most commonly used tensor factorisation methods is PARAFAC (also known as CP and CPD). PARAFAC can be considered a generalisation of two-way methods such as principal component analysis (PCA) and nonnegative matrix factorisation (NMF) to higher-order data. Let us here see how PARAFAC decomposition relates to matrix decomposition. With matrix factorisation, we describe a matrix, \(\mathbf{X}\), as the outer product of two-factor matrices:

where \(\mathbf{A}\) and \(\mathbf{B}\) are *factor matrices* that contain the patterns along the rows and
columns of \(\mathbf{X}\). The first component is represented by the first columns of \(\mathbf{A}\) and \(\mathbf{B}\),
the second component is represented by the second columns and so forth.

We can look at an example to better understand these factor matrices. If \(\mathbf{X}\) is a matrix of movie scores given by various users, then each component could represent a genre and the \(i\)-th row of \(\mathbf{A}\) could contain information about which movie genres the \(i\)-th user likes. Likewise, the \(j\)-th row of \(\mathbf{B}\) could contain information about how “strongly” each movie belonged to each genre.

The next step is to generalise this for more dimensions. We may for example have time. In that case, we have a tensor, \(\mathcal{X}\), which represents how much different people like different movies at different time points. To see how we can matrix factorisation for such a case, we first rewrite the equation above so we consider each entry, \(x_{ij}\), of \(\mathbf{X}\) instead:

If we now introduce a third mode (represented by a new index, \(k\)), we see an obvious way to extend this

Here, we have three sets of factor matrices, \(\mathbf{A}\), \(\mathbf{B}\) and \(\mathbf{C}\). In the movie example, the first two factor matrices still represent the same. The third-factor matrix, \(\mathbf{C}\) represents how popular the different genres are at different time points.

The model we just described is called the PARAFAC, CP or CPD model.
However, it is also common to introduce *weights* to the components.
These weights represent each component’s “signal strength” (similar to a singular value if you are familiar with the SVD).
If we include the weights in the equation
above, we get

This overview was only a very brief introduction to tensor factorisations. For a more thorough introduction, we recommend [KB09] (a thorough introduction to tensors) and [Bro97] (a thorough introduction to PARAFAC).