Factor tools¶

Functions:

 check_cp_tensor_equal(cp_tensor1, cp_tensor2) Check if the factor matrices and weights are equal. check_cp_tensors_equivalent(cp_tensor1, ...) Check if the decompositions are equivalent check_factor_matrix_close(factor_matrix1, ...) Check that all entries in a factor matrix are close, if labelled, then label equality is also checked. check_factor_matrix_equal(factor_matrix1, ...) Check that all entries in a factor matrix are close, if labelled, then label equality is also checked. cosine_similarity(factor_matrix1, factor_matrix2) The average cosine similarity (Tucker congruence) with optimal column permutation. degeneracy_score(cp_tensor) Compute the degeneracy score for a given decomposition. distribute_weights(cp_tensor, weight_behaviour) Utility to distribute the weights of a CP tensor. distribute_weights_evenly(cp_tensor) Ensure that the weight-vector consists of ones and all factor matrices have equal norm distribute_weights_in_one_mode(cp_tensor, mode) Normalise all factors and multiply the weights into one mode. factor_match_score(cp_tensor1, cp_tensor2[, ...]) Compute the factor match score between cp_tensor1 and cp_tensor2. get_cp_permutation(cp_tensor[, ...]) Find the optimal permutation between two CP tensors. get_factor_matrix_permutation(...[, ...]) Find optimal permutation of the factor matrices normalise_cp_tensor(cp_tensor) Ensure that the all factor matrices have unit norm, and all weight is stored in the weight-vector percentage_variation(cp_tensor[, dataset, ...]) Compute the percentage of variation captured by each component. permute_cp_tensor(cp_tensor[, permutation, ...]) Permute the CP tensor
tlviz.factor_tools.check_cp_tensor_equal(cp_tensor1, cp_tensor2, ignore_labels=False)[source]

Check if the factor matrices and weights are equal.

This will check if the factor matrices and weights are exactly equal to one another. It will not check if the two decompositions are equivalent. For example, if cp_tensor2 contain the same factors as cp_tensor1, but permuted, or with the weights distributed differently between the modes, then this function will return False. To check for equivalence, use check_cp_tensors_equivalent.

Parameters:
cp_tensor1CPTensor or tuple

TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument

cp_tensor2CPTensor or tuple

TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument

ignore_labelsbool

If True, then labels (i.e. DataFrame column names and indices) can differ.

Returns:
bool

Whether the decompositions are equal.

check_cp_tensors_equivalent

Function for checking if two CP tensors represent the same dense tensor.

Examples

check_cp_tensor_equal checks for strict equality of the factor matrices and weights.

>>> from tlviz.data import simulated_random_cp_tensor
>>> from tlviz.factor_tools import check_cp_tensor_equal
>>> cp_tensor, dataset = simulated_random_cp_tensor((10, 20, 30), 3, seed=0)
>>> check_cp_tensor_equal(cp_tensor, cp_tensor)
True


But it does not check the identity of the decompositions, only their numerical values

>>> cp_tensor2, dataset2 = simulated_random_cp_tensor((10, 20, 30), 3, seed=0)
>>> check_cp_tensor_equal(cp_tensor, cp_tensor2)
True


Normalising a cp_tensor changes its values, so then we do not have strict equality of the factor matrices, even though the decomposition is equivalent

>>> from tlviz.factor_tools import normalise_cp_tensor
>>> normalised_cp_tensor = normalise_cp_tensor(cp_tensor)
>>> check_cp_tensor_equal(cp_tensor, normalised_cp_tensor)
False


Permutations will also make the numerical values of thecp_tensor change

>>> from tlviz.factor_tools import permute_cp_tensor
>>> check_cp_tensor_equal(cp_tensor, permute_cp_tensor(cp_tensor, permutation=[1, 2, 0]))
False

tlviz.factor_tools.check_cp_tensors_equivalent(cp_tensor1, cp_tensor2, rtol=1e-05, atol=1e-08, ignore_labels=False)[source]

Check if the decompositions are equivalent

This will check if the factor matrices and weights are equivalent. That is if they represent the same tensor. This differs from checking equality in the sense that if cp_tensor2 contain the same factors as cp_tensor1, but permuted, or with the weights distributed differently between the modes, then they are not equal, but equivalent.

Parameters:
cp_tensor1CPTensor or tuple

TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument

cp_tensor2CPTensor or tuple

TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument

rtolfloat

Relative tolerance (see numpy.allclose)

atolfloat

Absolute tolerance (see numpy.allclose)

ignore_labelsbool

If True, then labels (i.e. DataFrame column names and indices) can differ.

Returns:
bool

Whether the decompositions are equivalent.

check_cp_tensor_equivalent

Function for checking if two CP tensors have the same numerical value (have equal weights and factor matrices)

Examples

check_cp_tensors_equivalent checks if two CP tensors represent the same dense tensor

>>> from tlviz.data import simulated_random_cp_tensor
>>> from tlviz.factor_tools import check_cp_tensors_equivalent
>>> cp_tensor, dataset = simulated_random_cp_tensor((10, 20, 30), 3, seed=0)
>>> cp_tensor2, dataset2 = simulated_random_cp_tensor((10, 20, 30), 3, seed=0)
>>> check_cp_tensors_equivalent(cp_tensor, cp_tensor2)
True


Normalising a cp_tensor changes its values, but not which dense tensor it represents

>>> from tlviz.factor_tools import normalise_cp_tensor
>>> normalised_cp_tensor = normalise_cp_tensor(cp_tensor)
>>> check_cp_tensors_equivalent(cp_tensor, normalised_cp_tensor)
True


Permutations will also make the numerical values of thecp_tensor change but not the dense tensor it represents

>>> from tlviz.factor_tools import permute_cp_tensor
>>> check_cp_tensors_equivalent(cp_tensor, permute_cp_tensor(cp_tensor, permutation=[1, 2, 0]))
True

tlviz.factor_tools.check_factor_matrix_close(factor_matrix1, factor_matrix2, rtol=1e-05, atol=1e-08, ignore_labels=False)[source]

Check that all entries in a factor matrix are close, if labelled, then label equality is also checked.

This function is similar to numpy.allclose, but works on both labelled and unlabelled factor matrices. If the factor matrices are labelled, then the DataFrame index and columns are also compared (unless ignore_labels=True).

Parameters:
factor_matrix1numpy.ndarray or pandas.DataFrame

Labelled or unlabelled factor matrix

cp_tensor2CPTensor or tuple

Labelled or unlabelled factor matrix

rtolfloat

Relative tolerance (see numpy.allclose)

atolfloat

Absolute tolerance (see numpy.allclose)

ignore_labelsbool

If True, then labels (i.e. DataFrame column names and indices) can differ.

Returns:
bool

Whether the decompositions are equivalent.

Examples

check_factor_matrix_close checks if two factor matrices are close up to round off errors.

>>> from tlviz.data import simulated_random_cp_tensor
>>> import numpy as np
>>> A = np.arange(6).reshape(3, 2).astype(float)
>>> B = A + 1e-10
>>> check_factor_matrix_close(A, B)
True


If we make only one of them into a DataFrame, then the factor matrices are not close

>>> import pandas as pd
>>> A_labelled = pd.DataFrame(A)
>>> check_factor_matrix_close(A_labelled, B)
False
>>> check_factor_matrix_close(B, A_labelled)
False


If we turn B into a DataFrame too, it passes again

>>> B_labelled = pd.DataFrame(A)
>>> check_factor_matrix_close(A_labelled, B_labelled)
True


The index is checked for equality, so if we change the index of B_labelled, then the factor matrices are not close

>>> B_labelled.index += 1
>>> check_factor_matrix_close(A_labelled, B_labelled)
False


However, we can disable checking the labels by using the ignore_labels argument

>>> check_factor_matrix_close(A_labelled, B_labelled, ignore_labels=True)
True

tlviz.factor_tools.check_factor_matrix_equal(factor_matrix1, factor_matrix2, ignore_labels=False)[source]

Check that all entries in a factor matrix are close, if labelled, then label equality is also checked.

This function is similar to numpy.allclose, but works on both labelled and unlabelled factor matrices. If the factor matrices are labelled, then the DataFrame index and columns are also compared (unless ignore_labels=True).

Parameters:
factor_matrix1numpy.ndarray or pandas.DataFrame

Labelled or unlabelled factor matrix

cp_tensor2CPTensor or tuple

Labelled or unlabelled factor matrix

rtolfloat

Relative tolerance (see numpy.allclose)

atolfloat

Absolute tolerance (see numpy.allclose)

ignore_labelsbool

If True, then labels (i.e. DataFrame column names and indices) can differ.

Returns:
bool

Whether the decompositions are equivalent.

Examples

check_factor_matrix_equal checks if two factor matrices are exactly the same.

>>> from tlviz.data import simulated_random_cp_tensor
>>> import numpy as np
>>> A = np.arange(6).reshape(3, 2).astype(float)
>>> B = A.copy()
>>> check_factor_matrix_equal(A, B)
True


If they are only the same up to round off errors, then this function returns False

>>> check_factor_matrix_equal(A, B + 1e-10)
False


If we make only one of them into a DataFrame, then the factor matrices are not equal

>>> import pandas as pd
>>> A_labelled = pd.DataFrame(A)
>>> check_factor_matrix_equal(A_labelled, B)
False
>>> check_factor_matrix_equal(B, A_labelled)
False


If we turn B into a DataFrame too, it passes again

>>> B_labelled = pd.DataFrame(A)
>>> check_factor_matrix_equal(A_labelled, B_labelled)
True


The index is checked for equality, so if we change the index of B_labelled, then the factor matrices are not equal

>>> B_labelled.index += 1
>>> check_factor_matrix_equal(A_labelled, B_labelled)
False


However, we can disable checking the labels by using the ignore_labels argument

>>> check_factor_matrix_equal(A_labelled, B_labelled, ignore_labels=True)
True

tlviz.factor_tools.cosine_similarity(factor_matrix1, factor_matrix2)[source]

The average cosine similarity (Tucker congruence) with optimal column permutation.

The cosine similarity between two vectors is computed as

$\cos (\mathbf{x}, \mathbf{y}) = \frac{\mathbf{x}^\mathsf{T}}{\|\mathbf{x}\|}\frac{\mathbf{y}}{\|\mathbf{y}\|}$

This function returns the average cosine similarity between the columns vectors of the two factor matrices, using the optimal column permutation.

Parameters:
factor_matrix1np.ndarray or pd.DataFrame

First factor matrix

factor_matrix2np.ndarray or pd.DataFrame

Second factor matrix

Returns:
float

The average cosine similarity.

tlviz.factor_tools.degeneracy_score(cp_tensor)[source]

Compute the degeneracy score for a given decomposition.

PARAFAC models can be degenerate, which is a sign that we should be careful before interpreting that model. For a third order tensor, this generally manifests in a triple cosine of two components that approach -1. That is

$\cos(\mathbf{a}_{r}, \mathbf{a}_{s}) \cos(\mathbf{b}_{r}, \mathbf{b}_{s}) \cos(\mathbf{c}_{r}, \mathbf{c}_{s}) \approx -1$

for some $$r \neq s$$, where $$\mathbf{A}, \mathbf{B}$$ and $$\mathbf{C}$$ are factor matrices and

$\cos(\mathbf{x}, \mathbf{y}) = \frac{\mathbf{x}^\mathsf{T} \mathbf{y}}{\|\mathbf{x}\| \|\mathbf{y}\|}.$

Furthermore, the magnitude of the degenerate components are unbounded and will approach infinity as the number of iterations increase.

Degenerate solutions typically signify that the decomposition is unreliable, and one should take care before interpreting the components. Degeneracy can, in fact, be a sign that the PARAFAC problem is ill-posed. There are certain tensors where there are no solutions to the least squares problem to needed to fit PARAFAC models. And in those cases, the “optimal” but unobtainable PARAFAC decomposition will have component vectors with infinite norm that point in opposite directions [KDS08].

There are several strategies to avoid degenerate solutions:

• Fitting models with more random initialisations

• Decreasing the convergence tolerance or increasing the number of iterations

• Imposing non-negativity constraints in all modes

• Imposing orthogonality constraints in at least one mode

• Changing the number of components

Both non-negativity constraints and orthogonality constraints will remove the potential ill-posedness of the CP model. We can, in fact, not obtain degenerate solutions when we impose such constriants [KDS08]

To measure degeneracy, we compute the degeneracy score, which is the minimum triple cosine (for a third-order tensor). A score close to -1 signifies a degenerate solution. A score of -0.85 is an indication of a troublesome model [Kri93] (as cited in [Bro97]).

Note

There are other kinds of degeneracies too. For example three-component degeneracies, which manifests as two components of increasing magnitude and one other component equal to the negative sum of the former two [Paa00, Ste06]. However, it is the two-component degeneracy that is most commonly discussed in the litterature [Bro97, KDS08, ZK02]. Still, if three or more components display weights that have a much higher magnitude than the data, there is a reason to be concerned.

Parameters:
cp_tensorCPTensor or tuple

TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument.

Returns:
degeneracy_scorefloat

Degeneracy score, between 1 and -1. A score close to -1 signifies a degenerate solution. A score of -0.85 is an indication of a troublesome model [Kri93] (as cited in [Bro97]).

Examples

We begin by constructing a random simulated cp tensor and compute the degeneracy score

>>> from tlviz.data import simulated_random_cp_tensor
>>> from tlviz.factor_tools import degeneracy_score
>>> cp_tensor = simulated_random_cp_tensor((10, 11, 12), rank=3, seed=0)[0]
>>> print(f"Degeneracy score: {degeneracy_score(cp_tensor):.2f}")
Degeneracy score: 0.35


We see that (as expected) the random cp_tensor is not very degenerate. To simulate a tensor with two-component degeneracy, we can, for example, replace one of the components with a flipped copy of another component

>>> w, (A, B, C) = cp_tensor
>>> A[:,1] = -A[:, 0]
>>> B[:,1] = -B[:, 0]
>>> C[:,1] = -C[:, 0]
>>> print(f"Degeneracy score: {degeneracy_score(cp_tensor):.2f}")
Degeneracy score: -1.00


We see that this modified cp_tensor is degenerate.

tlviz.factor_tools.distribute_weights(cp_tensor, weight_behaviour, weight_mode=0)[source]

Utility to distribute the weights of a CP tensor.

Parameters:
cp_tensorCPTensor or tuple

TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument.

weight_behaviour{“ignore”, “normalise”, “evenly”, “one_mode”} (default=”normalise”)

How to handle the component weights.

• "ignore" - Do nothing

• "normalise" - Normalise all factor matrices

• "evenly" - All factor matrices have equal norm

• "one_mode" - The weight is allocated in one mode, all other factor matrices have unit norm columns.

weight_modeint (optional)

Which mode to have the component weights in (only used if weight_behaviour="one_mode")

Returns:
tuple

The scaled CP tensor.

Raises:
ValueError

If weight_behaviour is not one of "ignore", "normalise", "evenly" or "one_mode".

normalise_cp_tensor

Give all component vectors unit norm

distribute_weights_evenly

Give all component vectors the same norm and set the weight-array to one.

distribute_weights_in_one_mode

Keep all the weights in one factor matrix and set the weight-array to one.

tlviz.factor_tools.distribute_weights_evenly(cp_tensor)[source]

Ensure that the weight-vector consists of ones and all factor matrices have equal norm

Parameters:
cp_tensorCPTensor or tuple

TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument.

Returns:
tuple

The scaled CP tensor.

tlviz.factor_tools.distribute_weights_in_one_mode(cp_tensor, mode, axis=None)[source]

Normalise all factors and multiply the weights into one mode.

The CP tensor is scaled so all factor matrices except one have unit norm columns and the weight-vector contains only ones.

Parameters:
cp_tensorCPTensor or tuple

TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument.

modeint

Which mode (axis) to store the weights in

axisint (optional)

Alias for mode. If this is set, then no value is needed for mode

Returns:
tuple

The scaled CP tensor.

tlviz.factor_tools.factor_match_score(cp_tensor1, cp_tensor2, consider_weights=True, skip_mode=None, return_permutation=False, absolute_value=True, allow_smaller_rank=False)[source]

Compute the factor match score between cp_tensor1 and cp_tensor2.

The factor match score is used to measure the similarity between two sets of components. There are many definitions of the FMS, but one common definition for third order tensors is given by:

$\sum_{r=1}^R \frac{\mathbf{a}_r^T \hat{\mathbf{a}}_r}{\|\mathbf{a}_r^T\| \|\hat{\mathbf{a}}_r\|} \frac{\mathbf{b}_r^T \hat{\mathbf{b}}_r}{\|\mathbf{b}_r^T\| \|\hat{\mathbf{b}}_r\|} \frac{\mathbf{c}_r^T \hat{\mathbf{c}}_r}{\|\mathbf{c}_r^T\| \|\hat{\mathbf{c}}_r\|},$

where $$\mathbf{a}, \mathbf{b}$$ and $$\mathbf{c}$$ are the component vectors for one of the decompositions and $$\hat{\mathbf{a}}, \hat{\mathbf{b}}$$ and $$\hat{\mathbf{c}}$$ are the component vectors for the other decomposition. Often, the absolute value of the inner products is used instead of just the inner products (i.e. $$|\mathbf{a}_r^T \hat{\mathbf{a}}_r|$$).

The above definition does not take the norm of the component vectors into account. However, sometimes, we also wish to compare their norm. In that case, set the consider_weights argument to True to compute

$\sum_{r=1}^R \left(1 - \frac{w_r \hat{w}_r}{\max\left( w_r \hat{w}_r \right)}\right) \frac{\mathbf{a}_r^T \hat{\mathbf{a}}_r}{\|\mathbf{a}_r^T\|\|\hat{\mathbf{a}}_r\|} \frac{\mathbf{b}_r^T \hat{\mathbf{b}}_r}{\|\mathbf{b}_r^T\|\|\hat{\mathbf{b}}_r\|} \frac{\mathbf{c}_r^T \hat{\mathbf{c}}_r}{\|\mathbf{c}_r^T\|\|\hat{\mathbf{c}}_r\|}$

instead, where $$w_r = \|\mathbf{a}_r\| \|\mathbf{b}_r\| \|\mathbf{c}_r\|$$ and $$\hat{w}_r = \|\hat{\mathbf{a}}_r\| \|\hat{\mathbf{b}}_r\| \|\hat{\mathbf{c}}_r\|$$.

For both definitions above, there is a permutation determinacy. Two equivalent decompositions can have the same component vectors, but in a different order. To resolve this determinacy, we use linear sum assignment solver available in SciPy [Cro16] to efficiently find the optimal permutation.

Parameters:
cp_tensor1CPTensor or tuple

TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument

cp_tensor2CPTensor or tuple

TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument

consider_weightsbool (default=True)

If False, then the weight-penalty is used (second equation above).

skip_modeint or None (default=None)

Which mode to skip when computing the FMS. Useful if cross validation or split-half analysis is used.

return_permutationbool (default=False)

Whether or not to return the optimal permutation of the factors

absolute_valuebool (default=True)

If True, then only magnitude of the congruence is considered, not the sign.

allow_smaller_rankbool (default=False)

Only relevant if return_permutation=True. If True, then cp_tensor2 can have fewer components than cp_tensor2. Missing components are aligned with tlviz.factor_tools.tlviz.factor_tools.NO_COLUMN (a slice that slices nothing).

Returns:
fmsfloat

The factor match score

permutationlist[int | object] (only if return_permutation=True)

List of ints used to permute cp_tensor2 so its components optimally align with cp_tensor1. If the cp_tensor1 has a component with no corresponding component in cp_tensor2 (i.e. there are fewer components in cp_tensor2 than in cp_tensor1), then tlviz.factor_tools.NO_COLUMN (a slice that slices nothing) is used to indicate missing components.

Raises:
ValueError

If allow_smaller_rank=False and cp_tensor2 has fewer components than cp_tensor1.

Examples

>>> import numpy as np
>>> from tlviz.factor_tools import factor_match_score
>>> from tensorly.decomposition import parafac
>>> from tensorly.random import random_cp
>>> # Construct random cp tensor with TensorLy
>>> cp_tensor = random_cp(shape=(4,5,6), rank=3, random_state=42)
>>> X = cp_tensor.to_tensor()
>>> X_noisy = X + 0.05*np.random.RandomState(0).standard_normal(size=X.shape)
>>> # Decompose with TensorLy and compute FMS
>>> estimated_cp_tensor = parafac(X_noisy, rank=3, random_state=42)
>>> fms_with_weight_penalty = factor_match_score(cp_tensor, estimated_cp_tensor, consider_weights=True)
>>> print(f"Factor match score (with weight penalty): {fms_with_weight_penalty:.2f}")
Factor match score (with weight penalty): 0.95
>>> fms_without_weight_penalty = factor_match_score(cp_tensor, estimated_cp_tensor, consider_weights=False)
>>> print(f"Factor match score (without weight penalty): {fms_without_weight_penalty:.2f}")
Factor match score (without weight penalty): 0.99

tlviz.factor_tools.get_cp_permutation(cp_tensor, reference_cp_tensor=None, consider_weights=True, allow_smaller_rank=False)[source]

Find the optimal permutation between two CP tensors.

This function supports two ways of finding the permutation of a CP tensor: Aligning the components with those of a reference CP tensor (if reference_cp_tensor is not None), or finding the permutation so the components are in descending order with respect to their explained variation (if both reference_cp_tensor and permutation is None).

This function uses the factor match score to compute the optimal permutation between two CP tensors. This is useful for comparison purposes, as CP two identical CP tensors may have permuted columns.

Parameters:
cp_tensorCPTensor or tuple

TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument.

reference_cp_tensorCPTensor or tuple (optional)

TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument. The tensor that cp_tensor is aligned with. Either this or the permutation argument must be passed, not both.

consider_weightsbool

Whether to consider the factor weights when the factor match score is computed.

Returns:
tuple

The permutation to use when permuting cp_tensor.

tlviz.factor_tools.get_factor_matrix_permutation(factor_matrix1, factor_matrix2, ignore_sign=True, allow_smaller_rank=False)[source]

Find optimal permutation of the factor matrices

Efficient estimation of the optimal permutation for two factor matrices. To find the optimal permutation, $$\sigma$$, we solve the following optimisation problem:

$\max_\sigma \sum_{r} \frac{\left|\mathbf{a}_{r}^\mathsf{T}\hat{\mathbf{a}}_{\sigma(r)}\right|} {\|\mathbf{a}_{r}\| \|\hat{\mathbf{a}}_{\sigma(r)}\|}$

where $$\mathbf{a}_r$$ is the $$r$$-th component vector for the first factor matrix and $$\hat{\mathbf{a}}_{\sigma(r)}$$ is $$r$$-th component vector of the second factor matrix after permuting the columns.

Parameters:
factor_matrix1np.ndarray or pd.DataFrame

First factor matrix

factor_matrix2np.ndarray or pd.DataFrame

Second factor matrix

ignore_signbool

Whether to take the absolute value of the inner products before computing the permutation. This is usually done because of the sign indeterminacy of component models.

allow_smaller_rankbool (default=False)

If True, then the function can align a smaller matrix onto a larger one. Missing columns are aligned with tlviz.factor_tools.NO_COLUMN (a slice that slices nothing).

Returns:
permutationlist[int | slice]

List of ints used to permute factor_matrix2 so its columns optimally align with factor_matrix1. If the factor_matrix1 has a column with no corresponding column in factor_matrix2 (i.e. there are fewer columns in factor_matrix2 than in factor_matrix1), then tlviz.factor_tools.NO_COLUMN (a slice that slices nothing) is used to indicate missing columns.

Raises:
ValueError

If allow_smaller_rank=False and factor_matrix2 has fewer columns than factor_matrix1.

tlviz.factor_tools.normalise_cp_tensor(cp_tensor)[source]

Ensure that the all factor matrices have unit norm, and all weight is stored in the weight-vector

Parameters:
cp_tensorCPTensor or tuple

TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument.

Returns:
tuple

The scaled CP tensor.

tlviz.factor_tools.percentage_variation(cp_tensor, dataset=None, method='model')[source]

Compute the percentage of variation captured by each component.

The (possible) non-orthogonality of CP factor matrices makes it less straightforward to estimate the amount of variation captured by each component, compared to a model with orthogonal factors. To estimate the amount of variation captured by a single component, we therefore use the following formula:

$\text{fit}_i = \frac{\text{SS}_i}{SS_\mathbf{\mathcal{X}}}$

where $$\text{SS}_i$$ is the squared norm of the tensor constructed using only the i-th component, and $$SS_\mathbf{\mathcal{X}}$$ is the squared norm of the data tensor . If method="data", then $$SS_\mathbf{\mathcal{X}}$$ is the squared norm of the tensor constructed from the CP tensor using all factor matrices.

Parameters:
cp_tensorCPTensor or tuple

TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument

datasetnp.ndarray

Data tensor that the cp_tensor is fitted against

method{“data”, “model”, “both”} (default=”model”)

Which method to use for computing the fit.

Returns:
fitfloat or tuple

The fit (depending on the method). If method="both", then a tuple is returned where the first element is the fit computed against the data tensor and the second element is the fit computed against the model.

Examples

There are two ways of computing the percentage variation. One method is to divide by the variation in the data, giving us the percentage variation of the data captured by each component. This approach will not necessarily sum to 100 since

1. the model will not explain all the variation.

2. the components are likely not orthogonal

Alternatively, we can divide by the variation in the model, which will give us the contribution of each component to the model. However, this may also not sum to 100 since the components may not be orthogonal.

>>> from tlviz.data import simulated_random_cp_tensor
>>> from tlviz.factor_tools import percentage_variation
>>> cp_tensor, X = simulated_random_cp_tensor((30, 10, 10), 5, noise_level=0.3, seed=0)
>>> print(percentage_variation(cp_tensor).astype(int))
[11  2  0  0 39]
>>> print(percentage_variation(cp_tensor, X, method="data").astype(int))
[11  2  0  0 37]


We see that the variation captured for each component sums to 50 when we compare with the data and 52 when we compare with the model. These low numbers are because the components are not orthogonal, which means that the magnitude of the data is not equal to the sum of the magnitudes of each component. We can also compute the percentage variation with the model and the data simultaneously:

>>> percent_var_data, percent_var_model = percentage_variation(cp_tensor, X, method="both")
>>> print(percent_var_data.astype(int))
[11  2  0  0 37]
>>> print(percent_var_model.astype(int))
[11  2  0  0 39]


If noise level is 0, both methods should give the same variantion percentages:

>>> cp_tensor, X = simulated_random_cp_tensor((30, 10, 10), 5, noise_level=0.0, seed=1)
>>> percent_var_data, percent_var_model = percentage_variation(cp_tensor, X, method="both")
>>> print(percent_var_data.astype(int))
[ 3 11  0 34  1]
>>> print(f"Sum of variation: {percent_var_data.sum():.0f}")
Sum of variation: 51
>>> print(percent_var_model.astype(int))
[ 3 11  0 34  1]
>>> print(f"Sum of variation: {percent_var_model.sum():.0f}")
Sum of variation: 51

tlviz.factor_tools.permute_cp_tensor(cp_tensor, permutation=None, reference_cp_tensor=None, consider_weights=True, allow_smaller_rank=False)[source]

Permute the CP tensor

This function supports three ways of permuting a CP tensor: Aligning the components with those of a reference CP tensor (if reference_cp_tensor is not None), permuting the components according to a given permutation (if permutation is not None) or so the components are in descending order with respect to their explained variation (if both reference_cp_tensor and permutation is None).

This function uses the factor match score to compute the optimal permutation between two CP tensors. This is useful for comparison purposes, as CP two identical CP tensors may have permuted columns.

Parameters:
cp_tensorCPTensor or tuple

TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument.

permutationtuple (optional)

Tuple with the column permutations. Either this or the reference_cp_tensor argument must be passed, not both.

reference_cp_tensorCPTensor or tuple (optional)

TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument. The tensor that cp_tensor is aligned with. Either this or the permutation argument must be passed, not both.

consider_weightsbool

Whether to consider the factor weights when the factor match score is computed.

Returns:
tuple

Tuple representing cp_tensor optimally permuted.

Raises:
ValueError

If neither permutation nor reference_cp_tensor is provided

ValueError

If both permutation and reference_cp_tensor is provided