Factor tools¶
Functions:
|
Check if the factor matrices and weights are equal. |
|
Check if the decompositions are equivalent |
|
Check that all entries in a factor matrix are close, if labelled, then label equality is also checked. |
|
Check that all entries in a factor matrix are close, if labelled, then label equality is also checked. |
|
The average cosine similarity (Tucker congruence) with optimal column permutation. |
|
Compute the degeneracy score for a given decomposition. |
|
Utility to distribute the weights of a CP tensor. |
|
Ensure that the weight-vector consists of ones and all factor matrices have equal norm |
|
Normalise all factors and multiply the weights into one mode. |
|
Compute the factor match score between |
|
Find the optimal permutation between two CP tensors. |
|
Find optimal permutation of the factor matrices |
|
Ensure that the all factor matrices have unit norm, and all weight is stored in the weight-vector |
|
Compute the percentage of variation captured by each component. |
|
Permute the CP tensor |
- tlviz.factor_tools.check_cp_tensor_equal(cp_tensor1, cp_tensor2, ignore_labels=False)[source]¶
Check if the factor matrices and weights are equal.
This will check if the factor matrices and weights are exactly equal to one another. It will not check if the two decompositions are equivalent. For example, if
cp_tensor2
contain the same factors ascp_tensor1
, but permuted, or with the weights distributed differently between the modes, then this function will return False. To check for equivalence, usecheck_cp_tensors_equivalent
.- Parameters:
- cp_tensor1CPTensor or tuple
TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument
- cp_tensor2CPTensor or tuple
TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument
- ignore_labelsbool
If True, then labels (i.e. DataFrame column names and indices) can differ.
- Returns:
- bool
Whether the decompositions are equal.
See also
check_cp_tensors_equivalent
Function for checking if two CP tensors represent the same dense tensor.
Examples
check_cp_tensor_equal
checks for strict equality of the factor matrices and weights.>>> from tlviz.data import simulated_random_cp_tensor >>> from tlviz.factor_tools import check_cp_tensor_equal >>> cp_tensor, dataset = simulated_random_cp_tensor((10, 20, 30), 3, seed=0) >>> check_cp_tensor_equal(cp_tensor, cp_tensor) True
But it does not check the identity of the decompositions, only their numerical values
>>> cp_tensor2, dataset2 = simulated_random_cp_tensor((10, 20, 30), 3, seed=0) >>> check_cp_tensor_equal(cp_tensor, cp_tensor2) True
Normalising a
cp_tensor
changes its values, so then we do not have strict equality of the factor matrices, even though the decomposition is equivalent>>> from tlviz.factor_tools import normalise_cp_tensor >>> normalised_cp_tensor = normalise_cp_tensor(cp_tensor) >>> check_cp_tensor_equal(cp_tensor, normalised_cp_tensor) False
Permutations will also make the numerical values of the``cp_tensor`` change
>>> from tlviz.factor_tools import permute_cp_tensor >>> check_cp_tensor_equal(cp_tensor, permute_cp_tensor(cp_tensor, permutation=[1, 2, 0])) False
- tlviz.factor_tools.check_cp_tensors_equivalent(cp_tensor1, cp_tensor2, rtol=1e-05, atol=1e-08, ignore_labels=False)[source]¶
Check if the decompositions are equivalent
This will check if the factor matrices and weights are equivalent. That is if they represent the same tensor. This differs from checking equality in the sense that if
cp_tensor2
contain the same factors ascp_tensor1
, but permuted, or with the weights distributed differently between the modes, then they are not equal, but equivalent.- Parameters:
- cp_tensor1CPTensor or tuple
TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument
- cp_tensor2CPTensor or tuple
TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument
- rtolfloat
Relative tolerance (see
numpy.allclose
)- atolfloat
Absolute tolerance (see
numpy.allclose
)- ignore_labelsbool
If True, then labels (i.e. DataFrame column names and indices) can differ.
- Returns:
- bool
Whether the decompositions are equivalent.
See also
check_cp_tensor_equivalent
Function for checking if two CP tensors have the same numerical value (have equal weights and factor matrices)
Examples
check_cp_tensors_equivalent
checks if two CP tensors represent the same dense tensor>>> from tlviz.data import simulated_random_cp_tensor >>> from tlviz.factor_tools import check_cp_tensors_equivalent >>> cp_tensor, dataset = simulated_random_cp_tensor((10, 20, 30), 3, seed=0) >>> cp_tensor2, dataset2 = simulated_random_cp_tensor((10, 20, 30), 3, seed=0) >>> check_cp_tensors_equivalent(cp_tensor, cp_tensor2) True
Normalising a
cp_tensor
changes its values, but not which dense tensor it represents>>> from tlviz.factor_tools import normalise_cp_tensor >>> normalised_cp_tensor = normalise_cp_tensor(cp_tensor) >>> check_cp_tensors_equivalent(cp_tensor, normalised_cp_tensor) True
Permutations will also make the numerical values of the``cp_tensor`` change but not the dense tensor it represents
>>> from tlviz.factor_tools import permute_cp_tensor >>> check_cp_tensors_equivalent(cp_tensor, permute_cp_tensor(cp_tensor, permutation=[1, 2, 0])) True
- tlviz.factor_tools.check_factor_matrix_close(factor_matrix1, factor_matrix2, rtol=1e-05, atol=1e-08, ignore_labels=False)[source]¶
Check that all entries in a factor matrix are close, if labelled, then label equality is also checked.
This function is similar to
numpy.allclose
, but works on both labelled and unlabelled factor matrices. If the factor matrices are labelled, then the DataFrame index and columns are also compared (unlessignore_labels=True
).- Parameters:
- factor_matrix1numpy.ndarray or pandas.DataFrame
Labelled or unlabelled factor matrix
- cp_tensor2CPTensor or tuple
Labelled or unlabelled factor matrix
- rtolfloat
Relative tolerance (see
numpy.allclose
)- atolfloat
Absolute tolerance (see
numpy.allclose
)- ignore_labelsbool
If True, then labels (i.e. DataFrame column names and indices) can differ.
- Returns:
- bool
Whether the decompositions are equivalent.
Examples
check_factor_matrix_close
checks if two factor matrices are close up to round off errors.>>> from tlviz.data import simulated_random_cp_tensor >>> import numpy as np >>> A = np.arange(6).reshape(3, 2).astype(float) >>> B = A + 1e-10 >>> check_factor_matrix_close(A, B) True
If we make only one of them into a DataFrame, then the factor matrices are not close
>>> import pandas as pd >>> A_labelled = pd.DataFrame(A) >>> check_factor_matrix_close(A_labelled, B) False >>> check_factor_matrix_close(B, A_labelled) False
If we turn B into a DataFrame too, it passes again
>>> B_labelled = pd.DataFrame(A) >>> check_factor_matrix_close(A_labelled, B_labelled) True
The index is checked for equality, so if we change the index of
B_labelled
, then the factor matrices are not close>>> B_labelled.index += 1 >>> check_factor_matrix_close(A_labelled, B_labelled) False
However, we can disable checking the labels by using the
ignore_labels
argument>>> check_factor_matrix_close(A_labelled, B_labelled, ignore_labels=True) True
- tlviz.factor_tools.check_factor_matrix_equal(factor_matrix1, factor_matrix2, ignore_labels=False)[source]¶
Check that all entries in a factor matrix are close, if labelled, then label equality is also checked.
This function is similar to
numpy.allclose
, but works on both labelled and unlabelled factor matrices. If the factor matrices are labelled, then the DataFrame index and columns are also compared (unlessignore_labels=True
).- Parameters:
- factor_matrix1numpy.ndarray or pandas.DataFrame
Labelled or unlabelled factor matrix
- cp_tensor2CPTensor or tuple
Labelled or unlabelled factor matrix
- rtolfloat
Relative tolerance (see
numpy.allclose
)- atolfloat
Absolute tolerance (see
numpy.allclose
)- ignore_labelsbool
If True, then labels (i.e. DataFrame column names and indices) can differ.
- Returns:
- bool
Whether the decompositions are equivalent.
Examples
check_factor_matrix_equal
checks if two factor matrices are exactly the same.>>> from tlviz.data import simulated_random_cp_tensor >>> import numpy as np >>> A = np.arange(6).reshape(3, 2).astype(float) >>> B = A.copy() >>> check_factor_matrix_equal(A, B) True
If they are only the same up to round off errors, then this function returns
False
>>> check_factor_matrix_equal(A, B + 1e-10) False
If we make only one of them into a DataFrame, then the factor matrices are not equal
>>> import pandas as pd >>> A_labelled = pd.DataFrame(A) >>> check_factor_matrix_equal(A_labelled, B) False >>> check_factor_matrix_equal(B, A_labelled) False
If we turn B into a DataFrame too, it passes again
>>> B_labelled = pd.DataFrame(A) >>> check_factor_matrix_equal(A_labelled, B_labelled) True
The index is checked for equality, so if we change the index of
B_labelled
, then the factor matrices are not equal>>> B_labelled.index += 1 >>> check_factor_matrix_equal(A_labelled, B_labelled) False
However, we can disable checking the labels by using the
ignore_labels
argument>>> check_factor_matrix_equal(A_labelled, B_labelled, ignore_labels=True) True
- tlviz.factor_tools.cosine_similarity(factor_matrix1, factor_matrix2)[source]¶
The average cosine similarity (Tucker congruence) with optimal column permutation.
The cosine similarity between two vectors is computed as
\[\cos (\mathbf{x}, \mathbf{y}) = \frac{\mathbf{x}^\mathsf{T}}{\|\mathbf{x}\|}\frac{\mathbf{y}}{\|\mathbf{y}\|}\]This function returns the average cosine similarity between the columns vectors of the two factor matrices, using the optimal column permutation.
- Parameters:
- factor_matrix1np.ndarray or pd.DataFrame
First factor matrix
- factor_matrix2np.ndarray or pd.DataFrame
Second factor matrix
- Returns:
- float
The average cosine similarity.
- tlviz.factor_tools.degeneracy_score(cp_tensor)[source]¶
Compute the degeneracy score for a given decomposition.
PARAFAC models can be degenerate, which is a sign that we should be careful before interpreting that model. For a third order tensor, this generally manifests in a triple cosine of two components that approach -1. That is
\[\cos(\mathbf{a}_{r}, \mathbf{a}_{s}) \cos(\mathbf{b}_{r}, \mathbf{b}_{s}) \cos(\mathbf{c}_{r}, \mathbf{c}_{s}) \approx -1\]for some \(r \neq s\), where \(\mathbf{A}, \mathbf{B}\) and \(\mathbf{C}\) are factor matrices and
\[\cos(\mathbf{x}, \mathbf{y}) = \frac{\mathbf{x}^\mathsf{T} \mathbf{y}}{\|\mathbf{x}\| \|\mathbf{y}\|}.\]Furthermore, the magnitude of the degenerate components are unbounded and will approach infinity as the number of iterations increase.
Degenerate solutions typically signify that the decomposition is unreliable, and one should take care before interpreting the components. Degeneracy can, in fact, be a sign that the PARAFAC problem is ill-posed. There are certain tensors where there are no solutions to the least squares problem to needed to fit PARAFAC models. And in those cases, the “optimal” but unobtainable PARAFAC decomposition will have component vectors with infinite norm that point in opposite directions [KDS08].
There are several strategies to avoid degenerate solutions:
Fitting models with more random initialisations
Decreasing the convergence tolerance or increasing the number of iterations
Imposing non-negativity constraints in all modes
Imposing orthogonality constraints in at least one mode
Changing the number of components
Both non-negativity constraints and orthogonality constraints will remove the potential ill-posedness of the CP model. We can, in fact, not obtain degenerate solutions when we impose such constriants [KDS08]
To measure degeneracy, we compute the degeneracy score, which is the minimum triple cosine (for a third-order tensor). A score close to -1 signifies a degenerate solution. A score of -0.85 is an indication of a troublesome model [Kri93] (as cited in [Bro97]).
For more information about degeneracy for component models see [ZK02] and [Bro97].
Note
There are other kinds of degeneracies too. For example three-component degeneracies, which manifests as two components of increasing magnitude and one other component equal to the negative sum of the former two [Paa00, Ste06]. However, it is the two-component degeneracy that is most commonly discussed in the litterature [Bro97, KDS08, ZK02]. Still, if three or more components display weights that have a much higher magnitude than the data, there is a reason to be concerned.
- Parameters:
- cp_tensorCPTensor or tuple
TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument.
- Returns:
Examples
We begin by constructing a random simulated cp tensor and compute the degeneracy score
>>> from tlviz.data import simulated_random_cp_tensor >>> from tlviz.factor_tools import degeneracy_score >>> cp_tensor = simulated_random_cp_tensor((10, 11, 12), rank=3, seed=0)[0] >>> print(f"Degeneracy score: {degeneracy_score(cp_tensor):.2f}") Degeneracy score: 0.35
We see that (as expected) the random cp_tensor is not very degenerate. To simulate a tensor with two-component degeneracy, we can, for example, replace one of the components with a flipped copy of another component
>>> w, (A, B, C) = cp_tensor >>> A[:,1] = -A[:, 0] >>> B[:,1] = -B[:, 0] >>> C[:,1] = -C[:, 0] >>> print(f"Degeneracy score: {degeneracy_score(cp_tensor):.2f}") Degeneracy score: -1.00
We see that this modified cp_tensor is degenerate.
- tlviz.factor_tools.distribute_weights(cp_tensor, weight_behaviour, weight_mode=0)[source]¶
Utility to distribute the weights of a CP tensor.
- Parameters:
- cp_tensorCPTensor or tuple
TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument.
- weight_behaviour{“ignore”, “normalise”, “evenly”, “one_mode”} (default=”normalise”)
How to handle the component weights.
"ignore"
- Do nothing"normalise"
- Normalise all factor matrices"evenly"
- All factor matrices have equal norm"one_mode"
- The weight is allocated in one mode, all other factor matrices have unit norm columns.
- weight_modeint (optional)
Which mode to have the component weights in (only used if
weight_behaviour="one_mode"
)
- Returns:
- tuple
The scaled CP tensor.
- Raises:
- ValueError
If
weight_behaviour
is not one of"ignore"
,"normalise"
,"evenly"
or"one_mode"
.
See also
normalise_cp_tensor
Give all component vectors unit norm
distribute_weights_evenly
Give all component vectors the same norm and set the weight-array to one.
distribute_weights_in_one_mode
Keep all the weights in one factor matrix and set the weight-array to one.
- tlviz.factor_tools.distribute_weights_evenly(cp_tensor)[source]¶
Ensure that the weight-vector consists of ones and all factor matrices have equal norm
- Parameters:
- cp_tensorCPTensor or tuple
TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument.
- Returns:
- tuple
The scaled CP tensor.
- tlviz.factor_tools.distribute_weights_in_one_mode(cp_tensor, mode, axis=None)[source]¶
Normalise all factors and multiply the weights into one mode.
The CP tensor is scaled so all factor matrices except one have unit norm columns and the weight-vector contains only ones.
- Parameters:
- cp_tensorCPTensor or tuple
TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument.
- modeint
Which mode (axis) to store the weights in
- axisint (optional)
Alias for mode. If this is set, then no value is needed for mode
- Returns:
- tuple
The scaled CP tensor.
- tlviz.factor_tools.factor_match_score(cp_tensor1, cp_tensor2, consider_weights=True, skip_mode=None, return_permutation=False, absolute_value=True, allow_smaller_rank=False)[source]¶
Compute the factor match score between
cp_tensor1
andcp_tensor2
.The factor match score is used to measure the similarity between two sets of components. There are many definitions of the FMS, but one common definition for third order tensors is given by:
\[\sum_{r=1}^R \frac{\mathbf{a}_r^T \hat{\mathbf{a}}_r}{\|\mathbf{a}_r^T\| \|\hat{\mathbf{a}}_r\|} \frac{\mathbf{b}_r^T \hat{\mathbf{b}}_r}{\|\mathbf{b}_r^T\| \|\hat{\mathbf{b}}_r\|} \frac{\mathbf{c}_r^T \hat{\mathbf{c}}_r}{\|\mathbf{c}_r^T\| \|\hat{\mathbf{c}}_r\|},\]where \(\mathbf{a}, \mathbf{b}\) and \(\mathbf{c}\) are the component vectors for one of the decompositions and \(\hat{\mathbf{a}}, \hat{\mathbf{b}}\) and \(\hat{\mathbf{c}}\) are the component vectors for the other decomposition. Often, the absolute value of the inner products is used instead of just the inner products (i.e. \(|\mathbf{a}_r^T \hat{\mathbf{a}}_r|\)).
The above definition does not take the norm of the component vectors into account. However, sometimes, we also wish to compare their norm. In that case, set the
consider_weights
argument toTrue
to compute\[\sum_{r=1}^R \left(1 - \frac{w_r \hat{w}_r}{\max\left( w_r \hat{w}_r \right)}\right) \frac{\mathbf{a}_r^T \hat{\mathbf{a}}_r}{\|\mathbf{a}_r^T\|\|\hat{\mathbf{a}}_r\|} \frac{\mathbf{b}_r^T \hat{\mathbf{b}}_r}{\|\mathbf{b}_r^T\|\|\hat{\mathbf{b}}_r\|} \frac{\mathbf{c}_r^T \hat{\mathbf{c}}_r}{\|\mathbf{c}_r^T\|\|\hat{\mathbf{c}}_r\|}\]instead, where \(w_r = \|\mathbf{a}_r\| \|\mathbf{b}_r\| \|\mathbf{c}_r\|\) and \(\hat{w}_r = \|\hat{\mathbf{a}}_r\| \|\hat{\mathbf{b}}_r\| \|\hat{\mathbf{c}}_r\|\).
For both definitions above, there is a permutation determinacy. Two equivalent decompositions can have the same component vectors, but in a different order. To resolve this determinacy, we use linear sum assignment solver available in SciPy [Cro16] to efficiently find the optimal permutation.
- Parameters:
- cp_tensor1CPTensor or tuple
TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument
- cp_tensor2CPTensor or tuple
TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument
- consider_weightsbool (default=True)
If False, then the weight-penalty is used (second equation above).
- skip_modeint or None (default=None)
Which mode to skip when computing the FMS. Useful if cross validation or split-half analysis is used.
- return_permutationbool (default=False)
Whether or not to return the optimal permutation of the factors
- absolute_valuebool (default=True)
If True, then only magnitude of the congruence is considered, not the sign.
- allow_smaller_rankbool (default=False)
Only relevant if
return_permutation=True
. IfTrue
, thencp_tensor2
can have fewer components thancp_tensor2
. Missing components are aligned withtlviz.factor_tools.tlviz.factor_tools.NO_COLUMN
(a slice that slices nothing).
- Returns:
- fmsfloat
The factor match score
- permutationlist[int | object] (only if return_permutation=True)
List of ints used to permute
cp_tensor2
so its components optimally align withcp_tensor1
. If thecp_tensor1
has a component with no corresponding component incp_tensor2
(i.e. there are fewer components incp_tensor2
than incp_tensor1
), thentlviz.factor_tools.NO_COLUMN
(a slice that slices nothing) is used to indicate missing components.
- Raises:
- ValueError
If
allow_smaller_rank=False
andcp_tensor2
has fewer components thancp_tensor1
.
Examples
>>> import numpy as np >>> from tlviz.factor_tools import factor_match_score >>> from tensorly.decomposition import parafac >>> from tensorly.random import random_cp >>> # Construct random cp tensor with TensorLy >>> cp_tensor = random_cp(shape=(4,5,6), rank=3, random_state=42) >>> X = cp_tensor.to_tensor() >>> # Add noise >>> X_noisy = X + 0.05*np.random.RandomState(0).standard_normal(size=X.shape) >>> # Decompose with TensorLy and compute FMS >>> estimated_cp_tensor = parafac(X_noisy, rank=3, random_state=42) >>> fms_with_weight_penalty = factor_match_score(cp_tensor, estimated_cp_tensor, consider_weights=True) >>> print(f"Factor match score (with weight penalty): {fms_with_weight_penalty:.2f}") Factor match score (with weight penalty): 0.95 >>> fms_without_weight_penalty = factor_match_score(cp_tensor, estimated_cp_tensor, consider_weights=False) >>> print(f"Factor match score (without weight penalty): {fms_without_weight_penalty:.2f}") Factor match score (without weight penalty): 0.99
- tlviz.factor_tools.get_cp_permutation(cp_tensor, reference_cp_tensor=None, consider_weights=True, allow_smaller_rank=False)[source]¶
Find the optimal permutation between two CP tensors.
This function supports two ways of finding the permutation of a CP tensor: Aligning the components with those of a reference CP tensor (if
reference_cp_tensor
is notNone
), or finding the permutation so the components are in descending order with respect to their explained variation (if bothreference_cp_tensor
andpermutation
isNone
).This function uses the factor match score to compute the optimal permutation between two CP tensors. This is useful for comparison purposes, as CP two identical CP tensors may have permuted columns.
- Parameters:
- cp_tensorCPTensor or tuple
TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument.
- reference_cp_tensorCPTensor or tuple (optional)
TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument. The tensor that
cp_tensor
is aligned with. Either this or thepermutation
argument must be passed, not both.- consider_weightsbool
Whether to consider the factor weights when the factor match score is computed.
- Returns:
- tuple
The permutation to use when permuting
cp_tensor
.
- tlviz.factor_tools.get_factor_matrix_permutation(factor_matrix1, factor_matrix2, ignore_sign=True, allow_smaller_rank=False)[source]¶
Find optimal permutation of the factor matrices
Efficient estimation of the optimal permutation for two factor matrices. To find the optimal permutation, \(\sigma\), we solve the following optimisation problem:
\[\max_\sigma \sum_{r} \frac{\left|\mathbf{a}_{r}^\mathsf{T}\hat{\mathbf{a}}_{\sigma(r)}\right|} {\|\mathbf{a}_{r}\| \|\hat{\mathbf{a}}_{\sigma(r)}\|}\]where \(\mathbf{a}_r\) is the \(r\)-th component vector for the first factor matrix and \(\hat{\mathbf{a}}_{\sigma(r)}\) is \(r\)-th component vector of the second factor matrix after permuting the columns.
- Parameters:
- factor_matrix1np.ndarray or pd.DataFrame
First factor matrix
- factor_matrix2np.ndarray or pd.DataFrame
Second factor matrix
- ignore_signbool
Whether to take the absolute value of the inner products before computing the permutation. This is usually done because of the sign indeterminacy of component models.
- allow_smaller_rankbool (default=False)
If
True
, then the function can align a smaller matrix onto a larger one. Missing columns are aligned withtlviz.factor_tools.NO_COLUMN
(a slice that slices nothing).
- Returns:
- permutationlist[int | slice]
List of ints used to permute
factor_matrix2
so its columns optimally align withfactor_matrix1
. If thefactor_matrix1
has a column with no corresponding column infactor_matrix2
(i.e. there are fewer columns infactor_matrix2
than infactor_matrix1
), thentlviz.factor_tools.NO_COLUMN
(a slice that slices nothing) is used to indicate missing columns.
- Raises:
- ValueError
If
allow_smaller_rank=False
andfactor_matrix2
has fewer columns thanfactor_matrix1
.
- tlviz.factor_tools.normalise_cp_tensor(cp_tensor)[source]¶
Ensure that the all factor matrices have unit norm, and all weight is stored in the weight-vector
- Parameters:
- cp_tensorCPTensor or tuple
TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument.
- Returns:
- tuple
The scaled CP tensor.
- tlviz.factor_tools.percentage_variation(cp_tensor, dataset=None, method='model')[source]¶
Compute the percentage of variation captured by each component.
The (possible) non-orthogonality of CP factor matrices makes it less straightforward to estimate the amount of variation captured by each component, compared to a model with orthogonal factors. To estimate the amount of variation captured by a single component, we therefore use the following formula:
\[\text{fit}_i = \frac{\text{SS}_i}{SS_\mathbf{\mathcal{X}}}\]where \(\text{SS}_i\) is the squared norm of the tensor constructed using only the i-th component, and \(SS_\mathbf{\mathcal{X}}\) is the squared norm of the data tensor [EigenvectorResearch11]. If
method="data"
, then \(SS_\mathbf{\mathcal{X}}\) is the squared norm of the tensor constructed from the CP tensor using all factor matrices.- Parameters:
- cp_tensorCPTensor or tuple
TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument
- datasetnp.ndarray
Data tensor that the cp_tensor is fitted against
- method{“data”, “model”, “both”} (default=”model”)
Which method to use for computing the fit.
- Returns:
- fitfloat or tuple
The fit (depending on the method). If
method="both"
, then a tuple is returned where the first element is the fit computed against the data tensor and the second element is the fit computed against the model.
Examples
There are two ways of computing the percentage variation. One method is to divide by the variation in the data, giving us the percentage variation of the data captured by each component. This approach will not necessarily sum to 100 since
the model will not explain all the variation.
the components are likely not orthogonal
Alternatively, we can divide by the variation in the model, which will give us the contribution of each component to the model. However, this may also not sum to 100 since the components may not be orthogonal.
>>> from tlviz.data import simulated_random_cp_tensor >>> from tlviz.factor_tools import percentage_variation >>> cp_tensor, X = simulated_random_cp_tensor((30, 10, 10), 5, noise_level=0.3, seed=0) >>> print(percentage_variation(cp_tensor).astype(int)) [11 2 0 0 39] >>> print(percentage_variation(cp_tensor, X, method="data").astype(int)) [11 2 0 0 37]
We see that the variation captured for each component sums to 50 when we compare with the data and 52 when we compare with the model. These low numbers are because the components are not orthogonal, which means that the magnitude of the data is not equal to the sum of the magnitudes of each component. We can also compute the percentage variation with the model and the data simultaneously:
>>> percent_var_data, percent_var_model = percentage_variation(cp_tensor, X, method="both") >>> print(percent_var_data.astype(int)) [11 2 0 0 37] >>> print(percent_var_model.astype(int)) [11 2 0 0 39]
If noise level is 0, both methods should give the same variantion percentages:
>>> cp_tensor, X = simulated_random_cp_tensor((30, 10, 10), 5, noise_level=0.0, seed=1) >>> percent_var_data, percent_var_model = percentage_variation(cp_tensor, X, method="both") >>> print(percent_var_data.astype(int)) [ 3 11 0 34 1] >>> print(f"Sum of variation: {percent_var_data.sum():.0f}") Sum of variation: 51 >>> print(percent_var_model.astype(int)) [ 3 11 0 34 1] >>> print(f"Sum of variation: {percent_var_model.sum():.0f}") Sum of variation: 51
- tlviz.factor_tools.permute_cp_tensor(cp_tensor, permutation=None, reference_cp_tensor=None, consider_weights=True, allow_smaller_rank=False)[source]¶
Permute the CP tensor
This function supports three ways of permuting a CP tensor: Aligning the components with those of a reference CP tensor (if
reference_cp_tensor
is notNone
), permuting the components according to a given permutation (ifpermutation
is notNone
) or so the components are in descending order with respect to their explained variation (if bothreference_cp_tensor
andpermutation
isNone
).This function uses the factor match score to compute the optimal permutation between two CP tensors. This is useful for comparison purposes, as CP two identical CP tensors may have permuted columns.
- Parameters:
- cp_tensorCPTensor or tuple
TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument.
- permutationtuple (optional)
Tuple with the column permutations. Either this or the
reference_cp_tensor
argument must be passed, not both.- reference_cp_tensorCPTensor or tuple (optional)
TensorLy-style CPTensor object or tuple with weights as first argument and a tuple of components as second argument. The tensor that
cp_tensor
is aligned with. Either this or thepermutation
argument must be passed, not both.- consider_weightsbool
Whether to consider the factor weights when the factor match score is computed.
- Returns:
- tuple
Tuple representing
cp_tensor
optimally permuted.
- Raises:
- ValueError
If neither
permutation
norreference_cp_tensor
is provided- ValueError
If both
permutation
andreference_cp_tensor
is provided