tensorly.preprocessing
.svd_compress_tensor_slices
- svd_compress_tensor_slices(tensor_slices, compression_threshold=0.0, max_rank=None, svd='truncated_svd')[source]
Compress data with the SVD for running PARAFAC2.
PARAFAC2 can be sped up massively for data where the number of rows in the tensor slices is much greater than their rank. In that case, we can compress the data by computing the SVD and fitting the PARAFAC2 model to the right singular vectors multiplied by the singular values. Then, we can “decompress” the decomposition by left-multiplying the \(B_i\)-matrices by the left singular values to get a decomposition as if it was fitted to the uncompressed data. We can essentially think of this as running a PCA without centering the data for each tensor slice and fitting the PARAFAC2 model to the scores. Then, to get back the components, we left-multiply the \(B_i\)-matrices with the loading matrices.
[1] states that we can constrain our \(B_i\)-matrices to lie in a given vector space, \(\mathscr{V}_i\) by multiplying the data matrices with an orthogonal basis matrix that spans \(\mathscr{V}_i\). However, since we know that \(B_i\) lie in the column space of \(X_i\), we can multiply the \(X_i\)-matrices by an orthogonal matrix that spans \(\text{col}(X_i)\) without affecting the fit of the model. Thus we can compress our data prior to fitting the PARAFAC2 model whenever the number of rows in our data matrices exceeds the number of columns (as the rank of \(\text{col}(X_i)\) cannot exceed the number of rows).
To implement this, we use the SVD to get an orthogonal basis for the column space of \(X_i\). Moreover, since \(S_i V_i^T = U_i^T X_i\), we can skip an additional matrix multiplication by fitting the model to \(S_i V_i^T\).
Finally, we note that this approach can also be implemented by truncating the SVD. If an appropriate threshold is set, this will not affect the fitted model in any major form.
Note
This can be thought of as a simplified version of the DPAR approach for compressing PARAFAC2 models [2], which compresses all modes of \(\mathcal{X}\) to fit an approximate PARAFAC2 model.
- Parameters:
- tensor_sliceslist of matrices
The data matrices to compress.
- compression_thresholdfloat (0 <= compression_threshold <= 1)
Threshold at which the singular values should be truncated. Any singular value less than compression_threshold * s[0] is set to zero. Note that if this is nonzero, then the found components will likely be affected.
- max_rankint
The maximum rank to allow in the datasets after compression. This also serves to speed up the SVD calculation with matrices containing many rows and columns when paired with randomized SVD solving.
- svdstr, default is ‘truncated_svd’
Function to use to compute the SVD, acceptable values in tensorly.SVD_FUNS
- Returns:
- list of matrices
The score matrices, used to fit the PARAFAC2 model to.
- list of matrices
The loading matrices, used to decompress the PARAFAC2 components after fitting to the scores.
References
[1]Helwig, N. E. (2017). Estimating latent trends in multivariate longitudinal data via Parafac2 with functional and structural constraints. Biometrical Journal, 59(4), 783-803. doi: 10.1002/bimj.201600045
[2]Jang JG, Kang U. Dpar2: Fast and scalable parafac2 decomposition for irregular dense tensors. 38th International Conference on Data Engineering (ICDE) 2022 May 9 (pp. 2454-2467). IEEE.