tlda
.TLDA
- class tlda.TLDA(n_topic, alpha_0, n_iter_train, n_iter_test, learning_rate, pca_batch_size=10000, third_order_cumulant_batch=1000, gamma_shape=1.0, smoothing=1e-06, theta=1, ortho_loss_criterion=1000, n_eigenvec=None, random_seed=None)[source]
Class to learn topic-word distribution from a corpus of documents
- Attributes:
unwhitened_factors
Unwhitened learned factors of shape (n_topic, vocabulary_size)
Methods
fit
(X[, order])Compute the word-topic distribution for the entire dataset at once.
partial_fit
(X_batch, batch_index[, save_folder])Update the word-topic distribution using a batch of documents.
partial_fit_online
(X_batch)Update the word-topic distribution using a batch of documents in a fully online version.
transform
([X, predict])Transform the document-word matrix of a set of documents into a word-topic distribution and topic-distribution when predict=True.
- fit(X, order=None)[source]
Compute the word-topic distribution for the entire dataset at once. Assumes that the whole dataset and the tensors required to compute its word-topic distribution fit in memory.
- Parameters:
- X: tensor of size (self.n_documents , self.vocab) all documents used to fit the word-topic distribution
- partial_fit(X_batch, batch_index, save_folder=None)[source]
Update the word-topic distribution using a batch of documents. For a given batch, the first and second order cumulants need to be fit once, but the third order cumulant should be fit many times.
- Parameters:
- X_batchtensor of shape (batch_size, self.vocab)
- batch_indexint
index of the current batch. This is used to know whether to update the first and second moment or just whiten
- save_folderstr, default is None
Folder in which to store the whitened batches. If None, the whitened batches will be recomputed at each iteration instead of being catched.
- partial_fit_online(X_batch)[source]
Update the word-topic distribution using a batch of documents in a fully online version. Meant for very large datasets, since we only do one gradient update for each batch in the third order cumulant calculation.
- Parameters:
- X_batchtensor of shape (batch_size, self.vocab)
- property unwhitened_factors
Unwhitened learned factors of shape (n_topic, vocabulary_size)
On the first call, this will compute and store the unwhitened factors. Subsequent calls will simply return the stored value.
- transform(X=None, predict=False)[source]
Transform the document-word matrix of a set of documents into a word-topic distribution and topic-distribution when predict=True.
- Parameters:
- Xtensor of shape (n_documents , self.vocab)
set of documetns to predict topic distribution
- predictindicate whether to return topic-document distribution and word-topic distribution or just word-topic distribution.