`tlda`.TLDA

class tlda.TLDA(n_topic, alpha_0, n_iter_train, n_iter_test, learning_rate, pca_batch_size=10000, third_order_cumulant_batch=1000, gamma_shape=1.0, smoothing=1e-06, theta=1, ortho_loss_criterion=1000, n_eigenvec=None, random_seed=None)[source]

Class to learn topic-word distribution from a corpus of documents

Attributes:

unwhitened_factors: Unwhitened learned factors of shape (n_topic, vocabulary_size)

Methods

`fit`(X[, order])	Compute the word-topic distribution for the entire dataset at once.
`partial_fit`(X_batch, batch_index[, save_folder])	Update the word-topic distribution using a batch of documents.
`partial_fit_online`(X_batch)	Update the word-topic distribution using a batch of documents in a fully online version.
`transform`([X, predict])	Transform the document-word matrix of a set of documents into a word-topic distribution and topic-distribution when predict=True.

fit(X, order=None)[source]

Compute the word-topic distribution for the entire dataset at once. Assumes that the whole dataset and the tensors required to compute its word-topic distribution fit in memory.

Parameters:

X: tensor of size (self.n_documents , self.vocab) all documents used to fit the word-topic distribution

partial_fit(X_batch, batch_index, save_folder=None)[source]

Update the word-topic distribution using a batch of documents. For a given batch, the first and second order cumulants need to be fit once, but the third order cumulant should be fit many times.

Parameters:

X_batchtensor of shape (batch_size, self.vocab)
batch_indexint: index of the current batch. This is used to know whether to update the first and second moment or just whiten
save_folderstr, default is None: Folder in which to store the whitened batches. If None, the whitened batches will be recomputed at each iteration instead of being catched.

partial_fit_online(X_batch)[source]

Update the word-topic distribution using a batch of documents in a fully online version. Meant for very large datasets, since we only do one gradient update for each batch in the third order cumulant calculation.

Parameters:

X_batchtensor of shape (batch_size, self.vocab)

property unwhitened_factors

Unwhitened learned factors of shape (n_topic, vocabulary_size)

On the first call, this will compute and store the unwhitened factors. Subsequent calls will simply return the stored value.

transform(X=None, predict=False)[source]

Transform the document-word matrix of a set of documents into a word-topic distribution and topic-distribution when predict=True.

Parameters:

Xtensor of shape (n_documents , self.vocab): set of documetns to predict topic distribution
predictindicate whether to return topic-document distribution and word-topic distribution or just word-topic distribution.