nmf_models.nmf_models_mod_updates.intNMF

class intNMF(n_topics, epochs=15, init='random', mod1_skew=0.5, seed=None, reg=None)[source]

Bases: object

Class to run int NMF on multiome data

theta

cell x topic matrix (joint low dimensional embedding)

Type:

array-like

phi_rna

topic x gene matrix. Gives the loading matrix to define topics.

Type:

array-like

phi_atac

topic x region matrix. Gives the loading matrix to define topics.

Type:

array-like

loss

l2 norm of the reconstruction error i.e. ||X_rna - WH_rna||_2 + ||X_atac - WH_atac||_2

Type:

float

loss_atac

l2 norm for the reconstruction error of just the atac matrix i.e. ||X_atac - WH_atac||_2

Type:

float

loss_rna

l2 norm for the reconstruction error of just the rna matrix i.e. ||X_rna - WH_rna||_2

Type:

float

Parameters:
  • n_topics (int) – The number of latent topics

  • epochs (int) – Number of interations during optimisation. Defaults to 15.

  • init (string) – Method of initialising W and H. Defaults to random.

  • mod1_skew (float) – Relative weighting of two modalities between 0-1. Defaults to 0.5.

  • reg (string) – Include l1 or l2 regularisation or not. (This is TODO). Default None

  • seed (int) – Random seed to use. Defaults to None ,i.e., no control of random seed (Useful for reproducability when using random initilisation)

Methods

fit

Optimise NMF.

fit(rna_mat, atac_mat, rna_names=None, atac_names=None)[source]

Optimise NMF.

Uses accelerated Hierarchical alternating least squares algorithm proposed here, but modified to joint factorise two matrices. https://arxiv.org/pdf/1107.5194.pdf. Only required arguments are the matrices to use for factorisation. GEX and ATAC matrices are expected in cell by feature format. Matrices should be scipy sparse matrices. min ||X_rna - (theta . phi_rna)||_2 and min ||X_atac - (theta . phi_atac)||_2 s.t. theta, phi_rna, phi_atac > 0. So that theta hold the latent topic scores for a cell. And phi the allows recontruction of X

Parameters:
  • rna_mat (scipy sparse matrix (or coercible) of single cell gene expression) –

  • atac_mat (scipy sparse matrix (or coercible) of single cell gene expression) –

  • rna_names (Optional list of gene names must be same length as columns in rna_mat) –

  • atac_names (Optional list. Must be the same length as columns in atac_mat) –

Returns:

Access low dim embed with self.theta or the loadings with self.phi_rna or self.phi_atac

Return type:

self