cellmaps_coembedding package

Subpackages

Submodules

cellmaps_coembedding.cellmaps_coembeddingcmd module

cellmaps_coembedding.cellmaps_coembeddingcmd.main(args)[source]

Main entry point for program

Parameters:

args (list) – arguments passed to command line usually sys.argv[1:]()

Returns:

return value of cellmaps_coembedding.runner.CellmapsCoEmbedder.run() or 2 if an exception is raised

Return type:

int

cellmaps_coembedding.exceptions module

exception cellmaps_coembedding.exceptions.CellmapsCoEmbeddingError[source]

Bases: Exception

Base exception for cellmaps_coembedding

cellmaps_coembedding.runner module

class cellmaps_coembedding.runner.AutoCoEmbeddingGenerator(dimensions=128, outdir=None, embeddings=None, ppi_embeddingdir=None, image_embeddingdir=None, embedding_names=None, jackknife_percent=0.0, n_epochs=100, save_update_epochs=True, batch_size=16, triplet_margin=0.2, dropout=0.5, l2_norm=False, mean_losses=False)[source]

Bases: ProteinGPSCoEmbeddingGenerator

Generates co-embedding using proteingps

Deprecated since version 1.0.0: The embedding was renamed to proteingps. This class is now called ProteinGPSCoEmbeddingGenerator.

Initializes the ProteinGPSCoEmbeddingGenerator.

Parameters:
  • dimensions – The dimensionality of the embedding space (default: 128).

  • outdir – The output directory where embeddings should be saved.

  • embeddings – Embedding data.

  • ppi_embeddingdir – Directory containing protein-protein interaction embeddings.

  • image_embeddingdir – Directory containing image embeddings.

  • embedding_names – List of names corresponding to each type of embedding provided.

  • jackknife_percent – Percentage of data to withhold from training as a method of resampling (default: 0).

  • n_epochs – Number of epochs for which the model trains (default: 250).

  • save_update_epochs – Boolean indicating whether to save embeddings at regular epoch intervals.

  • batch_size – Number of samples per batch during training (default: 16).

  • triplet_margin – The margin value for the triplet loss during training (default: 1.0).

  • dropout – The dropout rate between layers in the neural network (default: 0).

  • l2_norm – If true, L2 normalize coembeddings

class cellmaps_coembedding.runner.CellmapsCoEmbedder(outdir=None, inputdirs=None, embedding_generator=None, name=None, organization_name=None, project_name=None, provenance_utils=<cellmaps_utils.provenance.ProvenanceUtil object>, skip_logging=True, input_data_dict=None, provenance=None)[source]

Bases: object

Class to run algorithm

Constructor

Parameters:
  • outdir (str) – Directory to write the results of this tool

  • inputdir (str) – Output directory where embeddings to be coembedded are located (output of cellmaps_image_embedding and cellmaps_ppi_embedding)

  • embedding_generator

  • skip_logging (bool) – If True skip logging, if None or False do NOT skip logging

  • name (str)

  • organization_name (str)

  • project_name (str)

  • input_data_dict (dict)

generate_readme()[source]
get_coembedding_file()[source]

Gets image embedding file :return:

run()[source]

Runs CM4AI Generate COEMBEDDINGS

Returns:

class cellmaps_coembedding.runner.EmbeddingGenerator(dimensions=128, ppi_embeddingdir=None, image_embeddingdir=None, embeddings=None, embedding_names=None)[source]

Bases: object

Base class for implementations that generate network embeddings

Constructor

DROPOUT = 0.5
JACKKNIFE_PERCENT = 0.0
LATENT_DIMENSIONS = 128
N_EPOCHS = 100
get_dimensions()[source]

Gets number of dimensions this embedding will generate

Returns:

number of dimensions aka vector length

Return type:

int

get_embedding_inputdirs()[source]

Determines the input directories for embeddings by extracting the directory path from each embedding file path. If the path is already a directory, it’s returned as is.

Returns:

A list of directory paths for each embedding, derived from the embedding file paths.

Return type:

list

get_next_embedding()[source]

Generator method for getting next embedding. Caller should implement with yield operator

Raises:

NotImplementedError: Subclasses should implement this

Returns:

Embedding

Return type:

list

class cellmaps_coembedding.runner.FakeCoEmbeddingGenerator(dimensions=128, ppi_embeddingdir=None, image_embeddingdir=None, embeddings=None, embedding_names=None)[source]

Bases: EmbeddingGenerator

Generates a fake coembedding for intersection of embedding dirs

Constructor :param dimensions:

get_next_embedding()[source]

Gets next embedding

Returns:

class cellmaps_coembedding.runner.MuseCoEmbeddingGenerator(dimensions=128, k=10, triplet_margin=0.1, dropout=0.5, n_epochs=100, n_epochs_init=100, outdir=None, embeddings=None, ppi_embeddingdir=None, image_embeddingdir=None, embedding_names=None, jackknife_percent=0.0)[source]

Bases: EmbeddingGenerator

Generats co-embedding using MUSE

Parameters:
  • dimensions

  • k – k nearest neighbors value used for clustering - clustering used for triplet loss

  • triplet_margin – margin for triplet loss

  • dropout – dropout between neural net layers

  • n_epochs – training epochs

  • n_epochs_init – initialization training epochs

  • outdir

  • ppi_embeddingdir

  • image_embeddingdir

  • jackknife_percent – percent of data to withhold from training

N_EPOCHS_INIT = 100
get_next_embedding()[source]
Returns:

class cellmaps_coembedding.runner.ProteinGPSCoEmbeddingGenerator(dimensions=128, outdir=None, embeddings=None, ppi_embeddingdir=None, image_embeddingdir=None, embedding_names=None, jackknife_percent=0.0, n_epochs=100, save_update_epochs=True, batch_size=16, triplet_margin=0.2, dropout=0.5, l2_norm=False, mean_losses=False)[source]

Bases: EmbeddingGenerator

Generates co-embedding using proteingps

Initializes the ProteinGPSCoEmbeddingGenerator.

Parameters:
  • dimensions – The dimensionality of the embedding space (default: 128).

  • outdir – The output directory where embeddings should be saved.

  • embeddings – Embedding data.

  • ppi_embeddingdir – Directory containing protein-protein interaction embeddings.

  • image_embeddingdir – Directory containing image embeddings.

  • embedding_names – List of names corresponding to each type of embedding provided.

  • jackknife_percent – Percentage of data to withhold from training as a method of resampling (default: 0).

  • n_epochs – Number of epochs for which the model trains (default: 250).

  • save_update_epochs – Boolean indicating whether to save embeddings at regular epoch intervals.

  • batch_size – Number of samples per batch during training (default: 16).

  • triplet_margin – The margin value for the triplet loss during training (default: 1.0).

  • dropout – The dropout rate between layers in the neural network (default: 0).

  • l2_norm – If true, L2 normalize coembeddings

get_next_embedding()[source]

Iteratively generates embeddings by fitting the proteingps to the current data set.

Returns:

Yields the next embedding, produced by the proteingps embedder’s fit_predict method.

cellmaps_coembedding.utils module

Module contents

Top-level package for CM4AI Generate PPI.