Usage

The cellmaps_coembedding tool takes image and Protein-Protein Interaction (PPI) embeddings and generates co-embedding. The embeddings can be generated by cellmaps_image_embedding and cellmaps_ppi_embedding packages.

In a project

To use cellmaps_coembedding in a project:

import cellmaps_coembedding

Needed files

The output directories for the image embeddings (see Cell Maps Image Embedding) and protein-protein interaction network embeddings (see Cell Maps PPI Embedding) are required.

On the command line

For information invoke cellmaps_coembeddingcmd.py -h

Usage

cellmaps_coembeddingcmd.py [outdir] [--embeddings EMBEDDING_DIR [EMBEDDING_DIR2 ...]] [OPTIONS]

Arguments

  • outdir

    The directory where the output will be written to.

Required (choose one)

  • --embeddings EMBEDDINGS_DIR

    Paths to directories containing image and/or PPI embeddings. The directory should have a TSV file, named image_emd.tsv or ppi_emd.tsv. Second option is to provide paths to specific TSV files.

    Deprecated Flags (still functional but no longer required):

    • --ppi_embeddingdir

      The directory path created by cellmaps_ppi_embedding which has a TSV file containing the embeddings of the PPI network. For each row, the first value is assumed to be the gene symbol followed by the embeddings.

    • --image_embeddingdir

      The directory path created by cellmaps_image_embedding which has a TSV file containing the embeddings of the IF images. For each row, the first value is assumed to be the sample ID followed by the embeddings.

Optional

  • --embedding_names

    Names corresponding to each filepath input in –embeddings.

  • --algorithm

    Algorithm to use for coembedding. Choices: ‘auto’, ‘muse’, ‘proteingps’. Defaults to ‘muse’. ‘auto’ is deprecated, and ‘proteingps’ should be used instead.

  • --latent_dimension

    Output dimension of the embedding. Default is 128.

  • --n_epochs_init

    Number of initial training epochs. Default is 200.

  • --n_epochs

    Number of training epochs. Default is 500.

  • --jackknife_percent

    Percentage of data to withhold from training. For example, a value of 0.1 means to withhold 10 percent of the data.

  • --mean_losses

    If set, use the mean of losses; otherwise, sum the losses.

  • --dropout

    Percentage to use for dropout layers in the neural network.

  • --l2_norm

    If set, L2 normalize coembeddings.

  • --fake_embedding

    If set, generates fake co-embeddings.

  • --logconf

    Path to the Python logging configuration file in the specified format.

  • --verbose or -v

    Increases verbosity of the logger to standard error for log messages in this module. Logging levels: -v = ERROR, -vv = WARNING, -vvv = INFO, -vvvv = DEBUG, -vvvvv = NOTSET.

  • --version

    Shows the version of the program.

Example usage

cellmaps_coembeddingcmd.py ./cellmaps_coembedding_outdir --embeddings ./cellmaps_image_embedding_outdir ./cellmaps_ppi_embedding_outdir

Via Docker

Example usage

Coming soon...

Embedding Evaluation (additional functionality)

The cellmaps_coembedding.utils module provides functions for evaluating embeddings. It is not part of the standard workflow, but an additional functionality. It includes statistical analysis of similarity scores and visualization of embedding performance using enrichment tests.

The get_embedding_eval_data function computes enrichment effect sizes for various embeddings using a reference adjacency matrix (CORUM). It also saves KDE data for the MUSE embedding. The generate_embedding_evaluation_figures automates the evaluation process by loading embeddings, computing effect sizes, and generating figures.

Returns:

  • sim_muse_data.csv: MUSE similarity scores.

  • embedding_eval.csv: Enrichment effect sizes for each embedding.

  • sim_muse.png: KDE plot for similarity scores.

  • embedding_eval.png: Enrichment comparison plot.

Usage Example

from cellmaps_coembedding.utils import generate_embedding_evaluation_figures

generate_embedding_evaluation_figures(
    coembedding='/path/to/coembedding',
    ppi='/path/to/ppiembedding',
    image='/path/to/imageembedding',
    outdir='/output/directory',
    num_samplings=1000,
    num_edges=1000
)

UMAP Generation

Optionally, you can create UMAP visualizations of the generated embeddings by using the cellmaps_coembedding.utils helpers. These plots allow you to see how samples cluster in a 2D projection based on their embedding similarity.

Note

To generate UMAP plots, you need to have the umap-learn (often installed as umap or umap-learn) and seaborn Python packages installed. For example, you can install them via:

pip install umap-learn seaborn
from cellmaps_coembedding.utils import generate_umap_of_embedding

generate_umap_of_embedding(emb_file='/path/to/embedding', outdir='/output/directory')

If you want to color the UMAP based on label (for example localization of the protein in the cell), you can pass a directory that contains label to protein mapping in label_map argument.

from cellmaps_coembedding.utils import generate_umap_of_embedding

generate_umap_of_embedding(emb_file='/path/to/embedding', outdir='/output/directory', label_map=location_dict)