CytoGEDEVO tutorial #2

The goal of this tutorial is to describe the workflow to align networks extracted from the DIP database, annotate them with GO data, and perform some simple GO term analysis on the aligned network.

In the end, the tutorial will allow conclusions about how well different scoring schemes relate to biological similar alignments. Some of the results obtained with this workflow are part of the supplementary material of the paper.

Some steps are somewhat advanced, and require additional programming / external tools, but the extracted data and scripts are provided for convenience.

Prerequisites:

First, load the sapiens.elu and cerevisiae.elu networks.

Alignments:

Common settings for all 3:

These settings relax the extent to which nodes are paired a bit and have the algorithm try harder before assuming convergence. In order to focus on the simplest scoring model possible, graphlets are excluded (for this particular setting, our tests have shown that graphlets do not actually improve the overall alignment, this is why they are not needed here).

On the basic panel, choose a file to write the alignment to. For this tutorial, all 3 are needed so make sure you choose distinct file names for every alignment.

Alignment #1:

Alignment #2:

Alignment #3:

Same as #2, except for these changes to the data file:

For an explanation of these parameters, see here.

Alternatively:

Instead of using CytoGEDEVO, If you are low on memory or your machine can not handle 3 alignments at once in Cytoscape, you may want to use the command-line version of GEDEVO:

The required parameters for the alignments above are as follows:

Further processing of results

As the next step, add additional Gene Ontology (GO)-derived score columns the the alignment result files.

The associated GO terms are provided as a pre-made file (uniprot_sprot.goshrink), but you can also generate this file yourself with the provided scripts.

Run the following command from the supplied scripts directory for each of the generated alignment result files:

This creates a <file>.txt.withgo file for each input file.

You can repeat this for alignment results generated by other aligners.

Prepared + GO-expanded alignment result files for download: tut2-alignments.zip

Re-import into CytoGEDEVO.

Import each of the expanded alignments files into their corresponding network in CytoGEDEVO:

Alternatively, if you performed the alignments on the command line:

Repeat for the other two .withgo files.

Two additional data table columns are imported from the .withgo files:

Cleaning up

Since the alignments obtained are far too cluttered and not suitable for inspection, clean them up a little.

For each aligned network pair:

Your image may be different, depending on which network you choose and the aligner you used. For example, MI-GRAAL and NETAL have a tendency to produce one huge CCS, while with GEDEVO, CCS sizes vary depending on parameters. More about this further down.

Visualization & exploration

Create a new style to highlight gedevoGOSumScore -- choose the coloring scheme that looks best for you.

A good configuration for this data set is:

It looks like this:


After applying this coloring scheme to the cleaned up networks, the result should look similar to this:

Using only GED for alignment shows a quite low GO overlap:

(The more blue, the better)


Using 50% GED and 50% BLAST scores, GO overlap is higher, but the biological similarity of most nodes is still not very good:


Using override scoring (BLAST first and GED as fallback), we get this. Quite high GO overlap, but very small CCSs (which makes sense since topology is secondary here):

Note that the dataset used to assign GO terms is the manually curated UniprotKB database, which should exclude bias that would otherwise be introduced by automatically assigned GO terms based on sequence similarity.


This workflow is the same for other aligners, as long as their output file formats are compatible (or properly converted so that the addgo script and the CytoGEDEVO importer can handle them).

Given these results, it looks like topological similarity does not necessarily imply biological similarity, and the size of CCSs does not seem to allow deriving a biologically meaningful conclusion. In terms of cascaded scoring,

The final Cytoscape session file from this tutorial can be downloaded here:

The original session file used in supplement 2 also has alignments performed with NETAL and MI-GRAAL:

Given these results, it looks like topological similarity does not necessarily imply biological similarity, and the size or distribution of CCSs does not seem to allow deriving a biologically meaningful conclusion.