The goal of this tutorial is to describe the workflow to align networks extracted from the DIP database, annotate them with GO data, and perform some simple GO term analysis on the aligned network.
In the end, the tutorial will allow conclusions about how well different scoring schemes relate to biological similar alignments. Some of the results obtained with this workflow are part of the supplementary material of the paper.
Some steps are somewhat advanced, and require additional programming / external tools, but the extracted data and scripts are provided for convenience.
First, load the sapiens.elu and cerevisiae.elu networks.
Common settings for all 3:
These settings relax the extent to which nodes are paired a bit and have the algorithm try harder before assuming convergence. In order to focus on the simplest scoring model possible, graphlets are excluded (for this particular setting, our tests have shown that graphlets do not actually improve the overall alignment, this is why they are not needed here).
On the basic panel, choose a file to write the alignment to. For this tutorial, all 3 are needed so make sure you choose distinct file names for every alignment.
Same as #2, except for these changes to the data file:
For an explanation of these parameters, see here.
Instead of using CytoGEDEVO, If you are low on memory or your machine can not handle 3 alignments at once in Cytoscape, you may want to use the command-line version of GEDEVO:
The required parameters for the alignments above are as follows:
./gedevo --edgelist sapiens.elu --edgelist cerevisiae.elu -u --density 0.7 --maxiter 100 -c pairWeightGraphlets=0 --save 1.txt
./gedevo --edgelist sapiens.elu --edgelist cerevisiae.elu -u --density 0.7 --maxiter 100 -c pairWeightGraphlets=0 --save 2.txt --custom cerevisiae_sapiens.blastlistu auto distance clamp 1 0
./gedevo --edgelist sapiens.elu --edgelist cerevisiae.elu -u --density 0.7 --maxiter 100 -c pairWeightGraphlets=0 --save 3.txt --custom cerevisiae_sapiens.blastlistu auto distance override 0 0.0001
As the next step, add additional Gene Ontology (GO)-derived score columns the the alignment result files.
The associated GO terms are provided as a pre-made file (uniprot_sprot.goshrink), but you can also generate this file yourself with the provided scripts.
Run the following command from the supplied scripts directory for each of the generated alignment result files:
This creates a
<file>.txt.withgo file for each input file.
You can repeat this for alignment results generated by other aligners.
Prepared + GO-expanded alignment result files for download: tut2-alignments.zip
Import each of the expanded alignments files into their corresponding network in CytoGEDEVO:
Alternatively, if you performed the alignments on the command line:
.withgofile prepared above
Repeat for the other two
Two additional data table columns are imported from the
Since the alignments obtained are far too cluttered and not suitable for inspection, clean them up a little.
For each aligned network pair:
Select Nodes & edges that have CCS_size in [0, 2]
Invert the selection so that only nodes & edges with CCS_size > 2 are selected
Create a new network from selected nodes & edges:
This will extract all CCSs. Edges that previously connected individual CCSs will be exluded, thus leaving them nicely separated.
Apply Layout- > GEDEVO pair layouts -> Prefuse force directed layout -> (none):
Your image may be different, depending on which network you choose and the aligner you used. For example, MI-GRAAL and NETAL have a tendency to produce one huge CCS, while with GEDEVO, CCS sizes vary depending on parameters. More about this further down.
Create a new style to highlight gedevoGOSumScore -- choose the coloring scheme that looks best for you.
A good configuration for this data set is:
It looks like this:
After applying this coloring scheme to the cleaned up networks, the result should look similar to this:
Using only GED for alignment shows a quite low GO overlap:
(The more blue, the better)
Using 50% GED and 50% BLAST scores, GO overlap is higher, but the biological similarity of most nodes is still not very good:
Using override scoring (BLAST first and GED as fallback), we get this. Quite high GO overlap, but very small CCSs (which makes sense since topology is secondary here):
Note that the dataset used to assign GO terms is the manually curated UniprotKB database, which should exclude bias that would otherwise be introduced by automatically assigned GO terms based on sequence similarity.
This workflow is the same for other aligners, as long as their output file formats are compatible (or properly converted so that the
addgo script and the CytoGEDEVO importer can handle them).
Given these results, it looks like topological similarity does not necessarily imply biological similarity, and the size of CCSs does not seem to allow deriving a biologically meaningful conclusion. In terms of cascaded scoring,
The final Cytoscape session file from this tutorial can be downloaded here:
The original session file used in supplement 2 also has alignments performed with NETAL and MI-GRAAL:
Given these results, it looks like topological similarity does not necessarily imply biological similarity, and the size or distribution of CCSs does not seem to allow deriving a biologically meaningful conclusion.