The functional similarity among terms of an ontology is evaluated by using Semantic Similarity Measures (SSM). In computational biology, biological entities such as genes or proteins are usually annotated with terms extracted from Gene Ontology (GO) and the most common application is to find the similarity or dissimilarity among two entities through the application of SSMs to their annotations. More recently, the extensive application of SSMs yielded to the Semantic Similarity Networks (SSNs). SSNs are edge-weighted graphs where the nodes are concepts (e.g. proteins) and each edge has an associated weight that represents the semantic similarity among related pairs of nodes. Community detection algorithms that analyse SSNs, such as protein complexes prediction or motif extraction, may reveal clusters of functionally associated proteins. Because SSNs have a high number of arcs with low weight, likened to noise, the application of classical clustering algorithms on raw networks exhibits low performance. To improve the performance of such algorithms, a possible approach is to simplify the structure of SSNs through a preprocessing step able to delete arcs likened to noise. Thus we propose a novel preprocessing strategy to simplify SSNs based on an hybrid global-local thresholding approach based on spectral graph theory. As proof of concept we demonstrate that community detection algorithms applied to filtered (thresholded) networks, have better performances in terms of biological relevance of the results, with respect to the use of raw unfiltered networks.

Thresholding of Semantic Similarity Networks Using a Spectral Graph-Based Technique

Cannataro M;Veltri P;Guzzi P
2014-01-01

Abstract

The functional similarity among terms of an ontology is evaluated by using Semantic Similarity Measures (SSM). In computational biology, biological entities such as genes or proteins are usually annotated with terms extracted from Gene Ontology (GO) and the most common application is to find the similarity or dissimilarity among two entities through the application of SSMs to their annotations. More recently, the extensive application of SSMs yielded to the Semantic Similarity Networks (SSNs). SSNs are edge-weighted graphs where the nodes are concepts (e.g. proteins) and each edge has an associated weight that represents the semantic similarity among related pairs of nodes. Community detection algorithms that analyse SSNs, such as protein complexes prediction or motif extraction, may reveal clusters of functionally associated proteins. Because SSNs have a high number of arcs with low weight, likened to noise, the application of classical clustering algorithms on raw networks exhibits low performance. To improve the performance of such algorithms, a possible approach is to simplify the structure of SSNs through a preprocessing step able to delete arcs likened to noise. Thus we propose a novel preprocessing strategy to simplify SSNs based on an hybrid global-local thresholding approach based on spectral graph theory. As proof of concept we demonstrate that community detection algorithms applied to filtered (thresholded) networks, have better performances in terms of biological relevance of the results, with respect to the use of raw unfiltered networks.
2014
978-3-319-08406-0
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/362505
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 3
social impact