April 11, 2017

BRANE Clust: cluster-assisted gene regulatory network inference refinement

The joined GRN inference and clustering algorithm BRANE Clust is just been published in BRANE Clust: cluster-assisted gene regulatory network inference refinement in IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017.

Alternative versions are available as a preprint, on biorxiv, with a page and software and in HAL. Another brick in the BRANE series wall, a series of bioinformatics tools based on graphs and optimization, dedicated to -omics gene expression data for Gene Regulatory Network (GRN) inference.

While traditional Next-generation sequencing (NGS) pipelines often combine motley assumptions (correlation, normalization, clustering, inference), this work is an first step toward gracefully combining network inference and clustering. 

BRANE Clust works as a post-processing tool upon classical network thresholding refinement. From a complete weighted network (obtained from any network inference method) BRANE Clust favors edges both having higher weights (as in standard thresholding) and linking nodes belonging to a same cluster. It  relies on an optimization procedure. It  computes an optimal gene clustering (random walker algorithm) and an optimal edge selection jointly. The introduction of a clustering step in the edge selection process improves gene regulatory network inference. This is demonstrated on both synthetic (five networks of  DREAM4 and network 1 of DREAM5) and real (network 3 of DREAM5) data. These conclusions are drawn after comparing classical thresholding on CLR and GENIE3 networks to our proposed post-processing. Significant improvements in terms of Area Under Precision-Recall curve are obtained. The  predictive power on real data yields promising results: predicted links specific to BRANE Clust reveal plausible biological interpretation. GRN approaches that produce a complete weighted network to prune could benefit from BRANE Clust post-processing.

Escherichia coli network built using BRANE Clust on GENIE3 weights and containing 236 edges. Large dark gray nodes refers to transcription factors (TFs). Inferred edges also reported in the ground truth are colored in black while predictive edges are light gray. Dashed edges correspond to a link inferred by both BRANE Clust and GENIE3 while solid links refer to edges specifically inferred by BRANE Clust.
Discovering meaningful gene interactions is crucial for the identification of novel regulatory processes in cells.
Building accurately the related graphs remains challenging due to the large number of possible solutions from available data. Nonetheless, enforcing a priori on the graph structure, such as modularity, may reduce network indeterminacy issues. BRANE Clust (Biologically-Related A priori Network Enhancement with Clustering) refines gene regulatory network (GRN) inference thanks to cluster information. It works as a post-processing tool for inference methods (i.e. CLR, GENIE3). In BRANE Clust, the clustering is based on the inversion of a system of linear equations involving a graph-Laplacian matrix promoting a modular structure. Our approach is validated on DREAM4 and DREAM5 datasets with objective measures, showing significant comparative improvements. We provide additional insights on the discovery of novel regulatory or co-expressed links in the inferred Escherichia coli network evaluated using the STRING database. The comparative pertinence of clustering is discussed computationally (SIMoNe, WGCNA, X-means) and biologically (RegulonDB). BRANE Clust software is available at: