DETAILS ABOUT THE RESEARCH
The gene@home project is an implementation of the PC-IM algorithm, whose purpose is to expand Gene Regulatory Networks (GRN). Each network is a graph that specifies the causal relationships inside this set of genes, and helps in studying the gene expression phenomenon: the process through which the DNA is transcribed into RNA and the RNA translated into proteins.
Expanding a GRN means finding new genes related to the existing ones, and allows a deeper understanding of the phenomenon in order to be able to forecast – and in case manipulate – the behaviors.
The PC-IM tests the genes of the Arabidopsis Thaliana plant, considered a model organism from the biological point of view, and receives in input a local GNR – called Local Gene Network (LGN), a list of genes candidate for the expansion and some information about the data expression. During its execution it tries to establish the existence of causal relationships between those genes and the LGN, and returns in output the new GRN.
The work of the algorithm can be distinguished into five steps:
1. Blocks creation
The genes candidate for the expansion are randomly partitioned into non-overlapping blocks: the reason is that the algorithm is more efficient when it works with less than 1000 variables, so the work should be done with networks of lower dimensions.
Every block is merged with the LGN in input, in order to be able to infer the causal relationships, and the operation is repeated i times (where i is the number of iterations of the algorithm).
2. PC application
The PC algorithm runs on each block, exploiting the data about the gene expression.
In particular, the PC (Peter-Clark) is an improving of the SGS algorithm, general procedure for the causal relationships discovery, and finds the conditional dependencies of a graph. Starting from a complete and non-oriented graph, it recursively deletes the edges for which, given the input information, it can deduce an independency tie. Afterwards, it tries to orient the remaining edges, looking for common relationships with the nodes of the graph, and applying a set of rules.
The result is a network of gene and relationships, from which the PC-IM extracts the sub-networks containing both old and new genes.
3. Frequencies computation
The sub-networks of the previous step are used to create a unique list of genes expansion, and the appearance frequency is computed for each gene.
4. Internal performance assessment
The PC-IM evaluates its performance and establishes, through the relationship of the LGN, the frequency needed to have the best expansions. The possible false positives and false negatives are computed studying three evaluation measures: the Positive Predictive Value, the Sensitivity and the False Positive Rate. The Precision-Recall and Receive Operating Characteristic are built, and the step return the frequency closer to the ideal values.
5. Cut-off frequency application
According to the frequencies computations, the algorithm decides which genes of the expansion list are really related to the LGN in input, and can then be returned as a final output.
The PC-IM is an algorithm still in the developing phase, but the preliminary results shown robustness and good performance is the GRN expansion.