New experiment: gene network expansion of human pathologically relevant genes
log in

Advanced search

Message boards : Science : New experiment: gene network expansion of human pathologically relevant genes

Author Message
toma
Project scientist
Send message
Joined: 6 Jun 18
Posts: 3
Credit: 0
RAC: 0
Italy
Message 1329 - Posted: 6 Jun 2018, 12:04:16 UTC

The goal of this experiment is the identification of gene networks involving human genes of medical relevance for two broad families of human pathologies: motor neuron diseases (https://en.wikipedia.org/wiki/Motor_neuron_disease) and hematopoietic tumors (https://en.wikipedia.org/wiki/Tumors_of_the_hematopoietic_and_lymphoid_tissues).
Human gene network expansion will take advantage of the comprehensive gene expression dataset provided by the FANTOM project (http://fantom.gsc.riken.jp/5/data/).

Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 183
Credit: 4,641,505
RAC: 0
Italy
Message 1330 - Posted: 7 Jun 2018, 6:16:55 UTC

Very interesting!!

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,676,744
RAC: 1,154
Italy
Message 1331 - Posted: 7 Jun 2018, 9:07:42 UTC - in response to Message 1330.
Last modified: 7 Jun 2018, 9:08:29 UTC

I just wanted to add some technicalities: The dataset we are using right now is very big, a floating point matrix 87554x1829, it's size on the hard disk is ~0.5Gb and contains, for many genes, different isoforms/transcripts. The computational time for an algorithm's iteration, because of its 'complexity', is longer than the previous experiments.

Any single gene expansion is packed up into 294 workunits. Any workunit you receive has a name like this: 142041_Hs_T155402-MAP1B_wu-(1 to 294), a counter, Hs, an internal T code, a mnemonic for the gene/isoform. The T code is just a shortcut for the gene coordinates, for example T155402 is chr5:71403265..71403276,+

At this moment we don't know how many experiments we will make with this dataset, that's why I didn't update the 'Science status' page with the usual counters.

Col323
Send message
Joined: 23 Nov 16
Posts: 7
Credit: 1,329,132
RAC: 0
Angola
Message 1332 - Posted: 14 Jun 2018, 19:15:09 UTC - in response to Message 1331.

Thanks for updating the Science Status page with the information you can supply. I really enjoy seeing what progress has been made and at what rate, even if we have no idea how far down the path we are. We'll just keep crunching until we get there. :-D

Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 183
Credit: 4,641,505
RAC: 0
Italy
Message 1348 - Posted: 3 Sep 2018, 6:34:07 UTC

Almost 3 months ago, this project started.
Any preliminary result? How do you test our wus? In vitro??

Profile luca@gene [SSC11]
Project developer
Project tester
Project scientist
Send message
Joined: 20 Nov 13
Posts: 3
Credit: 205,242
RAC: 0
Italy
Message 1368 - Posted: 21 Sep 2018, 9:42:35 UTC
Last modified: 21 Sep 2018, 9:57:20 UTC

Moving to the FANTOM project dataset has been a big challenge for us. The original input contains a slightly different type of data (RNAseq instead of micro-array contrasts) and has 10 times the variables of the biggest data matrix we used so far.

Human is known to have around 20.000 protein-coding genes and 5.000 non-coding ones, but the technology employed in the FANTOM project allowed them to analyze expression at a higher resolution, reporting data also for alternative splicing. The raw dataset reports expression levels for more then 200.000 transcripts in 1829 conditions, wich would been too big to be explored, so we had to filter it.
A strongly related challenge comes from the highly specificity of certain transcripts, that are expressed only in few conditions. This is translated in lines of data with very few non-zero entries that are very likely to correlate one with another.

We are currently testing two version of the input dataset. The first one is a general filtering on the lines that have no HGNC code (www.genenames.org). The second is a more fine grained filtering, where we have removed also transcripts with less then 10% non-zero entries and gene-isoforms too similar one with another.

Most of the work in this moment is focused toward understanding which one of the two dataset is performing "better".

Thank you for all your support.


Post to thread

Message boards : Science : New experiment: gene network expansion of human pathologically relevant genes


Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN