New experiment: gene network expansion of human pathologically relevant genes

Message boards : Science : New experiment: gene network expansion of human pathologically relevant genes

Author	Message
toma Project scientist Send message Joined: 6 Jun 18 Posts: 3 Credit: 0 RAC: 0	Message 1329 - Posted: 6 Jun 2018, 12:04:16 UTC
	The goal of this experiment is the identification of gene networks involving human genes of medical relevance for two broad families of human pathologies: motor neuron diseases (https://en.wikipedia.org/wiki/Motor_neuron_disease) and hematopoietic tumors (https://en.wikipedia.org/wiki/Tumors_of_the_hematopoietic_and_lymphoid_tissues). Human gene network expansion will take advantage of the comprehensive gene expression dataset provided by the FANTOM project (http://fantom.gsc.riken.jp/5/data/).
	ID: 1329 · Reply Quote

[VENETO] boboviz Send message Joined: 12 Dec 13 Posts: 184 Credit: 4,642,321 RAC: 0	Message 1330 - Posted: 7 Jun 2018, 6:16:55 UTC
	Very interesting!!
	ID: 1330 · Reply Quote

valterc Project administrator Project tester Send message Joined: 30 Oct 13 Posts: 635 Credit: 34,757,094 RAC: 3	Message 1331 - Posted: 7 Jun 2018, 9:07:42 UTC - in response to Message 1330. Last modified: 7 Jun 2018, 9:08:29 UTC
	I just wanted to add some technicalities: The dataset we are using right now is very big, a floating point matrix 87554x1829, it's size on the hard disk is ~0.5Gb and contains, for many genes, different isoforms/transcripts. The computational time for an algorithm's iteration, because of its 'complexity', is longer than the previous experiments. Any single gene expansion is packed up into 294 workunits. Any workunit you receive has a name like this: 142041_Hs_T155402-MAP1B_wu-(1 to 294), a counter, Hs, an internal T code, a mnemonic for the gene/isoform. The T code is just a shortcut for the gene coordinates, for example T155402 is chr5:71403265..71403276,+ At this moment we don't know how many experiments we will make with this dataset, that's why I didn't update the 'Science status' page with the usual counters.
	ID: 1331 · Reply Quote

Col323 Send message Joined: 23 Nov 16 Posts: 7 Credit: 1,329,132 RAC: 0	Message 1332 - Posted: 14 Jun 2018, 19:15:09 UTC - in response to Message 1331.
	Thanks for updating the Science Status page with the information you can supply. I really enjoy seeing what progress has been made and at what rate, even if we have no idea how far down the path we are. We'll just keep crunching until we get there. :-D
	ID: 1332 · Reply Quote

[VENETO] boboviz Send message Joined: 12 Dec 13 Posts: 184 Credit: 4,642,321 RAC: 0	Message 1348 - Posted: 3 Sep 2018, 6:34:07 UTC
	Almost 3 months ago, this project started. Any preliminary result? How do you test our wus? In vitro??
	ID: 1348 · Reply Quote

luca@gene [SSC11] Project developer Project tester Project scientist Send message Joined: 20 Nov 13 Posts: 3 Credit: 205,242 RAC: 0	Message 1368 - Posted: 21 Sep 2018, 9:42:35 UTC Last modified: 21 Sep 2018, 9:57:20 UTC
	Moving to the FANTOM project dataset has been a big challenge for us. The original input contains a slightly different type of data (RNAseq instead of micro-array contrasts) and has 10 times the variables of the biggest data matrix we used so far. Human is known to have around 20.000 protein-coding genes and 5.000 non-coding ones, but the technology employed in the FANTOM project allowed them to analyze expression at a higher resolution, reporting data also for alternative splicing. The raw dataset reports expression levels for more then 200.000 transcripts in 1829 conditions, wich would been too big to be explored, so we had to filter it. A strongly related challenge comes from the highly specificity of certain transcripts, that are expressed only in few conditions. This is translated in lines of data with very few non-zero entries that are very likely to correlate one with another. We are currently testing two version of the input dataset. The first one is a general filtering on the lines that have no HGNC code (www.genenames.org). The second is a more fine grained filtering, where we have removed also transcripts with less then 10% non-zero entries and gene-isoforms too similar one with another. Most of the work in this moment is focused toward understanding which one of the two dataset is performing "better". Thank you for all your support.
	ID: 1368 · Reply Quote

Post to thread

Message boards : Science : New experiment: gene network expansion of human pathologically relevant genes