Difference between revisions of "Genomika: Rozvojové projekty"

Latest revision as of 12:06, 26 April 2018

Informácie k predmetu Genomika

Na tejto stránke sú informácie k podprojektom na záverečné týždne semestra.

MalGlo group

User trackDb, code management

Think how to better manage changes to browser code in the future instances of the course
Explore possibilities of each user having their own trackDb
Start by reading short info in /kentsrc/trackDb/makefile on genomika server

# Browser supports multiple trackDb's so that individual developers
# can change things rapidly without stepping on other people's toes. 
...

Write a manual how to do your suggested changes and test it

Rfam

Rfam http://rfam.xfam.org/ is a database of families of non-coding RNAs
It contains a covariance model for each family
The database can be downloaded and searched against a genome using Infernal tool http://eddylab.org/infernal/
Do this search, then convert the output to appropriate format and display in the browser
Possibly use BEDdetail format https://genome.ucsc.edu/FAQ/FAQformat.html#format1.7
After clicking on an Rfam match, there should be some display of additional information about the match and a link to the Rfam database. You can achieve this by the following lines in trackDb.ra:

type bedDetail 14
url http://rfam.xfam.org/family/$$
urlLabel Rfam:

Example of BEDdetail format for a Rfam match (items should be tab-separated, the last column starts at "truncated:")

chrom chromStart chromEnd name score strand thickStart thickEnd reserved blockCount blockSizes chromStarts id description
contigA 75109 75380 Fungi_SRP-1 1002 - 75109 75109 0 1 271 0 RF01502 truncated: no, E-value: 3.5e-19

Further things which you might want to explore:
- Remove matches that correspond to tRNAScan-SE matches (try tool overlapSelect)
- From several overlapping matches keep only the strongest (try tool overlapSelect)
- More ambitious: Explore creating image of each RNA structure and somehow linking it to the info page for the match (as in non-coding RNA track in the human genome browser - see for example http://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr1%3A16520585%2D16520658, display non-coding RNA track and click on the tRNA match)

Information for users

Each track should provide basic information for users in the HTML document displayed after clicking on track name or left bar of the browser image.
The information should summarize what is displayed, what was source of the data, what program was used to produce the results etc
- keep it less technical, with a link to your github wiki page for the track for potential developers replicating your work
See examples for tracks on the http://genome-euro.ucsc.edu/ browser
Also, the genome as a whole should have a description page. On the title page of http://genome-euro.ucsc.edu/ you see details of the selected assembly, e.g. for the guinea pig genome you see text

Guinea pig Genome Browser - cavPor3 assembly
The Feb. 2008 Cavia porcellus draft assembly (Broad Institute cavPor3) was produced by the Broad Institute at MIT and Harvard.
...

You should create some explanatory text for you species and genome and make it display on the title page
- This already works for Yarrowia lipolitica on genomika server, so you can try to find out how it was done

MalSym group

Informácie k predmetu Genomika

Gene info pages

If you click on a gene or other displayed item in a well-setup genome browser, you get a page with more information about this item
These info pages do not work satisfactorily on our genomika browser
Look at all protein coding gene tracks in four browsers:
- sacCer3 in original UCSC genome browser [1], tracks NCBI RefSeq, SGD Genes, Ensembl Genes
- sacCer3 in our genomika genome browser [2], tracks NCBI RefSeq, SGD Genes, Ens. Genes, NCBI RefSeq (L), SGD Genes (L), Ens. Genes (L),
- yarLip1 in our genomika genome browser [3], tracks Ens. Genes (L), RefSeq Genes (L)
- malSym1 in our genomika genome browser [4], track Ensemble Genes (should be renamed Genes from NCBI)
For each explored track, find out what gets displayed on the gene info page, whether there are any error messages, whether the page contains a link to the source database (e.g. Ensembl, RefSeq, NCBI, SGD)
Explore how the differences in these info pages are encoded in the database and trackDb.ra
Suggest and implement improvements in these info pages on our browser in sacCer, yarLip, malSym and after warning the other group also in malGlo
The most comprehensive gene info pages use additional db tables downloaded from the uniprot database. This database is too large to be completely mirrored on our server. Can you suggest and implement a method for downloading only parts of the database for our species and loading it to the tables? (You were downloading uniprot for one species, its "proteome" in task M, possibly it can be used here.)

Note:

To explore how things work at UCSC, you can see setup notes in theit github [5], particularly the uniProt section and sacCer3.txt
You can also check their original trackDb.ra files [6] - see also parent directory and subdirectories
You can explore even the UCSC mysql database through their mysql server [7]

Blat and name search

Blat:

In the blue menu bar on top of the genome browser screen find Tool->Blat. This is a fast alignment tool which find sequences highly similar to your query.
In the genomika browser it seems to work for sacCer3 but not for the other three genomes. Make it work for all four, document your changes.

Name search:

Browser screen also contains text input field, where you can enter particular coordinates but also other keywords, such a gene name etc.
- Try searching for gene YDR157W in sacCer3
- Try searching for gene CAG83524 in yarLip1 - the gene is there but is not found, instead we get an error message
- Make the search work for gene identifiers in all 4 genomes (sacCer, yarLip, malGlo, malSym)
Possibly also allow searching for other entities (keywords from gene descriptions, tRNA anti-codons, domains from Uniprot annotation track etc)
- For example searching for keyword "ribosomal" in UCSC sacCer genome browser returns a list of genes with ribosomal in their description - try: [8]
Get rid of misleading error message when search is unsuccessful (see what error you get in the UCSC brwoser)

See the note in the previous task for information sources on how things are setup at UCSC

Information for users

Each track should provide basic information for users in the HTML document displayed after clicking on track name or left bar of the browser image.
The information should summarize what is displayed, what was source of the data, what program was used to produce the results etc
- keep it less technical, with a link to your github wiki page for the track for potential developers replicating your work
See examples for tracks on the http://genome-euro.ucsc.edu/ browser
Also, the genome as a whole should have a description page. On the title page of http://genome-euro.ucsc.edu/ you see details of the selected assembly, e.g. for the guinea pig genome you see text

Guinea pig Genome Browser - cavPor3 assembly
The Feb. 2008 Cavia porcellus draft assembly (Broad Institute cavPor3) was produced by the Broad Institute at MIT and Harvard.
...

You should create some explanatory text for you species and genome and make it display on the title page
- This already works for Yarrowia lipolitica on genomika server, so you can try to find out how it was done

@@ Line 1: / Line 1: @@
+Informácie k predmetu [[Genomika]]
+Na tejto stránke sú informácie k podprojektom na záverečné týždne semestra.
 ==MalGlo group==
 ===User trackDb, code management===
@@ Line 30: / Line 34: @@
 * Further things which you might want to explore:
 ** Remove matches that correspond to tRNAScan-SE matches (try tool overlapSelect)
-** From several overlapping matches keep only the strongest
+** From several overlapping matches keep only the strongest (try tool overlapSelect)
-** More ambititios: Explore creating image of each RNA structure and somehow linking it to the info page for the match (as in non-coding RNA track in the human genome browser - go to http://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr1%3A16520585%2D16520658 &hgsid=227581906_XP4IhUVhkFQJxrfo1G3SjDkZZFkI chr1:16520585-16520658
+** More ambitious: Explore creating image of each RNA structure and somehow linking it to the info page for the match (as in non-coding RNA track in the human genome browser - see for example http://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr1%3A16520585%2D16520658, display non-coding RNA track and click on the tRNA match)
-http://genome-euro.ucsc.edu/cgi-bin/hgc?c=chr1&l=16512339&r=16554600&o=16520584&t=16520658&g=tRNAs&i=tRNA%2DAsn%2DGTT%2Dchr1%2D140
+===Information for users===
+* Each track should provide basic information for users in the HTML document displayed after clicking on track name or left bar of the browser image.
+* The information should summarize what is displayed, what was source of the data, what program was used to produce the results etc
+** keep it less technical, with a link to your github wiki page for the track for potential developers replicating your work
+* See examples for tracks on the http://genome-euro.ucsc.edu/ browser
+* Also, the genome as a whole should have a description page. On the title page of http://genome-euro.ucsc.edu/ you see details of the selected assembly, e.g. for the guinea pig genome you see text
+<pre>
+Guinea pig Genome Browser - cavPor3 assembly
+The Feb. 2008 Cavia porcellus draft assembly (Broad Institute cavPor3) was produced by the Broad Institute at MIT and Harvard.
+...
+</pre>
+* You should create some explanatory text for you species and genome and make it display on the title page
+** This already works for Yarrowia lipolitica on genomika server, so you can try to find out how it was done
+==MalSym group==
+Informácie k predmetu [[Genomika]]
+===Gene info pages===
+* If you click on a gene or other displayed item in a well-setup genome browser, you get a page with more information about this item
+* These info pages do not work satisfactorily on our genomika browser
+* Look at all protein coding gene tracks in four browsers:
+** sacCer3 in original UCSC genome browser [http://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=sacCer3], tracks NCBI RefSeq, SGD Genes, Ensembl Genes
+** sacCer3 in our genomika genome browser [http://genomika.compbio.fmph.uniba.sk/cgi-bin/hgTracks?db=sacCer3], tracks NCBI RefSeq, SGD Genes, Ens. Genes, NCBI RefSeq (L), SGD Genes (L), Ens. Genes (L),
+** yarLip1 in our genomika genome browser [http://genomika.compbio.fmph.uniba.sk/cgi-bin/hgTracks?db=yarLip1], tracks Ens. Genes (L), RefSeq Genes (L)
+** malSym1 in our genomika genome browser [http://genomika.compbio.fmph.uniba.sk/cgi-bin/hgTracks?db=malSym1], track Ensemble Genes (should be renamed Genes from NCBI)
+* For each explored track, find out what gets displayed on the gene info page, whether there are any error messages, whether the page contains a link to the source database (e.g. Ensembl, RefSeq, NCBI, SGD)
+* Explore how the differences in these info pages are encoded in the database and trackDb.ra
+* Suggest and implement improvements in these info pages on our browser in sacCer, yarLip, malSym and after warning the other group also in malGlo
+* The most comprehensive gene info pages use additional db tables downloaded from the uniprot database. This database is too large to be completely mirrored on our server. Can you suggest and implement a method for downloading only parts of the database for our species and loading it to the tables? (You were downloading uniprot for one species, its "proteome" in task M, possibly it can be used here.)
+Note:
+* To explore how things work at UCSC, you can see setup notes in theit github [https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/doc], particularly the uniProt section and sacCer3.txt
+* You can also check their original trackDb.ra files [https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/trackDb/sacCer] - see also parent directory and subdirectories
+* You can explore even the UCSC mysql database through their mysql server [http://genome.ucsc.edu/goldenPath/help/mysql.html]
+===Blat and name search===
+Blat:
+* In the blue menu bar on top of the genome browser screen find Tool->Blat. This is a fast alignment tool which find sequences highly similar to your query.
+* In the genomika browser it seems to work for sacCer3 but not for the other three genomes. Make it work for all four, document your changes.
+Name search:
+* Browser screen also contains text input field, where you can enter particular coordinates but also other keywords, such a gene name etc.
+** Try searching for gene YDR157W in sacCer3
+** Try searching for gene CAG83524 in yarLip1 - the gene is there but is not found, instead we get an error message
+** Make the search work for gene identifiers in all 4 genomes (sacCer, yarLip, malGlo, malSym)
+* Possibly also allow searching for other entities (keywords from gene descriptions, tRNA anti-codons, domains from Uniprot annotation track etc)
+** For example searching for keyword "ribosomal" in UCSC sacCer genome browser returns a list of genes with ribosomal in their description - try: [http://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=sacCer3]
+* Get rid of misleading error message when search is unsuccessful (see what error you get in the UCSC brwoser)
+See the note in the previous task for information sources on how things are setup at UCSC
 ===Information for users===
 * Each track should provide basic information for users in the HTML document displayed after clicking on track name or left bar of the browser image.
-* The information should summarize what is displayed, what was source of the data, what program was used to produce the results etc, but
+* The information should summarize what is displayed, what was source of the data, what program was used to produce the results etc
+** keep it less technical, with a link to your github wiki page for the track for potential developers replicating your work
+* See examples for tracks on the http://genome-euro.ucsc.edu/ browser
+* Also, the genome as a whole should have a description page. On the title page of http://genome-euro.ucsc.edu/ you see details of the selected assembly, e.g. for the guinea pig genome you see text
+<pre>
+Guinea pig Genome Browser - cavPor3 assembly
+The Feb. 2008 Cavia porcellus draft assembly (Broad Institute cavPor3) was produced by the Broad Institute at MIT and Harvard.
+...
+</pre>
+* You should create some explanatory text for you species and genome and make it display on the title page
+** This already works for Yarrowia lipolitica on genomika server, so you can try to find out how it was done

Difference between revisions of "Genomika: Rozvojové projekty"

Latest revision as of 12:06, 26 April 2018

Contents

MalGlo group

User trackDb, code management

Rfam

Information for users

MalSym group

Gene info pages

Blat and name search

Information for users

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools