1-DAV-202 Data Management 2023/24
Previously 2-INF-185 Data Source Integration

Materials · Introduction · Rules · Contact
· Grades from marked homeworks are on the server in file /grades/userid.txt
· Dates of project submission and oral exams:
Early: submit project May 24 9:00am, oral exams May 27 1:00pm (limit 5 students).
Otherwise submit project June 11, 9:00am, oral exams June 18 and 21 (estimated 9:00am-1:00pm, schedule will be published before exam).
Sign up for one the exam days in AIS before June 11.
Remedial exams will take place in the last week of the exam period. Beware, there will not be much time to prepare a better project. Projects should be submitted as homeworks to /submit/project.
· Cloud homework is due on May 20 9:00am.


Genomika: Rozvojové projekty

From MAD
Jump to navigation Jump to search

MalGlo group

User trackDb, code management

  • Think how to better manage changes to browser code in the future instances of the course
  • Explore possibilities of each user having their own trackDb
  • Start by reading short info in /kentsrc/trackDb/makefile on genomika server
# Browser supports multiple trackDb's so that individual developers
# can change things rapidly without stepping on other people's toes. 
...
  • Write a manual how to do your suggested changes and test it

Rfam

  • Rfam http://rfam.xfam.org/ is a database of families of non-coding RNAs
  • It contains a covariance model for each family
  • The database can be downloaded and searched against a genome using Infernal tool http://eddylab.org/infernal/
  • Do this search, then convert the output to appropriate format and display in the browser
  • Possibly use BEDdetail format https://genome.ucsc.edu/FAQ/FAQformat.html#format1.7
  • After clicking on an Rfam match, there should be some display of additional information about the match and a link to the Rfam database. You can achieve this by the following lines in trackDb.ra:
type bedDetail 14
url http://rfam.xfam.org/family/$$
urlLabel Rfam:

Example of BEDdetail format for a Rfam match (items should be tab-separated, the last column starts at "truncated:")

chrom chromStart chromEnd name score strand thickStart thickEnd reserved blockCount blockSizes chromStarts id description
contigA 75109 75380 Fungi_SRP-1 1002 - 75109 75109 0 1 271 0 RF01502 truncated: no, E-value: 3.5e-19
  • Further things which you might want to explore:
    • Remove matches that correspond to tRNAScan-SE matches (try tool overlapSelect)
    • From several overlapping matches keep only the strongest (try tool overlapSelect)
    • More ambitious: Explore creating image of each RNA structure and somehow linking it to the info page for the match (as in non-coding RNA track in the human genome browser - see for example http://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr1%3A16520585%2D16520658, display non-coding RNA track and click on the tRNA match)

Information for users

  • Each track should provide basic information for users in the HTML document displayed after clicking on track name or left bar of the browser image.
  • The information should summarize what is displayed, what was source of the data, what program was used to produce the results etc
    • keep iy less technical, with a link to your github wiki page for the track for potential developers replicating your work
  • See examples for tracks on the http://genome-euro.ucsc.edu/ brwoser
  • Also, the genome as a whole should have a description page. On the title page of http://genome-euro.ucsc.edu/ you see details of the selected assembly, e.g. for the guinea pig genome you see text
Guinea pig Genome Browser - cavPor3 assembly
The Feb. 2008 Cavia porcellus draft assembly (Broad Institute cavPor3) was produced by the Broad Institute at MIT and Harvard.
...
  • You should create some explanatory text for you species and genome and make it display on the title page