1-DAV-202 Data Management 2023/24
Previously 2-INF-185 Data Source Integration

Materials · Introduction · Rules · Contact
· Grades from marked homeworks are on the server in file /grades/userid.txt
· Dates of project submission and oral exams:
Early: submit project May 24 9:00am, oral exams May 27 1:00pm (limit 5 students).
Otherwise submit project June 11, 9:00am, oral exams June 18 and 21 (estimated 9:00am-1:00pm, schedule will be published before exam).
Sign up for one the exam days in AIS before June 11.
Remedial exams will take place in the last week of the exam period. Beware, there will not be much time to prepare a better project. Projects should be submitted as homeworks to /submit/project.
· Cloud homework is due on May 20 9:00am.


Genomika: Informácie ku trackom

From MAD
Jump to navigation Jump to search

Informácie k predmetu Genomika

Na tejto stránke sú informácie k trackom ktoré budete vytvárať na browseri (obe skupiny). K niektorým trackom pridáme ďalšie informácie v nasledujúcich dňoch.

Comments to the task list

  • Task (A) is a prerequisite of all other tasks, the rest are mostly independent of each other.
  • Tasks are marked as fast (no significant computation required), medium (estimated computation up to 1 hour), slow (longer computation, possibly several hours).
    • These times are only estimates, reality may vary. Perhaps provide actual running times (approximate) in your documentation.
    • Fast tasks can be done entirely on genomika server.
    • Students having accounts on compbio research cluster may run medium and slow tasks there.
  • If you get stuck on one task, you can try to do at least initial stages of another one. Coordinate within group!
  • Document your work. Documentation should be independent of this page and of the documentation created last year - copy and modify relevant passages, cite sources.

Basic information on creating tracks

(A) Genome (fast)

(B) Protein coding genes and other items from the annotation (fast, needs A)

baseColorUseCds given
baseColorDefault genomicCodons

(C) RepeatMasker (slow, needs A)

(D) tRNAscan-SE (medium, needs A)

  • Run software for finding tRNA genes (for comparison with annotation)
  • Download software from http://lowelab.ucsc.edu/tRNAscan-SE/ (already installed on compbio servers as tRNAscan-SE command)
  • Convert output by script rna/tRNAscan-SEtoBED.py on github
  • trackDb.ra record:
track tRNAs
shortLabel tRNA Genes
longLabel Transfer RNA Genes Identified with tRNAscan-SE
group genes
visibility hide
color 0,20,150
type bed 12
nextItemButton on
priority 10

(E) Augustus (slow, needs A)

  • Run gene finder Augustus, create track with predicted genes (for comparison with annotation)
  • Download and install software from http://bioinf.uni-greifswald.de/augustus/
    • Already installed on compbio servers
  • Example of command line: augustus --uniqueGeneId=true --species=ustilago_maydis genome.fa > augustus.gtf
  • ustilago_maydis is a related fungal species used for training parameters
  • The result needs to be converted from gtf to genepred, by gtfToGenePred (at genomika server) with option -genePredExt
  • If you name your track augustus, genome browser will recognize it automatically, no need to modify trackDb.ra

(F) Self-alignment (medium/slow needs A)

(G) Chains between genomes (medium, needs A from both groups)

  • TODO: more info

(H) Protein-based chains between genomes (medium, needs A,B from both groups)

(I) Genomes for comparative genomics (fast, only one group)

  • Download genomes of additional Malassezia species (other than malGlo and malSym)
  • Use list here [3]
  • Rename chromosomes similarly as in A, name fasta files in a systematic way (malRes1.fa etc.)
  • Store files in a directory at genomika server

(J) Multiple whole-genome alignment (slow, needs A from both groups, I)

(K) Conservation by phyloP (medium, needs A,I,J)

(L) Conserved elements by phastCons (medium, needs A,I,J)

(M) Protein domain and other protein annotation from Uniprot (medium, needs A,B)

(N) Expression data from RNA-seq (medium/slow, needs A)

(O) Differences between strains (slow, needs A)