1-DAV-202 Data Management 2023/24
Previously 2-INF-185 Data Source Integration

Materials · Introduction · Rules · Contact
· Grades from marked homeworks are on the server in file /grades/userid.txt


Genomika: Informácie ku trackom

From MAD
Revision as of 10:16, 1 March 2018 by Brona (talk | contribs) (Created page with "Informácie k predmetu Genomika Na tejto stránke sú informácie k trackom ktoré budete vytvárať na browseri (obe skupiny). K niektorým trackom pridáme ďalšie inf...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Informácie k predmetu Genomika

Na tejto stránke sú informácie k trackom ktoré budete vytvárať na browseri (obe skupiny). K niektorým trackom pridáme ďalšie informácie v nasledujúcich dňoch.

Comments to the task list

  • Task (A) is a prerequisite of all other tasks, the rest are mostly independent of each other.
  • Tasks are marked as fast (no significant computation required), medium (estimated computation up to 1 hour), slow (longer computation, possibly several hours).
    • These times are only estimates, reality may vary. Perhaps provide actual running times (approximate) in your documentation.
    • Fast tasks can be done entirely on genomika server.
    • Students having accounts on compbio research cluster may run medium and slow tasks there.
  • If you get stuck on one task, you can try to do at least initial stages of another one. Coordinate within group!

Basic information on creating tracks

(A) Genome (fast)

(B) Protein coding genes and other items from the annotation (fast, needs A)

baseColorUseCds given
baseColorDefault genomicCodons

(C) RepeatMasker (slow, needs A)

(D) tRNAscan-SE (medium, needs A)

  • Run software for finding tRNA genes (for comparison with annotation)
  • Download software from http://lowelab.ucsc.edu/tRNAscan-SE/ (already installed on compbio servers as tRNAscan-SE command)
  • Convert output by script rna/tRNAscan-SEtoBED.py on github
  • trackDb.ra record:
track tRNAs
shortLabel tRNA Genes
longLabel Transfer RNA Genes Identified with tRNAscan-SE
group genes
visibility hide
color 0,20,150
type bed 12
nextItemButton on
priority 10

(E) Augustus (slow, needs A)

  • Run gene finder Augustus, create track with predicted genes (for comparison with annotation)
  • Download and install software from http://bioinf.uni-greifswald.de/augustus/
    • Already installed on compbio servers
  • Example of command line: augustus --uniqueGeneId=true --species=ustilago_maydis genome.fa > augustus.gtf
  • ustilago_maydis is a related fungal species used for training parameters
  • The result needs to be converted from gtf to genepred, by gtfToGenePred (at genomika server) with option -genePredExt
  • If you name your track augustus, genome browser will recognize it automatically, no need to modify trackDb.ra

(F) Self-alignment (medium/slow needs A)

(G) Chains between genomes (medium, needs A from both groups)

  • TODO: more info

(H) Protein-based chains between genomes (medium, needs A,B from both groups)

(I) Genomes for comparative genomics (fast, only one group)

  • Download genomes of additional Malassezia species (other than malGlo and malSym)
  • Use list here [3]
  • Rename chromosomes similarly as in A, name fasta files in a systematic way (malRes1.fa etc.)
  • Store files in a directory at genomika server

(J) Multiple whole-genome alignment (slow, needs A from both groups, I)

(K) Conservation by phyloP (medium, needs A,I,J)

(L) Conserved elements by phastCons (medium, needs A,I,J)

(M) Protein domain and other protein annotation from Uniprot (medium, needs A,B)

(N) Expression data from RNA-seq (medium/slow, needs A)

(O) Differences between strains (slow, needs A)