2-INF-185 Integrácia dátových zdrojov 2016/17

Materiály · Úvod · Pravidlá · Kontakt
HW10 a HW11 odovzdajte do utorka 30.5. 9:00.
Dátumy odovzdania projektov:
1. termín: nedeľa 11.6. 22:00
2. termín: streda 21.6. 22:00
Oba termíny sú riadne, prvý je určený pre študentov končiacich štúdium alebo tých, čo chcú mať predmet ukončený skôr. V oboch prípadoch sa pár dní po odvzdaní budú konať krátke osobné stretnutia s vyučujúcimi (diskusia k projektu a uzatvárane známky). Presné dni a časy dohodneme neskôr. Projekty odovzdajte podobne ako domáce úlohy do /submit/projekt


From IDZ
Jump to: navigation, search


Program for today: basics of R (applied to biology examples)

  • very short intro as a lecture
  • tutorial as HW: read a bit of text, try some commands, extend/modify them as requested

In this course we cover several languages popular for scripting in bioinformatics: Perl, Python, R

  • their capabilities overlap, many extensions emulate strengths of one in another
  • choose a language based on your preference, level of knowledge, existing code for the task, rest of the team
  • quickly learn a new language if needed
  • also possibly combine, e.g. preprocess data in Perl or Python, then run statistical analyses in R, automate entire pipeline with bash or make


  • R is an open-source system for statistical computing and data visualization
  • Programming language, command-line interface
  • Many built-in functions, additional libraries
  • We will concentrate on useful commands rather than language features

Working in R

  • Run command R, type commands in command-line interface
    • supports history of commands (arrows, up and down, Ctrl-R) and completing command names with tab key
> 1+2
[1] 3
  • Write a script to file, run it from command-line: R --vanilla --slave < file.R
  • Use rstudio to open a graphics IDE [1]
    • Windows with editor of R scripts, console, variables, plots
    • Ctrl-Enter in editor executes current command in console
  • ? plot displays help for plot command

Suggested workflow

  • work interactively in Rstudio or on command line, try various options
  • select useful commands, store in a script
  • run script automatically on new data/new versions, potentially as a part of a bigger pipeline

Additional information

Gene expression data

  • Gene expression: DNA->mRNA->protein
  • Level of gene expression: Extract mRNA from a cell, measure amounts of mRNA
  • Technologies: microarray, RNA-seq

Gene expression data

  • Rows: genes
  • Columns: experiments (e.g. different conditions or different individuals)
  • Each value is expression of a gene, i.e. relative amount of mRNA for this gene in the sample

We will use microarray data for yeast:

  • Strassburg, Katrin, et al. "Dynamic transcriptional and metabolic responses in yeast adapting to temperature stress." Omics: a journal of integrative biology 14.3 (2010): 249-259. [2]
  • Downloaded from GEO database [3]
  • Data already preprocessed: normalization, log2, etc
  • We have selected only cold conditions, genes with absolute change at least 1
  • Data: 2738 genes, 8 experiments in a time series, yeast moved from normal temperature 28 degrees C to cold conditions 10 degrees C, samples taken after 0min, 15min, 30min, 1h, 2h, 4h, 8h, 24h in cold