1-DAV-202 Data Management 2024/25

Materials · Introduction · Rules · Contact
· Please fill in the following survey


Lr1

From MAD
Revision as of 20:52, 14 April 2020 by Brona (talk | contribs) (Created page with "<!-- NOTEX --> HWr1 <!-- /NOTEX --> Program for this lecture: basics of R (applied to biology examples) * very short intro as a lecture * exercises have the form of a tut...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

HWr1

Program for this lecture: basics of R (applied to biology examples)

  • very short intro as a lecture
  • exercises have the form of a tutorial: read a bit of text, try some commands, extend/modify them as requested in individual tasks

In this course we cover several languages popular for scripting and data processing: Perl, Python, R.

  • Their capabilities overlap, many extensions emulate strengths of one in another.
  • Choose a language based on your preference, level of knowledge, existing code for the task, the rest of the team.
  • Quickly learn a new language if needed.
  • Also possibly combine, e.g. preprocess data in Perl or Python, then run statistical analyses in R, automate entire pipeline with bash or make.

Introduction

  • R is an open-source system for statistical computing and data visualization
  • Programming language, command-line interface
  • Many built-in functions, additional libraries
  • We will concentrate on useful commands rather than language features

Working in R

Option 1: Run command R, type commands in a command-line interface

  • It supports history of commands (arrows, up and down, Ctrl-R) and completing command names with the tab key

Option 2: Write a script to a file, run it from the command-line as follows:
R --vanilla --slave < file.R

Option 3: Use rstudio command to open a graphical IDE

  • Sub-windows with editor of R scripts, console, variables, plots
  • Ctrl-Enter in editor executes the current command in console
  • You can also install RStudio on your home computer and work there

In R, you can create plots. In command-line interface these open as a separate window, in Rstudio they open in one of the sub-windows.

x=c(1:10)
plot(x,x*x)

Suggested workflow

  • work interactively in Rstudio or on command line, try various options
  • select useful commands, store in a script
  • run script automatically on new data/new versions, potentially as a part of a bigger pipeline

Additional information

Gene expression data

  • Gene expression: DNA -> mRNA -> protein
  • Level of gene expression: Extract mRNA from cells, measure amounts of mRNA
  • Technologies: microarray, RNA-seq

Gene expression data

  • Rows: genes
  • Columns: experiments (e.g. different conditions or different individuals)
  • Each value is the expression of a gene, i.e. the relative amount of mRNA for this gene in the sample

We will use microarray data for yeast:

  • Strassburg, Katrin, et al. ["Dynamic transcriptional and metabolic responses in yeast adapting to temperature stress." Omics: a journal of integrative biology 14.3 (2010): 249-259.
  • Downloaded from the GEO database
  • Data already preprocessed: normalization, logarithmic scale, etc
  • We have selected only cold conditions, genes with absolute change at least 1
  • Data: 2738 genes, 8 experiments in a time series, yeast moved from normal temperature 28 degrees C to cold conditions 10 degrees C, samples taken after 0min, 15min, 30min, 1h, 2h, 4h, 8h, 24h in cold