Difference between revisions of "Lr1"

Revision as of 21:18, 14 April 2020

HWr1

Program for this lecture: basics of R (applied to biology examples)

very short intro as a lecture
exercises have the form of a tutorial: read a bit of text, try some commands, extend/modify them as requested in individual tasks

In this course we cover several languages popular for scripting and data processing: Perl, Python, R.

Their capabilities overlap, many extensions emulate strengths of one in another.
Choose a language based on your preference, level of knowledge, existing code for the task, the rest of the team.
Quickly learn a new language if needed.
Also possibly combine, e.g. preprocess data in Perl or Python, then run statistical analyses in R, automate entire pipeline with bash or make.

Introduction

R is an open-source system for statistical computing and data visualization
Programming language, command-line interface
Many built-in functions, additional libraries
- For example Bioconductor for bioinformatics
We will concentrate on useful commands rather than language features

Working in R

Option 1: Run command R, type commands in a command-line interface

It supports history of commands (arrows, up and down, Ctrl-R) and completing command names with the tab key

Option 2: Write a script to a file, run it from the command-line as follows:
R --vanilla --slave < file.R

Option 3: Use rstudio command to open a graphical IDE

Sub-windows with editor of R scripts, console, variables, plots
Ctrl-Enter in editor executes the current command in console
You can also install RStudio on your home computer and work there

In R, you can create plots. In command-line interface these open as a separate window, in Rstudio they open in one of the sub-windows.

x=c(1:10)
plot(x,x*x)

Suggested workflow

work interactively in Rstudio or on command line, try various options
select useful commands, store in a script
run script automatically on new data/new versions, potentially as a part of a bigger pipeline

Additional information

Official tutorial
Seefeld, Linder: Statistics Using R with Biological Examples (pdf book)
Patrick Burns: The R Inferno (intricacies of the language)
Other books
Built-in help: ? plot displays help for plot command

Gene expression data

Gene expression: DNA -> mRNA -> protein
Level of gene expression: Extract mRNA from cells, measure amounts of mRNA
Technologies: microarray, RNA-seq

Gene expression data

Rows: genes
Columns: experiments (e.g. different conditions or different individuals)
Each value is the expression of a gene, i.e. the relative amount of mRNA for this gene in the sample

We will use microarray data for yeast:

Abbott, Derek A., et al. "Generic and specific transcriptional responses to different weak organic acids in anaerobic chemostat cultures of Saccharomyces cerevisiae." FEMS yeast research 7.6 (2007): 819-833.
Downloaded from the GEO database
Data already preprocessed: normalization, etc, we will apply logarithmic scale
Data: 6398 genes, 15 experiments: 5 conditions, 3 replicate experiments for each condition
- The first 3 experiments are control, that is, yeast grown in a usual medium
- In each of the remaining experiments a weak solution of an acid was added to the growing medium to observe how this influences the yeast
- We have 3 replicates from 4 different acids
- Columns 1,2,3 are control, columns 4,5,6 acetic acid, 7,8,9 benzoate acid, 10,11,12 propionate acid, and 13,14,15 sorbate acid

Read the microarray data, transform it to log scale, then work with table a:

input=read.table("/tasks/r1/acids.tsv", header=TRUE, row.names=1)
a = log(input)

@@ Line 61: / Line 61: @@
 We will use microarray data for yeast:
-* Strassburg, Katrin, et al. [[http://online.liebertpub.com/doi/full/10.1089/omi.2009.0107 "Dynamic transcriptional and metabolic responses in yeast adapting to temperature stress."] Omics: a journal of integrative biology 14.3 (2010): 249-259.
+* Abbott, Derek A., et al. "[https://academic.oup.com/femsyr/article/7/6/819/533265 Generic and specific transcriptional responses to different weak organic acids in anaerobic chemostat cultures of Saccharomyces cerevisiae.]" FEMS yeast research 7.6 (2007): 819-833.
-* Downloaded from the [http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15352 GEO database]
+* Downloaded from the [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5926 GEO database]
-* Data already preprocessed: normalization, logarithmic scale, etc
+* Data already preprocessed: normalization, etc, we will apply logarithmic scale
-* We have selected only cold conditions, genes with absolute change at least 1
+* Data: 6398 genes, 15 experiments: 5 conditions, 3 replicate experiments for each condition
-* Data: 2738 genes, 8 experiments in a time series, yeast moved from normal temperature 28 degrees C to cold conditions 10 degrees C, samples taken after 0min, 15min, 30min, 1h, 2h, 4h, 8h, 24h in cold
+** The first 3 experiments are control, that is, yeast grown in a usual medium
+** In each of the remaining experiments a weak solution of an acid was added to the growing medium to observe how this influences the yeast
+** We have 3 replicates from 4 different acids
+** Columns 1,2,3 are control, columns 4,5,6 acetic acid, 7,8,9 benzoate acid, 10,11,12 propionate acid, and 13,14,15 sorbate acid
+Read the microarray data, transform it to log scale, then work with table ''a'':
+<syntaxhighlight lang="r">
+input=read.table("/tasks/r1/acids.tsv", header=TRUE, row.names=1)
+a = log(input)
+</syntaxhighlight>

Difference between revisions of "Lr1"

Revision as of 21:18, 14 April 2020

Contents

Introduction

Working in R

Additional information

Gene expression data

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools