Difference between revisions of "Lbioinf3"

Revision as of 10:05, 18 March 2021

HWbioinf3

Polymorphisms

Individuals within species differ slightly in their genomes
Polymorphisms are genome variants which are relatively frequent in a population (e.g. at least 1%)
SNP: single-nucleotide polymorphism (a polymorphism which is a substitution of a single nucleotide)
Recall that most human cells are diploid, with one set of chromosomes inherited from the mother and the other from the father
At a particular location, a single human can thus have two different alleles (heterozygosity) or two copies of the same allele (homozygosity)

Finding polymorphisms / genome variants

We compare sequencing reads coming from an individual to a reference genome of the species
First we align them, as in the exercises on genome assembly
Then we look for positions where a substantial fraction of reads does not agree with the reference (this process is called variant calling)

Programs and file formats

For mapping, we will use BWA-MEM (you can also try Minimap2, as in the exercises on genome assembly)
For variant calling, we will use FreeBayes
For reads and read alignments, we will use FASTQ and BAM files, as in the previous lectures
For storing found variants, we will use VCF files
For storing genome intervals, we will use BED files

Human variants

For many human SNPs we already know something about their influence on phenotype and their prevalence in different parts of the world
There are various databases, e.g. dbSNP, OMIM, or user-editable SNPedia

UCSC genome browser

A short video for this section: [1]

On-line tool similar to IGV
http://genome-euro.ucsc.edu/
Nice interface for browsing genomes, lot of data for some genomes (particularly human), but not all sequenced genomes represented

Basics

On the front page, choose Genomes in the top blue menu bar
Select a genome and its version, optionally enter a position or a keyword, press submit
On the browser screen, the top image shows chromosome map, the selected region is in red
Below there is a view of the selected region and various tracks with information about this region
For example some of the top tracks display genes (boxes are exons, lines are introns)
Tracks can be switched on and off and configured in the bottom part of the page (browser supports different display levels, full contains all information but takes a lot of vertical space)
Buttons for navigation are at the top (move, zoom, etc.)
Clicking at the browser figure allows you to get more information about a gene or other displayed item
In this lecture, we will need tracks GENCODE and dbSNP - check e.g. gene ACTN3 and within it SNP rs1815739 in exon 15

Blat

For sequence alignments, UCSC genome browser offers a fast but less sensitive BLAT (good for the same or very closely related species)
Choose Tools->Blat in the top blue menu bar, enter DNA sequence below, search in the human genome
- What is the identity level for the top found match? What is its span in the genome? (Notice that other matches are much shorter)
- Using Details link in the left column you can see the alignment itself, Browser link takes you to the browser at the matching region

AACCATGGGTATATACGACTCACTATAGGGGGATATCAGCTGGGATGGCAAATAATGATTTTATTTTGAC
TGATAGTGACCTGTTCGTTGCAACAAATTGATAAGCAATGCTTTCTTATAATGCCAACTTTGTACAAGAA
AGTTGGGCAGGTGTGTTTTTTGTCCTTCAGGTAGCCGAAGAGCATCTCCAGGCCCCCCTCCACCAGCTCC
GGCAGAGGCTTGGATAAAGGGTTGTGGGAAATGTGGAGCCCTTTGTCCATGGGATTCCAGGCGATCCTCA
CCAGTCTACACAGCAGGTGGAGTTCGCTCGGGAGGGTCTGGATGTCATTGTTGTTGAGGTTCAGCAGCTC
CAGGCTGGTGACCAGGCAAAGCGACCTCGGGAAGGAGTGGATGTTGTTGCCCTCTGCGATGAAGATCTGC
AGGCTGGCCAGGTGCTGGATGCTCTCAGCGATGTTTTCCAGGCGATTCGAGCCCACGTGCAAGAAAATCA
GTTCCTTCAGGGAGAACACACACATGGGGATGTGCGCGAAGAAGTTGTTGCTGAGGTTTAGCTTCCTCAG
TCTAGAGAGGTCGGCGAAGCATGCAGGGAGCTGGGACAGGCAGTTGTGCGACAAGCTCAGGACCTCCAGC
TTTCGGCACAAGCTCAGCTCGGCCGGCACCTCTGTCAGGCAGTTCATGTTGACAAACAGGACCTTGAGGC
ACTGTAGGAGGCTCACTTCTCTGGGCAGGCTCTTCAGGCGGTTCCCGCACAAGTTCAGGACCACGATCCG
GGTCAGTTTCCCCACCTCGGGGAGGGAGAACCCCGGAGCTGGTTGTGAGACAAATTGAGTTTCTGGACCC
CCGAAAAGCCCCCACAAAAAGCCG

@@ Line 6: / Line 6: @@
 * Individuals within species differ slightly in their genomes
 * Polymorphisms are genome variants which are relatively frequent in a population (e.g. at least 1%)
-* [https://ghr.nlm.nih.gov/primer/genomicresearch/snp SNP]: single-nucleotide polymorphism (a polymorphism which is a substitution of a single nucletide)
+* [https://ghr.nlm.nih.gov/primer/genomicresearch/snp SNP]: single-nucleotide polymorphism (a polymorphism which is a substitution of a single nucleotide)
 * Recall that most human cells are diploid, with one set of chromosomes inherited from the mother and the other from the father
 * At a particular location, a single human can thus have two different alleles (heterozygosity) or two copies of the same allele (homozygosity)
@@ Line 17: / Line 17: @@
 ==Programs and file formats==
 * For mapping, we will use <tt>[https://github.com/lh3/bwa BWA-MEM]</tt> (you can also try Minimap2, as in [[HWbioinf1|the exercises on genome assembly]])
-* For variant calling, we will use [https://github.com/ekg/freebayes Freebayes]
+* For variant calling, we will use [https://github.com/ekg/freebayes FreeBayes]
 * For reads and read alignments, we will use FASTQ and BAM files, as in the [[Lbioinf1|previous lectures]]
 * For storing found variants, we will use [http://www.internationalgenome.org/wiki/Analysis/vcf4.0/ VCF files]

Difference between revisions of "Lbioinf3"

Revision as of 10:05, 18 March 2021

Contents

Polymorphisms

Finding polymorphisms / genome variants

Programs and file formats

Human variants

UCSC genome browser

Basics

Blat

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools