Difference between revisions of "HWbioinf2"

Revision as of 07:39, 4 April 2024

See also the lecture

Submit the protocol and the required files to /submit/bioinf2

Then sort the resulting SAM file using samtools, store it as a BAM file and create its index, similarly as in the previous homework.
In addition to the BAM file, we produced a file rnaseq-star.SJ.out.tab containing the position of detected introns (here called splice junctions or splices). Find in the manual of STAR description of this file.
There are also additional files with logs and statistics.

Examine the files to find out answers to the following questions (you can do it manually by looking at the the files, e.g. by less command):

(a) How many reads were in the FASTQ file? How many of them were successfully mapped?

(b) How many introns (splice junctions) were predicted? How many of them are supported by more than one read?

Finally, convert the file with splice junctions to BED format in which each line will be one intron. It should have the following columns:

Sequence name (as in the SJ.out.tab file)
Start (beware, it should be 1 less than the number in the SJ.out.tab file because of 1-based vs. 0-based coordinates)
End (as in the SJ.out.tab file)
Name (create some identifier, e.g. numbering the junctions sequentially)
Score (use the number of supporting reads as score)
Strand (+, -, or .

For conversion, you can write a short script in your favorite language or use a one-liner. The result should be names rnaseq-star.bed.

Write your answers to the protocol. Submit the files rnaseq-star.bam and rnaseq-star.bed.

Task C: Visualizing in IGV

As before, run IGV as follows:

igv -g ref.fasta &

Open additional files using menu File -> Load from File: annot.gff, augustus-anidulans.gtf, augustus-human.gtf, rnaseq.bam

Exons are shown as thicker boxes, introns are thinner.
For each of the following questions, select a part of the sequence illustrating the answer and export a figure using File->Save image
You can check these images using command eog

Questions:

(a) Create an image illustrating differences between Augustus with human parameters and the reference annotation, save as a.png. Briefly describe the differences in words.

(b) Find some differences between Augustus with A. nidulans parameters and the reference annotation. Store an illustrative figure as b.png. Which parameters have yielded a more accurate prediction?

(c) Zoom in to one of the genes with a high expression level and try to find spliced read alignments supporting the annotated intron boundaries. Store the image as c.png.

Submit files a.png, b.png, c.png. Write answers to your protocol.

@@ Line 50: / Line 50: @@
 <syntaxhighlight lang="bash">
 STAR --runMode genomeGenerate --genomeDir ref-index --genomeFastaFiles ref.fasta  --genomeSAindexNbases 6
-STAR --genomeDir ref-index --alignIntronMax 10000 --readFilesIn rnaseq.fastq  --outFileNamePrefix rnaseq e
+STAR --genomeDir ref-index --alignIntronMax 10000 --readFilesIn rnaseq.fastq  --outFileNamePrefix rnaseq-star.
 </syntaxhighlight>
 * Then sort the resulting SAM file using samtools, store it as a BAM file and create its index, similarly as in the [[HWbioinf1|previous homework]].
-* In addition to the BAM file, we produced a file containing the position of detected introns. Examine the files to find out answers to the following questions (you can do it manually by looking at the the files, e.g. by <tt>less</tt> command):
+* In addition to the BAM file, we produced a file <tt>rnaseq-star.SJ.out.tab</tt> containing the position of detected introns (here called splice junctions or splices). Find in the [https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf manual of STAR] description of this file.
+* There are also additional files with logs and statistics.
+Examine the files to find out answers to the following questions (you can do it manually by looking at the the files, e.g. by <tt>less</tt> command):
 (a) How many reads were in the FASTQ file? How many of them were successfully mapped?
-(b) How many introns ("junctions") were predicted?
+(b) How many introns (splice junctions) were predicted? How many of them are supported by more than one read?
-(c) During the mapping, we used a few custom options. Inspect the [https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf manual of STAR] and describe shortly what those options mean.
-<!-- How many of them are supported by more than one read? (The 5th column of the corresponding file is the number of reads supporting a junction.) -->
+Finally, convert the file with splice junctions to BED format in which each line will be one intron. It should have the following columns:
+* Sequence name (as in the SJ.out.tab file)
+* Start (beware, it should be 1 less than the number in the SJ.out.tab file because of 1-based vs. 0-based coordinates)
+* End (as in the SJ.out.tab file)
+* Name (create some identifier, e.g. numbering the junctions sequentially)
+* Score (use the number of supporting reads as score)
+* Strand (+, -, or .
+For conversion, you can write a short script in your favorite language or use a one-liner. The result should be names <tt>rnaseq-star.bed</tt>.
-<!-- NOTEX -->
+Write your answers to the '''protocol'''. '''Submit''' the files <tt>rnaseq-star.bam</tt> and <tt>rnaseq-star.bed</tt>.
-Write answers to the '''protocol'''. '''Submit''' the file <tt>rnaseq.bam</tt>.
-<!-- /NOTEX -->
 ===Task C: Visualizing in IGV===

Difference between revisions of "HWbioinf2"

Revision as of 07:39, 4 April 2024

Contents

Input files

Task A: Gene finding

Task B: Aligning RNA-seq reads

Task C: Visualizing in IGV

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools