1-DAV-202 Data Management 2024/25

Materials · Introduction · Rules · Contact
· Grades from marked homeworks are on the server in file /grades/userid.txt


HWcloud

From MAD
Revision as of 21:16, 29 April 2020 by Brona (talk | contribs) (Created page with "<!-- NOTEX --> See also the lecture For both tasks, submit your source code and the result, when run on whole dataset (<tt>s3://idzbucket2</tt>). The code is expec...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

See also the lecture

For both tasks, submit your source code and the result, when run on whole dataset (s3://idzbucket2). The code is expected to use the MRJob framework presented in the lecture.


Task A

Count the number of occurrences of each 4-mer in the provided data.

Task B

Count the number of pairs of reads which overlap in exactly 30 bases (end of one read overlaps beginning of the second read). You can ignore reverse complement.

Hints: