1-DAV-202 Data Management 2023/24
Previously 2-INF-185 Data Source Integration

Materials · Introduction · Rules · Contact
· Grades from marked homeworks are on the server in file /grades/userid.txt


Difference between revisions of "HWcloud"

From MAD
Jump to navigation Jump to search
Line 5: Line 5:
 
The code is expected to use the MRJob framework presented in the lecture. Submit directory is <tt>/submit/cloud/</tt>
 
The code is expected to use the MRJob framework presented in the lecture. Submit directory is <tt>/submit/cloud/</tt>
 
<!-- /NOTEX -->
 
<!-- /NOTEX -->
 
  
 
===Task A===
 
===Task A===

Revision as of 15:36, 15 April 2021

See also the lecture

For both tasks, submit your source code and the result, when run on whole dataset (s3://idzbucket2). The code is expected to use the MRJob framework presented in the lecture. Submit directory is /submit/cloud/

Task A

Count the number of occurrences of each 4-mer in the provided data.

Task B

Count the number of pairs of reads which overlap in exactly 30 bases (end of one read overlaps beginning of the second read). You can ignore reverse complement.

Hints: