1-DAV-202 Data Management 2023/24
Previously 2-INF-185 Data Source Integration

Materials · Introduction · Rules · Contact
· Grades from marked homeworks are on the server in file /grades/userid.txt
· Dates of project submission and oral exams:
Early: submit project May 24 9:00am, oral exams May 27 1:00pm (limit 5 students).
Otherwise submit project June 11, 9:00am, oral exams June 18 and 21 (estimated 9:00am-1:00pm, schedule will be published before exam).
Sign up for one the exam days in AIS before June 11.
Remedial exams will take place in the last week of the exam period. Beware, there will not be much time to prepare a better project. Projects should be submitted as homeworks to /submit/project.
· Cloud homework is due on May 20 9:00am.


HWweb

From MAD
Revision as of 18:20, 12 March 2020 by Brona (talk | contribs) (Created page with "<!-- NOTEX --> See the lecture <!-- /NOTEX --> <!-- NOTEX --> Submit by copying requested files to <tt>/submit/web/username/</tt> <!-- /NOTEX --> '''General goal:''...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

See the lecture

Submit by copying requested files to /submit/web/username/

General goal: Scrape comments from user discussions at the sme.sk website. Store comments from several (hundreds) users from the last month in an SQLite3 database.

Task A

Create SQLite3 "database" with appropriate schema for storing comments from SME.sk discussions. You will probably need tables for users and comments. You don't need to store which comment replies to which one but store the date and time when the comment was made.

Submit two files:

  • db.sqlite3 - the database
  • schema.txt - a brief description of your schema and rationale behind it


Task B

Build a crawler, which crawls comments in sme.sk discussions. You have two options:

  • For fewer points: Script which gets URL of a user (e.g. http://ekonomika.sme.sk/diskusie/user_profile.php?id_user=157432) and crawls his comments from the last month.
  • For more points: Scripts which gets one starting URL (either user profile or some discussion, your choice) and automatically discovers users and crawls their comments.

This crawler should store the comments in SQLite3 database built in the previous task.

Submit the following:

  • db.sqlite3 - the database
  • every python script used for crawling
  • README (how to start your crawler)