2-INF-185 Integrácia dátových zdrojov 2017/18

Materiály · Úvod · Pravidlá · Kontakt
Body z HW01 a HW04 nájdete na serveri v /grades/userid.txt


From IDZ
Jump to: navigation, search

Lecture 6

  • Submit by copying requested files to /submit/hw06/username/

General goal: Scrape comments from several (hundreds) sme.sk users from last month and store them in SQLite3 database.

Task A

Create SQLite3 "database" with appropriate schema for storing comments from SME.sk discussions. You will probably need tables for users and comments. You don't need to store which comments replies to which one.

Submit two files:

  • db.sqlite3 - the database
  • schema.txt - brief description of your schema and rationale behind it

Task B

Build a crawler, which crawls comments in sme.sk discussions. You have two options:

  • For fewer points: Script which gets url of the user (http://ekonomika.sme.sk/diskusie/user_profile.php?id_user=157432) and crawls his comments from last month.
  • For more points: Scripts which gets one starting url (either user profile or some discussion, your choice) and automatically discovers users and crawls their comments.

This crawler should store comments in SQLite3 database built in previous task. Submit following:

  • db.sqlite3 - the database
  • every python script used for crawling
  • README (how to start your crawler)