Difference between revisions of "HWweb"

Latest revision as of 10:14, 29 March 2024

Submit by copying requested files to /submit/web/username/

General goal: Scrape comments from user discussions at the pravda.sk website. Store comments from several (hundreds) users from the last month in an SQLite3 database.

Task A

Create SQLite3 "database" with appropriate schema for storing comments from pravda.sk discussions. For each comment you should be able to tell its content, date, time and who wrote it. You don't need to store which comment replies to which one. For each user you should be able to retrieve her/his comments and also her/his name. You will probably need tables for users and comments.

Submit two files:

db.sqlite3 - the database
schema.txt - a brief description of your schema and rationale behind it

Task B

Build a crawler, which crawls comments in pravda.sk discussions. You have two options:

For fewer points: Script which gets URL of a user (e.g. https://debata.pravda.sk/profil/debata/anysta/) and crawls the comments of this user from the last month.
For more points: Scripts which gets one starting URL (either user profile or some discussion, your choice) and automatically discovers users and crawls their comments.

This crawler should store the comments in SQLite3 database built in the previous task.

Submit the following:

db.sqlite3 - the database
every python script used for crawling
README (how to start your crawler)

@@ Line 7: / Line 7: @@
 <!-- /NOTEX -->
-'''General goal:''' Scrape comments from user discussions at the <tt>sme.sk</tt> website. Store comments  from several (hundreds) users from the last month in an SQLite3 database.
+'''General goal:''' Scrape comments from user discussions at the <tt>pravda.sk</tt> website. Store comments  from several (hundreds) users from the last month in an SQLite3 database.
 ===Task A===
-Create SQLite3 "database" with appropriate schema for storing comments from SME.sk discussions.
+Create SQLite3 "database" with appropriate schema for storing comments from pravda.sk discussions.
-You will probably need tables for users and comments. You don't need to store which comment replies to which one but store the date and time when the comment was made.
+For each comment you should be able to tell its content, date, time and who wrote it. You don't need to store which comment replies to which one.
+For each user you should be able to retrieve her/his comments and also her/his name.
+You will probably need tables for users and comments.
 <!-- NOTEX -->
@@ Line 23: / Line 25: @@
 ===Task B===
-Build a crawler, which crawls comments in sme.sk discussions.
+Build a crawler, which crawls comments in pravda.sk discussions.
 You have two options:
-* For fewer points: Script which gets URL of a user (e.g. http://ekonomika.sme.sk/diskusie/user_profile.php?id_user=157432) and crawls his comments from the last month.
+* For fewer points: Script which gets URL of a user (e.g. https://debata.pravda.sk/profil/debata/anysta/) and crawls the comments of this user from the last month.
 * For more points: Scripts which gets one starting URL (either user profile or some discussion, your choice) and automatically discovers users and crawls their comments.

Difference between revisions of "HWweb"

Latest revision as of 10:14, 29 March 2024

Task A

Task B

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools