2-INF-185 Integrácia dátových zdrojov 2017/18

Body z HW01 a HW04 nájdete na serveri v /grades/userid.txt


Lecture 7

  • Submit by copying requested files to /submit/hw07/username/

General goal: Build a simple website, which lists all crawled users and for each users has a page with simple statistics for given user.

This lesson requires crawled data from previous lesson, if you don't have one, you can find it at (and thank Baska): /tasks/hw07/db.sqlite3

Submit source code (web server and preprocessing scripts) and database files.

Task A

Create a simple flask web application which:

  • Has a homepage where is a list of all users (with links to their pages).
  • Has a page for each user, which has simple information about user: His nickname, number of posts and hist last 10 posts.

Task B

For each user preprocess and store list of his top 10 words and list of top 10 words typical for him (which he uses much more often than other users, come up with some simple heuristics). Show this information on his page.

Task C

Preprocess and store list of top three similar users for each user (try to come up with some simple definition of similarity based on text in posts). Again show this information on user page.

Bonus: Try to use some simple topic modeling (e.g. PCA as in TruncatedSVD from scikit-learn) and use it for finding similar users.