2-INF-185 Integrácia dátových zdrojov 2016/17

Materiály · Úvod · Pravidlá · Kontakt
HW10 a HW11 odovzdajte do utorka 30.5. 9:00.
Dátumy odovzdania projektov:
1. termín: nedeľa 11.6. 22:00
2. termín: streda 21.6. 22:00
Oba termíny sú riadne, prvý je určený pre študentov končiacich štúdium alebo tých, čo chcú mať predmet ukončený skôr. V oboch prípadoch sa pár dní po odvzdaní budú konať krátke osobné stretnutia s vyučujúcimi (diskusia k projektu a uzatvárane známky). Presné dni a časy dohodneme neskôr. Projekty odovzdajte podobne ako domáce úlohy do /submit/projekt


HW08

From IDZ
Jump to: navigation, search
  • Submit by copying requested files to /submit/hw08/username/

General goal: Build a simple website, which lists all crawled users and for each users has a page with simple statistics for given user.

This lesson requires crawled data from previous lesson, if you don't have one, you can find it at (and thank Baska): /tasks/hw08/db.sqlite3

Submit source code (web server and preprocessing scripts) and database files.

Task A

Create a simple flask web application which:

  • Has a homepage where is a list of all users (with links to their pages).
  • Has a page for each user, which has simple information about user: His nickname, number of posts and hist last 10 posts.

Task B

For each user preprocess and store list of his top 10 words and list of top 10 words typical for him (which he uses much more often than other users, come up with some simple heuristics). Show this information on his page.

Task C

Preprocess and store list of top three similar users for each user (try to come up with some simple definition of similarity based on text in posts). Again show this information on user page.

Bonus: Try to use some simple topic modeling (e.g. PCA as in TruncatedSVD from scikit-learn) and use it for finding similar users.