1-DAV-202 Data Management 2023/24
Previously 2-INF-185 Data Source Integration

Materials · Introduction · Rules · Contact
· Grades from marked homeworks are on the server in file /grades/userid.txt
· Dates of project submission and oral exams:
Early: submit project May 24 9:00am, oral exams May 27 1:00pm (limit 5 students).
Otherwise submit project June 11, 9:00am, oral exams June 18 and 21 (estimated 9:00am-1:00pm, schedule will be published before exam).
Sign up for one the exam days in AIS before June 11.
Remedial exams will take place in the last week of the exam period. Beware, there will not be much time to prepare a better project. Projects should be submitted as homeworks to /submit/project.
· Cloud homework is due on May 20 9:00am.


Difference between revisions of "HWflask"

From MAD
Jump to navigation Jump to search
(Created page with "<!-- NOTEX --> See the lecture <!-- /NOTEX --> '''General goal:''' Build a simple website, which lists all crawled users and for each users has a page with simple...")
 
Line 12: Line 12:
  
 
<!-- NOTEX -->
 
<!-- NOTEX -->
This lesson requires crawled data from previous lesson, if you don't have one, you can find it at (and thank Baska): <tt>/tasks/flask/db.sqlite3</tt>
+
This lesson requires crawled data from previous lesson, if you don't have one, you can find it at <tt>/tasks/flask/db.sqlite3</tt>
 
<!-- /NOTEX -->
 
<!-- /NOTEX -->
  
Line 30: Line 30:
 
Hint: To get the most frequently used words for each user, you can use  
 
Hint: To get the most frequently used words for each user, you can use  
 
[http://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html#numpy.argsort argsort from NumPy.]
 
[http://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html#numpy.argsort argsort from NumPy.]
 
  
 
===Task C===
 
===Task C===

Revision as of 11:23, 25 March 2020

See the lecture

General goal: Build a simple website, which lists all crawled users and for each users has a page with simple statistics regarding the posts of this user.


Submit your source code (web server and preprocessing scripts) and database files. Copy these files to /submit/flask/username/

This lesson requires crawled data from previous lesson, if you don't have one, you can find it at /tasks/flask/db.sqlite3

Task A

Create a simple Flask web application which:

  • Has a homepage with is a list of all users (with links to their pages).
  • Has a page for each user with basic information: the nickname, the number of posts and the last 10 posts of this user.

Task B

For each user preprocess and store

  • the list 10 most frequently used words
  • the list of top 10 words typical for this user (words which this user uses much more often than other users, come up with some simple heuristics for measuring this)

Show this information on the page of each user.

Hint: To get the most frequently used words for each user, you can use argsort from NumPy.

Task C

Preprocess and store the list of top three similar users for each user (try to come up with some simple definition of similarity based on the text in the posts). Again show this information on the user page.

Bonus: Try to use some simple topic modeling (e.g. PCA as in TruncatedSVD from scikit-learn) and use it for finding similar users.