1-DAV-202 Data Management 2023/24
Previously 2-INF-185 Data Source Integration

Materials · Introduction · Rules · Contact
· Grades from marked homeworks are on the server in file /grades/userid.txt
· Dates of project submission and oral exams:
Early: submit project May 24 9:00am, oral exams May 27 1:00pm (limit 5 students).
Otherwise submit project June 11, 9:00am, oral exams June 18 and 21 (estimated 9:00am-1:00pm, schedule will be published before exam).
Sign up for one the exam days in AIS before June 11.
Remedial exams will take place in the last week of the exam period. Beware, there will not be much time to prepare a better project. Projects should be submitted as homeworks to /submit/project.
· Cloud homework is due on May 20 9:00am.


Lflask

From MAD
Revision as of 11:08, 25 March 2020 by Brona (talk | contribs) (→‎Lflask)
Jump to navigation Jump to search

HWflask

In this lecture, we will use Python to process user comments obtained in the previous lecture.

  • We will display information about individual users as a dynamic website written in Flask framework
  • We will use simple text processing utilities from ScikitLearn library to extract word use statistics from the comments

Flask

Flask is a simple web server for Python. Using Flask you can write a simple dynamic website in Python.


Running Flask

You can find a sample Flask application at /tasks/flask/simple_flask. Run it using these commands:

cd <your directory>
export FLASK_APP=main.py
export FLASK_ENV=development # this is optional, but recommended for debugging

# before running the following, change the port number
# so that no two users use the same number
flask run --port=4247

Flask starts a webserver and serves the pages created in your Flask application,. Keep it running while you need to access these pages.

To view these pages, open a web browser on the same computer where the Flask is running, e.g. chromium-browser http://localhost:4247/ (change the port number to the one used in Flask). If you want to run graphical applications, such as Chromium, from our server, do not forget -XC switch in ssh.

You can also run Flask on a server but the browser at your own computer. Just replace localhost in the URL with the server address. However, firewalls may prevent you from accessing strange ports. You can get around by creating a proxy:

  • On your local machine create a SOCKS proxy server on port 8000 using command
    ssh username@server_addess -D 8000
  • keep this SSH session open while working
  • Set the HTTP proxy in your browser to localhost with port 8000. Then all web traffic goes through server via the SSH tunnel.


Structure of a Flask application

  • The provided Flask application resides in the main.py script.
  • Some functions in this script are annotated with decorators starting with @app.
  • Decorator @app.before_request marks a function which will be executed before processing a particular request from a web browser. In this case we open a database connection and store it in a special variable g which can be used to store variables for a particular request.
  • Decorator @app.route('/') marks a function which will serve the main page of the application with URL http://localhost:4247/. Similarly decorator @app.route('/wat/<random_id>/') marks a function which will serve URLs of the form http://localhost:4247/wat/100 where the particular string which the user uses in the URL (here 100) will be stored in random_id variable accessible within the function.
  • Functions serving a request return a string containing the requested webpage (typically a HTML document). For example, function wat returns a simple string without any HTML markup.
  • To more easily construct a full HTML document, you can use jinja2 templating language, as is done in the home function. The template itself is in file templates/main.html.


Processing text

The main tool we will use for processing text is CountVectorizer class from the Scikit-learn library. It transforms a text into a bag of words representation. In this representation we get the list of word and the count for each word. Example:

from sklearn.feature_extraction.text import CountVectorizer

vec = CountVectorizer(strip_accents='unicode')

texts = [
 "Ema ma mamu.",
 "Zirafa sa vo vani kupe a hneva sa."
]

t = vec.fit_transform(texts).todense()

print(t)
# prints:
# [[1 0 0 1 1 0 0 0 0]
#  [0 1 1 0 0 2 1 1 1]]

print(vec.vocabulary_)
# prints:
# {'vani': 6, 'ema': 0, 'kupe': 2, 'mamu': 4, 
# 'hneva': 1, 'sa': 5, 'ma': 3, 'vo': 7, 'zirafa': 8}

NumPy arrays

Array t in the example above is a NumPy array provided by the NumPy library. This librray has also lots of nice tricks. First lets create two matrices:

>>> import numpy as np
>>> a = np.array([[1,2,3],[4,5,6]])
>>> b = np.array([[7,8],[9,10],[11,12]])
>>> a
array([[1, 2, 3],
       [4, 5, 6]])
>>> b
array([[ 7,  8],
       [ 9, 10],
       [11, 12]])

We can sum these matrices or multiply them by some number:

>>> 3 * a
array([[ 3,  6,  9],
       [12, 15, 18]])
>>> a + 3 * a
array([[ 4,  8, 12],
       [16, 20, 24]])

We can calculate sum of elements in each matrix, or sum by some axis:

>>> np.sum(a)
21
>>> np.sum(a, axis=1)
array([ 6, 15])
>>> np.sum(a, axis=0)
array([5, 7, 9])

There are many other useful functions, check the documentation.