2-INF-185 Integrácia dátových zdrojov 2017/18

Materiály · Úvod · Pravidlá · Kontakt
Body z HW01 a HW04 nájdete na serveri v /grades/userid.txt


From IDZ
Jump to: navigation, search


In this lecture we dive into SQLite3 and Python.


SQLite3 is a simple "database" stored in one file. Think of SQLite not as a replacement for Oracle but as a replacement for fopen(). Documentation: https://www.sqlite.org/docs.html

You can access sqlite database either from command line:

usamec@Darth-Labacus-2:~$ sqlite3 db.sqlite3
SQLite version 3.8.2 2013-12-06 14:53:30
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> CREATE TABLE test(id integer primary key, name text);
sqlite> .schema test
CREATE TABLE test(id integer primary key, name text);
sqlite> .exit

Or from python interface: https://docs.python.org/2/library/sqlite3.html.


Python is a perfect language for almost anything. Here is a cheatsheet: http://www.cogsci.rpi.edu/~destem/igd/python_cheat_sheet.pdf

Scraping webpages

The simplest tool for scraping webpages is urllib2: https://docs.python.org/2/library/urllib2.html Example usage:

import urllib2
f = urllib2.urlopen('http://www.python.org/')
print f.read()

Or use requests package:

import requests
r = requests.get("http://en.wikipedia.org")

Parsing webpages

We use beautifulsoup4 for parsing html (http://www.crummy.com/software/BeautifulSoup/bs4/doc/). I recommend following examples at the beginning of the documentation and example about CSS selectors: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors

Parsing dates

You have two options. Either use datetime.strptime or use dateutil package.

Other usefull tips

  • Don't forget to commit to your sqlite3 database (db.commit()).
  • CREATE TABLE IF NOT EXISTS can be usefull at the start of your script.
  • Inspect element (right click on element) in Chrome can be very helpful.
  • Use screen command for long running scripts.
  • All packages are installed on vyuka server. If you are planning using your own laptop, you need to install them using pip (preferably using virtualenv).