2-INF-185 Integrácia dátových zdrojov 2016/17

Materiály · Úvod · Pravidlá · Kontakt
HW10 a HW11 odovzdajte do utorka 30.5. 9:00.
Dátumy odovzdania projektov:
1. termín: nedeľa 11.6. 22:00
2. termín: streda 21.6. 22:00
Oba termíny sú riadne, prvý je určený pre študentov končiacich štúdium alebo tých, čo chcú mať predmet ukončený skôr. V oboch prípadoch sa pár dní po odvzdaní budú konať krátke osobné stretnutia s vyučujúcimi (diskusia k projektu a uzatvárane známky). Presné dni a časy dohodneme neskôr. Projekty odovzdajte podobne ako domáce úlohy do /submit/projekt


L07

From IDZ
Jump to: navigation, search

In this lecture we dive into SQLite3 and Python.

SQLite3

SQLite3 is a simple "database" stored in one file. Think of SQLite not as a replacement for Oracle but as a replacement for fopen(). Documentation: https://www.sqlite.org/docs.html

You can access sqlite database either from command line:

usamec@Darth-Labacus-2:~$ sqlite3 db.sqlite3
SQLite version 3.8.2 2013-12-06 14:53:30
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> CREATE TABLE test(id integer primary key, name text);
sqlite> .schema test
CREATE TABLE test(id integer primary key, name text);
sqlite> .exit

Or from python interface: https://docs.python.org/2/library/sqlite3.html.

Python

Python is a perfect language for almost anything. Here is a cheatsheet: http://www.cogsci.rpi.edu/~destem/igd/python_cheat_sheet.pdf

Scraping webpages

The simplest tool for scraping webpages is urllib2: https://docs.python.org/2/library/urllib2.html Example usage:

import urllib2
f = urllib2.urlopen('http://www.python.org/')
print f.read()

Parsing webpages

We use beautifulsoup4 for parsing html (http://www.crummy.com/software/BeautifulSoup/bs4/doc/). I recommend following examples at the beginning of the documentation and example about CSS selectors: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors

Parsing dates

You have two options. Either use datetime.strptime or use dateutil package (https://dateutil.readthedocs.org/en/latest/parser.html).

Other usefull tips

  • Don't forget to commit to your sqlite3 database (db.commit()).
  • CREATE TABLE IF NOT EXISTS can be usefull at the start of your script.
  • Inspect element (right click on element) in Chrome can be very helpful.
  • Use screen command for long running scripts.
  • All packages are installed on vyuka server. If you are planning using your own laptop, you need to install them using pip (preferably using virtualenv).