Vybrané partie z dátových štruktúr
2-INF-237, LS 2016/17

Úvod · Pravidlá · Prednášky · Prezentácia · Ako poznámkovať · Moodle
Táto stránka sa týka školského roku 2016/17. V školskom roku 2017/18 predmet vyučuje Jakub Kováč, stránku predmetu je https://people.ksp.sk/~kuko/ds


Dátové štruktúry pre externú pamäť

Z VPDS
Prejsť na: navigácia, hľadanie

Zdroje:

Osnova:

  • uvod do externej pamati z prednasky Erika Demaina
    • strucne opakovanie B-trees: struktura, hladanie, vkladanie, odhad vysky
    • strucne opakovanie mergesortu v externej pamati
  • uvod do cache-oblivious modelu z prednasky Erika Demaina
    • strucnu uvod do online algoritmov, kompetitivnost, paging
  • staticke vyhldavacie stromy (vEB layout) z prednasky Erika Demaina
  • nacrt ako zhruba vyzeraju dynamicke stromy (cache oblivious B-trees), nesli sme do vsetkych detailov

External memory model, I/O model

  • big and slow disk, fast memory of a limited size M (words)
  • disk reads/writes in blocks of size B (words)
  • when analyzing algorithms, count how many disk reads/writes are done


B-trees

  • I/O cost of binary search tree?
  • from CLRS 3rd edition
  • parameter T related to block size B (so that each node fits into a constant number of blocks)
  • each node v has some number v.n of keys
    • if it is an internal node, it has v.n+1 children
    • keys in a node are sorted
    • each subtree contains only values between two successive keys in the parent
  • all leaves are in the same depth
  • each node except root has at least T-1 keys and each internal node except root has at least T children
  • each node has at most 2T-1 keys and at most 2T children
  • height is O(log_T n)

Search:

  • O(log_T n) block reads

Insert:

  • find a leaf where the new key belongs
  • if leaf is full (2T-1 keys), split into two equal parts of size T-1 and insert median to the parent
    • new element can be inserted to one of the new leaves
  • we continue splitting ancestors until we find a node which is not full or until we reach the root. If root is full, we create a new root with 2 children (increasing the height of the tree)
  • O(log_T n) block reads and writes

Sorting

  • mergesort
    • I/O cost of ordinary implementation?
    • create sorted runs of size M using O(N/B) disk reads and writes
    • repeatedly merge M/B-1 runs into 1 - 1 pass uses N/B disk reads and writes
    • log_{M/B-1} (N/M) passes

Paging

  • cache has size k = M/B blocks, each holds one
  • sequence of page requests, if the requested page not in memory not in memory, bring it in and remove some other page (page fault)
    • goal is to minimize the number of page faults
  • optimal offline algorithm (knows the whole sequence of page requests)
    • At a page fault remove the page that will not be used for the longest time
  • example of an on-line algoritm: FIFO:
    • at a page fault remove page which is in the memory longest time
    • uses at most k times as many page faults as the optimal algorithm (k-competitive)
    • no deterministic alg. can be better than k-competitive
    • it is conservative: in a segment of requests containing at most k distinct pages it does at most k page faults
    • every conservative alg. is k-competitive
  • Compare a conservative paging alg. (e.g. FIFO) on memory with k blocks to optimum offline alg. on a smaller memory of size h - competitive ratio k/(k-h)
    • if h = k/2, we get competitive ratio 2
    • divide input sequence into maximal blocks, each containing k distinct elements (first element of the next block is distinct from the k elements of the previous block)
    • FIFO uses at most k page faults in each block
    • optimum has at most h sets of a block in memory at the beginning - at least k-h pages will cause a fault
    • in fact we can prove k/(k-h+1), works even for k=h