Dátové štruktúry pre externú pamäť: Rozdiel medzi revíziami

Verzia zo dňa a času 08:53, 29. apríl 2015

Obsah

1 External memory model, I/O model
2 Cache oblivious model
3 Static cache-oblivious search trees
- 3.1 van Emde Boas order
- 3.2 Analysis
4 Dynamic cache oblivious trees
5 Sources

External memory model, I/O model

Introduction

big and slow disk, fast memory of a limited size M words
disk reads/writes in blocks of size B words
when analyzing algorithms, count how many disk reads/writes (memory transfers) are done

Example: scanning through n elements (e.g. to compute their sum): Theta(ceiling of n/B) memory transfers

B-trees

I/O cost of binary search tree?
parameter T related to block size B (so that each node fits into a constant number of blocks)
each node v has some number v.n of keys
- if it is an internal node, it has v.n+1 children
- keys in a node are sorted
- each subtree contains only values between two successive keys in the parent
all leaves are in the same depth
each node except root has at least T-1 keys and each internal node except root has at least T children
each node has at most 2T-1 keys and at most 2T children
height is O(log_T n)

Search:

O(log_T n) block reads

Insert:

find a leaf where the new key belongs
if leaf is full (2T-1 keys), split into two equal parts of size T-1 and insert median to the parent (recursively)
- new element is inserted to one of the new leaves
- we continue splitting ancestors until we find a node which is not full or until we reach the root. If root is full, we create a new root with 2 children (increasing the height of the tree)
O(log_T n) transfers

Sorting

mergesort

I/O cost of ordinary implementation?
create sorted runs of size M using O(N/B) transfers
repeatedly merge M/B-1 runs into one in 1 pass, this uses N/B disk reads and writes
log_{M/B-1} (N/M) passes

Summary

B-trees can do search tree operations (insert, delete, search/predecessor) in Theta(log_{B+1} n) = Theta(log n / log (B+1)) memory transfers
- number of transfers compared to usual search tree running time: factor of 1/log B
search optimal in comparison model
- proof outline through information theory
- goal of the seach is to tell us between which two elements is query located
- there are Theta(n) possible answers, thus Theta(log n) bits of information
- in normal comparison model one comparison of query to an elements of the set gives us at most 1 bit of information
- in I/O model reading a block of B elements and comparing query to them gives us in the beste case its position within these B ele,ents, i.e. Theta(log B) bits
- to get Theta(log n) bits of infomration for the answer, we need Theta(log n/log B) block reads
- this can be transformed to a more formal and precise proof
sorting O((N/B) log_{M/B} (N/B)) memory trasfers
- number of memory transfers compared to usual sorting running time: factor of more than 1/B
- also much better than sorting using B-trees, which would take O(N * log_B n)

Cache oblivious model

Introduction

in the I/O model the algorithm explicitly requests block transfers, knows B, controls memory allocation
in the cache oblivious model algorithm does not know B or M, memory operates as a cache
M/B slots, each holding one block from disk
algorithm requests reading a word from disk
- if the block containing this word in cache, no transfer
- replace one slot with block holding requested item, write original block if needed (1 or 2 transfers)
- which one to replace: classical on-line problem of paging

Paging

cache has size k = M/B slots, each holds one page (above called block)
sequence of page requests, if the requested page not in memory not in memory, bring it in and remove some other page (page fault)
- goal is to minimize the number of page faults
optimal offline algorithm (knows the whole sequence of page requests)
- At a page fault remove the page that will not be used for the longest time
example of an on-line algoritm: FIFO:
- at a page fault remove page which is in the memory longest time
- uses at most k times as many page faults as the optimal algorithm (k-competitive)
- no deterministic alg. can be better than k-competitive
- it is conservative: in a segment of requests containing at most k distinct pages it does at most k page faults
- every conservative alg. is k-competitive
Compare a conservative paging alg. (e.g. FIFO) on memory with k blocks to optimum offline alg. on a smaller memory of size h - competitive ratio k/(k-h)
- if h = k/2, we get competitive ratio 2
- divide input sequence into maximal blocks, each containing k distinct elements (first element of the next block is distinct from the k elements of the previous block)
- FIFO uses at most k page faults in each block
- optimum has at most h pages from a block in memory at the beginning - at least k-h pages will cause a fault
- in fact we can prove k/(k-h+1), works even for k=h

Back to cache-oblivious model

we analyze algorithms under the assumptiom that paging algorithm uses off-line optimum
instead it could use e.g. FIFO in memory 2M and increase the number of transfers by a constant factor
advantages of cache-oblivious model:
- may adapt to changing M, B
- good for a whole memory hierarchy (several levels of cache, disk, network,...)
scanning still Theta(ceiling of N/B)
search trees still Theta(log_{B+1} N)
sorting Theta(n/B log_{M/B}(N/B)) but requires big enough M i.e. M=Omega(B^{1+epsilon})

Static cache-oblivious search trees

first published by Harald Prokop, Master thesis, MIT 1999 [1]
alternative to a binary search
- exercise: how many transfers for binary search?
use perfectly balanced binary search tree
search algorithm as usual (follows path from root down as usual)
nodes are stored on disk in some order which we can choose
- the goal is to choose it so that for any B, M and any search we use only few blocks
- for example, what would happen if we store nodes in pre-order or level-order?
- instead of simple orders, we will use van Emde Boas order (later in the course we will see van Emde Boas trees for integer keys)

van Emde Boas order

split tree of height lg n into top and bottom, each of height (1/2) lg n
top part is a small tree with about sqrt(n) vertices
bottom part consists of about sqrt(n) small trees, each size about sqrt(n)
each of these small trees is processed recursively and the results are concatenated
for example a tree with 4 levels is split into 5 trees with 2 levels, resulting in the following ordering:

          1
     2           3
  4     7    10    13
5  6  8  9 11 12 14 15

if B = (1/2) lg n, we need only 2 trasfers
but we will show next that for any B<=M/2 and for any path from the root we need only Theta(log_{B+1} n) block transfers

Analysis

find level of recursion where each smaller tree has height k such that size of these trees is <=B and the next higher level has tree size >B
(1/2) lg B <= k <= lg B
length of path lg N, visits lg N / k trees of height k, which is Theta(lg N / lg B)
each tree of height k occupies at most B cells in memory (assuming each node fits int a word, otherwise everything is multipled by a constant factor...)
traversing the tree might need up to 2 memory transfers if there is a block boundary inside interval for the tree (whole tree fits in memeory)

Dynamic cache oblivious trees

only rough sketch

uses "ordered file maintenance"

maintain n items in an array of size O(n) with gaps of size O(1)
updates: delete item, insert item between given two items (similar to linked list)
update rewrites interval of size at most O(log^2 n) amortized in O(1) scans
done by keeping appropriate density in a hierarchy of intervals
we will not cover details

simpler version of data structure

our keep elements in "ordered file" in a sorted order
build a full binary tree on top of array (segment tree)
each node stores maximum in its subtree
tree stored in vEB order
when array gets too full, double the size, rebuild everything

search

search checks max in left child and decides to move left or right
basically follows a path from root, uses O(log_B n) transfers

update

search to find the leaf
update "ordered file", resulting in changes in interval of size L (O(log^2 n) amortized)
then update all ancestors of these values in the tree by postorder traversal

analysis of update

again let k be height of vEM subtrees which have size <=B but next higher level has size >B
(1) first consider bottom 2 levels: O(1+L/B) trees of size k need updating (+1 from ceilings)
each tree stored in <=2 blocks
postorder traversals revisits parent block between child blocks, but if M>=4B, stays in memory
overall O(log^2 n /B) transfers for bottom two layers
(2) part of the tree up to lca of all changed leaves: O(L/B) nodes, even if one trasfer per each, we are fine
(3) path from lca to root O(log_B n) transfers as in search
(1)+(2)+(3): O(log^2 n / B + log_B n) - too big for small values of B

improvement of update

split sorted data into blocks, each block of size between (1/4) lg n and lg n
each block stored sequentially on disk,
minimum element from each block used as a leaf in the above data structure, now contains only O(n/log n) elements
update/search inside block: 1 scan, O(log n/B)
if a block gets too large, split into two blocks (similarly to B-trees)
if a block gets too small, connect with adjacent block, maybe resplit
spliting/connecting propagates as update to the structure for n/log n items
but this happens only once very log n updates, so amortized time for update divided by log n
however, O(log_B n) term still necessary for search

Sources

Prednaska L07 Erika Demaina z MIT: http://courses.csail.mit.edu/6.851/spring12/lectures/
Kapitola o B-trees z Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. (ja som pouzila 3. vydanie)
online algoritmy a paging http://courses.csail.mit.edu/6.854/03/scribe/scribe19.ps

@@ Riadok 79: / Riadok 79: @@
 ** divide input sequence into maximal blocks, each containing k distinct elements (first element of the next block is distinct from the k elements of the previous block)
 ** FIFO uses at most k page faults in each block
-** optimum has at most h sets of a block in memory at the beginning - at least k-h pages will cause a fault
+** optimum has at most h pages from a block in memory at the beginning - at least k-h pages will cause a fault
 ** in fact we can prove k/(k-h+1), works even for k=h

Dátové štruktúry pre externú pamäť: Rozdiel medzi revíziami

Verzia zo dňa a času 08:53, 29. apríl 2015

Obsah

External memory model, I/O model

Introduction

B-trees

Sorting

Summary

Cache oblivious model

Introduction

Paging

Back to cache-oblivious model

Static cache-oblivious search trees

van Emde Boas order

Analysis

Dynamic cache oblivious trees

Sources

Navigačné menu

Osobné nástroje

Menné priestory

Varianty

Zobrazení

Operácie

Hľadať

Navigácia

Nástroje