Dátové štruktúry pre celočíselné kľúče

Obsah

1 Introduction
2 Models of computations
- 2.1 Word RAM model
- 2.2 Other models
3 Predecessor problem
4 van Emde Boas data structure
5 Simple tree view
- 5.1 Predecessor, successor
- 5.2 Updates
6 Sources

Introduction

data structures that work only with integers
can be used to solve predecessor problem
van Emde Boas data structure, x-fast trees

Models of computations

for integer data structures
word - w-bit integer (unsigned)
words are over universe ${\mathcal {U}}=\{0,\dots ,2^{w}-1\}$ of size $u=2^{w}$
all input, output, queries, ... are with respect to w-bit words
memory is an array of words of some fixed size

Word RAM model

word Random Access Machine
standard model, it will be used in this lecture
memory is divided into S cells of size w
words can serve as pointers
it is practical to assume $w\geq \lg(S)$ , otherwise we can't index whole memory
if n is the problem size, it implies $w\geq \lg(n)$
processor may use in time following 'C-style' operations:
- $+,-,*,/,\%,\&,|,\gg ,\ll ,<,>,[~]$

Other models

there exist other models used with (not only) integer data structures
transdichotomous RAM model
- almost same as word RAM, but it can have perform arbitrary operations that access at most ${\mathcal {O}}(1)$ memory cells
cell probe model
- reading and writing costs 1, other operations are free
- unrealisitic, used for lower bounds
pointer machine, etc.

Predecessor problem

maintain a set S of n words over universe U
support following operations:
- insert an element of universe into set
- delete an element of set
- find the predecessor of an element from universe in S, ${\text{predecessor}}(x\in {\mathcal {U}})={\text{max}}\{y\in S~|~y<x\}$
- find the succecessor of an element from universe in S, ${\text{successor}}(x\in {\mathcal {U}})={\text{min}}\{y\in S~|~y>x\}$

van Emde Boas data structure

structure to solve predecessor problem
reinterpreted version by Bender and Farach-Colton
${\mathcal {O}}(\lg w)={\mathcal {O}}(\lg \lg u)$ time per operation, $\Theta (u)$ space
assuming n is not not much smaller compared with u, namely if $n=u^{{\Theta (1)}}$ , we get running time $O(\log \log n)$ compared to $O(\log n)$ with balanced binary search trees
goal is to bound all operations by recurrence $T(u)=T({\sqrt {u}})+{\mathcal {O}}(1)$ , so running time will be (e.g. by Master theorem applied to $\log u$ ) $T(u)={\mathcal {O}}(\lg \lg u)$
main idea is to split universe U into ${\sqrt {u}}$ clusters, each of size ${\sqrt {u}}$

Hierarchical coordinates

suppose we have universe of size u
for word x, we will write , where:
- $c=\lfloor x/{\sqrt {u}}\rfloor$ is cluster containing x
- $i=x{\text{ mod }}{\sqrt {u}}$ is x's index within cluster
- $x=c{\sqrt {u}}+i$
if we have binary numbers:
- just split binary representation of x in half
- c are high order bits, $c=x\gg w/2$
- i are low order bits, $i=x~\&~((1\ll w/2)-1)$
- $x=(c\ll w/2)~|~i$
conversion is in ${\mathcal {O}}(1)$ time

VEB recursive structure

van Emde Boas structure (VEB) V of size u with word size w
V consists of ${\sqrt {u}}$ VEB structures of size ${\sqrt {u}}$ and word size $w/2$ called clusters
each cluster is denoted by V.cluster[c], for $0\leq c<{\sqrt {u}}$
if $x=\langle c,i\rangle$ is contained in V and x is not the minimum, then i is contained in V.cluster[i]
in addition, we maintain a summary VEB of size ${\sqrt {u}}$ and word size $w/2$ , denoted by V.sumary
summary contains all integers c such that V.cluster[c] is not empty
we store minimum element of V separately in V.min (it is not in any cluster) or value None in case V is empty
we also store copy of maximum element of V in V.max

Operations

Successor

query for successor of $x=\langle c,i\rangle$ in VEB V
we first check if x is less then minimum of V, in that case minimum is the successor
if x is less than maximum in it's own cluster, then the successor of x must be in that cluster (it's either maximum of that cluster or some other element), so we can recursively search for the successor in that cluster
otherwise, x is greater than any element in it's own cluster, so the answer is minimum in first non empty cluster that goes after cluster c, we can find it by recursive search in summary
in each of three cases, we call recursion at most once on VEB of square root of size of V, so running time is bounded by recurrence $T(u)$
predecessor query works in very similar fashion

Successor(V, x = <c,i>) {
    if (x < V.min) return V.min;
    if (i < V.cluster[c].max) {
        return <c, Successor(V.cluster[c], i)>;
    } else {
        c' = Successor(V.summary, c);
        return <c', V.cluster[c'].min>;
    }
}

Insert

we want to insert $x=\langle c,i\rangle$ into VEB V
if minimum is None, it means V is empty and we just set minimum, maximum and return (since minimum is not stored recursively)
if x is less than minimum, we swap values of x and minimum, so we will insert the old minimum
if x is greater than maximum, we replace maximum with copy of x
we check, whether cluster c is empty
in case it is, we need to insert c into summary
at last, we insert i into x's cluster c
note that it can happen that we call recursion twice, but in that case cluster c was empty, so second insert (of i into cluster c) takes constant time, because it just sets minimum and returns
in any case, we have at most one deep (and at most one shallow) recursion, so running time is still bounded by aforementioned recurrence

Insert(V, x = <c,i>) {
    if (V.min == None) {
        V.min = V.max = x;
        return;
    }
    if (x < V.min) swap(x, V.min);
    if (x > V.max) V.max = x;
    if (V.cluster[c].min == None) {
        Insert(V.summary, c);
    }
    Insert(V.cluster[c], i);
}

Delete

we want to remove $x=\langle c,i\rangle$ from VEB V
if x is minimum of V, we first check whether the summary is empty
in case it is, x is the only element of V, so we just remove him and return
in case there are more elements, new minimum is going to be minimum of first non-empty cluster (since x is the overall minimum and is stored separately), which we can find easily in constant time (we look at minimum of summary and take minimum from that cluster)
now, we need to delete either minimum of first non-empty cluster (because we moved it into V.min) or x (case where x is not minimum, so it is in some cluster)
we delete our value recursively from it's cluster
in case cluster we delete element from is now empty, we need to delete it's number from summary by second recursive call
note that, same as with insert, in case of second recursive call, cluster in first call had only one element, so it was shallow recursion in constant time and we still have only one deep recursion
finally, we set min and max values accordingly

Delete(V, x = <c,i>) {
    if (x == V.min) {
        c = V.summary.min
        if (c == None) {
            V.min = None;
            return;
        }
        V.min = <c, i = V.cluster[c].min>;
    }
    Delete(V.cluster[c], i);
    if (V.cluster[c].min == None) {
        Delete(V.summary, c);
    }
    if (V.summary.min == None) {
        V.max = V.min;
    } else {
        c' = V.summary.max;
        V.max = <c', V.cluster[c'].max>;
    }
}

Simple tree view

original van Emde Boas data structure
it is similar to previous structure, but from tree point of view
take a bit vector v of size u, where i-th bit is 1, if integer i is in the structure and 0 otherwise
next, build full binary tree over vector v, where at each node we store the OR of its children

note that when we split tree in half of its height, first half represents summary (its leaves are summary bits) and ${\sqrt {u}}$ size trees in second half represent clusters
value of node in tree determine if there is some integer in interval that corresponds to that node
we will also store for each tree node minimum and maximum integer that is in its interval

Predecessor, successor

assume we are querying integer x and it is not in the set
take path from x to root in tree and look at first 1 on that path
let the last 0 be on node u and first 1 on node v
if u is left child of v, it means the successor must be in right child's tree and it is minimum of that subtree (since value of v is 1, there must be element in right subtree of v)
if u is right child of v, we can find predecessor of x as maximum from v's left subtree
to compute successor from predecessor and vice versa, we can simply store all elements of the structure in sorted linked list
to find 0-1 transition in x to root path, we use binary search on height
as value of each node is OR of its children, every element to root path is monotone, path starts as contiguous zeros followed by ones, so we can binary search for 0-1 transition
in RAM model, we can index directly into path, in other models (e.g. pointer machine), we must store pointers to ancestors to quickly jump in binary search
in case x is in the set, we just use forward or backward pointer
time complexity:
- as the binary tree has height w, cost of binary search is ${\mathcal {O}}(\lg w)$ or ${\mathcal {O}}(\lg \lg u)$

Updates

updating this structure is very expensive
per operation, we must update all nodes of leaf-to-root path, in ${\mathcal {O}}(\lg u)$ time

Sources

Erik Demain's lecture L11 from MIT: http://courses.csail.mit.edu/6.851/spring12/lectures/
Wikipedia article about x-fast trees: http://en.wikipedia.org/wiki/X-fast_trie
Space of vEB trees [1], [2]

Dátové štruktúry pre celočíselné kľúče

Obsah

Introduction

Models of computations

Word RAM model

Other models

Predecessor problem

van Emde Boas data structure

Hierarchical coordinates

VEB recursive structure

Operations

Successor

Insert

Delete

Simple tree view

Predecessor, successor

Updates

Sources

Navigačné menu

Osobné nástroje

Menné priestory

Varianty

Zobrazení

Operácie

Hľadať

Navigácia

Nástroje