Amortizované vyhľadávacie stromy a link-cut stromy

Predbežná verzia poznámok, nebojte sa upravovať: pripojiť informácie z prezentácie, pridať viac vysvetľujúceho textu a pod.

Obsah

[skryť]

1 Splay trees
2 Link-cut trees

Splay trees

intro on slides

Intuition and examples

if height of tree is large, search is expensive, we use potential
potential of path-like trees should be large, balanced trees smaller
what happens if we apply splay on a leaf on a path?
side benefit: if we search for some nodes more often, they tend to be near the root

Amortized analysis of splaying

real cost: number of rotations
D(x): number of descendants of node x, including x (size of a node)
r(x): log_2 (D(x)) rank of a node (we will denote log_2 by lg)
$\Phi (T)=\sum _{{x}}r(x)$
Exercise: asymptotically estimate potential of a path and balanced tree with n nodes

Lemma1 (proof later): consider one step of splaying x (1 or 2 rotations)

let r(x) be rank of x before splaying, r'(x) after splaying
amortized cost of one step of splaying is at most
- 3(r'(x)-r(x)) for zig-zag and zig-zig
- 3(r'(x)-r(x))+1 for zig

Lemma2: Amortized cost of splaying x to the root in a tree with n nodes is O(log n)

Let r_i be rank of x after i steps of splaying, let k be the number of steps
amortized cost is at most
- (use Lemma 1, +1 is from the last step of splaying)
telescopic sum, terms cancel, we get $3r_{k}-3r_{0}+1=O(\log n)$

Theorem: Amortized cost of insert, search and delete in splay tree is O(log n)

search:
- walk down to x, then splay x (if x not found, splay the last visited node)
- amortized cost is O(log n)
insert:
- insert x as in normal binary search tree, then splay x.
- Inserting increase rank of all nodes on the path P from root r to x
- consider node v and its parent p(v) on this path P
- D'(v) = 1 + D(v) <= D(p(v))
- r'(v) <= r(p(v))
- change in potential $\sum _{{v}}r'(v)-\sum _{{v}}r(v)\leq r'(r)+\sum _{{v\in P,v\neq r}}r(p(v))-r(v)=2r'(r)=O(\log n)$
- amortized cost of insert + splaying is thus O(log n)
delete:
- delete x as from binary search tree (may in fact delete node for predecessor or successor of x), then splay parent of deleted node
- deletion itself decreases potential, amortized cost of splaying is O(log n)
- amortized cost is O(log n)
overall time in each operation proportional to the length of the path traversed and therefore to the number of rotations done in splaying
- therefore amortized time O(log n) for each operation.

Proof of Lemma 1:

case zig: cost 1, r'(x) = r(y), all nodes except x and y keep their ranks
- amortized cost $1+r'(x)+r'(y)-r(x)-r(y)=1+r'(y)-r(x)<=1+r'(x)-r(x)<=1+3(r'(x)-r(x))$
case zig-zig: cost 2,
- amortized cost $2+r'(x)+r'(y)+r'(z)-r(x)-r(y)-r(z)$
- we have r'(x) = r(z), r'(y)<=r'(x), r(y)>=r(x)
- therefore am. cost at most $2+r'(x)+r'(z)-2r(x)$
- want 3(r'(x)-r(x)) therefore we need 2 <= 2r'(x)-r(x)-r'(z) or (r(x)-r'(x))+(r'(z)-r'(x))<=-2 or lg (D(x)/D'(x)) + lg (D'(z)/D'(x)) <= -2
- observe that D(x)+D'(z) <= D'(x)
- if we have a,b>0 s.t. a+b<=1, then lg a + lg b has maximum -2 at a=b=1/2.
  - how would you go about proving this?
  - binary logarithm is concave, i.e. lg(ta+(1-t)b)>= t lg(a)+(1-t)lg(b) for every t in [0,1]
  - set t = 1/2, (lg(a)+lg(b))/2 <= lg((a+b)/2) <= lg(1/2) = -1
- here a = D(x)/D'(x), b = D'(z)/D'(x)
case zig-zag is analogous
- amortized cost $2+r'(x)+r'(y)+r'(z)-r(x)-r(y)-r(z)$
- r'(x) = r(z), r(x)<=r(y)
- therefore am. cost at most $2+r'(y)+r'(z)-2r(x)$
- we will prove that 2 <= 2r'(x)-r'(y)-r'(z) and this will imply amortized cost is at most 2(r'(x)-r(x))
- we have D'(y)+D'(z) <= D'(x), use the same logarithmic trick as before

Collection of splay trees

Image now a collection of splay trees. The following operations can be done in O(log n) amortized time (each operation gets pointer to a node)

findMin(v): find minumum element in a tree containing v and make it root of that tree
- splay v, follow left pointers to min m, splay m, return m
join(v1, v2): all elements in tree of v1 must be smaller than all elements in tree of v2. Join these two trees into one
- m = findMin(v2), splay(v1), connect v1 as a left child of m, return m
- connecting increases rank of m by at most log n
splitAfter(x) - split tree containing v into 2 trees, one containing keys <=x, one containing keys > x, return root of the second tree
- splay(v), then cut away its right child and return it
- decreases rank of v
splitBefore(v) - split tree containing v into 2 trees, one containing keys <x, one containing keys >=x, return root of the first tree
- analogous, cut away left child

Link-cut trees

Union/find

Maintains a collection of disjoint sets, supports operations

union(v, w): connects sets containing v and w
find(v): returns representative element of set containing v (can be used to test of v and w are in the same set)

Can be used to maintain connected components if we add edges to the graph, also useful in Kruskal's algorithm for minimum spanning tree

Implementation

each set a (non-binary) trees, in which each node v has a pointer to its parent v.p
find(v) follows pointers to the root of the tree, return the root
union calls find for v and w and joins one with the other
if we keep track of tree height and always join shorter tree below higher tree plus do path compression, we get amortized time O(alpha(m+n,n)) where alpha is inverse ackermann function, extremely slowly growing (n = number of elements, m = number of queries)

Exercise: implement as a collection of splay trees

hint: want to use join for union, but what about the keys?
splay trees actually do not need to store key values if we do not search

Link/cut trees

Maintains a collection of disjoint rooted trees on n nodes

findRoot(v) find root of tree containing v
link(v,w) v a root, w not in tree of v, make v a child of w
cut(v) cut edge connecting v to its parent (v not a root)

O(log n) amortized per operation. We will show O(log^2) amortized time. Can be also modified to support these operations in worst-case O(log n) time, add more operations e.g. with weights in nodes.

Similar as union-find, but adds cut plus we cannot rearrange trees as needed as in union-find

Simpler version for paths

Simpler version for paths (imagine them as going upwards):

findPathHead(v) - highest element on path containing v
linkPaths(v, w) - join paths containing v and w (head of v's path will remain head)
splitPathAbove(v) - remove edge connecting v to its parent p, return some node in the path containing p
splitPathBelow(v) - remove edge connecting v to its chold c, return some node in the path containing c

Can be done by splay trees

keep each path in a splay tree, using position as a key (not stored anywhere)
findPathHead(v): findMin(v)
linkPaths(v, w): join(v, w)
splitPathAbove(v): splitBefore(v)
splitPathBelow(v): splitAfter(v)

Back to link/cut trees

separate edges in each tree into dashed and solid, among children of each node at most one connected by a solid edge
series of solid paths (some solid paths contain only a single vertex)
each solid path represented as a splay tree as above
each node remembers the dashed edge going to its parent (if any) - only in the head of a solid path
expose(v): make v the lower end of a solid path to root (make all edges on the path to root solid and make other edges dashed as necessary)
- assume can be done in O(log^2 n) amortized time
findRoot(v): findPathHead(expose(v))
link(v,w):
- expose(v) note: v is now 1-node solid path because v is a root
- expose(w)
- linkPaths(w, v)
cut(v):
- expose(v) note: v not a root and thus connected to its parent by a solid edge
- splitPathAbove(v)

expose(v)

series of splices where each splice adds one solid edge on the path from v to root

 
y = cutPathBellow(v);
if(y!=NULL) findPathHead(y).dashed = v;
while(true) {
  x = findPathHead(v);
  w = x.dashed;
  if(w == NULL) break;
  x.dashed = NULL;
  q = splitPathBelow(w);
  if(q != NULL) {
    findPathHead(q).dashed = w;
  }
  linkPaths(w, x); v = x;
}

we will prove that m link-cut operations with n nodes cause O(m log n) splices which means O(log n) splices amortized per expose
each splice O(log n) amortized time, therefore O(log^2 n) amortized time per link-cut operation

Heavy-light decomposition

edge from v to its parent p is called heavy if D(v)> D(p)/2, otherwise it is light. Observations:

each path from v to root at most lg n light edges because with each light edge the size of the subtree goes down by at least 1/2
each node has at most one child connected to it by a heavy edge

Proof that there are O(log n) splices amortized per expose

potential function: number of heavy dashed edges
cost: splice
assume expose creates L new light solid edges and H new solid heavy edges.
real cost L+H
it creates at most L+1 new heavy dashed edges (only splices which create new light solid edge can create heavy dashed edge plus 1 from child of v)
removes H heavy dashed edges (they become solid by splice)
change of potential <= L+1-H
amortized cost 2L+1 = O(log n) because all new light solid edges are light edge on a path from v to root
some operations change the tree and create new heavy dashed edges from light dashed edges
- consider edge from v to its parent p
- it can become heavy if v becomes heavier by linking in its subtree
  - this never happens because edge v->p would be solid before linking
- it can also become heavy if some other child x of p becomes lighter by cutting something in its subtree
  - after cuting, x is connected to p by a light edge
  - there are at most lg n vertices on the path exposed by cutting that are light after the cut and can play role of x
  - each x has at most one sibling v that becomes heavy
  - therefore there are at most lg new heavy dashed edges, which adds O(log n) splices to amortized cost of cuting

Fully dynamic connectivity

insert delete edges, insert delete isolated vertices, test if there is a path from v to w O(log^2 n) amortized
- Holm, Jacob, Kristian De Lichtenberg, and Mikkel Thorup. "Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity." Journal of the ACM (JACM) 48, no. 4 (2001): 723-760.
- uses link-cut trees as components
- good idea for presentation?
if no delete, can use union-find

Applications

lca least common ancestor:
- expose(v)
- expose(w)
- u = findPathHead(v)
- if u==root return v else return u.parent
- we have seen O(1) lca after preprocessing a static tree, this would be O(log n) dynamic
At the time of publication improved best known running time of maximum flow problem from O(nm log^2 n) to O(nm log n). Since then some improvements.

Amortizované vyhľadávacie stromy a link-cut stromy

Obsah

Splay trees

Intuition and examples

Amortized analysis of splaying

Collection of splay trees

Link-cut trees

Union/find

Link/cut trees

Simpler version for paths

Back to link/cut trees

Heavy-light decomposition

Proof that there are O(log n) splices amortized per expose

Fully dynamic connectivity

Applications

Navigačné menu

Osobné nástroje

Menné priestory

Varianty

Zobrazení

Operácie

Hľadať

Navigácia

Nástroje