Vybrané partie z dátových štruktúr
2-INF-237, LS 2016/17

Úvod · Pravidlá · Prednášky · Prezentácia · Ako poznámkovať · Moodle
Táto stránka sa týka školského roku 2016/17. V školskom roku 2017/18 predmet vyučuje Jakub Kováč, stránku predmetu je https://people.ksp.sk/~kuko/ds


Amortizované vyhľadávacie stromy a link-cut stromy

Z VPDS
Prejsť na: navigácia, hľadanie

Predbežná verzia poznámok, nebojte sa upravovať: pripojiť informácie z prezentácie, pridať viac vysvetľujúceho textu a pod.

Splay trees

  • intro on slides

Splay1.png Splay2.png Splay3.png

Intuition and examples

  • if height of tree is large, search is expensive, we use potential
  • potential of path-like trees should be large, balanced trees smaller
  • what happens if we apply splay on a leaf on a path?
  • side benefit: if we search for some nodes more often, they tend to be near the root

Amortized analysis of splaying

  • real cost: number of rotations
  • D(x): number of descendants of node x, including x (size of a node)
  • r(x): log_2 (D(x)) rank of a node (we will denote log_2 by lg)
  • \Phi (T)=\sum _{{x}}r(x)
  • Exercise: asymptotically estimate potential of a path and balanced tree with n nodes

Lemma1 (proof later): consider one step of splaying x (1 or 2 rotations)

  • let r(x) be rank of x before splaying, r'(x) after splaying
  • amortized cost of one step of splaying is at most
    • 3(r'(x)-r(x)) for zig-zag and zig-zig
    • 3(r'(x)-r(x))+1 for zig

Lemma2: Amortized cost of splaying x to the root in a tree with n nodes is O(log n)

  • Let r_i be rank of x after i steps of splaying, let k be the number of steps
  • amortized cost is at most 1+\sum _{{i=1}}^{k}3(r_{i}-r_{{i-1}})
    • (use Lemma 1, +1 is from the last step of splaying)
  • telescopic sum, terms cancel, we get 3r_{k}-3r_{0}+1=O(\log n)

Theorem: Amortized cost of insert, search and delete in splay tree is O(log n)

  • search:
    • walk down to x, then splay x (if x not found, splay the last visited node)
    • amortized cost is O(log n)
  • insert:
    • insert x as in normal binary search tree, then splay x.
    • Inserting increase rank of all nodes on the path P from root r to x
    • consider node v and its parent p(v) on this path P
    • D'(v) = 1 + D(v) <= D(p(v))
    • r'(v) <= r(p(v))
    • change in potential \sum _{{v}}r'(v)-\sum _{{v}}r(v)\leq r'(r)+\sum _{{v\in P,v\neq r}}r(p(v))-r(v)=2r'(r)=O(\log n)
    • amortized cost of insert + splaying is thus O(log n)
  • delete:
    • delete x as from binary search tree (may in fact delete node for predecessor or successor of x), then splay parent of deleted node
    • deletion itself decreases potential, amortized cost of splaying is O(log n)
    • amortized cost is O(log n)
  • overall time in each operation proportional to the length of the path traversed and therefore to the number of rotations done in splaying
    • therefore amortized time O(log n) for each operation.

Proof of Lemma 1:

  • case zig: cost 1, r'(x) = r(y), all nodes except x and y keep their ranks
    • amortized cost 1+r'(x)+r'(y)-r(x)-r(y)=1+r'(y)-r(x)<=1+r'(x)-r(x)<=1+3(r'(x)-r(x))
  • case zig-zig: cost 2,
    • amortized cost 2+r'(x)+r'(y)+r'(z)-r(x)-r(y)-r(z)
    • we have r'(x) = r(z), r'(y)<=r'(x), r(y)>=r(x)
    • therefore am. cost at most 2+r'(x)+r'(z)-2r(x)
    • want 3(r'(x)-r(x)) therefore we need 2 <= 2r'(x)-r(x)-r'(z) or (r(x)-r'(x))+(r'(z)-r'(x))<=-2 or lg (D(x)/D'(x)) + lg (D'(z)/D'(x)) <= -2
    • observe that D(x)+D'(z) <= D'(x)
    • if we have a,b>0 s.t. a+b<=1, then lg a + lg b has maximum -2 at a=b=1/2.
      • how would you go about proving this?
      • binary logarithm is concave, i.e. lg(ta+(1-t)b)>= t lg(a)+(1-t)lg(b) for every t in [0,1]
      • set t = 1/2, (lg(a)+lg(b))/2 <= lg((a+b)/2) <= lg(1/2) = -1
    • here a = D(x)/D'(x), b = D'(z)/D'(x)
  • case zig-zag is analogous
    • amortized cost 2+r'(x)+r'(y)+r'(z)-r(x)-r(y)-r(z)
    • r'(x) = r(z), r(x)<=r(y)
    • therefore am. cost at most 2+r'(y)+r'(z)-2r(x)
    • we will prove that 2 <= 2r'(x)-r'(y)-r'(z) and this will imply amortized cost is at most 2(r'(x)-r(x))
    • we have D'(y)+D'(z) <= D'(x), use the same logarithmic trick as before

Collection of splay trees

Image now a collection of splay trees. The following operations can be done in O(log n) amortized time (each operation gets pointer to a node)

  • findMin(v): find minumum element in a tree containing v and make it root of that tree
    • splay v, follow left pointers to min m, splay m, return m
  • join(v1, v2): all elements in tree of v1 must be smaller than all elements in tree of v2. Join these two trees into one
    • m = findMin(v2), splay(v1), connect v1 as a left child of m, return m
    • connecting increases rank of m by at most log n
  • splitAfter(x) - split tree containing v into 2 trees, one containing keys <=x, one containing keys > x, return root of the second tree
    • splay(v), then cut away its right child and return it
    • decreases rank of v
  • splitBefore(v) - split tree containing v into 2 trees, one containing keys <x, one containing keys >=x, return root of the first tree
    • analogous, cut away left child


Link-cut trees

Union/find

Maintains a collection of disjoint sets, supports operations

  • union(v, w): connects sets containing v and w
  • find(v): returns representative element of set containing v (can be used to test of v and w are in the same set)

Can be used to maintain connected components if we add edges to the graph, also useful in Kruskal's algorithm for minimum spanning tree

Implementation

  • each set a (non-binary) trees, in which each node v has a pointer to its parent v.p
  • find(v) follows pointers to the root of the tree, return the root
  • union calls find for v and w and joins one with the other
  • if we keep track of tree height and always join shorter tree below higher tree plus do path compression, we get amortized time O(alpha(m+n,n)) where alpha is inverse ackermann function, extremely slowly growing (n = number of elements, m = number of queries)

Exercise: implement as a collection of splay trees

  • hint: want to use join for union, but what about the keys?
  • splay trees actually do not need to store key values if we do not search

Link/cut trees

Maintains a collection of disjoint rooted trees on n nodes

  • findRoot(v) find root of tree containing v
  • link(v,w) v a root, w not in tree of v, make v a child of w
  • cut(v) cut edge connecting v to its parent (v not a root)

O(log n) amortized per operation. We will show O(log^2) amortized time. Can be also modified to support these operations in worst-case O(log n) time, add more operations e.g. with weights in nodes.

Similar as union-find, but adds cut plus we cannot rearrange trees as needed as in union-find

Simpler version for paths

Simpler version for paths (imagine them as going upwards):

  • findPathHead(v) - highest element on path containing v
  • linkPaths(v, w) - join paths containing v and w (head of v's path will remain head)
  • splitPathAbove(v) - remove edge connecting v to its parent p, return some node in the path containing p
  • splitPathBelow(v) - remove edge connecting v to its chold c, return some node in the path containing c

Can be done by splay trees

  • keep each path in a splay tree, using position as a key (not stored anywhere)
  • findPathHead(v): findMin(v)
  • linkPaths(v, w): join(v, w)
  • splitPathAbove(v): splitBefore(v)
  • splitPathBelow(v): splitAfter(v)

Back to link/cut trees

Link1.png

  • separate edges in each tree into dashed and solid, among children of each node at most one connected by a solid edge
  • series of solid paths (some solid paths contain only a single vertex)
  • each solid path represented as a splay tree as above
  • each node remembers the dashed edge going to its parent (if any) - only in the head of a solid path
  • expose(v): make v the lower end of a solid path to root (make all edges on the path to root solid and make other edges dashed as necessary)
    • assume can be done in O(log^2 n) amortized time
  • findRoot(v): findPathHead(expose(v))
  • link(v,w):
    • expose(v) note: v is now 1-node solid path because v is a root
    • expose(w)
    • linkPaths(w, v)
  • cut(v):
    • expose(v) note: v not a root and thus connected to its parent by a solid edge
    • splitPathAbove(v)

expose(v)

  • series of splices where each splice adds one solid edge on the path from v to root
 
y = cutPathBellow(v);
if(y!=NULL) findPathHead(y).dashed = v;
while(true) {
  x = findPathHead(v);
  w = x.dashed;
  if(w == NULL) break;
  x.dashed = NULL;
  q = splitPathBelow(w);
  if(q != NULL) {
    findPathHead(q).dashed = w;
  }
  linkPaths(w, x); v = x;
}
  • we will prove that m link-cut operations with n nodes cause O(m log n) splices which means O(log n) splices amortized per expose
  • each splice O(log n) amortized time, therefore O(log^2 n) amortized time per link-cut operation

Heavy-light decomposition

edge from v to its parent p is called heavy if D(v)> D(p)/2, otherwise it is light. Observations:

  • each path from v to root at most lg n light edges because with each light edge the size of the subtree goes down by at least 1/2
  • each node has at most one child connected to it by a heavy edge

Proof that there are O(log n) splices amortized per expose

  • potential function: number of heavy dashed edges
  • cost: splice
  • assume expose creates L new light solid edges and H new solid heavy edges.
  • real cost L+H
  • it creates at most L+1 new heavy dashed edges (only splices which create new light solid edge can create heavy dashed edge plus 1 from child of v)
  • removes H heavy dashed edges (they become solid by splice)
  • change of potential <= L+1-H
  • amortized cost 2L+1 = O(log n) because all new light solid edges are light edge on a path from v to root
  • some operations change the tree and create new heavy dashed edges from light dashed edges
    • consider edge from v to its parent p
    • it can become heavy if v becomes heavier by linking in its subtree
      • this never happens because edge v->p would be solid before linking
    • it can also become heavy if some other child x of p becomes lighter by cutting something in its subtree
      • after cuting, x is connected to p by a light edge
      • there are at most lg n vertices on the path exposed by cutting that are light after the cut and can play role of x
      • each x has at most one sibling v that becomes heavy
      • therefore there are at most lg new heavy dashed edges, which adds O(log n) splices to amortized cost of cuting

Fully dynamic connectivity

  • insert delete edges, insert delete isolated vertices, test if there is a path from v to w O(log^2 n) amortized
    • Holm, Jacob, Kristian De Lichtenberg, and Mikkel Thorup. "Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity." Journal of the ACM (JACM) 48, no. 4 (2001): 723-760.
    • uses link-cut trees as components
    • good idea for presentation?
  • if no delete, can use union-find

Applications

  • lca least common ancestor:
    • expose(v)
    • expose(w)
    • u = head(v)
    • if u==root return v else return u.parent
    • we have seen O(1) lca after preprocessing a static tree, this would be O(log n) dynamic
  • At the time of publication improved best known running time of maximum flow problem from O(nm log^2 n) to O(nm log n). Since then some improvements.