Vybrané partie z dátových štruktúr
2-INF-237, LS 2016/17

Úvod · Pravidlá · Prednášky · Prezentácia · Ako poznámkovať · Moodle
Táto stránka sa týka školského roku 2016/17. V školskom roku 2017/18 predmet vyučuje Jakub Kováč, stránku predmetu je https://people.ksp.sk/~kuko/ds


Prioritné rady

Z VPDS
Revízia z 14:30, 20. máj 2014; Brona (Diskusia | príspevky)

(rozdiel) ← Staršia verzia | Aktuálna úprava (rozdiel) | Novšia verzia → (rozdiel)
Prejsť na: navigácia, hľadanie

Nizsie su pracovne poznamky, ktore doplnujeu informaciu z prezentacie, je mozne ich rozsirit, doplnit samotnou prezentaciou

Fibonacci heaps

Analysis

ExtractMin

  • let c be degree of z, c<=D(n) = O(log n)
  • adding c children to root list:
    • cost c
    • delta Phi = c, amortized O(log n)

Consolidate:

  • t1 original number of nodes in the root list, t2 new
  • cost: t1 + D(n)
  • delta Phi: t2-t1
  • amortized t2+D(n), but t2<=D(n), therefore O(\log n)

DecreaseKey:

  • all except cascading cut:
    • cost O(1)
    • delta Phi <= 1
    • amortized O(1)

CascadingCut:

  • each recursive call cost 1
  • last recursive call: delta Phi <= 2 (amortized cost O(1))
  • other recursive call: delta Phi = -1 (amortized cost 0)

Overall DecreaseKey O(1) amortized

Maximum degree D(n)

Proof of Lemma 3:

  • Linking done only in Consolidate for 2 roots of equal degree.
  • When y_j was linked, y_1,...y_{j-1} were children, so at that time x.degree >= j-1.
  • Therefore at that time y_j.degree>= j-1
  • Since then y_j lost at most 1 child, so y_j.degree >= j2.

Proof of Lemma 4:

  • size of a node: size of its subtree (number of descendants)
  • let s_k be the minimum size of a node with degree k in any Fibonacci heap (will prove s_k>= phi^k)
  • s_k<= s_{k+1}*
  • Let z be a node of degree k and size s_k
  • Let y_1,...y_k be its children from earliest
  • s_k = 1 + sum_{j=1}^k size(y_j) >= 1 + size(y_1} + sum_{j=2}^k s_{y_j.degree}
  • >= 2 + sum_{j=2}^k s_{i-2} (lemma 3)
  • by induction on k: s_k >= F_{k+2}
    • trivial for k=0,1
    • assume k>=2, s_i>=F_{i+2} for i<=k-1
    • s_k >= 2+ sum_{j=2}^k s_{i-2} >= 2 + sum_{j=2}^k F_{i} (IH)
    • = 1 + sum_{j=0}^k F_{i} = F_{k+2} (lemma 1)
    • >= phi^k

Proof of Corollary:

  • let x be a node of degree k.
  • by lemma 4 n>=size(x) >= phi_k
  • therefore k<= log_phi(n)

Exercise

  • What sequence of operations will create a tree of depth n?
  • How to implement increaseKey?
  • Alternative delete (see slides)

Priority queues

Review

Operations        Bin. heap    Fib. heap 
                  worst case    amortized
MakeHeap()            O(1)       O(1)   
Insert(H, x)       O(log n)      O(1)   
Minimum(H)            O(1)       O(1)   
ExtractMin(H)      O(log n)    O(log n)   
Union(H1, H2)         O(n)       O(1)   
DecreaseKey(H, x, k)
                    O(log n)     O(1)   
Delete(H, x)        O(log n)   O(log n)   

Applications of priority queues

  • job scheduling on a server (store jobs in a max-based priority queue, always select the one with highest priority - needs operations insert and extract_min, possibly also delete -needs a handle)
  • event-driven simulation (store events, each with a time when it is supposed to happen. Always extract one event, process it and as a result potentially create new events - needs insert, extract_min Examples: queue in a bank or store: random customer arrival times, random service times per customer)
  • finding intersections of line segments [1]
    • Bentley–Ottmann algorithm - sweepline: binary search tree of line intersections) plus a priority queue with events (start of segment, end of segment, crossing)
    • for n segments and k intersections do 2n+k inserts, 2n+k extractMin, overall O((n+k)log n) time


  • heapsort (builds a heap from all elements or uses n time insert, then n times extract_min -implies a lower bound on time of insert+extract_min must be at least Omega(log n) in the comparison model)
  • aside: partial sorting - given n elements, print k in sorted order - selection O(n), then sort in O(k log k), together O(n + k log k)
  • incremental sorting: given n elements, build a data structure, then support extract_min called k times, k unknown in advance, running time a function of k and n
    • call selection each time O(kn)
    • heap (binary, Fibonacci etc) O(n + k log n) = O(n + k log k) because if k>= sqrt n, log n = Theta(log k) and of k < sqrt n, O(n) term dominates


  • Kruskal's algorithm (can use sort, but instead we can use heap as well, builds heap, then calls extract_min up to m times, but possibly fewer, plus needs union-find set structure) O(m log n)
    • process edges from smallest weight upwards, if it connects 2 components, add it. Finosh when we have a tree.
  • Prim's algorithm (build heap, n times extract-min, m times decrease_key - O(m log n) with binary heaps, O(m + n log n) with Fib heaps)
    • build a tree starting from one node. For each node outside tree keep the best cost to a node in a tree as a priority in a heap
  • Dijkstra's algorithm for shortest paths (build heap, n times extract_min, m times decrease_key = the same as Prim's)
    • keep nodes in a heap which do not have final distance. Set the distance of minimum node to final, update distances of tis neighbors

Meldable heaps

  • Advantages of Fibonacci compared to binary heap: faster decreaseKey (important for Prim's and Dijkstra's), faster union
  • meldable heaps support O(log n) union, but all standard operations run in O(log n) time (except for build in O(n) and min in O(1))
  • three versions: randomized, worst-case, amortized
  • all of them use a binary tree similar to binary heap, but not always fully balanced
    • min-heap property: key of child >= key of parent
  • key operation is union
    • insert: union of a single-node heap and original heap
    • extract_min: remove root, union of its two subtrees
  • delete and decreaseKey more problematic, different for different versions, need pointer to parent
    • delete: cut away a node, union its two subtrees, hang back, may need some rebalancing
    • decrease_key: use delete and insert (or cut away a node, decrease its key, union with the original tree)

Random meldable heap

Union(H1, H2) {
    if (H1 == null) return H2;
    if (H2 == null) return H1;
    if (H1.key > H2.key) { exchange H1 and H2 }
    // now we know H1.key <= H2.key
    if (rand() % 2==1) {
       H1.left = merge(H1.left, H2);
    } else {
       H1.right = merge(H1.right, H2);
    }
    return H1;
}

Analysis:

  • random walk in each tree from root to a null pointer (external node)
  • in each recursive call choose H1 or H2 based on key values, choose direction left or right randomly
  • running time proportional to the sum of lengths of these random walks

Lemma: The expected length of a random walk in a any binary tree with n nodes is at most lg(n+1).

  • let W be the length of the walk, E[W] its expected value.
  • Induction on n. For n=0 we have E[W] = 0 = lg(n+1). Assume the lemma hold for all n'<n
  • Let N1 be the size of left subtree, n2 the size of right subtree. n = n1 + n2 + 1
  • By the induction hypothesis E[W] <= 1 + (1/2)lg(n1+1) + (1/2)lg(n2+1)
  • lg is a concave function, meaning that (lg(x)+lg(y))/2 <= lg((x+y)/2)
  • E[W] <= 1 + lg((n1+n2+2)/2) = lg(n1+n2+2) = lg(n1+1)

Corollary: Time of Union is O(log n)

A different proof of the Lemma via entropy

  • let di be the depth of ith external node
  • binary tree with n nodes has n+1 external nodes
  • probability of random walk reaching external node i is pi = 2^{-di}, and so di = lg(1/pi)
  • E[W] = sum_{i=0}^n d_i p_i = sum_{i=0}^n p_i lg(1/pi)
  • we have obtained entropy of a distribution over n+1 values which is at most lg(n+1)

Leftist heaps

  • Mehta and Sahni Chap. 5
  • Crane, Clark Allan. "Linear lists and priority queues as balanced binary trees." (1972) (TR and thesis)
  • let s(x) be distance from x to nearest external node
    • s(x)=0 for external node x and s(x) = min(s(x.left), s(x.right))+1 for internal node x
  • height-biased leftist tree is a binary tree in which for every internal node x we have s(x.left)>=s(x.right)
Union(H1, H2) {
    if (H1 == null) return H2;
    if (H2 == null) return H1;
    if (H1.key > H2.key) { exchange H1 and H2 }
    // now we know H1.root.key <= H2.root.key
    HN = Union(H1.right, H2);
    if(HN.s <= H1.s) {
       return new node(H1.data, H1.left, HN) 
    } else {
       return new node(H1.data, HN, H1.left) 
    }
}
  • note: s value kept in each node,
  • recursion descends only along right edges

Lemma: (a) subtree with root x has at least 2^{s(x)}-1 nodes (b) if a subtree with root x has n nodes, s(x)<=lg(n+1) (c) the length of the path from node x using only edges to right child is s(x)

  • (a) all levels up to s(x) are filled with nodes, therefore the number of nodes is at least 1+2+4+...+2^{s(x)-1} = 2^{s(x)}-1
  • (b) an easy corollary of a (n>=2^s-1 => s<-lg(n+1))
  • (c) implied by height-biased leftist tree property

Corollary: Union takes O(log n) time

  • recursive calls along right edges from both roots - at most s(x)<=lg(n+1) such edges in a tree of size n

Skew Heaps

  • self-adjusting version of leftist heaps, no balancing values stored in nodes
  • Mehta and Sahni Chap. 6
Union(H1, H2) {
    if (H1 == null) return H2;
    if (H2 == null) return H1;
    if (H1.key > H2.key) { exchange H1 and H2 }
    // now we know H1.root.key <= H2.root.key
    HN = Union(H1.right, H2);
    return new node(H1.data, HN, H1.left) 
}

Recall: Heavy-light decomposition from Link/cut trees

  • D(x): number of descendants of node x, including x (size of a node)
  • An edge from v to its parent p is called heavy if D(v)> D(p)/2, otherwise it is light.

Observations:

  • each path from v to root at most lg n light edges because with each light edge the size of the subtree goes down by at least 1/2
  • each node has at most one child connected to it by a heavy edge

Potential function: the number of heavy right edges (right edge is an edge to a right child)

Amortized analysis of union:

  • Consider one recursive call. Let left child of H1 be A, right child B
  • Imagine cutting edges from root of H1 to it children A and B before calling recursion and creating new edges to Union(B,H2) and A after recursive call
  • Let real cost of one recursive call be 1
  • If edge to H1.B was heavy
    • cutting edges has delta Phi = -1
    • adding edges back does not create a heavy right edge, therefore delta Phi = 0
    • amortized cost is 0
  • If edge to H1.B was light
    • cutting edges has delta Phi = 0
    • adding edges creates at most one heavy right edge, therefore delta Phi <= 1
    • amortized cost is <= 2

All edges light edges we encounter in the algorithm are on the rightmost path in one of the two heaps, which means there are at most 2 lg n of the, (lg n in each heap). This gives is amortized cost at most 4 lg n.

Pairing heaps

  • Simple and fast in practice
  • hard to analyze amortized time
  • Fredman, Michael L.; Sedgewick, Robert; Sleator, Daniel D.; Tarjan, Robert E. (1986), "The pairing heap: a new form of self-adjusting heap", Algorithmica 1 (1): 111–129
  • Stasko, John T.; Vitter, Jeffrey S. (1987), "Pairing heaps: experiments and analysis", Communications of the ACM 30 (3): 234–249
  • Moret, Bernard M. E.; Shapiro, Henry D. (1991), "An empirical analysis of algorithms for constructing a minimum spanning tree", Proc. 2nd Workshop on Algorithms and Data Structures, Lecture Notes in Computer Science 519, Springer-Verlag, pp. 400–411
  • Pettie, Seth (2005), "Towards a final analysis of pairing heaps", Proc. 46th Annual IEEE Symposium on Foundations of Computer Science, pp. 174–183
  • a tree with arbitrary degrees
  • each node pointer to the first child and next sibling (for delete and decreaseKey also pointer to previous sibling or parent if first child)
  • each child has key >= key of parent
  • linking two trees H1 and H2 s.t. H1 has smaller key:
    • make H2 the first child of H1 (original first child of H1 is sibling of H2)
    • subtrees thus ordered by age of linking from youngest to oldest
  • extractMin is the most complex operation, the rest quite lazy
  • insert: make a new tree of size 1, link with H
  • union: link H1 and H2
  • decreaseKey: cut x from its parent, decrease its key and link to H
  • delete: cut x from parent, do extractMin on this tree, then link result to H

ExtractMin

  • remove root, combine resulting trees to a new tree (actual cost up to n)
  • start by linking pairs (1st and 2nd, 3rd and 4th,...), last may remain unpaired
  • finally we link each remaining tree to the last one, starting from next-to-last back to the first (in the opposite order)
    • other variants of linking were studied as well

O(log n) amortized time can be proved by a similar analysis as in splay trees

  • imagine as binary tree - rename first child to left and next sibling to right
  • D(x): number of descendants of node x, including x (size of a node)
  • r(x): lg (D(x)) rank of a node
  • potential of a tree - sum of ranks of all nodes
  • details in Fredman et al 1986
  • Better bounds in Petie 2005: amortized O(log n) ExtractMin; O(2^{2\sqrt{log log n}}) insert, union, decreaseKey; but decreaseKey not constant as in Fibonacci heaps)

Soft heap

  • http://en.wikipedia.org/wiki/Soft_heap
  • Chazelle, B. 2000. The soft heap: an approximate priority queue with optimal error rate. J. ACM 47, 6 (Nov. 2000), 1012-1027.
  • Kaplan, H. and Zwick, U. 2009. A simpler implementation and analysis of Chazelle's soft heaps. In Proceedings of the Nineteenth Annual ACM -SIAM Symposium on Discrete Algorithms (New York, New York, January 4––6, 2009). Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, 477-485. pdf
  • provides constant amortized time for insert, union, findMin, delete
    • but corrupts (increases) keys of up to epsilon*n elements, where epsilon is a fixed parameter in (0,1/2) and n is the number of insertions so far
    • useful to obtain minimum spanning tree in O(m alpha(m,n)) where alpha is the inverse of Ackermann's function