Vybrané partie z dátových štruktúr
2-INF-237, LS 2016/17
RMQ a LCA
Z VPDS
Predbežná verzia poznámok, nebojte sa upravovať: pripojiť informácie z prezentácie, pridať viac vysvetľujúceho textu a pod.
Obsah
Introduction to LCA, RMQ
- v is ancestor of u if it is on the path from u to the root
- lca(u,v): node of greatest depth in the intersection of the set of ancestors of u and the set of ancestors of v
- note that a node is ancestor of itself and if u is ancestor of v, then lca(u,v)=u
Task: preprocess tree T in O(n), answer lca(u,v) in O(1)
History: Harel and Tarjan 1984, Schieber a Vishkin 1988 (Gusfield book), Bender and Farach-Colton 2000 (this lecture)
LCA via RMQ
- zname algoritmy a triv. algoritmy na slajde
- definicia poli V, D, R + ich vytvorenie slajdy
- V -- visited nodes
- D -- their depths
- R -- first occurrence of node in V
- assume that LCA(u,v) = x (fig.) and that R[u]<=R[v]
- tree traversal visits x, then subtree with u, then again x, then subtree with v, then again x, and only after that it can get above x
- D[x] = min D[R[u]..R[v]]
- this holds even if LCA(u,v)=u (LCA(u,v) cannot be v if R[u]<R[v])
- if we can quickly find k = arg min_k in {i..j} D[k], the result is V[k]
- we have trasformed LCA to a new problem
Range minimum query RMQ
- goal: preprocess array A so that we can any two indices i and j quickly find index of minimum in A[i..j]
- Alg 1: no preprocessing, O(n) query: find minimum in interval
- Alg 2: O(n^2) preprocessing, O(1) query: precompute for all (i,j) in a table
- Alg 3: O(n log n) preprocessing, O(1) query: store minimum only for intervals of length 2^k
- M[i,k]: index of minimum in A[i..i+2^k-1] for k=1,2,...,floor(log n)
i 0 1 2 3 4 5 6 7 8 9 10 11 12 A[i] 0 1 2 1 2 3 2 1 0 1 0 1 0 -------------------------------------------- k=1 0 1 3 3 4 6 7 8 0 10 10 12 - k=2 0 1 3 3 7 8 8 8 8 10 - - - k=3 0 8 8 8 8 8 - - - - - - -
- Preprocessing M[i,k]: if A[M[i,k-1]]<A[M[i+2^{k-1},k-1]] then M[i,k]=M[i,k-1], else M[i,k]=M[i+2^{k-1},k-1]
- RMQ(i,j):
- let k=floor(log_2(j-i+1)) (max size of a block inside A[i..j]), floor(log x) can be precomputed for i=1..n
- if A[M[i,k]]<A[M[j-2^k+1,k]] return M[i,k] else return M[j-2^k+1,k] (figure of overlapping intervals)
+-1RMQ
- v poli D pre LCA plati, ze susedne cisla sa lisia o 1 D[i]-D[i+1] in {-1,1}
- Ukazeme si lepsi RMQ pre polia majuce tuto vlastnost, tento problem volame +-1RMQ
- rozdelime A na bloky velkosti m = log_2 n/2
- v kazdom bloku najdeme minimum, minima dame do pola A'[0..n'-1] a polohu tychto minim do pola M'[0..n'-1] kde n' = n/m = 2n/log_2 n
- pole A' predspracujeme pomocou alg 3 v case O(n'log n') = O(n)
- pri RMQ uvazujeme dva pripady (obrazok):
- i a j v roznych blokoch (bi a bj)
- spocitaj RMQ v A'[bi+1..bj-1] O(1)
- spocitaj minimum v bloku bi od i po koniec bloku (neskor)
- spocitaj minimum v bloku bj od zaciatku bloku po j (neskor)
- najdi minimum tychto troch cisel
- i a j v tom istom bloku
- spocitaj minimum medzi dvoma indexami v ramci bloku
- i a j v roznych blokoch (bi a bj)
- potrebujeme teda riesit +-1RMQ v ramci blokov
- reprezentacia bloku: prvy prvok + rozdiely medzi susedmi, napr 2,3,2,1 -> 2,+1,-1,-1
- na urcenie polohy minima nepotrebujeme vediet prvy provok, staci iba m-1 rozdielov +1,-1,-1
- kazdy blok sa teda da zapisat ako binarne cislo s m-1 bitmi
- najviac 2^{m-1}<=sqrt(n) roznych typov blokov
- pre kazdy typ predpocitaj vsetky odpovede (alg. 2) v case O(m^2)
- spolu teda O(2^{m-1} m^2) = O(sqrt(n)*log(n)^2) = o(n)
+-1RMQ predpracovanie:
- spocitaj typ kazdeho bloku, uloz do pola O(n)
- pre kazdy typ spocitaj odpovede o(n) (v priemere sa jeden typ opakuje cca sqrt(n) krat, takze usetrime)
- spocitaj pole A' O(n)
- predspracuj pole A' O(n)
spolu O(n)
RMQ(i,j):
- 2x minimum v bloku O(1)
- 1x minimum v A' O(1)
spolu O(1)
LCA predspracovanie:
- prehladanie stromu a vytvorenie poli O(n)
- +-1 RMQ predspracovanie O(n)
LCA(u,v) 2 pristupy do pola R, +-1RMQ na poli D, pristup do pola V O(1)
RMQ via LCA
- vieme teda dobre riesit +-1RMQ a pomocou neho aj LCA
- vseobecne RMQ vieme riesit pomocou LCA
- z pola A vytvorime karteziansky strom:
- v koreni ma polohu i najmensieho cisla
- v lavom podstrome rekurzive definovany strom pre cast pola a[0..i-1]
- v pravom podstrome rekurzivne definovany strom pre cast pola a[i+1..n]
- strom sa da vytvorit v O(n):
- ak uz mame strom pre a[0..i-1] a pridavame a[i], a[i] bude niekde na najpravejsej ceste
- ideme zo spodu po tejto ceste a najdeme prvy vrchol j, pre ktory a[j]<a[i]
- i dame ako prave dieta j, stare prave dieta j dame ako lave dieta i
- kazdy vrchol raz pridame do cesty a raz uberieme (vsetky vrcholy, ktore sme presli pri hladani j uz nie su na pravej ceste), teda celkovo O(n)
- potential function: number of vertices on the rightmost path
- RMQ(i,j) = LCA(ij)
- vezmime si uplne minimum v poli na poz. k: pre ktore dvojice bude vyslekodm RMQ(i,j)?
- pre tie, kde i<=k, j>=k, pre tieto je k aj LCA
- ostatne dvojice su bude uplne nalavo od k alebo uplne napravo, tam je strom definovany rekurzivne a teda tiez funguje
Segment trees
- root correspods to interval [0,n)
- leafs correspond to intervals of length 1, [i,i+1)
- if node corresponds to [i,j)
- left child corresponds to [i,k), right child to [k,j) where k = floor((i+j)/2)
- overall the number of nodes roughly 2n, height ceiling(lg n)
- each interval contains the result for its range (minimum, sum,...)
Decompose query interval [x,y) to a set of disjoint tree intervals (canonical decomposition):
- let current node has interval [i,j) and its left child interval [i,k)
- invariant: [i,j) overlaps with [x,y)
- if [i,j) is a subset of [x,y), include [i,j) in the decomposition and stop
- if [i,k) overlaps with [x,y), recurse on left child
- if [k,j) overlaps with [x,y), recurse on right child
Tree of recursive calls:
- sometimes recursion on only one child, sometimes both (split)
- first follow a single path from the root, each time moving to the left or right child
- last node before first split: smallest interval in the tree which contains whole [x,y), lca of leaves [x,x+1), [y,y+1)
- first split to both left and right: spliting point k inside [x,y)
- after that in the left branch [x,y) always covers right endpoint, therefore even if we split, right recursive calls will terminate in the next step with case 1
- similarly in the right branch [x,y) covers left endpoint, therefore left calls will terminate
- together at most 2 traversals from root to leaf plus a short call in each step, which means at most 4 ceil(lg n) vertices visited
Segment trees can also support dynamic case where we add operation set(i,x) which sets A[i] = x
- this would require large changes in prefix sum or O(n log n) RMQ data structures
- but here only O(log n) vertices on the path from the changed leaf to the root are affected
Print small
- mame pole cisel A, predspracovane pre RMQ
- chceme vypisat vsetky indexy k in {i..j} take ze A[i]<=x
void small(i,j,x) { if(j<i) return; k = rmq(i,j); if(a[k]<=x) { print k; small(i,k-1); small(k+1,j); } }
- analyzujte cas vypoctu vzhladom na p, kde p je pocet vypisanych indexov