Hešovanie: Rozdiel medzi revíziami

Verzia zo dňa a času 10:19, 17. január 2015

Zdroje

Prednaska L10 Erika Demaina z MIT: http://courses.csail.mit.edu/6.851/spring12/lectures/
Článok z Wikipédia o Bloom filtroch http://en.wikipedia.org/wiki/Bloom_filter
Učebnica Brass 2008 Advanced data structures (Bloom filters)

Osnova

vlastnosti hešovacích funkcií: totally random, universal, k-wise independent, simple tabulation;
chaining, perfect hashing, linear probing,
Bloom filters, locality sensitive hashing

V sylabe na skúšku je len perfect hashing, rozoberané na Erikovej prednáške

a bit string of length b, and k hash functions hi : U -> {1, . . . , b}.
insert(x): set bits h1 (x), . . . , hk (x) to 1.
test if x in the set: compute h1 (y), . . . , hk (y) and check whether all these bits are 1
- if yes, claim x is in the set, but possibility of error
- if no, answer no, surely true

Lemma: if hash functions totally random and independent and b = lg(e)kn bits, error is at most 2^{−k}.

proof later
totally random hash function impractical (need to store hash value for each element of the universe)
error independent of the size of the universe U
lg(e) is approx 1.44. for 1% error, k=7, about 10 bits per element
compare to a hash table, which needs at lead n lg u bits to store data items themselves

Use of Bloom filters

e.g. an approximate index of a larger data structure on a disk - if x not in the filter, do not bother looking at the disk, but small number of false positives not a problem

Proof of lemma

References

Bloom, Burton H. (1970), "Space/Time Trade-offs in Hash Coding with Allowable Errors", Communications of the ACM 13 (7): 422–426
Bloom filter. In Wikipedia, The Free Encyclopedia. Retrieved January 17, 2015, from http://en.wikipedia.org/w/index.php?title=Bloom_filter&oldid=641921171

@@ Riadok 11: / Riadok 11: @@
 V sylabe na skúšku je len perfect hashing, rozoberané na Erikovej prednáške
-===Bloom filters (1970)===
+===Bloom Filter (1970)===
 * a bit string of length b, and k hash functions hi : U -> {1, . . . , b}.
@@ Riadok 19: / Riadok 19: @@
 ** if no, answer no, surely true
-Lemma: if hash functions totally random and independent (impractical) and  b = lg(e)kn bits, error is at most 2^{−k}.
+Lemma: if hash functions totally random and independent and  b = lg(e)kn bits, error is at most 2^{−k}.
 * proof later
+* totally random hash function impractical (need to store hash value for each element of the universe)
 * error independent of the size of the universe U
 * lg(e) is approx 1.44. for 1% error, k=7, about 10 bits per element