Searching with Bloom Filters
BOLT Talks · March 18, 2021
Letting readers search my static sites.
That works, but...
Visitors had to download an 8 MB index file.
Could I have a search index for less?
“Is <this thing> in <that group>?”
You’ve likely used them already...
“Is this site in the naughty list?”
“Is this word in that document?”
- “Definitely not.”
- “Possibly yes.”
- An array of n bits.
- m independent hash functions.
data (any size) → value (fixed size)
Where am I supposed to find
m independent hash
gi(x) = h1(x) + ih2(x) + i2
Absurdly simple example
(10 bits, 1 hash function)
Reducing the error rate
(100 bits, 4 hash functions)
Counting Bloom filters
Relevant search results by term frequency.
Searching the site
One Bloom filter per document.