Bloom filter to test whether a given word is in a set or not in Java using MD5 hashing

Posted By: Achchuthan Yogarajah - 7:26 PM

Share

& Comment

There are many circumstances where we need to find out if something is a member of a set, and there are many algorithms for doing it. For example a spell checker checks to see whether a given word is in the dictionary. Holding 250,000 words in memory for a spell checker might be too big an overhead if your target environment has low memory.

Bloom filters are one solution for the above mentioned problem. It works as follows :
  • Take a big array of bits, initially all zero.
  • Then take each word from your dictionary of words.
  •  Produce ‘n’ independent hash values for each word. Set the bit positions of the array at the corresponding hash values. Sometimes there’ll be clashes, where the bit will already be set from some other word. This doesn't matter.
To check whether a given word is in the dictionary, do the following :

  • Produce the n hash values of the given word.
  • Then check to see if each of the bits corresponding to these hash values is set.
  • If any bit is not set, then you never loaded that word in, and you can say that word is not in the dictionary

To find the hash values, MD5 algorithm can be used. MD5 algorithm produces a 128-bit (16-byte) hash value and then take smaller hash values by extracting sequences of bits from the result.

Example : To store the word "computer"
MD5 value in decimal for the word "computer" is - 296852903797016122161588463667493754506
Taking 3 digits from the hash value, bit positions 296, 852, 903, 797, 016, 122, 161, 588, 463, 667,493, 754 and 506 needed to be set.

Example 2: ATX - Hash value: 71643585294187099263809470221633738211. Bit positions to set are 716, 435, 852, 941, 870, 992, 638, 094, 702, 216, 337, 382, 11

Implement the Bloom filter to test whether a given word is in a set or not. A set of terms are given in the file "terms.txt".

Implement the Bloom filter in Java


terms.txt
While your are run the program you must change this( BufferedReader br = new BufferedReader(new FileReader("C:/Users/Test/Downloads/2010SP007 - III M/2010SP007 - III M/terms.txt"));)



Output of this code :
Check for the word "chipset" in the list
--------------------------
Success : Word is in the List
+++++++++++++++++++++++++++
Check for the word "data" in the list
--------------------------
Success : Word is in the List
+++++++++++++++++++++++++++
Check for the word "achchuthan" in the list
--------------------------
Word is NOT in the List
+++++++++++++++++++++++++++

About Achchuthan Yogarajah

I’m passionate about Web Development and Programming and I go to extreme efforts to meet my passion. I’m a believer of learning the fundamentals first. I try to understand everything little bit more than the average.

2 comments :

Post a Comment

Thank you for vising Java90.blogspot.com

Copyright © 2016 Java Examples ACHCHUTHAN.ORG. Designed by Templateism .