How Cuckoo Filter Can Improve Existing Approximate Matching Techniques
Computer forensics, Cyber forensics, Data structures (Computer science)
Computer Engineering | Computer Sciences | Electrical and Computer Engineering | Forensic Science and Technology
In recent years, approximate matching algorithms have become an important component in digital forensic research and have been adopted in some other working areas as well. Currently there are several approaches, but sdhash and mrsh-v2 especially attract the attention of the community because of their good overall performance (runtime, compression and detection rates). Although both approaches have quite a different proceeding, their final output (the similarity digest) is very similar as both utilize Bloom filters. This data structure was presented in 1970 and thus has been used for a while. Recently, a new data structure was proposed which claimed to be faster and have a smaller memory footprint than Bloom filter - Cuckoo filter
In this paper we analyze the feasibility of Cuckoo filter for approximate matching algorithms and present a prototype implementation called mrsh-cf which is based on a special version of mrsh-v2 called mrsh-net. We demonstrate that by using Cuckoo filter there is a runtime improvement of approximately 37% and also a significantly better false positive rate. The memory footprint of mrsh-cf is 8 times smaller than mrsh-net, while the compression rate is twice than Bloom filter based fingerprint.
James, Joshua I. and Breitinger, Frank, "How Cuckoo Filter Can Improve Existing Approximate Matching Techniques" (2015). Electrical & Computer Engineering and Computer Science Faculty Publications. 34.
Gupta, V., & Breitinger, F. (2015). How Cuckoo Filter Can Improve Existing Approximate Matching Techniques. In James, Joshua I., Breitinger, Frank (Eds.) Digital Forensics and Cyber Crime (pp. 39-52). Springer International Publishing.