Electrical & Computer Engineering and Computer Science Faculty Publications

Similarity Hashing Based on Levenshtein Distances

Frank Breitinger, University of New HavenFollow
Georg Ziroff, Hochschule Darmstadt
Steffen Lange, Hochschule Darmstadt
Harald Baier, Hochschule Darmstadt

Author URLs

Professor Breitinger's Faculty Profile

Professor Breitinger's web page

Professor Breitinger's Full Bibliography

Document Type

Book Chapter

Publication Date

2014

Subject: LCSH

Cyber forensics, Computer forensics, Hashing (Computer science)

Disciplines

Computer Engineering | Computer Sciences | Electrical and Computer Engineering | Forensic Science and Technology | Information Security

Abstract

It is increasingly common in forensic investigations to use automated pre-processing techniques to reduce the massive volumes of data that are encountered. This is typically accomplished by comparing fingerprints (typically cryptographic hashes) of files against existing databases. In addition to finding exact matches of cryptographic hashes, it is necessary to find approximate matches corresponding to similar files, such as different versions of a given file.

This paper presents a new stand-alone similarity hashing approach called saHash, which has a modular design and operates in linear time. saHash is almost as fast as SHA-1 and more efficient than other approaches for approximate matching. The similarity hashing algorithm uses four sub-hash functions, each producing its own hash value. The four sub-hashes are concatenated to produce the final hash value. This modularity enables sub-hash functions to be added or removed, e.g., if an exploit for a sub-hash function is discovered. Given the hash values of two byte sequences, saHash returns a lower bound on the number of Levenshtein operations between the two byte sequences as their similarity score. The robustness of saHash is verified by comparing it with other approximate matching approaches such as +sdhash+.

Comments

Purchase chapter or book here

Find in a library

Also in UNH library

IFIP Advances in Information and Communication Technology series, Vol. 433

DOI

10.1007/978-3-662-44952-3_10

Repository Citation

Breitinger, Frank; Ziroff, Georg; Lange, Steffen; and Baier, Harald, "Similarity Hashing Based on Levenshtein Distances" (2014). Electrical & Computer Engineering and Computer Science Faculty Publications. 64.
https://digitalcommons.newhaven.edu/electricalcomputerengineering-facpubs/64

Publisher Citation

Breitinger, Frank; Ziroff, Georg; Lange, Steffen; Baier, Harald (2014): Similarity Hashing Based on Levenshtein Distances. In: Peterson, Gilbert; Shenoi, Sujeet (Ed.): Advances in Digital Forensics X, pp. 133-147, Springer Berlin Heidelberg, 2014, ISBN: 978-3-662-44951-6.

Link to Full Text

COinS

Electrical & Computer Engineering and Computer Science Faculty Publications

Similarity Hashing Based on Levenshtein Distances

Author URLs

Document Type

Publication Date

Subject: LCSH

Disciplines

Abstract

Comments

DOI

Repository Citation

Publisher Citation

Search

Browse

Author Corner

Links

Library Link

Electrical & Computer Engineering and Computer Science Faculty Publications

Similarity Hashing Based on Levenshtein Distances

Authors

Author URLs

Document Type

Publication Date

Subject: LCSH

Disciplines

Abstract

Comments

DOI

Repository Citation

Publisher Citation

Share

Search

Browse

Author Corner

Links

Library Link