Towards a Process Model for Hash Functions in Digital Forensics

Document Type

Book Chapter

Publication Date


Subject: LCSH

Computer forensics, Cyber forensics, Hashing (Computer science)


Computer Engineering | Computer Sciences | Electrical and Computer Engineering | Forensic Science and Technology | Information Security


Handling forensic investigations gets more and more difficult as the amount of data one has to analyze is increasing continuously. A common approach for automated file identification are hash functions. The proceeding is quite simple: a tool hashes all files of a seized device and compares them against a database. Depending on the database, this allows to discard non-relevant (whitelisting) or detect suspicious files (blacklisting).

One can distinguish three kinds of algorithms: (cryptographic) hash functions, bytewise approximate matching and semantic approximate matching (a.k.a perceptual hashing) where the main difference is the operation level. The latter one operates on the semantic level while both other approaches consider the byte-level. Hence, investigators have three different approaches at hand to analyze a device.

First, this paper gives a comprehensive overview of existing approaches for bytewise and semantic approximate matching (for semantic we focus on images functions). Second, we compare implementations and summarize the strengths and weaknesses of all approaches. Third, we show how to integrate these functions based on a sample use case into one existing process model, the computer forensics field triage process model.



Publisher Citation

Rybalchenko, A., & Steinebach, M., et al (2014, December). Towards a Process Model for Hash Functions in Digital Forensics. In Gladyshev, P., Marrington, A., and Baggili, I. Digital Forensics and Cyber Crime: Fifth International Conference, ICDF2C 2013, Moscow, Russia, September 26-27, 2013, Revised Selected Papers (Vol. 132, p. 170-186). Springer.