Document Type
Article
Publication Date
4-2019
Subject: LCSH
Computer crimes--Investigation, Computer forensics, Hashing (Computer science), Cyber forensics
Disciplines
Computer Engineering | Computer Sciences | Electrical and Computer Engineering | Forensic Science and Technology | Information Security
Abstract
In recent years different strategies have been proposed to handle the problem of ever-growing digital forensic databases. One concept to deal with this data overload is data reduction, which essentially means to separate the wheat from the chaff, e.g., to filter in forensically relevant data. A prominent technique in the context of data reduction are hash-based solutions. Data reduction is achieved because hash values (of possibly large data input) are much smaller than the original input. Today's approaches of storing hash-based data fragments reach from large scale multithreaded databases to simple Bloom filter representations. One main focus was put on the field of approximate matching, where sorting is a problem due to the fuzzy nature of the approximate hashes. A crucial step during digital forensic analysis is to achieve fast query times during lookup (e.g., against a blacklist), especially in the scope of small or ordinary resource availability. However, a comparison of different database and lookup approaches is considerably hard, as most techniques partially differ in considered use-case and integrated features, respectively. In this work we discuss, reassess and extend three widespread lookup strategies suitable for storing hash-based fragments: (1) Hash database for hash-based carving (hashdb), (2) hierarchical Bloom filter trees (hbft) and (3) flat hash maps (fhmap). We outline the capabilities of the different approaches, integrate new extensions, discuss possible features and perform a detailed evaluation with a special focus on runtime efficiency. Our results reveal major advantages for fhmap in case of runtime performance and applicability. Hbft showed a comparable runtime efficiency in case of lookups, but hbft suffers from pitfalls with respect to extensibility and maintenance. Finally, hashdb performs worst in case of a single core environment in all evaluation scenarios. However, hashdb is the only candidate which offers full parallelization capabilities, transactional features, and a Single-level storage.
DOI
10.1016/j.diin.2019.01.020
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Repository Citation
Liebler, Lorenz; Schmitt, Patrick; Baier, Harald; and Breitinger, Frank, "On Efficiency of Artifact Lookup Strategies in Digital Forensics" (2019). Electrical & Computer Engineering and Computer Science Faculty Publications. 87.
https://digitalcommons.newhaven.edu/electricalcomputerengineering-facpubs/87
Publisher Citation
Liebler, L., Schmitt, P., Baier, H., & Breitinger, F. (2019). On efficiency of artifact lookup strategies in digital forensics. Digital Investigation, 28, S116-S125.
Included in
Computer Engineering Commons, Electrical and Computer Engineering Commons, Forensic Science and Technology Commons, Information Security Commons
Comments
© 2019 The Author(s). Published by Elsevier Ltd on behalf of DFRWS. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)