Hashing of File Blocks: When Exact Matches Are Not Useful
Transcript of Hashing of File Blocks: When Exact Matches Are Not Useful
![Page 1: Hashing of File Blocks: When Exact Matches Are Not Useful](https://reader034.fdocuments.in/reader034/viewer/2022042609/626339783df0f338e56e0b96/html5/thumbnails/1.jpg)
Hashing of File Blocks: WhenExact Matches Are Not Useful
Douglas White
![Page 2: Hashing of File Blocks: When Exact Matches Are Not Useful](https://reader034.fdocuments.in/reader034/viewer/2022042609/626339783df0f338e56e0b96/html5/thumbnails/2.jpg)
DisclaimerTrade names and company products are mentioned in
the text or identified. In no case does suchidentification imply recommendation or endorsementby the National Institute of Standards andTechnology, nor does it imply that the products arenecessarily the best available for the purpose.
Statement of DisclosureThis research was funded by the National Institute of
Standards and Technology Office of LawEnforcement Standards, the Department of JusticeNational Institute of Justice, the Federal Bureau ofInvestigation and the National Archives and RecordsAdministration.
![Page 3: Hashing of File Blocks: When Exact Matches Are Not Useful](https://reader034.fdocuments.in/reader034/viewer/2022042609/626339783df0f338e56e0b96/html5/thumbnails/3.jpg)
Identification of Issues
• A meaningless change of file contentsdrastically changes a hash value
• Amount of data input to investigation isimmense
• Common hash algorithms can not identifysuspect files similar to known files
• Commonly used hash algorithms do not yielduseful data on partial or deleted files
![Page 4: Hashing of File Blocks: When Exact Matches Are Not Useful](https://reader034.fdocuments.in/reader034/viewer/2022042609/626339783df0f338e56e0b96/html5/thumbnails/4.jpg)
National Software Reference Library& Reference Data Set
The NSRL is conceptually three objects:• A physical collection of software• A database of meta-information• A subset of the database,
the Reference Data Set
The NSRL is designed to collect software fromvarious sources and compute hashes ofknown applications. For the purpose ofblock hashes, we assume applications arebenign.
![Page 5: Hashing of File Blocks: When Exact Matches Are Not Useful](https://reader034.fdocuments.in/reader034/viewer/2022042609/626339783df0f338e56e0b96/html5/thumbnails/5.jpg)
Perturbing File Hashes
Use of cryptographic hashes to automaticallyidentify files is absolute, too precise.
When dealing with morphing digital objects,such sorting leaves many files to be dealt withby manual review.
The NSRL hashset is commonly used toautomatically remove benign known itemsfrom human processing, which is fail-safe.
![Page 6: Hashing of File Blocks: When Exact Matches Are Not Useful](https://reader034.fdocuments.in/reader034/viewer/2022042609/626339783df0f338e56e0b96/html5/thumbnails/6.jpg)
Reducing Data Inflow
NSRL file content hash values allowinvestigators to automatically removebenign known items from view.
Known benign data can be identifiedbefore it arrives to investigators.
Is it technically possible to meaningfullyreduce the amount of incoming data?
![Page 7: Hashing of File Blocks: When Exact Matches Are Not Useful](https://reader034.fdocuments.in/reader034/viewer/2022042609/626339783df0f338e56e0b96/html5/thumbnails/7.jpg)
Block Hashes of Files
NSRL is investigating the usefulness ofintroducing the rigor of cryptographicdigital file identification at a granularlevel which supports statisticalidentification of objects .
Block hashing applies the cryptographicalgorithms to smaller-then-filesizeportions of the suspect data.
![Page 8: Hashing of File Blocks: When Exact Matches Are Not Useful](https://reader034.fdocuments.in/reader034/viewer/2022042609/626339783df0f338e56e0b96/html5/thumbnails/8.jpg)
File Selection
NSRL investigated 4096-byte blockhashes of Windows 2000 and WindowsXP operating system files in ourcollection.
NSRL also collected installed file blockhashes from physical and virtualmachines.
![Page 9: Hashing of File Blocks: When Exact Matches Are Not Useful](https://reader034.fdocuments.in/reader034/viewer/2022042609/626339783df0f338e56e0b96/html5/thumbnails/9.jpg)
Block Selection
NSRL investigated 4096-byte block hashvalues.
4096 bytes was the smallest windowconsidered, based on tools, storage andstatistical applicability..2% of collection < 16KB3% of collection < 32KB27% of collection < 128KB
![Page 10: Hashing of File Blocks: When Exact Matches Are Not Useful](https://reader034.fdocuments.in/reader034/viewer/2022042609/626339783df0f338e56e0b96/html5/thumbnails/10.jpg)
Block Hashing Benefits
• File-based data reduction leaves anaverage of 30% of disk space forhuman investigation
• Incorporating block hashes reduceshuman review to 15% of disk space
• Assist in recognizing wiped media• Assist in profiling media use
![Page 11: Hashing of File Blocks: When Exact Matches Are Not Useful](https://reader034.fdocuments.in/reader034/viewer/2022042609/626339783df0f338e56e0b96/html5/thumbnails/11.jpg)
Physical and Virtual Machines
• P-XP vs. P-XP = 83% 8,679 files• P-XP vs. V-XP = 85%• V-XP vs. V-XP = 91%• P-W2K vs. P-W2K = 85% 7,688 files• P-W2K vs. V-W2K = 89%• V-W2K vs. V-W2K = 94%
![Page 12: Hashing of File Blocks: When Exact Matches Are Not Useful](https://reader034.fdocuments.in/reader034/viewer/2022042609/626339783df0f338e56e0b96/html5/thumbnails/12.jpg)
Known - Unknown - Zero2nd 512 MB in W2K NTFS VM
![Page 13: Hashing of File Blocks: When Exact Matches Are Not Useful](https://reader034.fdocuments.in/reader034/viewer/2022042609/626339783df0f338e56e0b96/html5/thumbnails/13.jpg)
Next Steps
Investigate a wider variety of applicationsAutomation & virtualization of installationComparison with “fuzzy” hashesStorage in Bloom filterPrototype disk block imager“Smart unpacking” of remaining data
![Page 14: Hashing of File Blocks: When Exact Matches Are Not Useful](https://reader034.fdocuments.in/reader034/viewer/2022042609/626339783df0f338e56e0b96/html5/thumbnails/14.jpg)
ContactsDouglas [email protected]
Barbara GuttmanSoftware Diagnostics & Conformance Testing [email protected]
Sue Ballou, Office of Law Enforcement StandardsRep. For State/Local Law [email protected]