Using Hierarchy Hash Files vs Per-Folder Hash Files
The discussion that follows is still correct for icj 3.0, but v3.0 also includes support for extended file attributes on macOS and Linux.
Along with hierarchy files, icj is now by far the most powerful data integrity software solution on the market today.
Hierarchy files
Traditionally, IntegrityChecker (ic) and IntegrityChecker Java (icj) wrote hash validation data into an invisible file in each and every folder (".icj" for icj). Version 2.0 of IntegrityChecker Java was a major update that implemented hierarchy files, a significantly faster and less invasive approach to storing hash data for later validation.
This approach had both benefits and weaknesses. For example, a per-folder hash file is ideal because the hash data travels with the folder if that folder is moved or copied by itself. This is now addressed in version 3.0 with extended attributes (though Windows lacks support for them).
On the other hand, a key goal is data validation of an entire folder hierarchy, and frequently an entire volume where items are not being moved, but being backed-up and need verification. For example, in Lloyd’s Mail folder, there are 114344 folders, which means 114344 ".icj" files must be read and written (one per folder). Versus a single ".icjh" hierarchy file in the top-level folder.
Appeal of hierarchy files
With hierarchy hash files in IntegrityChecker Java 2.0 and later, only a single ".icjh" file in each top-level folder must be read and written. This speeds up certain operations considerably. In addition to the ".icjh" file at the top of each folder hierarchy that is updated, IntegrityChecker automatically places hierarchy files in strategic locations within the hierarchy, as described below.
Also, hierarchy files can be created by the user wherever they are needed by running 'update' on a specific folder, e.g. to provide "travel-along" hash files for that folder. Even so, the vast majority of folders will stay completely untouched by IntegrityChecker, with typically only a few dozen hierarchy files present within a given folder hierarchy. If required, per-folder ".icj" files and can still be written in addition to ".icjh" hierarchy files.
Changing the hierarchy files mode preference
Open the icj preferences file (macOS: "icj pref" will open it). Choices are "both" and "icjh" (default value).
ICJ_FILES_MODE = icjh
Hierarchy files are always used. The 'both' setting also uses traditional per-folder hash files (".icj" files, not recommended for most uses).
When and how to use hierarchy files
Hierarchy files (".icjh") are always written upon update, with per-folder .icj files optional via preferences, as shown above. They are “sticky” and will not be removed unless a clean is done that removes them.
Always do an 'icj update' on all the top-level folder(s) you intend to transfer/copy/backup to another volume. This is advised even in icj 3.0 so that file attributes are all updated as well.
Hierarchy files have several advantages over per-folder hash files:
- Folder modification dates change only for folders in which a hierarchy file is written. Contrast that to per-folder ".icj" files which are written in every folder that is updated, thus changing the modification date of every folder in the hierarchy.
- Only a few hierarchy files need be used, which reduces the number of files to be backed-up dramatically.
- Reduce disk space usage even if used in every folder 3 or 4 levels deep.
- Initial folder scanning before hashing operations is considerably faster.
For most users, a handful of hierarchy files suffice to cover all the bases, volumes, large subfolders, etc. Use them for:
- An entire volume.
- User home directory.
- Large folders or major project folders.
- Folders which are never likely to be split apart or to have subfolders moved around, e.g., the Mail folder.
In general, there is only modest downside to using hierarchy files liberally; think of them as glomming-together everything for a folder and all its subfolders. Even a few hundred hierarchy files can be processed extremely rapidly.
Efficiency is robust with hierarchy files; while using them won’t speed up hashing itself, update, status, etcetera can load hashes for half a million files in under 30 seconds on a fast computer. Furthermore, icj always knows which hierarchy file is authoritative and can simply skip over any hierarchy files in subfolders. So in practice, typically only one hierarchy file need be processed.
Where IntegrityChecker creates hierarchy files
Hierarchy files are created in various places according to two methods.
Automatic selection of folders for writing hierarchy files
With HIERARCHY_FILES_MODE = auto, additional hierarchy files are written (or not) as follows:
- With HIERARCHY_FILES_MODE = auto, hierarchy files are written for a folder based on its file count and total space used (recursive totals for each).
- Folders matching patterns found in [hierarchy.auto-write] cause a hierarchy file to be written in each matching folder.
- Folders matching patterns found in [hierarchy.no-auto-write] PREVENT a hierarchy file from being written in each matching folder.
Implicit hierarchy files based on arguments
When an icj update is done, a hierarchy file (".icjh") is written in every specified top-level folder/volume (with no arguments that just means whatever the current directory is). For example:
# writes a hierarchy file for the volume(or folder) Photos:
icj update Photos
# writes hierarchy files for the folders (or volumes) AAA, BBB, CCC:
icj update AAA BBB CCC
# writes hierarchy files for all folders in the current directory:
icj update *
# hash the user home folder, writing a hierarchy file for it
icj update ~
# hash the Mail folder, writing a hierarchy file for it
icj update ~Library/Mail
# hash the folder/volume "Photos", writing a hierarchy file for it
icj update Photos
# hash all folders in folder/volume "Photos", writing a hierarchy file in each
icj update Photos/*
# ditto, but within all subfolders of subfolders of Photos, then subfolders of Photos, then Photos itself
icj update Photos/*/*
icj update Photos/*
icj update Photos
When to use per-folder .icjh files (not recommended for general use)
Per-folder .icj files are useful when many folders (subfolders, sub-sub-folders, etc.) are often being moved or copied elsewhere. However, support for extended file attributes on macOS and Linux makes this point largely moot.
Barring that, the reasons for using per-folder .icjh files are not very persuasive, so long as appropriate folders all use hierarchy files; that is, use icj update on every subfolder you might move/copy/backup separately from its enclosing folder structure—see the examples as well as [hierarchy.auto-write].
When rearranging folders, it may be helpful to temporarily enable ICJ_FILES_MODE= both, the when done, set it back to ICJ_FILES_MODE = icjh, which will remove the per-folder .icj files.
Copyright © 2022 diglloyd Inc, all rights reserved