General Topics

IntegrityCheckerJava

DiskTester

MemoryTester

IntegrityChecker

dgl

Tips and How-To

Troubleshooting

Using Hierarchy Hash Files vs Per-Folder Hash Files

The discussion that follows is still correct for icj 3.0, but v3.0 also includes support for extended file attributes on macOS and Linux.

Along with hierarchy files, icj is now by far the most powerful data integrity software solution on the market today.

Hierarchy files

Traditionally, IntegrityChecker (ic) and IntegrityChecker Java (icj) wrote hash validation data into an invisible file in each and every folder (".icj" for icj). Version 2.0 of IntegrityChecker Java was a major update that implemented hierarchy files, a significantly faster and less invasive approach to storing hash data for later validation.

This approach had both benefits and weaknesses. For example, a per-folder hash file is ideal because the hash data travels with the folder if that folder is moved or copied by itself. This is now addressed in version 3.0 with extended attributes (though Windows lacks support for them).

On the other hand, a key goal is data validation of an entire folder hierarchy, and frequently an entire volume where items are not being moved, but being backed-up and need verification. For example, in Lloyd’s Mail folder, there are 114344 folders, which means 114344 ".icj" files must be read and written (one per folder). Versus a single ".icjh" hierarchy file in the top-level folder.

Appeal of hierarchy files

With hierarchy hash files in IntegrityChecker Java 2.0 and later, only a single ".icjh" file in each top-level folder must be read and written. This speeds up certain operations considerably. In addition to the ".icjh" file at the top of each folder hierarchy that is updated, IntegrityChecker automatically places hierarchy files in strategic locations within the hierarchy, as described below.

Also, hierarchy files can be created by the user wherever they are needed by running 'update' on a specific folder, e.g. to provide "travel-along" hash files for that folder. Even so, the vast majority of folders will stay completely untouched by IntegrityChecker, with typically only a few dozen hierarchy files present within a given folder hierarchy. If required, per-folder ".icj" files and can still be written in addition to ".icjh" hierarchy files.

Changing the hierarchy files mode preference

Open the icj preferences file (macOS: "icj pref" will open it). Choices are "both" and "icjh" (default value).

ICJ_FILES_MODE = icjh

Hierarchy files are always used. The 'both' setting also uses traditional per-folder hash files (".icj" files, not recommended for most uses).

When and how to use hierarchy files

Hierarchy files (".icjh") are always written upon update, with per-folder .icj files optional via preferences, as shown above. They are “sticky” and will not be removed unless a clean is done that removes them.

Always do an 'icj update' on all the top-level folder(s) you intend to transfer/copy/backup to another volume. This is advised even in icj 3.0 so that file attributes are all updated as well.

Hierarchy files have several advantages over per-folder hash files:

For most users, a handful of hierarchy files suffice to cover all the bases, volumes, large subfolders, etc. Use them for:

In general, there is only modest downside to using hierarchy files liberally; think of them as glomming-together everything for a folder and all its subfolders. Even a few hundred hierarchy files can be processed extremely rapidly.

Efficiency is robust with hierarchy files; while using them won’t speed up hashing itself, update, status, etcetera can load hashes for half a million files in under 30 seconds on a fast computer. Furthermore, icj always knows which hierarchy file is authoritative and can simply skip over any hierarchy files in subfolders. So in practice, typically only one hierarchy file need be processed.

Where IntegrityChecker creates hierarchy files

Hierarchy files are created in various places according to two methods.

Automatic selection of folders for writing hierarchy files

With HIERARCHY_FILES_MODE = auto, additional hierarchy files are written (or not) as follows:

Implicit hierarchy files based on arguments

When an icj update is done, a hierarchy file (".icjh") is written in every specified top-level folder/volume (with no arguments that just means whatever the current directory is). For example:

# writes a hierarchy file for the volume(or folder) Photos:
icj update Photos

# writes hierarchy files for the folders (or volumes) AAA, BBB, CCC:
icj update AAA BBB CCC

# writes hierarchy files for all folders in the current directory:
icj update *

# hash the user home folder, writing a hierarchy file for it
icj update ~

# hash the Mail folder, writing a hierarchy file for it
icj update ~Library/Mail

# hash the folder/volume "Photos", writing a hierarchy file for it
icj update Photos

# hash all folders in folder/volume "Photos", writing a hierarchy file in each
icj update Photos/*

# ditto, but within all subfolders of subfolders of Photos, then subfolders of Photos, then Photos itself
icj update Photos/*/*
icj update Photos/*
icj update Photos

When to use per-folder .icjh files (not recommended for general use)

Per-folder .icj files are useful when many folders (subfolders, sub-sub-folders, etc.) are often being moved or copied elsewhere. However, support for extended file attributes on macOS and Linux makes this point largely moot.

Barring that, the reasons for using per-folder .icjh files are not very persuasive, so long as appropriate folders all use hierarchy files; that is, use icj update on every subfolder you might move/copy/backup separately from its enclosing folder structure—see the examples as well as [hierarchy.auto-write].

When rearranging folders, it may be helpful to temporarily enable ICJ_FILES_MODE= both, the when done, set it back to ICJ_FILES_MODE = icjh, which will remove the per-folder .icj files.

Next page: Usage Tips