icj 3.0: Extended File Attributes (xattrs or just "attrs")
IntegrityChecker Java 3.0 (icj 3.0) introduces extended file attributes referered to in this documentation as "attrs".
Attrs are “metadata” associated with files and folders. Attrs are maintained by the operating system and cannot be seen in a file browser.
BENEFIT: with hash info in an attr, files can be moved anywhere because hash information travels with it, even without any hierarchy file (icj or icjh file).
BENEFIT: in addition to hash info, file and folder IDs greatly enhance the ability of icj to find files that have been moved, and to inform accordingly.
Along with hierarchy files, icj is now by far the most powerful data integrity software solution on the market today.
Attributes used by icj
Two types of extended file attributes ("attrs") are used by icj. This is handled seamlessly during update and verify:
user.diglloyd.icj.HashInfo#S — stores hash information for a file
user.diglloyd.icj.ID#N — stores file ID or folder ID, unique for that file or folder
Both suffixes are part of the attribute name:
The #S suffix means “preserve when copying or backing up”, well supported by macOS. Even iCloud does so.
The #N suffix
means “do not preserve when copying or backing up”, well supported by macOS.
Operating system support
MacOS supports attrs on all native file systems and some others (even some Windows file systems).
MS Windows file systems do not have attr support.
In general, ZFS does not support attrs.
Linux supports attrs on file systems supporting them. Linux operating system support is crude.
- Linux does not recognize the #S and #N suffixes, nor does it automaticallly copy/preserve attributes.
- Linux does not automaticallly copy/preserve attributes.
- The 'cp' command (see the man page) by default does NOT copy attrs. Nor is there any granularity of which attributes are copied. The if attrs are preserved when copying, the user.diglloyd.icj.ID#N attr is inappropriately preserved (it should be unique to a single file). icj can do nothing about this, and that behavior can lead to finding missing files that might not be the exact-same original file. Not a big deal, but not exactly ideal either.
Tips when making backups
Operation with attrs is automatic when using icj. And on macOS, HashInfo attrs are even preserved with some cloud services, like Apple iCloud.
However, backups present some subtleties which icj cannot address by itself.
For existing backups, use the icj sync command to ensure attributes are brought up to date on the backup. That’s because most backup programs will not copy new attributes unless the file itself is re-copied. This need be done once and only once, since after that the backup program will naturally copy the file and its attributes (if supported and so-configured, generally by default).
Pre-backup:
1. Do icj update on all volumes and/or folders to be backed-up. This ensures not only that icjh/icjh files are updated, but that hash innfo attributes on files are also up to date.
2. On macOS, most backup programs preserve file attributes by default. Check the preferences in your backup program to be sure.
Post backup:
Use icj status to see the status of the backup; it will quickly show things like missing files (it will not actually verify data).
Regularly use icj verify on backup volumes, or at least a quick check via icj status.
After status/verify, an icj update may be appropriate in some scenarios—
Any time when files and folders are copied (not moved), they lose their ID attribute on macOS. This is by design because a copy always necessarliy has a different file ID (hard links being an oddball exception). Thus, doing icj update on the backup is a good idea, so as to update the file/folder IDs in the icjh files (and on Linux, to update the ID attributes, which are incorrect for the copies).
If a partial backup was done (excluding some items), then icjh files will contain information relating to files that do not exist (files that were not backed-up). Perform icj update on the backup after the backup is done.
In mixed scenarios such as multiple backups into different folders on a volume, perform icj update on the backup volume after the backup is done. This will generate one top-level icjh file as well as taking care of reconciling the icjh files vs files that were not backed-up.
Tips (very specific)
On Linux, care should be taken with the 'cp' command and similar so that attributes are preserved via the --preserve option; see the man page for 'cp'. An icj update shoudl also be done to insert correct file and folder IDs.
Copyright © 2022 diglloyd Inc, all rights reserved