General Topics

IntegrityCheckerJava

DiskTester

MemoryTester

IntegrityChecker

dgl

Tips and How-To

Troubleshooting

IntegrityChecker java (icj): 'dupes' command

The dupes command finds duplicate files, producing a report showing the duplicates. In some cases, large amounts of space can be saved using the dupes command. In Lloyd’s case, 700GB was saved by detecting duplicating folders of image files.

An update must be done first so that hash information is current—if not done, then icj does its best to ignore any files of unknown status.

Additionally:

By default, duplicate files are only shown above 32K size (this keeps the "noise" down).

* not generally recommended unless files will always remain in the same relative locations on the same volume.

Excluding folders for comparison purposes

The dupes.ignore preference in .icj_prefs can be used to exclude folders from consideration. For example, to exclude the folder /Work/testing, add it to dupes.ignore:

  [dupes.ignore]
  /Work/testing

Command line usage

icj dupes [<path>]*

Like all commands, this one is recursive, cleaning the entire folder hierarchy. If no path is specified, the current working directory is used.

--size option

The --size option specifies a minimum file size below which files are ignored e.g., --size 64K.

--types option

The --types option specifies one or more file extensions, for example "txt", "jpg", "raw". Types are case insensitive.

The following types are special “smart” types:

More than one type can be specified with a comma (no spaces!), e.g., --types doc,docx,rtf,txt,html.

--emit=<rm|clone|symlink|nop> option

By default emits commands to remove or clone duplicate files. These commands can be pasted in for execution. Use some caution because icj cannot know which files should be preferred to keep (though it applies some logic).

Using --emit nop suppresses emission of such commands.

Examples

Lines starting with "#" are comments.

# Show duplicate files in current working directory
icj dupes

# Show duplicate files in current working directory, don't emit any commands for dealing with
icj dupes --

# Show duplicate files on volume Master (or folder Master within current directory):
icj dupes Master

# Show duplicate RAW and jpeg files in Master that (jpg includes .jpg and .jpeg)
icj dupes Master --types RAW,jpg

# Show duplicate files of all types of at least 64K in size on all mounted volumes
icj dupes --size 64K /Volumes/*

# Show duplicate files at least 4K in size of type ".txt" and ".html" on volume Master
icj dupes --size 4K --types txt,html Master

Special note on using the clone feature

Clones require an APFS volume on macOS.

When run, icj dupes will emit a report that includes appropriate commands.

To generate commands suitable for making clones for duplicate files, use the --emit clone option, like this (append the folder or volume name to operate on):

icj dupes --emit clone

Cloning files immediately reclaims all disk space for all duplicates except one. After cloning, all files look and behave the same, and there is actually no way to tell if a file is a clone or not (even with the Finder or other programs) . Therefore, icj will report the duplicates all over again! But there is no harm in re-cloning; it just won't reclaim space that is already reclaimed.

The key decision is whether a clone is desirable, versus just removing the duplicate files. Sometimes you want a duplicate. Other times it is just a mistake. But the beauty of clones is that the decision can be deferred, and the space immediately reclaimed.

If you wish to remove the duplicate files instead, please note that while icj makes a very intelligent guess at which of the duplicates is the best one to keep, that is ultimately your own call:

icj dupes --emit rm

 

The report includes comment lines which start with "#". All the lines (includig the comment lines) can be pasted directly into a Terminal window to execute them.

There

# 72177 bytes /Volumes/Master/diglloyd/DOMAINS/MPG/_defunct/_mpg-pro-one/publish/js/jquery-1.4.2.min.js /Volumes/Master/diglloyd/DOMAINS/MPG/_defunct/_mpg-pro-workstation/publish/js/jquery-1.4.2.min.js /Volumes/Master/diglloyd/DOMAINS/MPG/_defunct/_mpg-pro-laptop/publish/js/jquery-1.4.2.min.js /Volumes/Master/diglloyd/DOMAINS/MPG/_diglloydTools/publish/js/jquery-1.4.2.min.js
cp -c "/Volumes/Master/diglloyd/DOMAINS/MPG/_defunct/_mpg-pro-one/publish/js/jquery-1.4.2.min.js" "/Volumes/Master/diglloyd/DOMAINS/MPG/_defunct/_mpg-pro-workstation/publish/js/jquery-1.4.2.min.js"
cp -c "/Volumes/Master/diglloyd/DOMAINS/MPG/_defunct/_mpg-pro-one/publish/js/jquery-1.4.2.min.js" "/Volumes/Master/diglloyd/DOMAINS/MPG/_defunct/_mpg-pro-laptop/publish/js/jquery-1.4.2.min.js"
cp -c "/Volumes/Master/diglloyd/DOMAINS/MPG/_defunct/_mpg-pro-one/publish/js/jquery-1.4.2.min.js" "/Volumes/Master/diglloyd/DOMAINS/MPG/_diglloydTools/publish/js/jquery-1.4.2.min.js"

 

Previous page: icj compare
Next page: icj sha