DELETE all those duplicate files but one based on md5 hash comparision in the current directory tree

10 de Junho de 2009, 0:00 , por Software Livre Brasil - 0sem comentários ainda | Ninguém está seguindo este artigo ainda.
Visualizado 427 vezes
$ find . -type f -print0|xargs -0 md5sum|sort|perl -ne 'chomp;$ph=$h;($h,$f)=split(/\s+/,$_,2);print "$f"."\x00" if ($h eq $ph)'|xargs -0 rm -v --

This one-liner will the *delete* without any further confirmation all 100% duplicates but one based on their md5 hash in the current directory tree (i.e including files in its subdirectories).

Good for cleaning up collections of mp3 files or pictures of your dog|cat|kids|wife being present in gazillion incarnations on hd.

md5sum can be substituted with sha1sum without problems.

The actual filename is not taken into account-just the hash is used.

Whatever sort thinks is the first filename is kept.

It is assumed that the filename does not contain 0x00.

As per the good suggestion in the first comment, this one does a hard link instead:

find . -xdev -type f -print0 | xargs -0 md5sum | sort | perl -ne 'chomp; $ph=$h; ($h,$f)=split(/\s+/,$_,2); if ($h ne $ph) { $k = $f; } else { unlink($f); link($k, $f); }'
Submitted by masterofdisaster

