Bash – Remove Duplicate Files

So, a common issue I have is finding and removing duplicate files. Here’s a little bash script that helps clean up the mess:

#!/usr/bin/env bash

declare -r _root="${1:-.}"
declare -a _hashes=
declare -a _files=
declare _test=

_files=($(find ${_root} -type f))
# first pass - collect all of the sha1 checksums
for _file in "${_files[@]}"; do
   _hash="$(sha1sum "${_file}" | awk '{print $1}')"
   for _test in "${_hashes[@]}"; do
     [[ ${_test} == "${_hash}" ]] && rm "${_file}"
   done;
   _hashes+=("${_hash}")
done

Let me know what you think! If you have suggestions or modifications, I’m open to them.

Leave a Reply