How to Remove Duplicate Files In Linux?

10 minutes read

To remove duplicate files in Linux, you can use various methods and commands. Here are some common approaches:

  1. Using fdupes: The fdupes command-line tool is commonly used to find and remove duplicate files. It can be installed using package managers like apt or yum. Once installed, you can run the command followed by the desired directory path. Fdupes will locate duplicate files and prompt you to either delete or preserve them.
  2. Using rdfind: Similar to fdupes, rdfind is another useful command for finding and removing duplicate files. It can be installed using package managers. To scan a specific directory, run the rdfind command along with the path. It will identify duplicate files and provide options to delete them.
  3. Using find and md5sum: You can also find duplicates using the find command in combination with the md5sum utility. First, use the find command to locate all files in a directory recursively. Then, pipe the output to md5sum, which generates a unique hash for each file. Sorting and comparing these hashes will help you identify duplicates, and you can manually delete them.
  4. Manual deletion: If you have a small number of files or want full control, you can manually identify and delete duplicates. You can use file managers or terminal commands like ls to list files, and then compare file sizes, names, or content to find duplicates. Once you identify them, you can delete them using the rm command.


Remember to exercise caution when deleting files, especially if you are not completely certain they are duplicates. It is advisable to make backups before performing deletion operations.

Best Linux Books to Read in 2024

1
Linux Bible

Rating is 5 out of 5

Linux Bible

2
CompTIA Linux+ Certification All-in-One Exam Guide, Second Edition (Exam XK0-005)

Rating is 4.9 out of 5

CompTIA Linux+ Certification All-in-One Exam Guide, Second Edition (Exam XK0-005)

3
How Linux Works, 3rd Edition: What Every Superuser Should Know

Rating is 4.8 out of 5

How Linux Works, 3rd Edition: What Every Superuser Should Know

4
CompTIA Linux+ Study Guide: Exam XK0-005

Rating is 4.7 out of 5

CompTIA Linux+ Study Guide: Exam XK0-005

5
Linux All-In-One For Dummies (For Dummies (Computer/Tech))

Rating is 4.6 out of 5

Linux All-In-One For Dummies (For Dummies (Computer/Tech))

6
The Linux Command Line, 2nd Edition: A Complete Introduction

Rating is 4.5 out of 5

The Linux Command Line, 2nd Edition: A Complete Introduction

7
Linux Basics for Hackers: Getting Started with Networking, Scripting, and Security in Kali

Rating is 4.4 out of 5

Linux Basics for Hackers: Getting Started with Networking, Scripting, and Security in Kali


How to remove duplicate files in Linux using the rmlint command?

To remove duplicate files in Linux using the rmlint command, follow these steps:

  1. Install rmlint if it is not already installed. You can do this by running the following command in your terminal: sudo apt-get install rmlint
  2. Navigate to the directory containing the files you want to scan for duplicates. For example, if your files are in the Documents directory, use the following command to navigate to it: cd Documents
  3. Run the rmlint command with the appropriate options. For example, the following command scans the current directory and all subdirectories, and lists the duplicate files it finds: rmlint -4 -o list . -4 flag tells rmlint to perform a deep scan, considering file content. -o list flag tells rmlint to show the list of duplicate files.
  4. Examine the list of duplicate files displayed by rmlint. Each duplicate is listed with a unique identifier called a "Group ID".
  5. Choose which files you want to remove. You can decide by looking at the Group ID and the file paths.
  6. To remove the chosen duplicate files, rerun the rmlint command with the -d flag followed by the Group ID(s) of the duplicates you want to delete. For example, to remove a file with Group ID 1234, run the following command: rmlint -d 1234 If you want to remove multiple files, separate their Group IDs with commas.


Please note that the rmlint command can irreversibly delete files, so be cautious and ensure you have a backup before running it.


How to delete duplicate files in Linux without moving them to a different location?

To delete duplicate files in Linux without moving them to a different location, you can use the fdupes command-line tool. Here's how you can do it:

  1. Install fdupes if you don't have it already. If you are using a Debian-based distribution, run the following command: sudo apt update sudo apt install fdupes
  2. Once installed, navigate to the directory that contains the files you want to check for duplicates. For example, if the files are in the /home/myuser/documents directory, use the cd command: cd /home/myuser/documents
  3. Run the fdupes command with the -d option, which deletes all but one of each set of duplicate files: fdupes -dN . The -N option ensures that the first file in each set is kept, while the rest are deleted. The . indicates that the command should run on the current directory. Note: Be cautious when using this command as it permanently deletes files.
  4. After running the command, fdupes will prompt you before deleting each duplicate file. It will display a list of duplicates for each set and ask if you want to preserve at least one of the files. You can press "y" (yes) to delete duplicate files or "n" (no) to skip.
  5. Repeat the previous step until fdupes has gone through all the duplicates in the directory.


By following these steps, you can identify and delete duplicate files directly in their current location without the need to move them elsewhere.


How to remove duplicate files by comparing checksums in Linux?

To remove duplicate files by comparing checksums in Linux, you can follow these steps:

  1. Open the terminal in Linux.
  2. Navigate to the directory where you want to search for duplicate files using the cd command. For example, if you want to search for duplicates in the "Documents" folder, use the command cd ~/Documents.
  3. Use the find command along with the exec flag to calculate the checksum (MD5) of each file recursively. Replace directory with the desired directory to search in. The following command calculates the MD5 checksum of each file and prints it with the full path: find directory -type f -exec md5sum {} + > checksums.txt This command will generate a file named checksums.txt in the current directory, containing the MD5 checksums and file paths.
  4. Sort the checksums in the checksums.txt file using the sort command: sort -o checksums.txt checksums.txt
  5. Use the uniq command to remove duplicate consecutive checksums from the sorted checksums.txt file: uniq -w32 -d checksums.txt > duplicates.txt
  6. Sort the duplicates.txt file again and save the sorted entries in the same file: sort -o duplicates.txt duplicates.txt
  7. Use the cut command to extract the duplicate file paths from the duplicates.txt file: cut -f3 -d' ' duplicates.txt > duplicates_paths.txt
  8. Open the duplicates_paths.txt file to see the paths of the duplicate files.
  9. Confirm the paths and make sure that these files are indeed duplicates.
  10. Use the rm command to remove the duplicate files. For example, to delete all duplicate files listed in duplicates_paths.txt, run the following command: xargs -d '\n' rm -f < duplicates_paths.txt


Remember to be cautious when deleting files. Double-check that the files listed in duplicates_paths.txt are indeed duplicate files before executing the rm command.

Facebook Twitter LinkedIn Whatsapp Pocket

Related Posts:

Visual Studio Code is a lightweight and versatile source code editor developed by Microsoft. Although originally designed for Windows, it is possible to run Visual Studio Code on Linux systems as well. Here are the steps to run Visual Studio Code on Linux:Down...
To find recently modified files in Linux, you can use the find command combined with the -mtime option.The general syntax for this command is:find &lt;directory&gt; -mtime &lt;n&gt;Here, &lt;directory&gt; represents the directory in which you want to search fo...
To combine 2 select statements in Oracle, you can use the UNION keyword. This keyword allows you to combine the results of two separate SELECT statements into a single result set. Each SELECT statement must have the same number of columns in the same order. UN...