Developers Arena: How to Remove Duplicate Lines in Unix

File Can be Sorted Alphabetically
Step 1Make a backup of the file you are working with:
cp document.txt document.txt.bkup

Step 2Issue the command:
sort -u document.txt
This command will sort the file and remove all duplicate lines.

Step 3Remove the blank lines with the command:
uniq document.txt

File Cannot be Sorted Alphabetically
Step 1Make a backup file:
cp document.txt document.txt.bkup

Step 2Issue the following awk command:
awk '!($0 in a) {a[$0];print}' document.txt > unique.txt
Your unique entries will be found in the file named unique.txt

Step 3Rename the text file with the unique lines.
cp unique.txt document.txt
This puts the unique entries back into the original file.

Combine Two Files and Find the Duplicate Lines
Step 1Make a backup file:
cp document.txt document.txt.bkup

Step 2Issue the command:
cat doc1.txt doc2.txt > combine.txt
This command combines doc1.txt and doc2.txt into the file combine.txt

Step 3Remove the duplicate lines.
Use either the sort and uniq commands or the awk command specified above.

Developers Arena

How to Remove Duplicate Lines in Unix

No comments:

Post a Comment

Categories

Previous Posts