Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences
Abstract Motivation: In 2001 and 2002, we published two papers (Bioinformatics, 17, 282–283, Bioinformatics, 18, 77–82) describing an ultrafast protein sequence clustering progr...