An incremental clustering scheme for duplicate detection in large databases