Incorporating clustering into set similarity join algorithms: The SjClust framework