A technique for extracting sub-source similarities from information sources having different formats