Document layout analysis for semantic information extraction