Describir: A review on knowledge and information extraction from PDF documents and storage approaches