GUIDELINES FOR SUMBITTING ANNUAL REPORTS IN PDF FORMAT
Â
REQUIREMENTS:
- NO PASSWORD PROTECTION: ENSURE THAT PDF FILES ARE NOT PASSWORD-PROTECTED OR ENCRYPTED. THE EXTRACTION PROCESS NEEDS TO ACCESS THE CONTENT FREELY, WHICH IS NOT POSSIBLE IF THE PDF IS SECURED WITH A PASSWORD.
- SINGLE-VIEW PAGES: PAGES SHOULD BE IN SINGLE VIEW FORMAT. DOUBLE-PAGE SPREADS CAN CONFUSE THE EXTRACTION PROCESS, LEADING TO INACCURATE TEXT EXTRACTION AND SUMMARIES.
BEST PRACTICES:
- CONSISTENT FORMATTING: MAINTAIN CONSISTENT FORMATTING ACROSS ALL PAGES. INCONSISTENT FORMATTING CAN LEAD TO ERRORS IN TEXT EXTRACTION AND SUMMARIZATION.
- HIGH-QUALITY SCANS: IF DEALING WITH SCANNED DOCUMENTS, ENSURE THEY ARE OF HIGH QUALITY. LOW-RESOLUTION SCANS CAN RESULT IN POOR ACCURACY.
- CONSISTENT PAGE SIZES: ENSURE THAT ALL PAGES IN THE PDF HAVE CONSISTENT SIZES. VARYING PAGE SIZES CAN COMPLICATE THE EXTRACTION PROCESS.
- REMOVE WATERMARKS AND ANNOTATIONS: WATERMARKS, ANNOTATIONS, AND OTHER NON-TEXT ELEMENTS CAN INTERFERE WITH TEXT EXTRACTION. REMOVE THESE ELEMENTS IF POSSIBLE.
- METADATA AND BOOKMARKS: ENSURE THAT THE PDF INCLUDES PROPER METADATA AND BOOKMARKS IF AVAILABLE. WHILE NOT DIRECTLY AFFECTING TEXT EXTRACTION, METADATA CAN HELP IN ORGANIZING AND REFERENCING THE DOCUMENTS.
- USE STANDARD FONTS: USE STANDARD FONTS THAT ARE EASILY RECOGNIZABLE BY TEXT EXTRACTION TOOLS. CUSTOM FONTS CAN SOMETIMES LEAD TO INCORRECT CHARACTER RECOGNITION.