Analyze this document image and provide a comprehensive profile for adaptive extraction. TASK: Learn the document's structure, typography, colors, and layout patterns. This analysis will be used to generate a document-specific extraction prompt. OUTPUT JSON FORMAT: { "document_profile": { "document_type": "certificate|form|report|invoice|letter|contract|id_card|receipt|statement|other", "document_subtype": "specific description (e.g., 'police clearance certificate', 'business registration form')", "confidence": 0.0-1.0, "layout_analysis": { "orientation": "portrait|landscape", "column_count": 1-4, "has_header": true|false, "has_footer": true|false, "has_sidebar": true|false, "primary_layout_pattern": "single_column|two_column|three_column|form_grid|table_heavy|mixed" }, "region_definitions": { "header_zone": { "y_start": 0.0, "y_end": 0.0-1.0, "contains": ["list of content types: logo, organization_name, title, etc."] }, "body_zone": { "y_start": 0.0-1.0, "y_end": 0.0-1.0, "primary_content": "description of main content" }, "footer_zone": { "y_start": 0.0-1.0, "y_end": 1.0, "contains": ["page_number", "date", "qr_code", etc.] } }, "typography_profile": { "font_families_detected": ["serif", "sans-serif", "monospace"], "size_distribution": { "xlarge": {"pixel_range": [min, max], "relative_to_body": 1.5-2.0}, "large": {"pixel_range": [min, max], "relative_to_body": 1.2-1.5}, "normal": {"pixel_range": [min, max], "relative_to_body": 1.0}, "small": {"pixel_range": [min, max], "relative_to_body": 0.7-0.9} }, "default_line_spacing": 1.0-2.0 }, "color_palette": { "dominant_colors": ["#RRGGBB", ...], "accent_colors": ["#RRGGBB", ...], "background_colors": ["#RRGGBB", ...], "border_color": "#RRGGBB|none" }, "table_analysis": { "has_tables": true|false, "table_count": 0-N, "table_styles": [ { "type": "bordered|borderless|partial", "border_color": "#RRGGBB", "header_style": "bold|background|both|none", "column_ratio": [0.3, 0.05, 0.65], "has_alternating_rows": true|false } ] }, "visual_elements": { "logos": [{"position": "top-left|top-center|top-right", "approximate_size": "small|medium|large"}], "photos": [{"position": "...", "approximate_size": "passport|medium|large"}], "qr_codes": [{"position": "bottom-right|..."}], "signatures": [{"position": "..."}], "stamps": [{"position": "..."}], "barcodes": [{"position": "..."}] }, "spacing_analysis": { "paragraph_spacing": "tight|normal|loose", "section_spacing": "minimal|standard|generous", "margin_style": "narrow|normal|wide" } } } ANALYSIS INSTRUCTIONS: 1. DOCUMENT TYPE DETECTION: - Examine overall structure, logos, titles, and content - Certificate: official documents, clearances, licenses - Form: structured fields with labels and values - Report: narrative text with sections - Invoice: billing information, line items - Letter: correspondence format - Contract: legal document with clauses - ID Card: identification document - Receipt: transaction record - Statement: account summary 2. ZONE BOUNDARY DETECTION: - Header: Look for logos, organization names, document titles at TOP - Footer: Look for page numbers, dates, QR codes, URLs at BOTTOM - Measure ACTUAL boundaries as percentages (0.0 to 1.0) of page height - Do NOT use default values - measure from the actual content 3. TYPOGRAPHY ANALYSIS: - Identify font size categories by measuring actual text heights - Calculate relative sizes compared to body text (normal = 1.0) - Note: pixel_range should reflect actual measured heights 4. COLOR EXTRACTION: - Identify dominant text colors (usually black) - Identify accent colors used for highlights, links, headers - Identify background colors (page and element backgrounds) - Identify border colors used in tables or frames 5. TABLE STRUCTURE: - Count tables and identify their styles - Measure actual column ratios from content positions - Note if tables have visible borders, headers, alternating rows 6. VISUAL ELEMENTS: - Locate logos, photos, signatures, stamps, QR codes, barcodes - Record position (e.g., "top-left", "body-right", "bottom-center") - Estimate size (small, medium, large, or specific like "passport") 7. SPACING PATTERNS: - Observe paragraph gaps, section breaks, margins - Classify as tight/normal/loose based on visual density IMPORTANT: - Measure actual values from the document, do NOT use generic defaults - All percentages are relative to page dimensions (0.0 = top/left, 1.0 = bottom/right) - Column ratios should sum to approximately 1.0 - If uncertain about a value, provide your best estimate with lower confidence