Detect all text blocks in the image. Group consecutive lines that form a single paragraph together. For each text block, output: - bbox_2d: [x1, y1, x2, y2] coordinates - text_content: The complete text of the block (join multi-line text with spaces) Output JSON array format: [{"bbox_2d": [x1,y1,x2,y2], "text_content": "text here"}, ...]