How to extract image · jsvine/pdfplumber · Discussion #496 (original) (raw)

Hi @samkit-jain, Thanks for the prompt reply and help. But .images give list of dictionary object with details of the image. I have attached a sample bellow. I want to save these images and process OCR on them. Please help me in this if you can. I know one method of cropping the image out of the page but I want a better solution. Thanks again for your help.

{'x0': Decimal('438.420'), 'y0': Decimal('104.640'), 'x1': Decimal('776.580'), 'y1': Decimal('507.360'), 'width': Decimal('338.160'), 'height': Decimal('402.720'), 'name': 'Im0', 'stream': <PDFStream(398): raw=48424, {'Subtype': /'Image', 'Length': 48423, 'Filter': /'DCTDecode', 'BitsPerComponent': 8, 'ColorSpace': PDFObjRef:378, 'Width': 500, 'Height': 595, 'Interpolate': True, 'Type': /'XObject'}>, 'srcsize': (Decimal('500'), Decimal('595')), 'imagemask': None, 'bits': 8, 'colorspace': [[/'ICCBased', <PDFStream(397): raw=2599, {'Length': 2598, 'Filter': /'FlateDecode', 'N': 3, 'Alternate': /'DeviceRGB'}>]], 'object_type': 'image', 'page_number': 1, 'top': Decimal('104.640'), 'bottom': Decimal('507.360'), 'doctop': Decimal('104.640')}