Unable to extract subset font name using the newer versions of PyMuPDF : 1.24.6 and 1.24.7. (original) (raw)
Description of the bug
I am trying to extract the subset font name from span.
Although I set subset font names to TRUE, I can see the output as below.
Calibri MinionPro
My expected output is as follows:
AGDFCT+Calibri
THUGIK+MinionPro
How to reproduce the bug
This is the code that can help reproduce the bug
import fitz
from fitz import TOOLS
TOOLS.set_subset_fontnames(True)
doc=fitz.open("input.pdf")
for page in doc:
for block in page.get_text("dict")["blocks"]:
if block['type'] == 0:
if 'lines' in block.keys():
for line in block["lines"]:
for span in line["spans"]:
font_name=span["font"]
print(font_name)
doc.close()
PyMuPDF version
1.24.6
Operating system
Windows
Python version
3.8