bpo-31589: Add config for LaTeX handling of stray Unicode chars in PDF by jfbu · Pull Request #4069 · python/cpython (original) (raw)

@JulienPalard unfortunately I couldn't describe a simple procedure working all the time. What I would do is use utf8x option to inputenc to see what happens:

\documentclass{article} \usepackage[T1]{fontenc} \usepackage[utf8x]{inputenc} \begin{document} №, ×, € \end{document}

Then I try pdflatex this document. There are errors \textnumero undefined, and \texteuro undefined. These errors are more explicit than those which would come from utf8, which would say Unicode char № (U+2116) not set-up for LaTeX only. I am aware there is some package textcomp which provides additional symbols, so I try again with

\documentclass{article} \usepackage[T1]{fontenc} \usepackage[utf8x]{inputenc} \usepackage{textcomp} \begin{document} №, ×, € \end{document}

and it all works. Then I try my luck again with utf8

\documentclass{article} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{textcomp} \begin{document} №, ×, € \end{document}

and it works... Does adding \usepackage{textcomp} to your preamble solve your issues ?

edit make sure to read bottom of this before trying...

You can find the additional Unicode code-points it defines in file ts1enc.dfu (kpsewhich ts1enc.dfu returns /usr/local/texlive/2017/texmf-dist/tex/latex/base/ts1enc.dfu on my system). Perhaps Sphinx should do \usepackage{textcomp} per default.

This does not quite answer your question; as utf8x (which works with package ucs) has extensive support files, I sometimes have to dig into them to find out which font encoding and which font slot I should use in \newunicodechar. For example imagine I am looking for ℂ which is U+2102.

$ kpsewhich ucs.sty
/usr/local/texlive/2017/texmf-dist/tex/latex/ucs/ucs.sty

$ pushd /usr/local/texlive/2017/texmf-dist/tex/latex/ucs
/usr/local/texlive/2017/texmf-dist/tex/latex/ucs ~/_texlatex/1711

$ grep -r 8450
data/uni-111.def:\uc@dclc{28450}{cjkbg5}{\u@cjk@Bgv1693}%
data/uni-111.def:\uc@dclc{28450}{cjkjis}{\jischar{3441}}%
data/uni-150.def:\uc@dclc{38450}{cjkbg5}{\u@cjk@Bgv05A7}%
data/uni-150.def:\uc@dclc{38450}{cjkgb}{\u@cjk@GB0933}%
data/uni-150.def:\uc@dclc{38450}{cjkjis}{\jischar{4B49}}%
data/uni-250.def:\uc@dclc{64071}{autogenerated}{\unichar{28450}}%
data/uni-250.def:\uc@dclc{64154}{autogenerated}{\unichar{28450}}%
data/uni-33.def:\uc@dclc{8450}{default}{\ensuremath{\mathbb C}}%

There are false-positive but I find the definition \ensuremath{\mathbb C}. Thus I can do

\newunicodechar{ℂ}{\ensuremath{\mathbb C}}

(the ams packages loaded by Sphinx provide \mathbb blackboard alphabet -- this is just an example).

When I see that the utf8x defintion would use some \text... macro, I try my luck with textcomp package. Or I should have tried that first...

I understand the whole thing is a bit scary. And then we have additional problem:

\documentclass{article} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{textcomp} \usepackage{times}% default for Sphinx \begin{document} №, ×, € \end{document}

gives error:

./temp77.tex:8: Package textcomp Error: Symbol \textnumero not provided by
(textcomp)                font family ptm in TS1 encoding.
(textcomp)                Default family used instead.

which is ridiculous because this should be only a warning, not an error. It said that it had to use computer modern, not Times font. Which means to avoid that error we must do

\newunicodechar{№}{{\fontfamily{cmr}\selectfont\textnumero}}

Final mwe

\documentclass{article} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{textcomp} \usepackage{times} \usepackage{newunicodechar} \newunicodechar{№}{{\fontfamily{cmr}\selectfont\textnumero}} \begin{document}

№, ×, € %\showoutput \end{document}

...whow :-( ... well we did it...