bpo-31589: Add config for LaTeX handling of stray Unicode chars in PDF by jfbu · Pull Request #4069 · python/cpython (original) (raw)
@JulienPalard unfortunately I couldn't describe a simple procedure working all the time. What I would do is use utf8x
option to inputenc to see what happens:
\documentclass{article} \usepackage[T1]{fontenc} \usepackage[utf8x]{inputenc} \begin{document} №, ×, € \end{document}
Then I try pdflatex this document. There are errors \textnumero undefined
, and \texteuro undefined
. These errors are more explicit than those which would come from utf8
, which would say Unicode char № (U+2116) not set-up for LaTeX
only. I am aware there is some package textcomp
which provides additional symbols, so I try again with
\documentclass{article} \usepackage[T1]{fontenc} \usepackage[utf8x]{inputenc} \usepackage{textcomp} \begin{document} №, ×, € \end{document}
and it all works. Then I try my luck again with utf8
\documentclass{article} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{textcomp} \begin{document} №, ×, € \end{document}
and it works... Does adding \usepackage{textcomp}
to your preamble solve your issues ?
edit make sure to read bottom of this before trying...
You can find the additional Unicode code-points it defines in file ts1enc.dfu
(kpsewhich ts1enc.dfu
returns /usr/local/texlive/2017/texmf-dist/tex/latex/base/ts1enc.dfu
on my system). Perhaps Sphinx should do \usepackage{textcomp}
per default.
This does not quite answer your question; as utf8x
(which works with package ucs
) has extensive support files, I sometimes have to dig into them to find out which font encoding and which font slot I should use in \newunicodechar
. For example imagine I am looking for ℂ which is U+2102.
- I convert 0x2102 to decimal 8450
- I move to
ucs
repertory in my TeX distribution and grep 8450 there
$ kpsewhich ucs.sty
/usr/local/texlive/2017/texmf-dist/tex/latex/ucs/ucs.sty
$ pushd /usr/local/texlive/2017/texmf-dist/tex/latex/ucs
/usr/local/texlive/2017/texmf-dist/tex/latex/ucs ~/_texlatex/1711
$ grep -r 8450
data/uni-111.def:\uc@dclc{28450}{cjkbg5}{\u@cjk@Bgv1693}%
data/uni-111.def:\uc@dclc{28450}{cjkjis}{\jischar{3441}}%
data/uni-150.def:\uc@dclc{38450}{cjkbg5}{\u@cjk@Bgv05A7}%
data/uni-150.def:\uc@dclc{38450}{cjkgb}{\u@cjk@GB0933}%
data/uni-150.def:\uc@dclc{38450}{cjkjis}{\jischar{4B49}}%
data/uni-250.def:\uc@dclc{64071}{autogenerated}{\unichar{28450}}%
data/uni-250.def:\uc@dclc{64154}{autogenerated}{\unichar{28450}}%
data/uni-33.def:\uc@dclc{8450}{default}{\ensuremath{\mathbb C}}%
There are false-positive but I find the definition \ensuremath{\mathbb C}
. Thus I can do
\newunicodechar{ℂ}{\ensuremath{\mathbb C}}
(the ams packages loaded by Sphinx provide \mathbb
blackboard alphabet -- this is just an example).
When I see that the utf8x defintion would use some \text...
macro, I try my luck with textcomp
package. Or I should have tried that first...
I understand the whole thing is a bit scary. And then we have additional problem:
\documentclass{article} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{textcomp} \usepackage{times}% default for Sphinx \begin{document} №, ×, € \end{document}
gives error:
./temp77.tex:8: Package textcomp Error: Symbol \textnumero not provided by
(textcomp) font family ptm in TS1 encoding.
(textcomp) Default family used instead.
which is ridiculous because this should be only a warning, not an error. It said that it had to use computer modern, not Times font. Which means to avoid that error we must do
\newunicodechar{№}{{\fontfamily{cmr}\selectfont\textnumero}}
Final mwe
\documentclass{article} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{textcomp} \usepackage{times} \usepackage{newunicodechar} \newunicodechar{№}{{\fontfamily{cmr}\selectfont\textnumero}} \begin{document}
№, ×, € %\showoutput \end{document}
...whow :-(
... well we did it...