Should not transform named HTML entites into numeric HTML entities (original) (raw)
CX2: Should not transform named HTML entites into numeric HTML entities
CX2 should not transform things like – into their numeric equivalent - because they are a lot less understandable.
Example on frwiki "Félicie de Hauteville" : original translation
(c. 1078 - c. 1102) compared to (c. 1078 – c. 1102)
(c. 1070 - 3 février 1116) compared to (c. 1070 – 3 February 1116)
(avant 1101 - ?) compared to (before 1101 – ?)
(1101 - 1er mars 1131) compared to (1101 – 1 March 1131)
Event Timeline
We may need to check if this is a side-effect of the content sanitization done for content coming from MT services (or if it is something that can be fixed as part of it). May be related to T213257.
@santhosh - there are several issues that I'd like you to review:
(1) I was checking the fix in cx2-testing on the same article translation: en:Felicia of Sicily -> fr:Félicie de Hauteville. The only MT option in cx2-testing for en-fr translation is Yandex which does not work, so the translation fell back to 'Copy original content'. I used this option for translating problematic content, e.g. (c. 1078 – c. 1102). The result: (c. 1078 – c. 1102) was translated with 'Copy original content' to (c. 1078 - c. 1102) upon publishing.
(2) Translating (en_>es) with MT Apertium (the text is changed to 'Lorem ipsum' in the sample below for avoiding "too much unmodified text" warning), I still see – -> –
'''Id magna nec elit laoreet commodo quis eget nisi''' (c. 1078 @– c. 1102) es un nombre que está utilizado para una.
- [[Sofía de Hungría|Sophia]] (antes de que 1101 @– ?), mujer de un húngaro noble
- King [[Esteban II de Hungría|Stephen II de Hungría]] (1101 @– 1 Marcha 1131)
- Ladislaus (?)
(3)
CX2 should not transform things like – into their numeric equivalent -
The numeric equivalent for – is not - (http://www.howtocreate.co.uk/sidehtmlentity.html)
Dec | Hex | Entity |
---|---|---|
– | – | – |
There is no entity equivalent for -:
(4) – represents en dash, a dash that is used for ranges which is correctly used in the article for (c. 1078 – c. 1102). – is used for hyphens which is incorrect to use for (c. 1078 - c. 1102) according to English grammar rules.
Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Privacy Policy · Code of Conduct · Terms of Use · Disclaimer · CC-BY-SA · GPL · Credits