Garrett Wollman reported privately that XEmacs 21.4.22, the current
stable version, doesn't work with the UTF-8 recently introduced
into our commentary. For example, the UTF-8 character '−'
(MINUS SIGN), which is stored as the three bytes "\342\210\222",
displays as 'â\210\222'. For proper names this is annoying but
tolerable, as there's little loss in utility from (say) 'Racoviță'
to its display form 'RacoviÈ\233Ä\203'. But for symbols this is a
real pain that can make it hard to understand the documentation, e.g.,
'Release 2014e – 2014-06-12 21:53:52 −0700' displays as
'Release 2014e â\200\223 2014-06-12 21:53:52 â\210\2220700'.
To work around this problem, make the following substitutions in
commentary to mostly revert these symbols to their pre-UTF-8 versions:
'§' -> 'section', '°' -> 'degrees', '±' -> '+-', '–' -> '-' (en
dash), '—' -> '--' (em dash), '′' -> "'", '″' -> '"', '→' -> '->',
'−' -> '-' (minus sign), '≤' -> '<='. Leave proper names and
foreign words in UTF-8.
Most of this patch is to commentary, to distinguish better among
the similar characters hyphen '-', minus sign '−', en dash '–' (used
for ranges), and em dash '—'. Render ordinary dashes in text as
spaced en dashes, as this gives better-looking results in monospaced
fonts.
* yearistype.sh, zic.c: Reword diagnostics so that they do not use
hyphens as if they were dashes. Since we are sticking to ASCII in
programs, we can't use proper dashes in diagnostics.
In several cases our text files had commentary like this:
# Egypt to cancel daylight saving time
# <a href="http://www.almasryalyoum.com/en/node/407168">
# http://www.almasryalyoum.com/en/node/407168
# </a>
where lines 2 and 4 let the file be viewed by a browser with the
URL being clickable. In practice this approach does not work with
many modern browsers (I tested with Firefox and with Chrome) and
it clutters up the data, so in cases like the above this patch
removes lines 2 and 4.
(Problem reported by Steffen "Daode" Nurpmeso.)