* zdump.c, zic.c: Don't include <ctype.h>, because the behavior if
its macros is locale-dependent if HAVE_GETTEXT, and we want the C
locale's semantics. Instead, use portable replacements for
ctype.h operations, with the desired semantics. All uses of
isalpha etc. replaced by calls to new functions is_alpha etc. or
by inline code.
(isascii): Remove.
(is_alpha): New function.
* zic.c (doabbr): Simplify by using is_alpha.
(is_space): New function.
* NEWS: Document the above.
* zic.c (componentcheck, namecheck): Warn about all '.'s in
the file name, not merely about "." and ".." file name components.
* zic.8 (DESCRIPTION), NEWS: Document this.
For *.txt files, UTF-8 is not that important, and avoiding it
is better for users with older or misconfigured manpage readers.
* Makefile (check_character_set): Check that *.txt files
contain only safe characters.
($(MANTXTS)): Build *.txt files in the C locale, not in en_US.utf8.
Also, document the longstanding exceptions to POSIX.
Derived from Arthur David Olson's patch in:
http://mm.icann.org/pipermail/tz/2014-June/021052.html
* Theory: Document the exceptional file name components in 'etcetera'
and 'backward' that contain '+' or digits, or are too long.
* zic.8 (DESCRIPTION): Document the change to zic.
* zic.c (component_length_check, namecheck): New functions.
(dolink, writezone): Use them to check that file name components
obey the POSIX rules, and do not contain "." or "..".
* NEWS: Document the above.
* Makefile (NONSYM_CHAR, NONSYM_LINE): New macros.
(SAFE_SHARP_LINE): Don't let comments contain troublesome symbols.
(check_character_set): Don't let troublesome symbols appear in
source files, and don't let non-ASCII appear in .html files.
Change attributions for contributors' names using the Latin alphabet
only where it was possible to reconstruct from mailing list records.
Leave attributions for Андрей Чернов, Иофис Герман Романович, and 李耀
宗 with their romanized versions instead (Andrey A. Chernov, German
Iofis, and Lee Yiu Chung, respectively).
(Thanks to Garrett Wollman for reporting the problem.)
* NEWS, Theory, northamerica: Add "coding: utf-8" directives so
that GNU Emacs, when run in a unibyte locale, does not mistakenly
interpret these files to be unibyte. This change should make no
difference if you use Emacs to read these files in a UTF-8 locale.
I didn't bother to add coding directives to files where GNU Emacs
already deduces the encoding correctly in a Latin-1 locale.
Garrett Wollman reported privately that XEmacs 21.4.22, the current
stable version, doesn't work with the UTF-8 recently introduced
into our commentary. For example, the UTF-8 character '−'
(MINUS SIGN), which is stored as the three bytes "\342\210\222",
displays as 'â\210\222'. For proper names this is annoying but
tolerable, as there's little loss in utility from (say) 'Racoviță'
to its display form 'RacoviÈ\233Ä\203'. But for symbols this is a
real pain that can make it hard to understand the documentation, e.g.,
'Release 2014e – 2014-06-12 21:53:52 −0700' displays as
'Release 2014e â\200\223 2014-06-12 21:53:52 â\210\2220700'.
To work around this problem, make the following substitutions in
commentary to mostly revert these symbols to their pre-UTF-8 versions:
'§' -> 'section', '°' -> 'degrees', '±' -> '+-', '–' -> '-' (en
dash), '—' -> '--' (em dash), '′' -> "'", '″' -> '"', '→' -> '->',
'−' -> '-' (minus sign), '≤' -> '<='. Leave proper names and
foreign words in UTF-8.
These changes pacify gcc 4.8.2-19ubuntu1 on Ubuntu 14.04,
and take advantage of recently-added GCC options when compiling
with GCC_DEBUG_FLAGS.
* Makefile (GCC_DEBUG_FLAGS): Add -Wdeclaration-after-statement,
-Wjump-misses-init, -Wsuggest-attribute=format.
* date.c (copyright, sccsid):
* strftime.c (sccsid):
Remove unused vars.
* date.c (main):
* localtime.c (tzparse):
Remove no-longer-needed uses of INITIALIZE; GCC is smart enough to
figure these out on its own now.
* localtime.c (gmtsub): Redo initialization of gmt_is_set to pacify GCC.
Retry malloc next time even if it failed this time.
* private.h, zdump.c (GNUC_or_lint): Remove, as it provoked a GCC
diagnostic about unused macros in some cases. Instead, just use
'lint'. All uses removed.
(TZ_DOMAIN): Don't define unless needed, as otherwise it provokes
a GCC diagnostic about unused macros.
* private.h (INITIALIZE): Simplify accordingly.
* NEWS: Document this.
Most of this patch is to commentary, to distinguish better among
the similar characters hyphen '-', minus sign '−', en dash '–' (used
for ranges), and em dash '—'. Render ordinary dashes in text as
spaced en dashes, as this gives better-looking results in monospaced
fonts.
* yearistype.sh, zic.c: Reword diagnostics so that they do not use
hyphens as if they were dashes. Since we are sticking to ASCII in
programs, we can't use proper dashes in diagnostics.
In several cases our text files had commentary like this:
# Egypt to cancel daylight saving time
# <a href="http://www.almasryalyoum.com/en/node/407168">
# http://www.almasryalyoum.com/en/node/407168
# </a>
where lines 2 and 4 let the file be viewed by a browser with the
URL being clickable. In practice this approach does not work with
many modern browsers (I tested with Firefox and with Chrome) and
it clutters up the data, so in cases like the above this patch
removes lines 2 and 4.
(Problem reported by Steffen "Daode" Nurpmeso.)
This was appropriate before Unicode, where "`" and "'" often balanced
in visual output, but that's been obsolete for a while.
Overall, single-quote 'like this' instead of `like this', and
double-quote "like this" or (in typeset output) “like this”
instead of ``like this''. While we're at it, fix some glitches
with hyphens versus minus versus en dash versus em dash.
* Makefile ($(MANTXTS)): Use UTF-8 locale to produce *.txt output.
* checktab.awk, tzselect.ksh:
Quote 'like this' in diagnostics, instead of `like this'.
* date.1, newctime.3, newtzset.3, time2posix.3, zic.8 (q, lq, rq):
New macros. Use them for better double-quoting in output.
* workman.sh: Tell Perl that its stdin and stdout are UTF-8.
* NEWS: Document this.
* localtime.c (time1): Shrink size of 'types' local back down
from TZ_MAX_TIMES to TZ_MAX_TYPES. The smaller value suffices,
since nseen can grow to at most TZ_MAX_TYPES. (Thanks to
Arthur David Olson.)
* NEWS: Remove mention of non-fix.
* localtime.c (tzload) [ALL_STATE]:
Redo to keep the stack frame tiny.
(time1): Shrink sizes of local arrays so that this stack frame
doesn't take much more than about TZ_MAX_TIMES bytes total.
* Makefile (DOCS): Move $(WEB_PAGES) here ...
(MISC): ... from here, so that they can contain UTF-8.
* tz-art.htm, tz-link.htm: Switch to UTF-8 encoding.
Prefer UTF-8 to HTML entities, e.g., prefer '–' to '–'.
* Makefile (INVALID_CHAR1): Rename to SAFE_CHARSET1.
(INVALID_CHAR2): Rename to SAFE_CHARSET2.
(INVALID_CHAR3): Rename to SAFE_CHARSET3.
(INVALID_CHAR): Remove.
(SAFE_CHARSET, SAFE_CHAR, SAFE_LINE, SAFE_SHARP_LINE, VALID_LINE):
New macros.
(check_character_set): Allow any UTF-8 character in some contexts,
including comments in the data, and in some auxiliary commentary files.
(Thanks to Milamber Space Network.)
* africa (Morocco): 2014 transitions are Jun 28 03:00 and Aug 2 02:00.
Guess a similar pattern for future years. Fix white space to be
more consistent.
* NEWS: Document this.
* africa (Egypt): 2014 transitions are June 26 and July 31 at 24:00.
(Thanks to Imed Chihi.) Guess a similar pattern for future years,
namely, DST is not observed in Ramadan, nor in the days following the
last Thursday before Ramadan, nor in the days preceding the first
Friday after Ramadan.
* NEWS: Document this.