Entries for characters which have “IGNORE” on all 4 levels like:
<U0001> IGNORE;IGNORE;IGNORE;IGNORE % START OF HEADING (in ISO 6429)
are changed into:
<U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING (in ISO 6429)
i.e. putting the code point of the character into the fourth level
instead of “IGNORE”. Without that change, all such characters
would compare equal which would make a wcscoll test case fail.
It is better to have a clearly defined sort order even for characters
like this so it is good to use the code point as a tie-break.
* localedata/locales/iso14651_t1_common: Use the code point of a
character in the fourth collation level instead of IGNORE for all
entries which have IGNORE on all 4 levels.
* localedata/locales/iso14651_t1_common: Add some convenient collation
symbols like <AFTER-A>, <BEFORE-A> to make tailoring easier using
rules similar to those in CLDR.
* localedata/locales/iso14651_t1_common: The new version of this
file downloaded from ISO contained several syntax errors which
are fixed by this patch.
[BZ #14095] - Review / update collation data from Unicode / ISO 14651
File downloaded from:
http://standards.iso.org/iso-iec/14651/ed-4/ISO14651_2016_TABLE1_en.txt
Updating this file alone is not enough, there are problems in the new
file which need to be fixed and the collation rules for many locales
need to be adapted. This is done by the following patches.
This update also fixes the problem that many characters are treated as
identical when sorting because they were not yet in the old
iso14651_t1_common file, see:
https://bugzilla.redhat.com/show_bug.cgi?id=1336308
- Infinite (∞) and empty set (∅) are treated as if they were the same character by sort and uniq
[BZ #14095]
* localedata/locales/iso14651_t1_common: Update file to
latest version from ISO (ISO14651_2016_TABLE1_en.txt).
LC_TIME in these 4 locales is identical, using “copy "es_BO"” makes
that more obvious.
[BZ #22646]
* localedata/locales/es_CL (LC_TIME): copy "es_BO".
* localedata/locales/es_CU (LC_TIME): copy "es_BO".
* localedata/locales/es_EC (LC_TIME): copy "es_BO".
[BZ #10871]
* localedata/locales/ru_RU (mon): Rename to...
(alt_mon): This.
(abmon): Rename to...
(ab_alt_mon): This.
(mon): Import from CLDR (genitive case).
(abmon): Copy from the old content except the 5th month which is
now in the genitive case, even when abbreviated.
* localedata/locales/ru_UA: Likewise.
* time/tst-strptime.c (day_tests): Add an actual example of
a difference between %b and %Ob in Russian.
Primary month names are in a genitive case now, alternative month names
are in a nominative case.
The alternative digits hack is no longer needed and has been removed.
[BZ #10871]
* localedata/locales/uk_UA (mon): Renamed to...
(alt_mon): This.
(alt_digits): "0" removed and then renamed to...
(mon): This.
(date_fmt): Definition changed not to use the alternative
digits hack.
[BZ #10871]
* localedata/locales/pl_PL: Alternative month names added,
primary month names are genitive now.
* time/tst-strptime.c (day_tests): Actually use a genitive case
of a month name in Polish language.
Reported-by: Robert Pluim <rpluim@gmail.com>
* localedata/locales/gu_IN (LC_IDENTIFICATION): Fix an obvious typo
in date: "2004-14-09" should be "2004-09-14".
* localedata/locales/lo_LA: Fix an obvious typo in date in the header:
"2003-15-09" should be "2003-09-15".
* localedata/locales/bho_NP (LC_IDENTIFICATION): Fix an obvious typo
in date: "2017-24-07" should be "2017-07-24".
* localedata/locales/mai_IN: Likewise.
* localedata/locales/mai_NP: Likewise.
The current date format prefixes one-digit days with a space, resulting
in ugly two spaces:
$ LC_ALL=hu_HU.UTF-8 date
2018. jan. 1., hétfő, 21:25:35 CET
^^
The official orthography rules doesn't contain an explicit rule about
this (which already gives no sane reason for double space), and an
implicit example of "1848. március 9." under bullet point 296 at
http://helyesiras.mta.hu/helyesiras/default/akh12 contains a single
space only. It's sure not convincing on an HTML page, but I confirm
that the official book edition (e.g.
https://www.libri.hu/en/konyv/a-magyar-helyesiras-szabalyai-32.html)
also contains a single space there.
[BZ #22657]
* localedata/locales/hu_HU (d_t_fmt): Avoid a leading space
before the day number which may produce a double space.
(date_fmt): Likewise.
[BZ #22524]
* localedata/Makefile: Add lt_LT.UTF-8 to test-input
and to the list of locales to be built for testing.
* localedata/lt_LT.UTF-8.in: New file for testing the collation.
* localedata/locales/lt_LT (LC_COLLATE): Use “copy "iso14651_t1"”
and build the collation rules upon that.
[BZ #22515]
* localedata/Makefile: Add hsb_DE.UTF-8 to test-input
and to the list of locales to be built for testing.
* localedata/hsb_DE.UTF-8.in: New file for testing the collation.
* localedata/locales/hsb_DE (LC_COLLATE): Use “copy "iso14651_t1"”
and build the collation rules upon that.
[BZ #22517]
* localedata/Makefile: Add et_EE.UTF-8 to test-input
and to the list of locales to be built for testing.
* localedata/et_EE.UTF-8.in: New file for testing the collation.
* localedata/locales/et_EE (LC_COLLATE): Use “copy "iso14651_t1"”
and build the collation rules upon that.
[BZ #22527]
* localedata/locales/tr_TR (LC_COLLATE): Base collation rules
on iso14651_t1. A test file localedata/tr_TR.UTF-8.in is already
available, this rewrite of the collation rules does reproduce
the test file in the same order.
[BZ #10580]
* localedata/locales/hr_HR (LC_TIME): Use two letters for the
digraphs in the month and day names. Using single code points for
digraphs is deprecated. While there are dedicated Unicode
codepoints, for the digraphs, these are included for backwards
compatibility and modern texts use a sequence of Basic Latin
characters. See: https://www.unicode.org/faq/ligature_digraph.html
This makes the month and day names agree exactly with CLDR now,
CLDR does not use the single code points for the digraphs either.
According to CLDR, collation rules for Serbian and Bosnian
should be the same as for Croatian.
[BZ #22534]
* localedata/Makefile: Add sr_RS.UTF-8 and bs_BA.UTF-8 to test-input
and to the list of locales to be built for testing.
* localedata/bs_BA.UTF-8.in: New file (same as hr_HR.UTF-8.in).
* localedata/sr_RS.UTF-8.in: New file (same as hr_HR.UTF-8.in).
* localedata/locales/bs_BA (LC_COLLATE): Use “copy "hr_HR"”.
* localedata/locales/sr_RS (LC_COLLATE): Use “copy "hr_HR"”.
[BZ #10580]
* localedata/locales/hr_HR (LC_COLLATE): Base collation rules on
iso14651_t1.
* localedata/locales/hr_HR (LC_TIME): Sync month and day names with
CLDR (except use ligatures for the digraphs, CLDR does not use
the ligatures), add first_workday, some fixes in the date and time
formats.
* localedata/locales/hr_HR (LC_CTYPE): Add transliteration rules
for Đ and đ.
* localedata/locales/hr_HR (LC_MONETARY): Change currency_symbol to
lower case. p_cs_precedes and n_cs_precedes should be 0 instead of 1.
Add int_p_cs_precedes and int_n_cs_precedes.
* localedata/locales/hr_HR (LC_NUMERIC): Change thousands_sep to
"<U202F>" (NARROW NO-BREAK SPACE) and grouping to 3;3 (Agrees with
LC_MONETARY now).
* localedata/locales/hr_HR (LC_TELEPHONE): Add tel_dom_fmt.
* localedata/locales/hr_HR (LC_NAME): Add name_mr, name_mrs, and
name_miss.
* localedata/locales/hr_HR (LC_ADDRESS): Add country_post, country_isbn,
and lang_lib. Change postal_fmt.
change
[BZ #17750]
* Makefile: add fr_CA.UTF-8 to test-input and LOCALES.
* localedata/fr_CA.UTF-8.in: New file with test data for backward
accents sorting.
* localedata/fr_FR.UTF-8.in: Fix test data for forward accents
sorting.
* localedata/locales/cs_CZ (LC_COLLATE): Remove “define DIACRIT_FORWARD”
* localedata/locales/de_DE (LC_COLLATE): Likewise.
* localedata/locales/hu_HU (LC_COLLATE): Likewise.
* localedata/locales/lb_LU (LC_COLLATE): Likewise.
* localedata/locales/yuw_PG (LC_COLLATE): Likewise.
* localedata/locales/fr_CA (LC_COLLATE): Add “define DIACRIT_BACKWARD”
* localedata/locales/iso14651_t1_common: Use “ifdef DIACRIT_FORWARD”
instead of “ifdef DIACRIT_BACKWARD”.
The only locale which currently needs backward accents sorting is fr_CA.
Therefore, forward accents sorting should be the default.
Before this patch, backwards accent sorting was the default and all
locales except fr_CA had to use
define DIACRIT_FORWARD
before
copy "iso14651_t1"
Most locales didn’t do that and thus got the inappropriate backwards accents sorting
by accident. Now only the fr_CA locale needs to use
define DIACRIT_BACKWARD
before
copy "iso14651_t1"
Original patch slightly modified by: Mike FABIAN <mfabian@redhat.com>
[BZ #22336]
* localedata/locales/cs_CZ (LC_COLLATE): Use “copy "iso14651_t1"”
and implement the collation rules for cs from CLDR on top of that.
* Makefile: Add cs_CZ.UTF-8 to test-input and to the list
of locales to be built for testing.
* cs_CZ.UTF-8.in: New file with test data to test the Czech sorting.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
[BZ #22469]
* localedata/locales/pl_PL (LC_COLLATE): Use “copy "iso14651_t1"”
and implement the collation rules for pl from CLDR on top of that.
* Makefile: Add pl_PL.UTF-8 to test-input and to the list
of locales to be built for testing.
* pl_PL.UTF-8.in: New file with test data to test the Polish sorting.
[BZ #15537]
* localedata/locales/lv_LV (LC_COLLATE): Fix collation by
using “copy "iso14651_t1"” and then implementing the
collation rules for lv from CLDR on top of that.
* Makefile: Add lv_LV.UTF-8 to test-input and to the list
of locales to be built for testing.
* lv_LV.UTF-8.in: New file with test data to test the Latvian
sorting.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Following the previous work by Carlos O'Donell the category of LC_CTYPE
is correctly set to "i18n:2012" rather than "unicode:2014" and the
i18n_ctype file is once again regenerated from scratch to make sure it
does not contain any manual additions except the copyright message.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* localedata/unicode-gen/gen_unicode_ctype.py (output_head):
category of LC_CTYPE set to "i18n:2012".
* localedata/locales/i18n_ctype: Regenerate.
[BZ #19485]
* localedata/locales/csb_PL (LC_TIME): Fix “abmon” for March
and use a better translation for March in “mon”.
* localedata/locales/csb_PL: Use more ASCII to improve the
readability of the source.
[BZ #13953]
* localedata/locales/km_KH: Use ASCII as much
as possible for better readability of the source and
remove useless comments.
* localedata/locales/km_KH (LC_TIME): Remove era stuff, it
was commented out and apparently wrong anyway because it was
using Lao characters. If Buddhist era should be used
for km_KH, a native speaker should write the correct formaat
for Khmer.
* localedata/locales/km_KH (LC_TIME): Add first_weekday 1
(According to CLDR, the first weekday for Cambodia is Sunday).
* localedata/locales/km_KH (LC_NAME): Remove name_mr and name_mrs
(These were using Lao characters which must be wrong. If we get
the correct data from a native speaker, we could add it back, until
then it is better not to have name_mr and name_mrs at all than
having it wrong).
[BZ #15260]
* localedata/locales/doi_IN (LC_MESSAGES): Match only for the
first letters of yesstr and nostr in yesexpr and noexpr,
not for the full words.
* localedata/locales/hne_IN (LC_MESSAGES): Likewise.
* localedata/locales/kok_IN (LC_MESSAGES): Likewise.
* localedata/locales/mr_IN (LC_MESSAGES): Likewise.
* localedata/locales/sat_IN (LC_MESSAGES): Likewise.
* localedata/locales/km_KH (LC_MESSAGES): Match also for the
first letters of yesstr and nostr in yesexpr and noexpr,
until now only English was matched in yesexpr and noexpr.
* localedata/locales/tl_PH (LC_MESSAGES): Use “copy "fil_PH"”
instead of “copy "en_US"”. CLDR has yesstr and nostr data for
fil but not for tl. As tl and fil are very similar, using fil
is probably better than using English.
Pablo was l10n/i18n coordinator back in the old days but MandrakeSoft is
dead now
* localedata/locales/br_FR (LC_IDENTIFICATON): Add
Thierry Vignaud <thierry.vignaud@gmail.com> as the contact
for the br_FR locale.
"Ket" is the the most used negative answer, as it's the negative answer
to a positively phrased question
It's used as it or with the verb ("Ne ran ket", ...)
As such, "Ket" is used in most translations.
"Nann" is less used as it's the negative answer to a negatively phrased
question
See https://en.wikipedia.org/wiki/Yes_and_no for explanations about
languages with 3 or 4 form systems.
We still keep "Nn" for short answers as:
- new learners are used to "Non" in french
- and they often misuses "Nann"
- for compatibility with english
[BZ #21706]
* localedata/locales/br_FR (LC_MESSAGES): Fix nostr.
After the transition to generating a distinct file for Unicode ctype
information e.g. i18n_ctype, the check target was left with the wrong
target name. This patch fixes the check target and regenerates the
files with more information than previously used, filling in the the
LC_IDENTIFICATION data.
Tested on x86_64 by regenerating from Unicode source files, and
running checks. Tested by subsequently rebuilding all locales.
No regressions in testsuite.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
Reported-by: Rafal Luzynski <digitalfreak@lingonborough.com>