Revision history for Perl module Unicode::Collate. 1.31 Sat Aug 21 20:02:21 2021 - Someone, not via rt.cpan.org, proposed patches for EBCDIC. Thanks! - Here is an experimental EBCDIC support, that the author has not used yet. - Since this distribution on CPAN has a pure-perl module without XSUB, then the internal values should be Unicode, but not native. - All t/*.t files include sub _pack_U and _unpack_U as well as sub ok. 1.30 Sun Jun 6 21:33:26 2021 - [rt.cpan.org #133952] mkheader subject to "print on closed filehandle" warnings 1.29 Sun Sep 27 08:56:06 2020 - DUCET is updated (for Unicode 13.0.0) as Collate/allkeys.txt. - The default UCA_Version is 43. - added khitan.t in t. - Locale/*.pl and CJK/Korean.pm are updated. 1.28 Tue Sep 22 22:14:48 2020 - DUCET is updated (for Unicode 12.1.0) as Collate/allkeys.txt. - The default UCA_Version is 41. - UCA_Version 38 & 40 (for Unicode 11.0.0 & 12.0.0) are also supported. - [rt.cpan.org #133311] Make Makefile.PL strict - Locale/*.pl and CJK/Korean.pm are updated. 1.27 Wed Jan 2 19:42:56 2019 - DUCET is updated (for Unicode 10.0.0) as Collate/allkeys.txt. - The default UCA_Version is 36. - Locale/*.pl and CJK/Korean.pm are updated. 1.26 Mon Dec 31 14:32:07 2018 - U::C::Locale newly supports locale: cu. - tailoring Cyrillic YI as BYELORUSSIAN-UKRAINIAN I with DIAERESIS. (affected locale: kk) - added loc_cu.t in t. 1.25 Wed Nov 22 20:48:48 2017 - Makefile.PL: [rt.cpan.org #123631] Switch Unicode::Collate to XSLoader 1.24 Sun Nov 19 22:06:03 2017 - xs: [rt.cpan.org #123631] Switch Unicode::Collate to XSLoader 1.23 Mon Nov 13 19:10:28 2017 - Now UCA_Version 36 (for Unicode 10.0.0) is supported. * But the default UCA_Version is still 34. - added nushu.t in t. 1.22 Sat Nov 11 10:53:35 2017 - internal: someone suggests using 'exists' for checking the truth of $collator->{mapping}{$variable} and $collator->{maxlength}{$variable}, where $variable may stand for codepoints whose mapping is not defined; though such a problem was not reproduced on my environment. 1.21 Sat Nov 4 10:49:19 2017 - mklocale: [rt.cpan.org #121664] . removed from @INC (take 2) - DUCET is updated (for Unicode 9.0.0) as Collate/allkeys.txt. - The default UCA_Version is 34. - added tangut.t in t. - Locale/*.pl and CJK/Korean.pm are updated. 1.20 Fri Nov 3 11:50:21 2017 - XS: [rt.cpan.org #121664] . removed from @INC - U::C::Locale newly supports locales: dsb, lkt. 1.19 Sat Dec 3 09:32:31 2016 - U::C::Locale newly supports locales: he, vo. - locales updated to CLDR 24: az, haw. - locale updated to CLDR 26: et. 1.18 Sat Nov 5 21:14:35 2016 - U::C::Locale newly supports locale: de_AT_phonebook. - locales updated to CLDR 23: as, ca. - removed locale fr (French) other than Canadian according to CLDR 1.9.0. 1.17 Wed Oct 26 21:46:12 2016 - DUCET is updated (for Unicode 8.0.0) as Collate/allkeys.txt. - The default UCA_Version is 32. - U+9FCD..U+9FD5 and U+2B820..U+2CEA1 are new CJK unified ideographs. - Cyrillic contractions except SHORT I are removed from DUCET. * modified locales: be, bs_Cyrl, kk, mk, sr, uk. * removed locales: bg, ru. - Locale/*.pl and CJK/Korean.pm are updated. 1.16 Mon Oct 24 21:31:12 2016 - modified tests for the Cyrillic script before updating DUCET: loc_be.t, loc_kk.t, loc_mncy.t, loc_ru.t, loc_uk.t. 1.15 Sat Oct 22 08:36:07 2016 - U::C::Locale newly supports locale: ug_Cyrl. cf. [rt.cpan.org #117512] - added loc_mncy.t, loc_ugcy.t in t. - modified tests for the Cyrillic script: loc_be.t, loc_bg.t, loc_bscy.t, loc_kk.t, loc_mk.t, loc_ru.t, loc_sr.t, loc_uk.t. 1.14 Sat Jul 11 13:20:03 2015 - [rt.cpan.org #105621] mklocale fails as it creates Locale directory without executable bit. 1.13 Sat Jul 11 12:13:50 2015 - something to remove 'use Unicode::Collate' from CJK/Korean.pm cf. [rt.cpan.org #105791] 1.12 Mon Mar 16 20:21:15 2015 - XS: [rt.cpan.org #102663] IRIX 6.5 failures with Unicode::Collate (porting: avoid non-zero values in the initializer of an array) 1.11 Tue Feb 17 21:23:03 2015 - XS: [rt.cpan.org #102024] remove extranous const casting 1.10 Thu Jan 15 21:37:58 2015 - XS: const &c [rt.cpan.org #101170] [PATCH] - Makefile.PL: [rt.cpan.org #101500] [PATCH] 1.09 Thu Dec 18 21:39:18 2014 - XS: a workaround for perl 5.6.x to handle noncharacters U+FFFF etc. is abandoned. Perl 5.8.0 or later is recommended for handling these noncharacters. 1.08 Sat Dec 6 20:12:55 2014 - DUCET is updated (for Unicode 7.0.0) as Collate/allkeys.txt. - The default UCA_Version is 30. - *.pl and *.pm are updated so that they have same the version number. 1.07 Tue May 27 23:18:23 2014 - XS: for the world without utf8n_to_uvuni(). 1.06 Tue May 27 21:11:09 2014 - 0.67's improved discontiguous contractions are invalidated by default and are supported as a parameter 'long_contraction.' 1.05 Sat May 24 16:30:42 2014 - XS: avoid unused expression 1; for no-op. (see [rt.cpan.org #95866] compilation noise) 1.04 Sat Dec 7 11:34:18 2013 - XS: a workaround for perl 5.6.x to handle U+FFFF correctly. unpack_U() is implemented by using XS again as well as that in 1.02, but now that is used only in the versions before perl 5.8.0. 1.03 Sun Dec 1 21:45:46 2013 - XS: now unpack_U() uses unpack('U*') in pure perl. avoid XS for the internal "utf8" encoding of perl. 1.02 Sun Nov 10 18:39:37 2013 - POD: fix [rt.cpan.org #90170] about iso-8859-1 letters in pod. E<> is used for the compatibility with perl 5.6.1 and possibly EBCDIC. - 1.01 forgot to increase the version number of CJK/Korean.pm. - modified tests: cjkrange.t, compatui.t, hangtype.t, illegal.t, loc_ja.t, loc_ta.t, overcjk0.t, overcjk1.t, view.t in t. 1.01 Sat Nov 2 19:00:38 2013 - DUCET is updated (for Unicode 6.3.0) as Collate/allkeys.txt. - The default UCA_Version is 28. - Locale/*.pl (except fr.pl) and CJK/Korean.pm are updated. - modified tests: loc_es.t, loc_estr.t, rewrite.t, version.t in t. 1.00 Sun Oct 27 13:22:17 2013 - When a subroutine by 'overrideOut' taking a out-of-range value and returning undef, now the value is treated as if it were U+FFFD. * 0.99 wrongly calculates implicit weights based on out-of-range values. - Assertion using unpack 'U' is added. If not only pack('U') but also unpack('U') of CORE:: don't work as expected, this module will die. 0.99 Sun Sep 1 12:46:14 2013 - by default out-of-range values are treated as if it were U+FFFD when UCA_Version >= 22. - supported overriding out-of-range values (see 'overrideOut' in POD). - modified tests: override.t, illegal.t in t. 0.98 Sat Jun 15 19:44:06 2013 - typo (see [rt.cpan.org #85655] typo fixes) 0.97 Sat Dec 22 14:25:50 2012 - bug fix: XS of 0.96 (if UCA_Version is 9 to 11) wrongly referred to DUCET for completely ignorable characters, even though the collator doesn't use DUCET. - separated t/notable.t from t/test.t. 0.96 Sat Dec 15 19:43:10 2012 - special noncharancter tailorings ('highestFFFF' and 'minimalFFFE') * some locales are modified for 'highestFFFF': as, bn, fa, gu, hi, hy, kn, kok, mr, or, sa, si, si_dict, ta, te, th, ur. - U::C::Locale now allows 'entry' to add or override mappings. - bug fix: using DUCET through XS wrongly prevented completely ignorable characters from tailoring. - modified tests: default.t, loc_as.t, loc_bn.t, loc_fa.t, loc_gu.t, loc_hi.t, loc_hy.t, loc_kn.t, loc_kok.t, loc_mr.t, loc_or.t, loc_sa.t, loc_si.t, loc_sidt.t, loc_ta.t, loc_te.t, loc_test.t, loc_th.t, loc_ur.t, nonchar.t in t. 0.95 Sat Dec 8 15:11:09 2012 - U::C::Locale newly supports locales: bs_Cyrl, ee. - locale updated to CLDR 21: uk. - locales updated to CLDR 22: th, to. - added loc_bscy.t, loc_ee.t in t. - modified tests: loc_th.t, loc_to.t, loc_uk.t in t. 0.94 Fri Nov 23 18:45:53 2012 - U::C::Locale newly supports locale: zh__zhuyin. - added Unicode::Collate::CJK::Zhuyin for zh__zhuyin. - doc: added CAVEAT to CJK/Stroke.pm - modified tests: loc_cjk.t, loc_cjkc.t in t. - added cjk_zy.t, loc_zhzy.t in t. 0.93 Sun Nov 18 18:13:42 2012 - DUCET is updated (for Unicode 6.2.0) as Collate/allkeys.txt. - The default UCA_Version is 26. - Locale/*.pl (except fr.pl) and CJK/Korean.pm are updated. - modified tests: loc_es.t, loc_estr.t, version.t in t. 0.92 Wed Nov 14 20:58:19 2012 - fix: index() etc. with preprocess/normalization should be always croaked. - doc: referred to the latest UTS #10 and updated its section numbers. - supported the identical level (see 'identical' in POD). - Now UCA_Version 26 (for Unicode 6.2.0) is supported. * But the default UCA_Version is still 24. - added ident.t in t. - modified tests: cjkrange.t, compatui.t, hangtype.t, index.t, overcjk0.t, overcjk1.t, test.t, view.t in t. 0.91 Sun Nov 4 17:00:20 2012 - XSUB: use PERL_NO_GET_CONTEXT (see perlguts) (see [rt.cpan.org #80313]) 0.90 Sun Sep 23 10:42:26 2012 - perl 5.11.0 or later: Install to 'site' instead of 'perl' (see [rt.cpan.org #79800]) 0.89 Sat Mar 10 20:19:11 2012 - avoid "use Test". 0.88 Mon Mar 5 21:56:13 2012 - DUCET is updated (for Unicode 6.1.0) as Collate/allkeys.txt. - U+9FCC is a new CJK unified ideograph. - The default UCA_Version is 24. - Locale/*.pl (except fr.pl) and CJK/Korean.pm are updated. - modified tests: cjkrange.t, compatui.t, hangtype.t, loc_cjkc.t, loc_es.t, loc_estr.t, overcjk0.t, overcjk1.t, version.t in t. 0.87 Sat Nov 26 17:01:42 2011 - Now Locale/*.pl files are searched in @INC. (see [rt.cpan.org #72666]) - added locale_version method to access the version number of Locale/*.pl. 0.86 Wed Nov 23 17:16:00 2011 - tailored compatibility ideographs as well as unified ideographs for the locales: ja, ko, zh__big5han, zh__gb2312han, zh__pinyin, zh__stroke. - added loc_cjkc.t in t. 0.85 Sat Nov 19 20:01:57 2011 - U::C::Locale newly supports locales: bn, sa. - updated some locales to CLDR 2.0 : zh__pinyin, zh__stroke. * supported compatibility decomposable characters and U+FDD0 indexes. * updated CJK/Pinyin.pm and CJK/Stroke.pm. - added loc_bn.t, loc_cjk.t, loc_sa.t in t. 0.84 Sun Nov 6 14:44:51 2011 - U::C::Locale supports script codes. - U::C::Locale newly supports locales: fa, sr_Latn, ur. - added loc_fa.t, loc_srla.t, loc_ur.t in t. 0.83 Sun Oct 30 20:22:04 2011 - mklocale: auto-generate equivalents for suppressed contractions. * be.txt, bg.txt, kk.txt, mk.txt, ru.txt, sr.txt, uk.txt in data are simplified. * but no Locale/*.pl will be modified. 0.82 Sun Oct 30 10:03:48 2011 - U::C::Locale newly supports locales: si, si__dictionary, sv__reformed, ta, te, th, wae. - added loc_si.t, loc_sidt.t, loc_svrf.t, loc_ta.t, loc_te.t, loc_th.t, loc_wae.t in t. - updated some locales to CLDR 2.0 : sk, sr, sv, uk. - updated CJK/Pinyin.pm according to CLDR 2.0. 0.81 Sun Oct 23 21:32:36 2011 - U::C::Locale newly supports locales: ml, mr, or, pa. - added loc_ml.t, loc_mr.t, loc_or.t, loc_pa.t in t. - updated some locales to CLDR 2.0 : mk, mt, nb, nn, ro, ru. 0.80 Sun Oct 9 21:00:21 2011 - U::C::Locale newly supports locales: bs, hi, kn, kok, ln. - added loc_bs.t, loc_hi.t, loc_kn.t, loc_kok.t, loc_ln.t in t. - updated some locales to CLDR 2.0 : ha, hr, kk, lt. 0.79 Sun Oct 2 20:31:01 2011 - pod: [rt.cpan.org #70241] Fix minor grammar error in manpage by Harlan Lieberman-Berg. - 'suppress' no longer affects contractions via 'entry'. - U::C::Locale newly supports locales: as, fi__phonebook, gu. - added loc_as.t, loc_fiph.t, loc_gu.t in t. - updated some locales to CLDR 2.0 : ar, be, bg. 0.78 Mon Jul 25 21:29:50 2011 - tried fixing the tarball with world writable files. ( http://www.perlmonks.org/?node_id=731935 ) 0.77 Sun Jul 3 21:15:08 2011 - xs: [perl #93470] [PATCH] consting in Collate.xs by Robin Barker. 0.76 Sun May 15 10:06:59 2011 - updated CJK/Pinyin.pm and CJK/Stroke.pm according to CLDR 1.9.1. (type='pinyin' alt='short' and type='stroke' alt='short' respectively) 0.75 Sat May 7 21:07:38 2011 - supported ignore_level2 and rewrite. - added iglevel2.t and rewrite.t in t. 0.74 Mon Mar 21 19:07:38 2011 - removed locale sw (Swahili) according to CLDR 1.9.0. (removed files: Collate/Locale/sw.pl and data/sw.txt) - shifted primary weights of letters > Z for some languages. (affected locales: da, fi, fo, kl, nb, nn, sv) 0.73 Sun Mar 6 13:24:22 2011 - DUCET is updated (for Unicode 6.0.0) as Collate/allkeys.txt. - The default UCA_Version is 22. - Locale/*.pl (except fr.pl and ko.pl) and CJK/Korean.pm are updated. - test: compare allkeys.txt's version with Base_Unicode_Version in t/default.t. 0.72 Sat Jan 22 17:28:32 2011 - xs: fix mixing char* and U8*. 0.71 Tue Jan 18 22:29:44 2011 - t/loc_test.t should not fail without Unicode::Normalize. 0.70 Sun Jan 16 20:31:07 2011 - Now U::C::Locale->new will use the compiled DUCET via XS if available. added some tests in t/loc_test.t. 0.69 Sat Jan 15 19:41:11 2011 - clarified about XSUB. revised INSTALL in README. - xs: flag passed to utf8n_to_uvuni(). - doc and comments: [perl #81876] Fix typos by Peter J. Acklam. 0.68 Tue Nov 23 20:17:22 2010 - doc: clarified about (backwards => [ ]) and (backwards => undef). - separated t/backwds.t from t/test.t. - added cjk_b5.t, cjk_gb.t, cjk_ja.t, cjk_ko.t, cjk_py.t, cjk_st.t in t for CJK/*.pm without Locale.pm. 0.67 Sun Nov 14 11:38:59 2010 - supported UCA_Version 22 for Unicode 6.0.0. * U+2B740..U+2B81D are new CJK unified ideographs. * noncharacters (e.g. U+FFFF) should be overridable, not be ignored. * DUCET is NOT updated, as no maint perl supports Unicode 6.0.0. Thus the default UCA_Version is still 20. - added t/nonchar.t. - improved discontiguous contractions of 3 or more characters. (e.g. 0FB2 0F71 0F80 and 0FB3 0F71 0F80) - auxiliary: now 'mklocale' also copes with Korean.pm according to DUCET. 0.66 Sun Nov 7 10:47:30 2010 - U::C::Locale newly supports locale: ko. - added Unicode::Collate::CJK::Korean for ko. - added t/loc_ko.t. - 12 compat. ideographs (e.g. U+FA0E) are treated as unified ideographs. (though DUCET also does it, now Unicode::Collate does it without DUCET.) - added t/compatui.t. * Ideographs Ext.B (U+20000..U+2A6D6) can be overridden with UCA_Version 8. This is a long-standing behavior from Unicode::Collate 0.11 to 0.63. A wrong fix at 0.64 should be abandoned. 0.65 Wed Nov 3 13:10:20 2010 - U::C::Locale newly supports locale: zh and its some variants. (zh__big5han, zh__gb2312han, zh__pinyin, zh__stroke) - added Unicode::Collate::CJK::Big5 for zh__big5han. - added Unicode::Collate::CJK::GB2312 for zh__gb2312han. - added Unicode::Collate::CJK::Pinyin for zh__pinyin. - added Unicode::Collate::CJK::Stroke for zh__stroke. - added loc_zh.t, loc_zhb5.t, loc_zhgb.t, loc_zhpy.t, loc_zhst.t in t. 0.64 Sun Oct 31 14:17:29 2010 - U::C::Locale newly supports locale: ja. - added Unicode::Collate::CJK::JISX0208 for ja. - added loc_ja.t, loc_jait.t, loc_japr.t in t. - a subroutine specified in 'overrideCJK' or 'overrideHangul' is allowed to return an integer or undef value. - fix: Ideographs Ext.B (U+20000..U+2A6D6) are assigned in Unicode 3.1, then 'overrideCJK' should not override them with UCA_Version 8. * sorry, this fix is based on a wrong idea. reverted at 0.66. - separated t/overcjk0.t and t/overcjk1.t from t/override.t. 0.63 Sun Oct 10 22:13:21 2010 - supported suppress contractions (see 'suppress' in POD). - internal for 'hangul_terminator' in getSortKey(). - U::C::Locale newly supports locales: be, bg, kk, mk, ru, sr. - added loc_be.t, loc_bg.t, loc_cyrl.t, loc_kk.t, loc_mk.t, loc_ru.t, loc_sr.t in t. - added tailoring with U+0340 or U+0341 instead of U+0300 or U+0301. (affected locales: hr, is, pl, se, to, wo) 0.62 Wed Oct 6 21:35:54 2010 - U::C::Locale newly supports locales: ar, hu, hy, se, to, uk. - added loc_ar.t, loc_hu.t, loc_hy.t, loc_se.t, loc_to.t, loc_uk.t in t. - Vietnamese (vi): added tailoring for U+0340 and U+0341. 0.61 Sat Oct 2 11:41:29 2010 - U::C::Locale newly supports locales: hr, ig, sq. - added loc_hr.t, loc_ig.t, loc_sq.t in t. - precomposed e-dot-below, o-dot-below, o-tilde are tailored as well. (affected locales: et, yo) - Vietnamese (vi): added contractions for non-blocked decompositions * base + dot-below + mark such as a\x{323}\x{306}, \x{1EA1}\x{306} etc. * base + tone + horn such as o\x{309}\x{31B}, \x{1ECF}\x{31B} etc. 0.60 Thu Sep 23 21:37:36 2010 - bug fix: index() [and its friends including gmatch()] didn't remove ignorable characters in the substring correctly. Thanks for the bug report: http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2010-09/msg00014.html - U::C::Locale newly supports locales: de__phonebook, nso, om, tn, vi. - added loc_de.t, loc_deph.t, loc_nso.t, loc_om.t, loc_tn.t, loc_vi.t in t. - precomposed a-breve, a-circ, e-circ, o-circ are tailored as well. (affected locales: ro, sk, sv) 0.59 Sun Sep 5 17:03:52 2010 - U::C::Locale newly supports locales: az, fil, ha, lt, mt, tr, wo, yo. - added loc_az.t, loc_fil.t, loc_ha.t, loc_lt.t, loc_mt.t, loc_tr.t, loc_wo.t, loc_yo.t in t. - precomposed a-uml, o-uml, and u-uml are tailored as well. (affected locales: da, et, fi, fo, is, kl, nb, nn, sk, sv) 0.58 Sun Aug 29 19:56:50 2010 - U::C::Locale newly supports locales: af, cy, da, fo, haw, is, kl, sw. - added loc_af.t, loc_cy.t, loc_da.t, loc_fo.t, loc_haw.t, loc_is.t, loc_kl.t, loc_sw.t in t. 0.57 Sun Aug 22 22:39:58 2010 - U::C::Locale newly supports locales: ca, et, fi, lv, sk, sl. - added loc_ca.t, loc_et.t, loc_fi.t, loc_lv.t, loc_sk.t, loc_sl.t in t. 0.56 Sun Aug 8 20:24:03 2010 - Unicode::Collate::Locale newly supports locales: eo, nb, ro, sv. - added loc_eo.t, loc_es.t, loc_estr.t, loc_nb.t, loc_ro.t, loc_sv.t in t. * renamed t/locale_{xy}.t to t/loc_{xy}.t (for safer 8.3 names) (loc_cs.t, loc_fr.t, loc_nn.t, loc_pl.t, loc_test.t) 0.55 Sun Aug 1 21:21:23 2010 - incorporated Unicode::Collate::Locale with some changes. see: http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2004-03/msg00030.html - supported locales: cs, es, es__traditional, fr, nn, pl. * added t/locale*.t that uses DUCET. (locale_cs.t, locale_fr.t, locale_nn.t, locale_pl.t, locale_test.t) - data/*.txt and mklocale for preparation of Locale/*.pl from DUCET. 0.54 Sun Jul 25 21:37:04 2010 - Now UCA Revision 20 (based on Unicode 5.2.0). - DUCET is also updated (for Unicode 5.2.0) as Collate/allkeys.txt, which is required to test this module. - U+9FC4..U+9FCB and U+2A700..U+2B734 are new CJK unified ideographs. - Many hangul jamo are assigned (affecting hangul_terminator). * Now XSUB will be built by default. (XSUB needs a C compiler.) To build pure perl, run disableXS before Makefile.PL. * DUCET will be compiled when XS is used. Explicit saying