Sort::ArbBiLex

Sort::ArbBiLex is a Perl module that can make sort functions for arbitrary sort orders.
Download

Sort::ArbBiLex Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Perl Artistic License
  • Price:
  • FREE
  • Publisher Name:
  • Sean M. Burke
  • Publisher web site:
  • http://search.cpan.org/~sburke/

Sort::ArbBiLex Tags


Sort::ArbBiLex Description

Sort::ArbBiLex is a Perl module that can make sort functions for arbitrary sort orders. Sort::ArbBiLex is a Perl module that can make sort functions for arbitrary sort orders.SYNOPSIS use Sort::ArbBiLex ( 'fulani_sort', # ask for a &fulani_sort to be defined "a A c C ch Ch CH ch' Ch' CH' e E l L lh Lh LH n N r R s S u U z Z " ); @words = ; @stuff = fulani_sort(@words); foreach (@stuff) { print "n" }CONCEPTSWriting systems for different languages usually have specific sort orders for the glyphs (characters, or clusters of characters) that each writing system uses. For well-known national languages, these different sort orders (or someone's idea of them) are formalized in the locale for each such language, on operating system flavors that support locales. However, there are problems with locales; cf. perllocale. Chief among the problems relevant here are:* The basic concept of "locale" conflates language/dialect, writing system, and character set -- and country/region, to a certain extent. This may be inappropriate for the text you want to sort. Notably, this assumes standardization where none may exist (what's THE sort order for a language that has five different Roman-letter-based writing systems in use?).* On many OS flavors, there is no locale support.* Even on many OS flavors that do suport locales, the user cannot create his own locales as needed.* The "scope" of a locale may not be what the user wants -- if you want, in a single program, to sort the array @foo by one locale, and an array @bar by another locale, this may prove difficult or impossible.In other words, locales (even if available) may not sort the way you want, and are not portable in any case.This module is meant to provide an alternative to locale-based sorting.This module makes functions for you that implement bi-level lexicographic sorting according to a sort order you specify. "Lexicographic sorting" means comparing the letters (or properly, "glyphs", as I'll call them here, when a single glyph can encompass several letters, as with digraphs) in strings, starting from the start of the string (so that "apple" comes after "apoplexy", say) -- as opposed to, say, sorting by numeric value. "Lexicographic sorting" is sometimes used to mean just "ASCIIbetical sorting", but I use it to mean the sort order used by lexicographers, in dictionaries (at least for alphabetic languages).Consider the words "resume" and "rsum" (the latter should display on your POD viewer with acute accents on the e's). If you declare a sort order such that e-acute ("") is a letter after e (no accent), then "rsum" (with accents) would sort after every word starting with "re" (no accent) -- so "rsum" (with accents) would come after "reward".If, however, you treated e (no accent) and e-acute as the same letter, the ordering of "resume" and "rsum" (with accents) would be unpredictable, since they would count as the same thing -- whereas "resume" should always come before "rsum" (with accents) in English dictionaries.What bi-level lexicographic sorting means is that you can stipulate that two letters like e (no accent) and e-acute ("") generally count as the same letter (so that they both sort before "reward"), but that when there's a tie based on comparison that way (like the tie between "resume" and "rsum" (with accents)), the tie is broken by a stipulation that at a second level, e (no accent) does come before e-acute ("").(Some systems of sort order description allow for any number of levels in sort orders -- but I can't imagine a case where this gets you anything over a two-level sort.)Moreover, the units of sorting for a writing system may not be characters exactly. In some forms of Spanish, ch, while two characters, counts as one glyph -- a "letter" after c (at the first level, not just the second, like the e in the paragraph above). So "cuerno" comes before "chile". A character-based sort would not be able to see that "ch" should count as anything but "c" and "h". So this library doesn't assume that the units of comparison are necessarily individual characters. Requirements: · Perl


Sort::ArbBiLex Related Software