GuICU 0.0

Short Contents

Table of Contents


Next: , Previous: (dir), Up: (dir)

GuICU: a Guile Unicode Library

This manual is for GuICU, a Guile internationalization library, (version 0.0, 22 December 2007). GuICU provides bindings to functions from the Internation Components for Unicode library, which provides functionality for multilingualization.

Copyright © 2007 Michael L. Gran Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License”.

--- The Detailed Node Listing ---

Introduction

Tutorial

Reference


Next: , Previous: Top, Up: Top

1 Introduction

GuICU is a Guile module that provides Unicode string functions. This manual corresponds to version 0.0.

Guile is one of GNU's implementations of the Scheme language.

Unicode is a standard that hopes to provide encoding and text processing algorithms for all the world's languages (see Unicode Consortium 2007).

Among the implementations of the Unicode standard is that of the International Components for Unicode (ICU). It provides a unicode library for C/C++ and Java (see http://www.icu-project.org/).

This library, GuICU, wraps some of the functionality of the ICU library as a Guile module.


Next: , Previous: Introduction, Up: Introduction

1.1 Unicode Algorithms Supported by GuICU

The Unicode standard has two main ideas: how to make a binary representation for text in any language, and how do to standard string operations on those representations.

GuICU wraps a subset of the operations suggested by the Unicode standard. Here are the primary services provided by GuICU:


Next: , Previous: Unicode Algorithms Supported, Up: Introduction

1.2 Installation

First and foremost, to use this library, one must have the Guile language itself. Guile must be installed before this library can be installed. GNU Guile's homepage is http://www.gnu.org/software/guile/. This library has been tested with version 1.8.2. As always, check your operating system distribution to see if it has been packaged, but, if not, the source can be downloaded from the homepage.

Second, the C-language version of the ICU library (ICU4C) must be installed. ICU's homepage is http://icu-project.org/, and the ICU4C library can be be obtained from there. This library has been tested with ICU4C version 3.8.0.

Once the prerequisites are installed, this library can be installed. Standard installation directions are found in the distribution in the file INSTALL.

GuICU, once installed, provides one public module (guicu icu) and one private module (guicu raw) that gets placed in the Guile site directory, (usually /usr/local/share/guile/site). Also, it installs a guile C library libguile-guicu-v-0.la in the library directory, (usually /usr/local/lib).

It may be necessary to mess about with your library search path (LTDL_LIBRARY_DIR), and your Guile search path (GUILE_LOAD_PATH) to get it to work, depending on your install.


Previous: Installation, Up: Introduction

1.3 Invoking GuICU

Guile programs that use this library should invoke it with (use-modules (guicu icu)).1


Next: , Previous: Introduction, Up: Top

2 Tutorial

In this chapter, a demonstration of some of the features of the library will be given. Unicode strings will be created, modified, and displayed.

For these examples to operate as expected, the terminal emulator used must be set up to display UTF8 characters. For most users of modern GNU/Linux/BSD distributions, this will probably already be true.

On my GNU/Linux distribution, I accomplish this by the following. First, I ensure that the LANG environment variable is set up to use UTF8. On my machine I have LANG=en_US.utf8. This indicates that my machine uses American English and displays strings using the UTF8 encoding. I chose this option from the catalog of options that my distribution had preinstalled in the /usr/lib/locale directory. Your system will likely differ somewhat.

For my examples, I used xterm as my terminal emulator, which uses as default a font (-misc-fixed-*) that contains a useful subset of the unicode glyphs, including a fair number of the Chinese, Japanese and Korean (CJK) glyphs. As of this date, no commonly available terminal emulator has a font that can print all of the glyphs. You should ensure that your terminal emulator is using a font that many unicode characters.

For Arabic and Hebrew, I instead use the Monospace family of fonts. (xterm -fa Monospace).


Next: , Previous: Tutorial, Up: Tutorial

2.1 Types and encodings

GuICU functions typically operate on one of two types, unicode characters and unicode strings.

— Data Type: codepoint

Unicode characters are encoded as Guile integers that range from 0 to 1114111 (aka hex 10FFFF). In general, common characters have lower values and obscure or rare characters have higher values. Unicode characters 32 to 127 are the same as the ASCII characters 32 to 127.

Since, for Guile and C, the words char and uchar already have meaning, the term codepoint will be used in the names of functions that take unicode characters.

— Data Type: ustring

In this library, Unicode strings are encoded as an opaque type, called a ustring. A ustring is a SMOB that contains an efficient representation of a Unicode string. Most of the GuICU functions operate on ustring-typed values. To create a ustring, one would use a function such as string->ustring, which converts a standard Guile string into the opaque type.


Next: , Previous: Types and encodings, Up: Tutorial

2.2 Characters

Here are some simple examples to demonstrate some of the characters. As noted in Types and encodings, a unicode character (a codepoint) is encoded as an integer between 0 and 10FFFF, inclusive.

2.2.1 Codepoint names

The common practice for referring to a unicode letter in printed material is by using the following format.

U+00F1 latin small letter n with tilde
The first part “U+” indicates that this is a unicode character. The four digit hexadecimal number “00F1” is its number. And, the remainder is its official Unicode name, which is usually in all capital letters.

The Unicode name for a codepoint can be found by using codepoint-name.

     (codepoint-name #x1e51)
         => "LATIN SMALL LETTER O WITH MACRON AND GRAVE"

2.2.2 Binary properties of codepoints

A character can one of many binary properties: it can be alphabetic, numeric, a space character, a control character, etc. Consider the following properties of U+1E51 latin small letter o with macron and grave:

     ;; Is it alphabetic?
     (codepoint-alpha? #x1e51)
         => #t
     
     ;; Is it uppercase?
     (codepoint-upper? #x1e51)
         => #f
     
     ;; Is it a space-type character?
     (codepoint-blank? #x1e51)
         => #f

For the complete list of binary properties, Codepoints.

2.2.3 Case conversion

Codepoints can be converted between uppercase, lowercase, foldcase, and titlecase.

Titlecase letters are usualy single codepoints that can be decomposed into two characters where the first character is uppercase and the second character is lowercase. For example, some languages consider the letter pair DZ (U+01F1 latin capital letter dz) to be a single logical letter. As such dz (U+01F3 latin small letter dz) would be the lowercase form and Dz (U+01F2 latin capital letter d with small letter z) would be the titlecase form.

Converting a letter to uppercase and then to lowercase is called case folding.

As an example of the case conversion functions, consider the following statements

     (codepoint-name #x0041)
         => "LATIN CAPITAL LETTER A"
     
     (codepoint-name (codepoint-downcase #x0041))
         => "LATIN SMALL LETTER A"
     
     (codepoint-name #x0031)
         => "DIGIT ONE"
     
     (codepoint-name (codepoint-upcase #x0031))
         => "DIGIT ONE"

In the first pair of statements, codepoint-downcase returns the lowercase version of “A”. In the second pair of statements, codepoint-upcase returns its input value, because there is no uppercase version of U+0031 digit one.

2.2.4 Screen width

For terminal applications where monospaced fonts are used, most graphical characters will occupy one to two columns. Latin letters and numbers are usually 1 column wide. Most Chinese, Japanese, and Korean ideographs take two columns per character. Some codepoints, such as combining accents, are not meant to be used as standalone characters, and thus can be said to “occupy” zero columns.

There are a couple of functions that guess the screen width of a character. codepoint-xterm-width returns a value that is appropriate for the xterm terminal emulator. codepoint-wcwidth uses the library function wcwidth to determine the screen width on the console. Whether either one will be correct depends on the user's system.

Most East-Asian characters are wide characters, but, a few are not. The function codepoint-east-asian-width returns a category for each character, indicating if it is narrow, wide, fullwidth, or halfwidth.

For example:

     ;; DIGIT ONE
     (codepoint-xterm-width #x0031)
         => 1
     
     ;; HIRAGANA LETTER A
     (codepoint-xterm-width #x3042)
         => 2
     (codepoint-east-asian-width #x3042)
         => wide
     
     ;; HALFWIDTH KATAKANA LETTER SMALL E
     (codepoint-xterm-width #xff61)
         => 1
     
     (codepoint-east-asian-width #xff61)
         => halfwidth


Previous: Characters, Up: Tutorial

2.3 Strings

Most of the ICU functions operate on variable of type ustring, which is an opaque type used for operating with the lower-level library. So, the first step will usually be to convert scheme data into a ustring.

To make a ustring from a UTF8-encoded string, use either string->ustring or its shorthand version _u

     (string->ustring "abc")
         => #<ustring 0x80c7fb8>
     (_u "abc")
         => #<ustring 0x8104f30>

The hex value in the result is the C pointer location of the ustring.

To make a ustring from a string encoded is something other that UTF-8, use codepage-string->ustring.

     (codepage-string->ustring "más" "iso-8859-1")
         => #<ustring 0x80c7fb8>

And to make a ustring from a list of codepoints, use codepoint-list->ustring.

     (codepoint-list->ustring '(#x0061 #x006C #x0067 #x00FA #x006E))
         => #<ustring 0x807d050>

The reverse operations are similar.

     (ustring->string (codepoint-list->ustring '(100 101 102)))
         => "def"

There is also a shorthand for ustring->string, namely _s.

This brings us to our first useful operation that the library can provide: codepage conversion. By combining pairs of codepage-string->ustring and ustring->codepage-string, codepage conversion is simple. In the following example, a the Spanish word más gets converted from ISO-8859-1 to UTF-8.

     ;; The word más
     (define str "m\xe1s")
     (string->list str)
         =>(#\m #\341 #\s)
     (string->list (_s (codepage-string->ustring str "iso-8859-1")))
         => (#\m #\303 #\241 #\s)

2.3.1 String Presentation

String Presentation or visualization is the process of making a visual representation of the logical string. In this section, a series of increasingly complex presentation engines will be demonstrated.

For ASCII-encoded English, string presentation is not hard. An ASCII string has letters stored in logical first-to-last order. Each character in the string creates exactly one grapheme on the screen. Each grapheme is one column wide. To present the string, one must write the letters left-to-right. To find an appropriate place to wrap a line, usually one just breaks after a space.

The procedure ustring-line-break returns the indices where linebreaks can occur. It classifies line break possibilities into two categories: soft and hard. Soft linebreaks are optional. Hard linebreaks are mandatory, usually because there is a <CR> in the string. For example,

     (ustring-line-break (_u "Hi Mom!"))
         => ((3 7) (soft soft))

This string “Hi Mom!” can begin a new line in one of two places: beginning with the first letter in “Mom!”, which is position 3, or after the end of the string, which is position 7. Both possibilities are soft because there is no <CR> or other line-ending characters in the string.

For another example, imagine a system where the screen is 22 columns wide.2. If the first hard break possibility of a string occurs before 22 columns has passed, the string should break there. Otherwise, the string should break at the last soft break possibility before 22 columns.

Here is a fragment of a poem.3

     (define poem
       (_u ("What passing-bells for these who die as cattle?\n")))
     (ustring-line-break poem)
         => ((5 13 19 23 29 33 37 40 48)
             (soft soft soft soft soft soft soft soft hard))

The function ustring-line-break has returned a list of candidate line breaks. Among these candidate line breaks, on the hypothetical 22 column display, the best place to begin a new line would be at character 19, which is the beginning of the word “for”.

So, our first presentation engine should display characters 0 to 18, and then repeat the wrapping process with the remainder of the string. The function usubstring extracts a substring from the ustring. Below, the first screen line of the poem is removed, and the remaining characters are checked for the next line break.

     (set! poem (usubstring poem 19))
     (ustring-line-break poem)
         => ((4 10 14 18 21 29)
             (soft soft soft soft soft hard))

Here, the last soft break before our 22 columns is used up is at column 21, the beginning of the word “cattle”.

After the second screen line, only 7 characters and a <CR> remain. Thus, the remainder of the string would would fit on line three.

Now, let me present a complete toy presentation engine.

First, let's define a string.

     (define str (_u (string-append
                      "The studio was filled with the rich "
                      "odour of roses, and when the light "
                      "summer wind stirred amidst the trees "
                      "of the garden,")))

In this case, our hypothetical terminal has 40 columns.

     (define max-cols 40)

Now, lets create a utility function to find the last line break that fits on a line of N columns.

     ;; Take line break possibilities COLS and TYPE, and find
     ;; the best line break position for a screen of N columns
     (define (find-best-wrap n cols types)
       (let loop ((prev 0) (cols cols) (types types))
         (cond
          ;; Reached end of line
          ((or (null? cols) (>= (car cols) n))
           prev)
          ;; Reached a hard break
          ((eq? 'hard (car types))
           (car cols))
          ;; Keep looking
          (else
           (loop (car cols) (cdr cols) (cdr types))))))

Then loop over the string, printing it line by line. This introduces a new function ustring-null? that tests if a ustring contains zero codepoints.

     ;; Display a wrapped string given a long string STR
     (let loop ((str str))
       (if (not (ustring-null? str))
           (let* ((breaks (ustring-line-break str))
                  (break-cols (car breaks))
                  (break-types (cadr breaks))
                  (wrap-at (find-best-wrap n break-cols break-types)))
             (display (_s (usubstring str 0 (- wrap-at 1))))
             (newline)
             (loop (usubstring str wrap-at)))))

This should return the following:

The studio was filled with the rich
odour of roses, and when the light
summer wind stirred amidst the trees
of the garden

To expand on this to make a more complete presentation engine, it will be necessary to deal with wide and narrow characters, as well as bidirectionalization.

The procedure ustring-xterm-width-list can be used to get the column locations of the characters in a string,

     (ustring-xterm-width-list (_u "abc"))
         => (1 2 3)

2.3.2 String Transformations

There is a rich set of string tranforms available.

To show their effects, a debugging function ustring-dump is used.4 It verbosely prints the names of each character in a ustring. For the examples, the call to ustring-dump is implied, even if it is not included explicitly.

     (ustring-dump (_u "abc"))
     =>
     U+0061 LATIN SMALL LETTER A
     U+0062 LATIN SMALL LETTER B
     U+0063 LATIN SMALL LETTER C

First, there are the case transforms: ustring-upcase, ustring-downcase, ustring-foldcase, ustring-titlecase. Note that these are affected by the locale.

     (_s (ustring-upcase (_u "aBc")))
         => ABC
     (_s (ustring-downcase (_u "aBc")))
         => abc
     (_s (ustring-titlecase (_u "aBc"))
         => Abc

There are string normalization transforms. In Unicode, an accented letter can be represented with either a presentation form, where the letter and the combining accent have one codepoint, or in a decomposed form, where the letter and the combining codepoints have separate codepoints. Here a string is decomposed.

     (ustring #x00d1)
     =>
         U+00D1 LATIN CAPITAL LETTER N WITH TILDE
     
     (ustring-normalize-nfd (ustring #x00d1))
     =>
         U+004E LATIN CAPITAL LETTER N
         U+0303 COMBINING TILDE

Now, a string is composed, combining accents into presentation forms, when possible.

     (ustring #x006e #x0303)
     =>
         U+006E LATIN SMALL LETTER N
         U+0303 COMBINING TILDE
     
     (ustring-normalize-nfc (ustring #x006e #x0303))
     =>
         U+00F1 LATIN SMALL LETTER N WITH TILDE

There is the bidirectionalization transform, which takes a string in logical, first-to-last order and transforms it into left-to-right order. Hebrew and Arabic characters are swapped when they are found.

     (ustring #x05d7 #x5d1 #x05e8)
     =>
         U+05D7 HEBREW LETTER HET
         U+05D1 HEBREW LETTER BET
         U+05E8 HEBREW LETTER RESH
     
     (ustring-bidi-visualize
      (ustring #x05d7 #x5d1 #x05e8)
      *ubidi-default-ltr* *ubidi-reorder-default* *ubidi-option-default* 0)
     =>
         U+05E8 HEBREW LETTER RESH
         U+05D1 HEBREW LETTER BET
         U+05D7 HEBREW LETTER HET

There is the Arabic shaping transform, where logical Arabic letters are changed to their correct cursive forms for proper presentation.

     (ustring #x0627 #x0644 #x0628 #x0627 #x0628)
     =>
         U+0627 ARABIC LETTER ALEF
         U+0644 ARABIC LETTER LAM
         U+0628 ARABIC LETTER BEH
         U+0627 ARABIC LETTER ALEF
         U+0628 ARABIC LETTER BEH
     
     (ustring-shape-arabic
      (ustring #x0627 #x0644 #x0628 #x0627 #x0628)
      *u-shape-letters-shape*)
     =>
         U+FE8D ARABIC LETTER ALEF ISOLATED FORM
         U+FEDF ARABIC LETTER LAM INITIAL FORM
         U+FE92 ARABIC LETTER BEH MEDIAL FORM
         U+FE8E ARABIC LETTER ALEF FINAL FORM
         U+FE8F ARABIC LETTER BEH ISOLATED FORM

Okay, that was your quick tour of GuICU. There are a lot more functions listed in the reference that have been demonstrated here.


Next: , Previous: Tutorial, Up: Top

3 Reference


Next: , Previous: Reference, Up: Reference

3.1 Codepoints

The following procecures operate on unicode characters, or codepoints. It is invalid to pass them character constants, such as #\x without first converting them to integers.

A codepoint is an integer between 0 and #x10ffff. Some values in that range are not considered Unicode characters. The single surrogate codepoints (U+D800 to U+DFFF), byte-order marks (U+FFFE to U+FFFF), and the unassigned range (U+FDD0 to U+FDEF) are not valid codepoints.

— Procedure: codepoint? char

Return #t if char is a valid codepoint. Otherwise, it returns #f.

— Procedure: codepoint-name char

Returns the Unicode name of the character as a string.

3.1.1 Binary properties of codepoints

— Procedure: codepoint-alpha? char
— Procedure: codepoint-alphabetic? char

Returns #t if the codepoint is alphabetic.

— Procedure: codepoint-lower? char
— Procedure: codepoint-lower-case? char

Return #t if char is lowercase.

— Procedure: codepoint-upper? char
— Procedure: codepoint-upper-case? char

Returns #t is char is uppercase.

— Procedure: codepoint-punct? char

Returns #t if char is punctuation.

— Procedure: codepoint-digit? char
— Procedure: codepoint-numeric? char

Retursn #t if char is a number. Non-decimal numbers, like Roman numerals, are not included.

— Procedure: codepoint-xdigit? char

Returns #t if char is a decimal number or a hexadecimal digit.

— Procedure: codepoint-alnum? char

Returns #t of char is alphabetic or is a decimal number.

— Procedure: codepoint-space? char
— Procedure: codepoint-whitespace? char

Returns #t if char is whitespace. This includes vertical whitespace like carriage returns, linefeeds, vertical tabs, and form feeds.

— Procedure: codepoint-blank? char

Returns #t if char is a horizonal whitespace.

— Procedure: codepoint-cntrl? char
— Procedure: codepoint-control? char

Returns #t if char is a control character.

— Procedure: codepoint-graph? char

Returns #f is char is a space, control, surrogate, or unassigned character. Essentially it returns #t if this printing this character would uses ink.

— Procedure: codepoint-print? char

Returns #f if char is vertical whitespace, a control character, a surrogate, or is unassigned.

3.1.2 Codepoint case conversion

— Procedure: codepoint-upcase char
— Procedure: codepoint-downcase char

Returns the uppercase or lowercase version of char if it exists and can be represented as a codepoint. If it does not exist or cannot be represented as a codepoint, then char is returned.

— Procedure: codepoint-titlecase char
— Procedure: codepoint-foldcase char

Returns the titlecase and foldcase version of char if it exists and can be represented as a codepoint. If it does not exist or cannot be represented as a codepoint, then char is returned.

— Procedure: codepoint-foldcase-exclude-special-i char

Returns the foldcase of char, excluding special processing for Turkish letters “i”.

3.1.3 Screen width

— Procedure: codepoint-xterm-width char

Returns the number of screen columns that the char is likely to occupy on a terminal or terminal emulator. As this is dependent on the user's setup, it may not be correct in all circumstances.

— Procedure: codepoint-wcwidth char

Returns the number of screen columns that char should have according to the system's wcwidth routine, or -1 if the width is unknown.

This behavior of this procedure depends on the LC_CTYPE of the current locale.

— Procedure: codepoint-east-asian-width char

Returns the width of char according to the Unicode East Asian width database. The return value will be one of following symbols: wide, narrow, fullwidth, halfwidth, neutral, or ambiguous.

3.1.4 Comparison

Since codepoints are integers, they can be compared using the standard operators =, <, etc.

There are procedures for case insensitive comparison of codepoints.

— Procedure: codepoint-ci=? ...
— Procedure: codepoint-ci<=? ...
— Procedure: codepoint-ci<? ...
— Procedure: codepoint-ci>? ...
— Procedure: codepoint-ci>=? ...

Given zero or more codepoints, do a case-insensitive comparison. If only zero or one codepoints are given, the result is #t. The default behavior with reference to Turkish letters “i” is assumed.


Previous: Codepoints, Up: Reference

3.2 UStrings

3.2.1 Predicates

Functions that help test if a ustring has a given property.

— Procedure: ustring? str

Returns #t if str is a ustring.

— Procedure: ustring-null? str

Return #t if str contains zero-length Unicode string.

— Procedure: ustring-any pred str

Returns a non-false value if pred is true for any of the codepoints in ustring str. pred can be a ustring, or it can be a procedure that takes one argument which is a codepoint and returns a boolean value.

— Procedure: ustring-every pred str

Returns a non-false value if pred is true for all of the codepoints in ustring str. pred can be a ustring, or it can be a procedure that takes one argument which is a codepoint and returns a boolean value.

3.2.2 Constructors, Codepages, and List/UString conversion

These functions create ustring types from other guile types.

— Procedure: string->ustring str
— Procedure: _u str

If str is a properly-encoded UTF-8 string, a ustring is returned. If str is not proper UTF-8, an error is thrown.

— Procedure: ustring->string str
— Procedure: _s str

Returns a string that contains the UTF-8 representation of the ustring str.

— Procedure: codepage-name cp

Given a string cp that is the name of a codepage, this function returns ICU's preferred name for that codepage, or #f if it does not understand that codepage.

The returned string can be used as an available conversion locale in the functions codepage-string->ustring and ustring->codepage-string.

— Procedure: display-codepage-names

Display the codepages from which the library can convert strings to Unicode strings. An entry from this list can be used as an available conversion locale in the functions codepage-string->ustring and ustring->codepage-string.

Some sample codepages are

US-ASCII
A 7-bit encoding for English.
ISO-8859-1
An 8-bit encoding for western European languages.
iso-8859_10-1988
An 8-bit encoding for Nordic languages.
iso-8859_11-2001
An 8-bit encoding for Thai.
iso-8859_14-1988
An 8-bit encoding for Gaelic.
— Procedure: codepage-string->ustring str cp

Given a string str encoded in codepage cp, a ustring is returned.

— Procedure: ustring->codepage-string str cp

Returns a string that contains the codepage-dependent representation of ustring str. Codepoints that cannot be represented in the indicated locale will be dropped.

Here's a tip. Try converting a string to its compatability (NFKD) decompsition before converting a ustring to a limited character set, like US-ASCII or ISO-8859-1.

— Procedure: codepoint-list->ustring lst

Returns a ustring generated from the a list of codepoints.

— Procedure: ustring->codepoint-list str

Returns a list of codepoints generated from the ustring str.

— Procedure: ustring ch1 ch2 ... chN

Creates a ustring from the codepoints ch1 through chN.

— Procedure: make-ustring K chr

Returns a ustring containing K copies of codepoint chr.

— Procedure: ustring-dump str

A debugging convenience function that prints to the screen the names of the characters in STR

3.2.3 String Selection

— Procedure: ustring-length str

Returns the number of codepoints in str.

— Procedure: ustring-ref str k

Returns the kth codepoint of ustring str.

— Procedure: usubstring str [start [end]]

Returns a new ustring extracted from str between codepoint locations start (inclusive) and end (exclusive). start will default to zero, and end will default to the number of codepoints in str.

— Procedure: ustring-take str k

Return a ustring containing the first k codepoints of str.

— Procedure: ustring-drop str k

Return a ustring containing all but the first k codepoints of str.

— Procedure: ustring-take-right str k

Return a ustring containing the last k codepoints of str.

— Procedure: ustring-drop-right str k

Return a ustring containing all but the last k codepoints of str.

3.2.4 String Modification

— Procecure: ustring-set! str k chr

Store chr in codepoint position k of ustring str. The position k must be a valid position for str.

3.2.5 String Comparison

— Procedure: ustring=? s1 s2 ...
— Procedure: ustring<? s1 s2 ...
— Procedure: ustring<=? s1 s2 ...
— Procedure: ustring>? s1 s2 ...
— Procedure: ustring>=? s1 s2 ...

Returns #t if s1 and s2 (and, optionally, more ustrings) have a given lexicographic relationship. s1 and s2 will be equal if their normalized forms are the same length and have the same codepoints.

— Procedure: ustring-ci=? s1 s2 ...
— Procedure: ustring-ci<? s1 s2 ...
— Procedure: ustring-ci<=? s1 s2 ...
— Procedure: ustring-ci>? s1 s2 ...
— Procedure: ustring-ci>=? s1 s2 ...

Returns #t if s1 and s2 (and, optionally, more ustrings) have a given lexicographic relationship. s1 and s2 will be equal if their normalized, case-folded forms are the same length and have the same codepoints.

— Procedure: ustring-raw=? s1 s2

Returns #t if s1 and s2 have the same UTF16 representation. No normalization occurs before comparison.

3.2.6 Searching

— Procedure: ustring-contains s1 s2 options [start1 [end1 [start2 [end2]]]]

Returns the index where s2 occurs in s1, or #f if it is not found. If start1 and end1 are set, it restricts the search to substring of s1 between start1 (inclusive) and end1 (exclusive). If start2 and end2 are set, it tries to match that substring of s2.

3.2.7 Alphabetic Case Mapping

For letters that have cases, these functions modify their case. Titlecase is when the first letter of each word it capitalized. Foldcase is when when a string is converted to uppercase and then back to lowercase. For some languages, this is different from the conversion to lowercase.

Note that the rules for case conversion do depend of the locale. To check the locale, one could try (setlocale LC_ALL) to read the system locale. To check that GuICU has understood the system locale, the following function is given.

— Procedure: get-icu-locale

Returns a string containing the underlying library's understand of the locale to use for locale-specific ustring functions.

— Procedure: ustring-upcase str [start [end]]

Returns a ustring containing the uppercase of str. If start and end are set, it returns a ustring of str where the codepoints from start and end have been converted to uppercase. The returned ustring may have a different number of codepoints than the str.

— Procedure: ustring-downcase str [start [end]]
— Procedure: ustring-titlecase str [start [end]]
— Procedure: ustring-foldcase str [start [end]]

Similar to ustring-upcase, but for the lowercase, titlecase, and case folding operations.

— Procedure: ustring-upcase! str [start [end]]
— Procedure: ustring-downcase! str [start [end]]
— Procedure: ustring-titlecase! str [start [end]]
— Procedure: ustring-foldcase! str [start [end]]

As above, except that the ustring str is modified in place. The return value is unspecified.

— Procedure: ustring-foldcase-exclude-special-i str [start [end]]
— Procedure: ustring-foldcase-exclude-special-i! str [start [end]]

As above, except that it does not distinguish between dotted and dotless letters “i”.

3.2.8 Appending strings

— Procedure: ustring-append . args

Return a ustring that is formed by appending the args of type ustring.

3.2.9 Normalization aka Normalisation

String normalization is the process of replacing one sequence or characters with another equivalent sequence.

For latin alphabets, there is canonical equivalence between precomposed characters, like, U+00E1 latin small letter a with acute, with their decompositions U+0061 latin small letter a and U+0301 combining acute accent.

There is compatability equivalence between characters that appear approximately the same, such as, U+FF21 fullwidth latin capital letter a and U+0041 latin capital letter a, or, U+00B5 micro sign and U+03BC greek small letter mu.

As a consequence, there are functions to transform a ustring to one of four normalizations.

Normalization Form D (NFD) is a canonical decomposition, typically splitting precomposed characters into a base character and a set of combining marks.

Normalization Form C (NFC) is a canonical composition. It combines base characters with combining marks into precomposed characters.

Normalization Form KD (NFKD) is a compatibility decomposition, splitting precomposed characters and replacing characters with their more common compatability equivalents.

Normalization Form KC (NFKC) is like NFKD, with the additional step that base characters with combining marks are replaced with their precomposed forms.

— Procedure: ustring-normalize-nfc str
— Procedure: ustring-normalize-nfd str
— Procedure: ustring-normalize-nfkc str
— Procedure: ustring-normalize-nfkd str

Given a ustring str, returns a new ustring that contains the normalized form of str.

— Procedure: ustring-normalise-nfc str
— Procedure: ustring-normalise-nfd str
— Procedure: ustring-normalise-nfkc str
— Procedure: ustring-normalise-nfkd str

Given a ustring str, returns a new ustring that contains the normalised form of str.

3.2.10 Boundary analysis and line-wrapping

These functions return lists where a given type of break it allowed.

A grapheme is, for Latin languages, a base character with its optional combining marks. Usually a grapheme appears as a single “character” when printed, even if its representation has multiple code points. Some Hangul codepoints also combine to form single graphemes.

Note that all of these functions are affected by the curent locale. The command setlocale can be used to get or set the locale, and get-icu-locale can be used to return this library's understanding of the current locale.

— Procedure: ustring-grapheme-break str [start [end]]

Returns a pair of lists. The first list of the locations of graphemes in ustring str, (optionally between positions start and end). The second list is the type of grapheme. For now, this is always the symbol graph.

— Procedure: ustring-word-break str [start [end]]

Returns the locations and types of allowable word breaks. It returns a list of two lists. The first list is the codepoint location of a word break. The second returned list has one of the following symbols for each possible word break:

number
for “words” that appear to be numbers,
letter
for words that contain non-CJK letters,
kana
for words containing kana characters,
ideo
for words containing ideographic characters,
none
for “words” that do not fit in any other category.

— Procedure: ustring-line-break str [start [end]]

Returns the locations and types of allowable line breaks. It returns a list of two lists. The first list is the codepoint location of a line break. The second returned list has one of the following symbols for each possible line break:

soft
where a line break is acceptable but not required
hard
where a line break is mandatory, usually becase there is a <CR> in the string

— Procedure: ustring-sentence-break str [start [end]]

Returns the locations and types of allowable sentence breaks. It returns a list of two lists. The first list is the codepoint location of a sentence break. The second returned list has one of the following symbols for each possible sentence break:

term
for sentences ended by punctuation
sep
for sentences ended by <CR>, <LF> or the end of input.

3.2.11 Console width

Since console applications are common, here are functions that describe the number of columns a given character or string would take when printed on a console, such as an xterm.

— Procedure: ustring-xterm-width str [start [end]]

Return the number of columns that ustring str would take to print on a console. If start and end are provided, it returns the console width of the substring between start and end.

While the width should be valid for common terminal emulators, your mileage may vary.

— Procedure: ustring-xterm-width-list str [start [end]]

Return a list of column locations that of the codepoints of ustring str would take to print on a xterm-like console. If start and end are provided, it returns the console width of the substring between start and end.

3.2.12 Bidirectionalization (aka Bidirectionalisation)

Most European languages are written left to right. Arabic and Hebrew are among the languages written right to left. In both cases, Unicode strings are stored in logical order: they are not stored left-to-right in the string, but instead are stored from first letter to be output to last letter to be output.

For some windowing systems or consoles that have a preference for left-to-right text, it will be necessary to convert strings from logical order to left-to-right visual order before they are displayed.

— Procedure: ustring-bidi-visualize str level mode options write-options

Given a ustring str that represents a line of text, return a new ustring where the text is in left-to-right visual order. Hebrew and Arabic substrings will be reversed where they occur.

The level indicates the underlying directional prefence for the paragraph as a whole. A level of zero will firmly set the underlying directional preference for the paragraph as left-to-right. A level of 1 will give right-to-left. A level of *ubidi-default-ltr* will try to determine the level from str, and default to left-to-right if it cannot be determined. A level of *ubidi-default-rtl* does the opposite.

The mode is one of the following constants.

*ubidi-reorder-default*
Use default behavior.
*ubidi-reorder-numbers-special*
When a word begins with digits and ends with right-to-left letters, the visualization will have the visualized right-to-left letters followed by digits. The default behavior would be to have digits followed by the visualized right-to-left letters.
*ubidi-reorder-group-numbers-with-r*
Numbers will usually be visualized right-to-left unless bookended by text that is left-to-right.

The options is either of the following constants.

*ubidi-option-default*
Use default behavior.
*ubidi-option-remove-controls*
If the special Unicode characters U+200E left-to-right mark and U+200F right-to-left mark were used in str to clarify the ordering of a passage, this option removes those characters in the returned string.

The write-options is zero or a logior of zero or more of the following integers.

*ubidi-do-mirroring*
Replace characters with their mirror-image mappings, if they have mirror-image mappings. Primarily this is for characters such a parentheses. In the visualized text right parenthesis will be replaced by left parenthesis and vice-versa.
*ubidi-keep-base-combining*
When outputting right-to-left text, combining characters will still appear to the the “right” of the base characters in visualized strings. This option is likely necessary in console applications.
*ubidi-output-reverse*
After all other processing is complete, reverse the codepoints in the text before output.
*ubidi-remove-bidi-controls*
If the special Unicode characters u+200e left-to-right mark and U+200F right-to-left mark were used in str to clarify the ordering of a passage, this option removes those characters in the returned string.

This option is redundant, as *ubidi-option-remove-controls* could have been set in options.

— Procedure: ustring-bidi-visualize-and-map str level mode options write-options

This is the same as ustring-bidi-visualize except that the return value is a list in which the first element is the visualized ustring, and the second element is list that provides a logical-to-visual index map. Some values in the map may be #f if *ubidi-option-remove-controls* was set, which indicates that a codepoint in str does not exist in the returned ustring.

— Procedure: ustring-bidi-visualise
— Procedure: ustring-bidi-visualise-and-map

Same as ustring-bidi-visualize and ustring-bidi-visualize-and-map respectively.

3.2.13 Arabic Shaping

In Arabic languages, a letter looks differently depending on its position in a word. An Arabic letter can have one of four presentations: initial, medial, final, or isolated. A Unicode string containing Arabic text will usually contain the logical letter and rely on the display to convert its presentation form. Some displays have the capability to determine the presentation form and some do not.

For systems that do not have that capability, each Arabic letters must be converted into one of four presentation forms. This is called shaping.

Also, Arabic languages have their own characters for the digits 0 through 9. In the shaping process, European digits can be replaced with Arabic digits.

— Procedure: ustring-shape-arabic str options

Given a ustring containing unshaped Arabic, return a string with shaped Arabic.

Somewhat oddly, if zero is used as the (default) option, this procedure just returns an identical copy of str: Arabic letters are untouched and digits are unmodified.

options is a logior of the following:

*u-shape-letters-shape*
Replace abstract letters with shaped presentations. This should usually be used.
*u-shape-letters-unshape*
Replace shaped presentations with abstract letters.
*u-shape-letters-shape-tashkeel-isolated*
Replace abstract letters with shaped presentations, including tashkeel forms.
*u-shape-text-direction-logical*
Assume that the characters in the string are in logical order.
*u-shape-text-direction-visual-ltr*
Asssume that the characters in the string are in visual left-to-right order.
*u-shape-digits-en2an*
Replace European numbers with Arabic numbers.
*u-shape-digits-an2en*
Replace Arabic numbers with European numbers.
*u-shape-digits-alen2an-init-lr*
Replace European digits with Arabic digits if, by context, the surrounding text is Arabic. For digits that appear without sufficient context, European digits are assumed.
*u-shape-digits-alen2an-init-al*
Replace European digits by Arabic digits if, by context, the surrounding text is Arabic. For digits that appear without sufficient context, Arabic digits are assumed.
*u-shape-digit-type-an*
When converting to Arabic digits, use standard Arabic digits.
*u-shape-digit-type-an-extended*
When converting to Arabic digits, use a digit form more common in Persian or Urdu.
*u-shape-aggregate-tashkeel*
When an Arabic shadda appears before one of dammatan, kasratan, fatha, damma or kasra, replace it with ligature forms.
*u-shape-preserve-presentation*
When shaping a string that already has some characters converted to presentation forms, do not alter the presentation forms.

Except when the option *u-shape-aggregate-tashkeel* has been chosen, the returned ustring should have the same number of codepoints.


Next: , Previous: Reference, Up: Top

Appendix A Bibliography

[Unicode Consortium 2007]

The Unicode Consortium. The Unicode Standard, Version 5.0.0, defined by: The Unicode Standard, Version 5.0 (Boston, MA, Addison-Wesley, 2007. ISBN 0-321-48091-0) (http://www.unicode.org/version/Unicode5.0.0/).


Next: , Previous: Bibliography and References, Up: Top

Appendix B GNU Free Documentation License

Version 1.2, November 2002
     Copyright © 2000,2001,2002 Free Software Foundation, Inc.
     59 Temple Place, Suite 330, Boston, MA  02111-1307, USA
     
     Everyone is permitted to copy and distribute verbatim copies
     of this license document, but changing it is not allowed.
  1. PREAMBLE

    The purpose of this License is to make a manual, textbook, or other functional and useful document free in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.

    This License is a kind of “copyleft”, which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.

    We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.

  2. APPLICABILITY AND DEFINITIONS

    This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated herein. The “Document”, below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as “you”. You accept the license if you copy, modify or distribute the work in a way requiring permission under copyright law.

    A “Modified Version” of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.

    A “Secondary Section” is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.

    The “Invariant Sections” are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none.

    The “Cover Texts” are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words.

    A “Transparent” copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of text. A copy that is not “Transparent” is called “Opaque”.

    Examples of suitable formats for Transparent copies include plain ascii without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification. Examples of transparent image formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by some word processors for output purposes only.

    The “Title Page” means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, “Title Page” means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text.

    A section “Entitled XYZ” means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as “Acknowledgements”, “Dedications”, “Endorsements”, or “History”.) To “Preserve the Title” of such a section when you modify the Document means that it remains a section “Entitled XYZ” according to this definition.

    The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License.

  3. VERBATIM COPYING

    You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.

    You may also lend copies, under the same conditions stated above, and you may publicly display copies.

  4. COPYING IN QUANTITY

    If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.

    If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.

    If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.

    It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.

  5. MODIFICATIONS

    You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:

    1. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission.
    2. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five), unless they release you from this requirement.
    3. State on the Title page the name of the publisher of the Modified Version, as the publisher.
    4. Preserve all the copyright notices of the Document.
    5. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices.
    6. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this License, in the form shown in the Addendum below.
    7. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document's license notice.
    8. Include an unaltered copy of this License.
    9. Preserve the section Entitled “History”, Preserve its Title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section Entitled “History” in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence.
    10. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the “History” section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission.
    11. For any section Entitled “Acknowledgements” or “Dedications”, Preserve the Title of the section, and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein.
    12. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles.
    13. Delete any section Entitled “Endorsements”. Such a section may not be included in the Modified Version.
    14. Do not retitle any existing section to be Entitled “Endorsements” or to conflict in title with any Invariant Section.
    15. Preserve any Warranty Disclaimers.

    If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.

    You may add a section Entitled “Endorsements”, provided it contains nothing but endorsements of your Modified Version by various parties—for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.

    You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.

    The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.

  6. COMBINING DOCUMENTS

    You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers.

    The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.

    In the combination, you must combine any sections Entitled “History” in the various original documents, forming one section Entitled “History”; likewise combine any sections Entitled “Acknowledgements”, and any sections Entitled “Dedications”. You must delete all sections Entitled “Endorsements.”

  7. COLLECTIONS OF DOCUMENTS

    You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.

    You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.

  8. AGGREGATION WITH INDEPENDENT WORKS

    A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, is called an “aggregate” if the copyright resulting from the compilation is not used to limit the legal rights of the compilation's users beyond what the individual works permit. When the Document is included in an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document.

    If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire aggregate, the Document's Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate.

  9. TRANSLATION

    Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will prevail.

    If a section in the Document is Entitled “Acknowledgements”, “Dedications”, or “History”, the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title.

  10. TERMINATION

    You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.

  11. FUTURE REVISIONS OF THIS LICENSE

    The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/.

    Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License “or any later version” applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation.

B.0.1 ADDENDUM: How to use this License for your documents

To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page:

       Copyright (C)  year  your name.
       Permission is granted to copy, distribute and/or modify this document
       under the terms of the GNU Free Documentation License, Version 1.2
       or any later version published by the Free Software Foundation;
       with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
       Texts.  A copy of the license is included in the section entitled ``GNU
       Free Documentation License''.

If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the “with...Texts.” line with this:

         with the Invariant Sections being list their titles, with
         the Front-Cover Texts being list, and with the Back-Cover Texts
         being list.

If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives to suit the situation.

If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.


Previous: GNU Free Documentation License, Up: Top

Index


Footnotes

[1] What! A one sentence section? Apparently it is a requirement that there be an “invoking” section in every GNU manual. So here it is.

[2] Yeah, rockin' the VIC-20. 22 characters by 23 lines is all anyone should need.

[3] Wilfred Owen, Anthem for Doomed Youth

[4] I'd like to just display the strings, but, apparently, texinfo is kind of crap a non-European languages.