text-0.11.3.1: An efficient packed Unicode text type.

PortabilityGHC
Stabilityexperimental
Maintainerbos@serpentine.com, rtomharper@googlemail.com, duncan@haskell.org
Safe HaskellNone

Data.Text.Internal

Contents

Description

A module containing private Text internals. This exposes the Text representation and low level construction functions. Modules which extend the Text system may need to use this module.

You should not use this module unless you are determined to monkey with the internals, as the functions here do just about nothing to preserve data invariants. You have been warned!

Synopsis

Types

Internally, the Text type is represented as an array of Word16 UTF-16 code units. The offset and length fields in the constructor are in these units, not units of Char.

Invariants that all functions must maintain:

  • Since the Text type uses UTF-16 internally, it cannot represent characters in the reserved surrogate code point range U+D800 to U+DFFF. To maintain this invariant, the safe function maps Char values in this range to the replacement character (U+FFFD, '�').
  • A leading (or "high") surrogate code unit (0xD800–0xDBFF) must always be followed by a trailing (or "low") surrogate code unit (0xDC00-0xDFFF). A trailing surrogate code unit must always be preceded by a leading surrogate code unit.

data Text Source

A space efficient, packed, unboxed Unicode text type.

Constructors

Text !Array !Int !Int 

Instances

Eq Text 
Data Text 
Ord Text 
Read Text 
Show Text 
Typeable Text 
IsString Text 
NFData Text 
Monoid Text 

Construction

text :: Array -> Int -> Int -> TextSource

Smart constructor.

textP :: Array -> Int -> Int -> TextSource

Construct a Text without invisibly pinning its byte array in memory if its length has dwindled to zero.

Safety

safe :: Char -> CharSource

Map a Char to a Text-safe value.

UTF-16 surrogate code points are not included in the set of Unicode scalar values, but are unfortunately admitted as valid Char values by Haskell. They cannot be represented in a Text. This function remaps those code points to the Unicode replacement character (U+FFFD, '�'), and leaves other code points unchanged.

Code that must be here for accessibility

empty :: TextSource

O(1) The empty Text.

Utilities

firstf :: (a -> c) -> Maybe (a, b) -> Maybe (c, b)Source

Apply a function to the first element of an optional pair.

Debugging

showText :: Text -> StringSource

A useful show-like function for debugging purposes.