A Brand New Spec, S15

So, for the past few days I’ve been working on a provisional S15 mostly for fun. I was considering TimToady’s long-ago suggestion of developing a libicu replacement tuned to Perl 6′s needs, and after learning some interesting things about NFG, I finally got around to writing an S15.

After those few days, S15 has become “good enough” for inclusion into the specs repository, where it will benefit from many people being able to edit the spec. Now anyone with commit access to the specs repository will be able to improve it, as well as anyone who forks the repo :) .

See it here.

The contents of S15 are far from finished. There’s a lot of stuff that still needs working out, such as the functions of the Stringy and Unicodey roles, whether Uni is a rope of multiple Normalization Forms or just a simple string containing that mixture, and the function of string operators now. For instance,

Str ~ Str

Concatenates two strings and results in a Str. But what happens when you try

Uni ~ NFC

or any of the other multitude of combinations of string types?

What’s Next?

There are three things I see that I could do at this time:

  1. Write and fudge a bunch of S15 tests. This seems to me to be the most important thing, as it allows us to see how coding with these new things feels before they ever begin to work.
  2. Copy a bunch of S15 information to the rest of the spec. This involves at least, off the top of my head, S05, S32::Str(ing), and S02. Undoubtedly more.
  3. Start migrating the other specs to Pod6. The S15 I placed in the repository makes it the second Pod6-written document in the specs repository. I should think that now’s a good time to migrate the rest of the specs, and modify/replace the relevant scripts in the mu repository to handle Pod6. All this work would of course happen in branches.

The list is in about the order I plan on doing these things, assuming others don’t work on these things first :) .

So please, read our not-yet-stellar provisional draft S15, and get ready for the Unicode Future™.

About these ads
This entry was posted in Progress Happened, Press and tagged , , , , , , . Bookmark the permalink.

2 Responses to A Brand New Spec, S15

  1. Denormalized Unicode strings are still valid Unicode strings, but S15 states that they are “considered an error in mismatching types.”

    UAX #15 has some details about concatenation of normalized strings in section 1.4 (http://www.unicode.org/reports/tr15/#Concatenation): “In using normalization functions, it is important to realize that none of the Normalization Forms are closed under string concatenation. That is, even if two strings X and Y are normalized, their string concatenation X+Y is not guaranteed to be normalized. This even happens in NFD, because accents are canonically ordered, and may rearrange around the point where the strings are joined.”

    Fortunately section 9.1 (http://www.unicode.org/reports/tr15/#Stable_Code_Points) provides info on optimization of concatenating normalized strings while always producing a normalized result. This is only relevant though for the same normalization form.

    Anyway, thanks for working on this! I plan on helping out.

    • lueinc says:

      Thanks, I’ll be sure to look into that stuff. (I’ll likely read the entire Standard anyway at some point :P).

      I suppose I wasn’t clear enough; I was referring to not knowing which string type to use in the case of concatenation. Though perhaps on second thought it should just return a Uni there, as Uni is the type where you can throw in a bunch of differently NF’d sequences.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s