Buffers Aren’t Strings

So, the issue of this post has recently come up once again, because this:

my $buffer = Buf.new(65, 66, 67);
my $string = "ABC";
say $buffer eq $string;

infinitely recurses in Rakudo. Why? It’s because both Buf and Str do Stringy, and when eq is given disparate types, it calls .Stringy on both of them. Which returns a Buf for Buf, and a Str for Str.

Str.Stringy being a Str is normal and expected, but Buf.Stringy is the problem. If Buf didn’t do Stringy, it would be converted into a Stringy object that isn’t itself (like 4.Stringy, which is why "4" eq 4 works).

This is indicative of what I think is a huge problem in Perl 6: Bufs should not be considered Stringy at all. Since the last time this discussion came up didn’t go so well, I thought I’d put up a blog post on my thoughts, to avert the problems with trying to convey the same information on IRC.

So, Perl 6 regards strings as a high-level sequence of characters. Unlike other programming languages, you’re not required to pay attention to how strings are actually stored, or encoded, to manipulate them as you would expect. Strings in Perl 6 don’t know their storage at all, so if you do in fact need to manipulate the bytes making up its storage, you have to .encode the string to a buffer, and .decode that buffer when you want a string again.

Now, I can’t say for sure why Buf does Stringy in the first place; it’s the only thing in Perl 6 I know of where the implicit definition of the word “string” is much more general than the text-based definition we’re familiar with. What I can say though, is what I find wrong with this:

Textual data is only a subset of what buffers can handle. Buffers in Perl 6 are used to handle binary data, for example reading a binary file. This is the kind of thing buffers are designed to handle. Some of that data could be text, but that’s not all it could be. So why inherit a role that handles only some of the data you receive? Rats don’t inherit Int to handle numbers whose denominators are 1, after all.

Important to note here is that while text data is a proper subset of binary data, the Stringy role that deals with text data isn’t similarly related to the Buf role that deals with binary data. There may be some overlap, but neither fits inside the other. This brings us to the larger issue…

Strings and buffers aren’t the same. The match method for strings doesn’t make too much sense for buffers. Going the other direction, the bitwise AND operator for buffers makes no sense for strings, which don’t know their bit patterns in the first place.

However, because buffers and strings are currently linked as they are, both buffers and strings need to support (in some fashion) operations that are truly only meant for the other. This is I think the biggest and most substantive problem. Buffers and strings aren’t similar. There is no good way to relate strings and buffers without getting a clunky mess.

The best evidence for this is S03’s coverage of the buffer bitwise operators. Except for the shift operators1, every single one mentions coercion of string types to some buffer type, and then says coercion probably indicates a design error. The design error is trying to say that buffers are string-like.

These issues can be fixed by simply not saying Buf does Stringy. The Stringy role is the basis for all high-level string types, the Buf role is the basis for all low-level buffer types. They do separate things, and have separate purposes. Creating this link between them serves no purpose than to cause possible design errors and issues with infinite recursion.

This leads to a particular problem though: those bitwise ops. The ~ character signifies string-like stuff in Perl 6, which (as I’ve established) buffers aren’t. Which necessitates a new symbol. Problem is, looking at my ordinary keyboard, the only ASCII symbol that doesn’t mean something somewhere in Perl 6 yet is the backtick. Sadly, I don’t think many people will enjoy `+ and `> for their bitwise ops 🙂 , so we’ll need to go past ASCII, and come up with a Texas variant too. Some possible ideas I’ve come across so far:

€& €| €^ €> €<    (E&) (E|) (E^) (E>) (E<)

Flimsily based on the theme set by $ and ¢ --- $calar, ¢apture, and €xposed
binary data, of course. (parens in the Texas version like set ops', to avoid
thinking E is a metaop)

⅋& ⅋| ⅋^ ⅋> ⅋<    (&&) (&|) (&^) (&>) (&<)

Flimsily based on the fact that ⅋ looks cool. (parens in Texas version to
prevent conflict with &&)

⋈& ⋈| ⋈^ ⋈> ⋈<    ><& ><| ><^ ><> ><<

Bowties are cool.

Additionally, there’s the question of what kinds of methods and operators on Buf we should see, to which the answer is simple: array-like things, rather than string-like things. Bufs should be seen as a kind of list, really. (This means postcircumfix:<[]> instead of .subbuf, .push instead of infix:<~>, etc.)

Finally, just to clear up this potential point: utf8, utf16, and all the other Unicode encoding scheme2 Blobs shouldn’t do Unicodey. This is because the point of those blob types, to enforce an encoding scheme, isn’t handled by Unicodey (a high-level string-like role), and the only other stuff Unicodey offers is for string-based stuff, not buffer-based stuff.3

I realize that this isn’t the end of the discussion (we’ve got a buffer symbol to decide, after all :P). However, I don’t think I’ll ever be convinced that Buf does Stringy is right; they are just too distinct for this association to be useful, and they are distinct enough for this association to be harmful.4

I think separating the two would lead to better things, for Buf especially. For instance, I have my suspicions that Perl 6’s version of pack and unpack will be heavily centered on Buf.5 🙂

Perl 6 roles are usually adjectives, not nouns. Shouldn’t Buf be Buffy then?

1The buffer bitwise shift operators have no descriptions in S03 in the first place, and in any case are implied to handle strings much like the other buffer bitwise ops.
2Yes, scheme, not form. I’d like to see utf16le, utf16be, utf32le, and utf32be be added as types. My experience writing S15 tells me that specifying endianness with :be and :le adverbs is a poor reimplementation of the type system 🙂 . utf16 and utf32 would be kept as BOM-using (but not -requiring) variants, as they are encoding schemes too.
3The fact that the utf buffers are guaranteed to be holding textual data would suggest it’s ok for them to do Unicodey, and thus also Stringy. However, if you need to be doing string-like operations on your data, might I suggest our lovely collection of Unicode string types 😀 !
4If you think otherwise, shouldn’t Array does Stringy too? Practically the same thing 🙂 .
5Especially when you consider that the only functionality of pack/unpack Perl 6 is lacking is the ability to easily interchange complex data with low-level APIs, and thus it’s the only thing pack/unpack in Perl 6 needs to do.

Advertisements
This entry was posted in Think Tank and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s