Thoughts on starting NQP on LLVM

Lately I’ve been getting into certain project ideas (which I’ll specify later if they ever become anything) that would require a GUI of some kind. Although I’ve had some success learning SFML, I’m not too much a fan of C and C++. Not because they’re bad languages, but I feel I’m spending more time making sure I’ve specified types and type casts correctly and things like that instead of actually coding1.

Also, my last C++ program segfaults right at a for loop declaration, for no discernible reason.

So, I decided to take a look at Zavolaj, which allows you to use external libraries, such as GUI toolkits, in Perl 6. I’m feeling excited about using Zavolaj and Perl 6 instead of C++ now, there’s just one roadblock left for me.

Parrot. Rakudo on Parrot is slooow, so slow that the last version of Rakudo I successfully compiled locally is 2011.09-46-g1c2c2d4, the culprit being compiling CORE.setting. I’ve barely tried since that last time (only once or twice more), because I must not have the memory to compile it.

Now I could just continue using C++ and live with it, but that won’t be very enjoyable, at least for a while2. So I wish to finally begin porting NQP to LLVM, so that I can ultimately have Rakudo run natively.

How to Port NQP

I see two ways of going about this task. The first is rewriting NQP, that is starting from scratch, and replacing all the PIR generating parts of the current NQP with LLVMIR generating parts.

Rewriting would seem to me to be the first choice, because it avoids the potential complexity of modifying parts of the existing code (bear in mind I haven’t examined NQP’s code deeply enough to know if this is more complex than a rewrite). A rewrite would also allow the option of not writing any of NQP in NQP, as long as the self-compiling part of the original NQP isn’t essential to its operation3.

The one problem is that the features of the NQP language aren’t well-documented. Except for a very helpful wikibooks page, I haven’t found much in the way of such documentation. Which would mean inspecting the original code and asking many, many question on #perl6 to get all the information needed.

Option two is rewriting the PIR generating parts of NQP to generate LLVMIR instead. Hopefuly this code resides in the .nqp files of NQP 🙂 . After that, we have two choices. The first is that we could compile this LLVMIR-generating NQP to Parrot and update the bootstrap so we get .ll and .bc files instead of .pir and .pbc files, respectively. The other choice is to manually rewrite the .pir and .pbc files and then have a completely LLVM based NQP compiler.

This option could be either as simple as “just modify the nqp files in this directory and you’ll be targeting a new backend!” or as complex as “every single file needs modification. Good luck!”.

I hope to get started on this soon, so that I can use Perl 6 all the time 🙂 .

1. Of course, I haven’t been coding C++ for long, so this could be just me being a beginner.
2. My general view on C++ is that every time I use C++, I appreciate Perl 6 more.
3. Maybe it’s a bootstrapped compiler because PIR isn’t very enjoyable 🙂 . I’ll likely find out the reason very soon.

This entry was posted in Think Tank and tagged , , , , , . Bookmark the permalink.

2 Responses to Thoughts on starting NQP on LLVM

  1. This is the appropriate weblog for anybody who desires to search out out about this topic. You understand a lot its almost exhausting to argue with you (not that I actually would need…HaHa). You undoubtedly put a new spin on a subject thats been written about for years. Great stuff, simply great!

  2. Daniel Ruoso says:

    Ok, so I have been thinking about this for a long time already. So I’ll just serialize everything that is in my brain right now. I hope it is useful 🙂

    The first two things I have to say are actually two lessons I learned when working in the SMOP experiment, and that I still hold to be true for any low level implementation — as in, not using a virtual machine with high-level stack management — need to do.

    1) Continuation-Passing-Style: This is absolutely required due to all the possible diversions that the code execution may happen (as in, phasers, control, catch and others). This is possible in LLVM using the “FastC” calling convention.

    2) Representation Polymorphism: It is very important to understand that, with the exception of the native types, every Perl 6 type is the equivalent of a C++ template, in that it only becomes concrete when you choose a representation, with an important twist, you can actually use types realized in different representations interchangeably, which means that you should be able to step back and use the Meta-OO APIs to dispatch the call.

    Now… I have mostly been thinking about generating C code to compile with LLVM (because of the fastC calling convention and CPS). And the way this would work is that we would generate, for each piece of NQP code, different ABI levels, from a truly-C-level call, to a completely abstracted invocation, and the compiler would be able to decide to which level it could go (depending on the actual types available in the lexical scope). The different levels would be:

    level 0) the function is exposed as a C symbol with the invocation completely resolved at compile time and you can only receive native types as arguments on the stack and return native types on the stack without support for exceptions of any kind, and you can only call other functions of the same level. In that level you just use the C stack as usual.

    level 1) the function is exposed as a C symbol with the invocation completely resolved at compile time and you receive non-native types of known representations (i.e.: p6object Int) and return native or non-native types of known representations by using a low-level calling convention (i.e.: no capture object). This already requires CPS (with full support for exceptions), so each function is potentially splitted in several sections as you call other functions, the function lexpad becomes a garbage-collected item. A function in this level can call functions in any other level already.

    level 2) the function is exposed as a C symbol with the invocation completely resolved at compile time and you receive non-native types of unknown representations (i.e.: a Int, but it could be p6object or p5SV) by using a low-level calling convention (i.e.: no capture object). This already requires your generated code to perform calls to those objects in the level3, but this function is still on level 2.

    level 3) the function cannot be completely resolved at compile time to a C symbol, so you need to perform the entire high-level dispatch by first retrieving the function from the object (using .^can) and then calling the function (using the equivalent of .postcircumfix:), but you only use the known representation for the Capture object.

    level 4) same as level 3, but the Capture object may have a unknown representation.

    The exposure of C symbols can use some sort of name-mangling the same way C++ does so that it encodes the presumed information in each level.

    As I said, I have been thinking about this for a while. I hope it is usefull… 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s