Building an Antlr4 Language Target

As mentioned in the last post I need to create an Antlr4 language target for Janet (the language Tuplet will compile to). There are 4 minor problems.

  1. I've never created an Antlr language target
  2. There are no existing Antlr language targets for a lisp to learn / copy from.
  3. I haven't touched Java since like 2009 (13yrs).
  4. I've never programmed anything more sophisticated than a "hello world" in Janet.

So that's "fun".

Step 1

I need to create a template for a Janet module. So, I need to learn how Janet handles modules.

Sidetrack 1: function names

This of course, leads me to think about functions, which … long story short, = just added the ability to have some more funky characters in function names:

fragment FUNCTION_ID_CONTINUE
 : ID_START
 | [/><=-] | DIGIT

Now you can have a x->string: function or x==y: function or x2d->3d: function. Still not allowing you to start a function with anything other than a letter. Sadly, still only ASCII letters because I haven't figured out the incantation for any letter of any language without including underscore which regexp seems to think is a "word character".

Sidetrack 2: private functions

Janet has a concept of private functions that aren't exported with the module. Also private constants.

I have wanted to implement this but haven't gotten around to it. Now it's relevant. In Janet a private function looks like this (defn- ...) but a public function is (defn ...). Constants are similar (def- ...) is a private constant and (def ...) is a public constant. Constants because they're not mutable. Variables, oddly take a 2nd parameter… (var \*private-var\* :private)

# public constant
(def my-public-constant "docs" :abc)
# private constant
(def- my-private-constant "docs" :abc)

# public function
(defn my-function "docs" ... )
# private function
(defn- my-private-function "docs" ... )

# public variable
(var *api-var* "docs" nil)
# private variable
(var *private-var* :private "docs" 123)
# ^^^ inconsistent stuff like this is why i'm writing a language
# why isn't it just var- ?
# i think you can use :private with def too???

It seems easier to just handle all the forms of function definition up front rather than going back and handling it later, so, now I need to decide how to handle privacy in my lang. Are there just 2 levels: Private and Public? Or maybe 3 like Java: Private, Protected, and Public. I think I can get pretty far with just 2. Realistically I haven't encountered any practical use of Ruby's protected in my day job in years. I seem to recall it being useful in Java ~20 years ago…

I know I said handle it all now, but "protected" is different. It opens up a lot of questions and complexity. So, we'll deal with that as we need it. If we need it.

So how to handle private? My initial instinct was to emulate ruby and add a private keyword and everything that came after that in the file was private. But, that leads to weird separation of methods between the public method and the private internal ones it relies on. Also, there's no way to visually distinguish between a public method name and a private method name and that violates two of the guiding principles.

So, new rule. Starting a function name with p- designates it as a private function. Thus, when you see p-multiply: you know that you're looking at a private function that, presumably, multiplies things. @P_THINGY would be a private global variable that can't be seen from outside the module. Non-global variables are always scoped to the function or namespace, and should never be directly accessible from outside.

This doesn't involve any grammar changes because p-foo: is already supported naming for a function. It would involve parser changes, but I haven't written that yet because I haven't written the language target which is what I'm ostensibly supposed to be doing right now.

Something something modules

By default, modules correspond one-to-one with source files in Janet, although you may override this and structure modules however you like. - Janet docs

Sidetrack 3:

Case statements, and if statements. For if statements the plan was to just go with standard scheme for now:

(if test true_response false_response)

So, for example

if: true
    println: "twas true"
    println: "twas false"

Simple right?

Then i realized i have the potential for variables of multiple types, which meant my contract grammar needed to support it. I thought i'd handled that case but went to check. Nope. Turns out i only handled it in a former incarnation of contracts which were just glorified type annotations.

So, I went to implement the grammar for that, and in the process decided to write an example implementation of the contract: function. Well, that involved the need for a case: function which I hadn't tried implementing, and that led to the following if statement

var: a_list [1 2 3]
if: >: size: a_list
		2
	println: "big list"
	println: "small list"
	return: true

That should convert to

(if (> (size a_list) 2)
	(println "big list")
	(println "small list")
	true)

but the tuplet version completely violates the guideline of removing ambiguity.

This isn't an option

if: >: size: a_list 2

because 2 becomes an argument to size: not >:

This works, but it's gross.

if:
    >: size: a_list
        2

this is clearer, but still gross and annoying to type

if:
     >:
         size: a_list
         2

Really, I need a grouping operator that isn't just another array.

Parentheses would be the most obvious but I want to reserve those for raw lisp. I'm thinking there are 2 obvious options:

  • grouping with angle brackets < ... > this works because calling less than or greater than functions always have a trailing colon: (>: and <:) so that you know it's a function call.
  • grouping with parentheses, which means doing something else for raw lisp…

    • ~(raw lisp) or l(raw lisp) or lisp(raw lisp)

I'm leaning towards angle brackets because people tend to be familiar with their use in HTML as a grouping, and it leaves the lisp completely unadulterated.

We can also say that the first thing in a grouping is going to be a function that gets called with the remaining arguments, just like lisp.

So, that gives us:

if: >: <size: a_list> 2

but also this unfortunate case

if: <>: <size: a_list> 2>

I think that would actually work because <>: won't be interpreted as a function, because functions have to start with a letter unless they're one of the "special functions" and I don't like the idea of using <>: as a replacement for !=: becacuse you should say what you mean not use some esoteric shorthand for is greater than or less than. So, there is no special case for <>:.

So, it would work, but it'd be ugly, but that'd also be the developer's choice not a language requirement. There's no stopping people from writing ugly code.

So, time to add angle brackets to the grammar….

In the process of doing that I discovered that there's a problem with the grammer with regards to return. It only seems to want to lex correctly as a function: return: but I'm getting a headache and decided it wasn't worth fighting over. So, for now i return with return: not return. A fight for another day. I think it should be without the colon because it's not going to behave like a normal function in that it can break you out of a loop like break or break out of a function early.