Quick Progress!
Parsing for fun and profit
I am not sure if it was my previous experience with parsing and Antlr (or Irony), or how clean and expressive the GOLD parser framework is, but I've been able to create a working parser in just a few days of coding. And by working, I mean it parses and understands valid LLVM IR straight from the project's documentation:
I actually implemented a a lot more operations and keywords needed to parse the example. It was just easier that way. I am going to now start building an AST. I've decided to be a bit aggressive because I think the AST will drive some Grammar refactoring. I'd like to do the refactoring sooner rather than later. Right now the grammar is fairly small and I'd like to refactor using the AST as a guide. (Think BDD, but the behavior is defined for me, in CIL...i just need to start outputing CIL to validate/test!). This will also allow me to structure the document better and add more comments for posterity.
So my next steps:
I am hoping to have 1-3 done in the next few days. 4-6 soon after that. I have very little experience with bootstrapping, so i'll probably have a look at the emscripten project as a guide. For those who dont know (or if i am using the wrong terminology), basically a general C/C++ library will use standard calls (syscalls, printf, cout, etc, etc) out to what is known as libc (though thats a specific library, there are many implementations). I will need to create a shim routes those calls to the respective mscorelib class/method or find a sutible replacement library on the .net framework (take SDL or Opengl for example...)
Note to those trying to create an grammar for LLVM IR. you see that c"<String>" notation? Its not mentioned ANYWHERE in their documentation. I am assuming it is short hand to create a character array from the given string. I've found a few other little syntax surprises hidden in the example code of the documentation. Its not too bad...yet. Because I am not worrying about my parser validating the syntax, i think I may even be able to ignore some of those surprises.
So my next steps:
- Create an AST
- Generate IL with the AST
- Create a running and valid CIL
- Refactor GOLD Grammar
- Add missing Operations and Instructions to Grammar
- Add respective AST logic
- Find small C (or C++) library to test Grammar and IL generator (lzip is a contender)
- Find a libc to bootstrap: ulibC or newlib are contenders
I am hoping to have 1-3 done in the next few days. 4-6 soon after that. I have very little experience with bootstrapping, so i'll probably have a look at the emscripten project as a guide. For those who dont know (or if i am using the wrong terminology), basically a general C/C++ library will use standard calls (syscalls, printf, cout, etc, etc) out to what is known as libc (though thats a specific library, there are many implementations). I will need to create a shim routes those calls to the respective mscorelib class/method or find a sutible replacement library on the .net framework (take SDL or Opengl for example...)
Note to those trying to create an grammar for LLVM IR. you see that c"<String>" notation? Its not mentioned ANYWHERE in their documentation. I am assuming it is short hand to create a character array from the given string. I've found a few other little syntax surprises hidden in the example code of the documentation. Its not too bad...yet. Because I am not worrying about my parser validating the syntax, i think I may even be able to ignore some of those surprises.
Comments