Construyendo el compilador Parte II: ANTLR V4 y Python

 Python y ANTLR V4 (cuatro versiones, si, lo repito)


Bienvenidos a otro POST de mi blog Compiladores, café y curiosidad. Las tres "C" de mi vida. 
Esta semana la entrada no será tan técnica, bueno un poco nomás.
Quisiera mostrarle al lector las bondades de usar ANTLR con Python 3, las cuales pueden llegar a ser muchas dependiendo el lugar y el amor que le tengamos al lenguaje. 

ANTLR originalmente solo compila y genera el lexer y el parser para el lenguaje JAVA. Esta configuracion por defecto hace que podamos "casarnos" con JAVA o buscar una alternativa. He experimentado los ultimos meses con Python a una escala mayor. De manera que me siento mucho más comodo con ese lenguaje. ANTLR es bondadoso, y nos permite cambias de lenguaje con tan solo especificar un parametro al momento de generar la gramática. 

En especial, es este parámetro.
antlr -Dlanguage=Python3 decaf.g4

Como vemos, es sumamente sencillo lograr un cambio en el lenguaje target que generamos. Esto sin embargo trae otros inconvenientes. Java necesitaba ser compilado con el JDK, y luego podíamos usar una GUI para representar el arbol, sin embargo con python no contamos con una interfaz gráfica que haga ese trabajo "sucio" por nosotros. 
Para resolver este problema, vamos a crear un pequeño programa en python que permita "leer" el arbol, y luego imprimirlo en consola. El programa en esencia es este


from antlr4 import *
from antlr4.tree.Trees import TerminalNode
from decafLexer import decafLexer
from decafListener import decafListener
from decafParser import decafParser
import sys


class KeyPrinter(decafListener):
    def exitKey(selfctx):
        print("Hola: %s" % ctx.ID())


def traverse(treerule_namesindent=0):
    if tree.getText() == "<EOF>":
        return
    elif isinstance(tree, TerminalNode):
        print("{0}T='{1}'".format("  " * indent, tree.getText()))
    else:
        print("{0}R='{1}'".format("  " * indent,
                                  rule_names[tree.getRuleIndex()]))
        if (tree.children != None):
            for child in tree.children:
                traverse(child, rule_names, indent + 1)


def main():
    print("Imprimiendo el árbol..................")
    data = open('antlr/Python3/programs/test.txt').read()
    lexer = decafLexer(InputStream(data))
    stream = CommonTokenStream(lexer)
    parser = decafParser(stream)
    tree = parser.start()

    printer = KeyPrinter()
    walker = ParseTreeWalker()
    walker.walk(printer, tree)

    traverse(tree, parser.ruleNames)


if __name__ == '__main__':
    main()



Como notamos, este programa permite que el arbol sea impreso, este codigo fue extraido de la documentacion oficial de ANTLR. 
El resultado es la representacion del arbol, pero en consola. Asi:

R='start'
  T='class'
  T='Program'
  T='{'
  R='declaration'
    R='structDeclaration'
      T='struct'
      R='id_tok'
        T='A'
      T='{'
      R='varDeclaration'
        R='varType'
          T='int'
        R='id_tok'
          T='a'
        T=';'
      T='}'
      T=';'
  R='declaration'
    R='structDeclaration'
      T='struct'
      R='id_tok'
        T='B'
      T='{'
      R='varDeclaration'
        R='varType'
          T='int'
        R='id_tok'
          T='b'
        T='['
        T='5'
        T=']'
        T=';'
      R='varDeclaration'
        R='varType'
          T='struct'
          R='id_tok'
            T='A'
        R='id_tok'
          T='c'
        T=';'
      T='}'
      T=';'
  R='declaration'
    R='varDeclaration'
      R='varType'
        T='struct'
        R='id_tok'
          T='A'
      R='id_tok'
        T='y'
      T=';'
  R='declaration'
    R='varDeclaration'
      R='varType'
        T='struct'
        R='id_tok'
          T='A'
      R='id_tok'
        T='z'
      T=';'
  R='declaration'
    R='methodDeclaration'
      R='methodType'
        T='void'
      R='id_tok'
        T='main'
      T='('
      R='parameter'
        T='void'
      T=')'
      R='block'
        T='{'
        R='varDeclaration'
          R='varType'
            T='struct'
            R='id_tok'
              T='B'
          R='id_tok'
            T='y'
          T='['
          T='5'
          T=']'
          T=';'
        R='varDeclaration'
          R='varType'
            T='int'
          R='id_tok'
            T='i'
          T=';'
        R='varDeclaration'
          R='varType'
            T='int'
          R='id_tok'
            T='j'
          T=';'
        R='varDeclaration'
          R='varType'
            T='int'
          R='id_tok'
            T='k'
          T=';'
        R='statement'
          R='location'
            R='id_tok'
              T='i'
          T='='
          R='expression'
            R='literal'
              R='int_literal'
                T='0'
        R='statement'
          T=';'
        R='statement'
          R='location'
            R='id_tok'
              T='j'
          T='='
          R='expression'
            R='literal'
              R='int_literal'
                T='0'
        R='statement'
          T=';'
        R='statement'
          R='location'
            R='id_tok'
              T='z'
            T='.'
            R='location'
              R='id_tok'
                T='a'
          T='='
          R='expression'
            R='literal'
              R='int_literal'
                T='3'
        R='statement'
          T=';'
        R='statement'
          T='while'
          T='('
          R='expression'
            R='expression'
              R='location'
                R='id_tok'
                  T='i'
            R='op'
              R='rel_op'
                T='<='
            R='expression'
              R='literal'
                R='int_literal'
                  T='10'
          T=')'
          R='block'
            T='{'
            R='statement'
              R='location'
                R='id_tok'
                  T='y'
                T='['
                R='expression'
                  R='location'
                    R='id_tok'
                      T='j'
                T=']'
                T='.'
                R='location'
                  R='id_tok'
                    T='b'
                  T='['
                  R='expression'
                    R='literal'
                      R='int_literal'
                        T='0'
                  T=']'
              T='='
              R='expression'
                R='methodCall'
                  R='id_tok'
                    T='InputInt'
                  T='('
                  T=')'
            R='statement'
              T=';'
            R='statement'
              T='if'
              T='('
              R='expression'
                R='expression'
                  R='location'
                    R='id_tok'
                      T='y'
                    T='['
                    R='expression'
                      R='location'
                        R='id_tok'
                          T='j'
                    T=']'
                    T='.'
                    R='location'
                      R='id_tok'
                        T='b'
                      T='['
                      R='expression'
                        R='literal'
                          R='int_literal'
                            T='0'
                      T=']'
                R='op'
                  R='eq_op'
                    T='=='
                R='expression'
                  R='literal'
                    R='int_literal'
                      T='5'
              T=')'
              R='block'
                T='{'
                R='statement'
                  R='location'
                    R='id_tok'
                      T='y'
                    T='['
                    R='expression'
                      R='location'
                        R='id_tok'
                          T='j'
                    T=']'
                    T='.'
                    R='location'
                      R='id_tok'
                        T='b'
                      T='['
                      R='expression'
                        R='literal'
                          R='int_literal'
                            T='0'
                      T=']'
                  T='='
                  R='expression'
                    R='location'
                      R='id_tok'
                        T='z'
                      T='.'
                      R='location'
                        R='id_tok'
                          T='a'
                R='statement'
                  T=';'
                R='statement'
                  R='location'
                    R='id_tok'
                      T='k'
                  T='='
                  R='expression'
                    R='location'
                      R='id_tok'
                        T='factorial'
                R='statement'
                  R='expression'
                    T='('
                    R='expression'
                      R='methodCall'
                        R='id_tok'
                          T='ReturnNumber'
                        T='('
                        T=')'
                    T=')'
                  T=';'
                R='statement'
                  R='methodCall'
                    R='id_tok'
                      T='OutputInt'
                    T='('
                    R='arg'
                      R='expression'
                        R='location'
                          R='id_tok'
                            T='k'
                    T=')'
                  T=';'
                T='}'
            R='statement'
              R='location'
                R='id_tok'
                  T='y'
                T='['
                R='expression'
                  R='location'
                    R='id_tok'
                      T='j'
                T=']'
                T='.'
                R='location'
                  R='id_tok'
                    T='c'
                  T='.'
                  R='location'
                    R='id_tok'
                      T='a'
              T='='
              R='expression'
                R='location'
                  R='id_tok'
                    T='factorial'
            R='statement'
              R='expression'
                T='('
                R='expression'
                  R='location'
                    R='id_tok'
                      T='y'
                    T='['
                    R='expression'
                      R='location'
                        R='id_tok'
                          T='j'
                    T=']'
                    T='.'
                    R='location'
                      R='id_tok'
                        T='b'
                      T='['
                      R='expression'
                        R='literal'
                          R='int_literal'
                            T='0'
                      T=']'
                T=')'
              T=';'
            R='statement'
              R='methodCall'
                R='id_tok'
                  T='OutputInt'
                T='('
                R='arg'
                  R='expression'
                    R='location'
                      R='id_tok'
                        T='y'
                      T='['
                      R='expression'
                        R='location'
                          R='id_tok'
                            T='j'
                      T=']'
                      T='.'
                      R='location'
                        R='id_tok'
                          T='c'
                        T='.'
                        R='location'
                          R='id_tok'
                            T='a'
                T=')'
              T=';'
            R='statement'
              R='location'
                R='id_tok'
                  T='i'
              T='='
              R='expression'
                R='expression'
                  R='location'
                    R='id_tok'
                      T='i'
                R='op'
                  R='arith_op'
                    T='+'
                R='expression'
                  R='literal'
                    R='int_literal'
                      T='1'
            R='statement'
              T=';'
            T='}'
        T='}'
  R='declaration'
    R='methodDeclaration'
      R='methodType'
        T='int'
      R='id_tok'
        T='factorial'
      T='('
      R='parameter'
        R='parameterType'
          T='int'
        R='id_tok'
          T='n'
      T=')'
      R='block'
        T='{'
        R='statement'
          T='if'
          T='('
          R='expression'
            R='expression'
              R='location'
                R='id_tok'
                  T='n'
            R='op'
              R='eq_op'
                T='=='
            R='expression'
              R='literal'
                R='int_literal'
                  T='0'
          T=')'
          R='block'
            T='{'
            R='statement'
              T='return'
              R='expression'
                R='literal'
                  R='int_literal'
                    T='1'
              T=';'
            T='}'
          T='else'
          R='block'
            T='{'
            R='statement'
              T='return'
              R='expression'
                R='expression'
                  R='location'
                    R='id_tok'
                      T='n'
                R='op'
                  R='arith_op'
                    T='*'
                R='expression'
                  R='methodCall'
                    R='id_tok'
                      T='factorial'
                    T='('
                    R='arg'
                      R='expression'
                        R='expression'
                          R='location'
                            R='id_tok'
                              T='n'
                        R='op'
                          R='arith_op'
                            T='-'
                        R='expression'
                          R='literal'
                            R='int_literal'
                              T='1'
                    T=')'
              T=';'
            T='}'
        T='}'
  R='declaration'
    R='methodDeclaration'
      R='methodType'
        T='void'
      R='id_tok'
        T='OutputInt'
      T='('
      R='parameter'
        R='parameterType'
          T='int'
        R='id_tok'
          T='n'
      T=')'
      R='block'
        T='{'
        T='}'
  R='declaration'
    R='methodDeclaration'
      R='methodType'
        T='int'
      R='id_tok'
        T='InputInt'
      T='('
      R='parameter'
        T='void'
      T=')'
      R='block'
        T='{'
        R='statement'
          T='return'
          R='expression'
            R='literal'
              R='int_literal'
                T='0'
          T=';'
        T='}'
  R='declaration'
    R='methodDeclaration'
      R='methodType'
        T='int'
      R='id_tok'
        T='ReturnNumber'
      T='('
      R='parameter'
        T='void'
      T=')'
      R='block'
        T='{'
        R='statement'
          T='return'
          R='expression'
            R='location'
              R='id_tok'
                T='z'
              T='.'
              R='location'
                R='id_tok'
                  T='a'
          T=';'
        T='}'
  T='}'

Como vemos, es bastante interesante lo que se puede lograr. 

Comentarios

Entradas populares de este blog

Construyendo el compilador Parte I: ANTLR V4 (ya van 4 versiones, increible), café y una miradita a los parser y lexer generados con Java

Construyendo el compilador Parte V: Finalizando el analizar Semántico

Construyendo el compilador Parte III: ANTLR V4, Listener vs Visitor y la tabla de Símbolos