ANTLR in C#

In the previous post Starting out with ANTLR we look at the basics of creating a grammar and generating code from it, now let’s take that very simple grammar and integrate it into a C# application.

Here’s the grammar again (from our grammar file queryLanguage.g4)

Note: We’re going to capitalize the grammar name as this will then by more in the style of C# class names.

grammar QueryLanguage;

query
    : expression
    ;

expression
    : STRING
    | NUMBER
    | expression 'AND' expression
    | expression 'OR' expression
    ;

WS  : (' '|'\t'|'\r'|'\n')+ -> skip;

STRING : '"' .*? '"';
SIGN
   : ('+' | '-')
   ;
NUMBER  
    : SIGN? ( [0-9]* '.' )? [0-9]+;

The ANTLR4 JAR is not compatible with the ANTRL4 Nuget package, so instead for our Example application we’ll use an alternative, the Antlr4 CodeGenerator, so follow these steps to create an application

  • Create a .NET Core Console application
  • Editor the SDK project file and change netcoreapp3.1 to net472
  • Add the ANTLR4.Runtime and Antlr4.CodeGenerator Nuget packages
  • Add your QueryLanguage.g4 grammar to the project

If you select the .g4 file you can now view the properties for that file within Visual Studio 2019 and (if you wish to) change what’s generated by ANTLR. Let’s just ensure Generate Visitor is Yes.

For some reason a .NET framework 4.7.2 project does not include the properties and whilst we can edit the .csproj file to get things working, I’ve found the above steps the simplest to get ANTLR running in a .NET application at the time of writing.

I’ve found I do still need to edit the .csproj file to add the following

<ItemGroup>
  <Antlr4 Update="QueryLanguage.g4">
    <Listener>false</Listener>
    <CustomToolNamespace>Example.Generated</CustomToolNamespace>
  </Antlr4>
</ItemGroup>

<PropertyGroup>
  <Antlr4UseCSharpGenerator>True</Antlr4UseCSharpGenerator>
</PropertyGroup>

Change Example.Generated to the preferred namespace for the generated files.

Now build the project and if all goes well there should be no errors and the ANTLR code should be generated in obj/Debug/net472 (or whatever configuration you’re using).

Let’s now make some changes to our grammar to make writing Visitor code simpler by adding labels to our expression code, the changes are listed below

expression
    : STRING #String
    | NUMBER #Number
    | expression 'AND' expression #And
    | expression 'OR' expression  #Or
    ;

We use # to create a label and this will turn into a Visit function with the label, i.e. VisitAnd, VisitoOr etc.

All we’re going to do with this grammar is use the Visitor pattern/class to generate code where strings are all lowercase, AND becomes & and OR becomes |, ofcourse you could produce byte code or do all sorts of things with your input.

Create a new file name QueryLanguageVisitor.cs and it should look like this

using Example.Generated;

namespace Example
{
  public class QueryLanguageVisitor : QueryLanguageBaseVisitor<string>
  {
    public override string VisitString(QueryLanguageParser.StringContext context)
    {
      return context.GetText().ToLower();
    }

    public override string VisitNumber(QueryLanguageParser.NumberContext context)
    {
      return context.GetText();
    }

    public override string VisitAnd(QueryLanguageParser.AndContext context)
    {
      return Visit(context.expression(0)) + "&" + Visit(context.expression(1));
    }

    public override string VisitOr(QueryLanguageParser.OrContext context)
    {
      return Visit(context.expression(0)) + "|" + Visit(context.expression(1));
    }
  }
}

As you can see from the above code, we subclass QueryLanguageBaseVisitor (a generated file) and the generic parameter is set as a string as our result of running through the QueryLanguageVisitor will simply be another string.

In the case of the AND and OR which ofcourse are binary expressions, i.e. require two parameters either side of the AND or OR and these may themselves be expression, hence we call Visit those expressions.

At this point, we have nothing to actually run the QueryLanguageVisitor so in the Main method place the following code

// add these using clauses
// using Antlr4.Runtime;
// using Example.Generated;

// example expression
var expression = "\"HELLO\" AND 123";

var inputStream = new AntlrInputStream(expression);
var lexer = new QueryLanguageLexer(inputStream);
var tokenStream = new CommonTokenStream(lexer);
var parser = new QueryLanguageParser(tokenStream);

var visitor = new QueryLanguageVisitor();
var query = parser.query();
var result = visitor.Visit(query);

In the code above, we create an ANTLR input stream (you can ofcource use an AntlrFileStream if you’re taking input from a file). Next we use our generated lexer which is passed into the CommonTokenStream and this in turn is passed into our generated QueryLanguageParser.

Finally we create our newly added QueryLanguageVisitor which will have functions based upon our grammar, in our case the startRule is query hence we call this method and pass the result into the Visit method of our QueryLanguageVisitor. The result (assuming no errors) will be

"hello" & 123

A more fully featured (i.e. includes error handling) implementation would be as follows (concepts and code snippets taken from various existing samples)

public class ParserResult
{
  public bool IsValid { get; internal set; }
  public int ErrorPosition { get; internal set; } = -1;
  public string ErrorMessage { get; internal set; }
  public string Result { get; internal set; }
}

public static class Query
{
  public static ParserResult Parse(string expression, bool secondRun = false)
  {
    if (String.IsNullOrWhiteSpace(expression))
    {
      return new ParserResult
      {
        IsValid = true,
        Result = null
      };
    }

    var inputStream = new AntlrInputStream(expression);
    var lexer = new QueryLanguageLexer(inputStream);
    var tokenStream = new CommonTokenStream(lexer);
    var parser = new QueryLanguageParser(tokenStream);

    lexer.RemoveErrorListeners();
    parser.RemoveErrorListeners();
    var customErrorListener = new QueryLanguageErrorListener();
    parser.AddErrorListener(customErrorListener);
    var visitor = new QueryLanguageVisitor();

    var queryExpression = parser.query();
    var result = visitor.Visit(queryExpression);
    var isValid = customErrorListener.IsValid;
    var errorLocation = customErrorListener.ErrorLocation;
    var errorMessage = customErrorListener.ErrorMessage;
    if (result != null)
    {
      isValid = false;
    }

    if (!isValid && !secondRun)
    {
      var cleanedFormula = string.Empty;
      var tokenList = tokenStream.GetTokens().ToList();
      for (var i = 0; i < tokenList.Count - 1; i++)
      {
        cleanedFormula += tokenList[i].Text;
      }
      var originalErrorLocation = errorLocation;
      var retriedResult = Parse(cleanedFormula, true);
      if (!retriedResult.IsValid)
      {
        retriedResult.ErrorPosition = originalErrorLocation;
        retriedResult.ErrorMessage = errorMessage;
      }
      return retriedResult;
    }
    return new ParserResult
    {
      IsValid = isValid,
      Result = isValid || result != null 
        ? result
        : null,
      ErrorPosition = errorLocation,
      ErrorMessage = isValid ? null : errorMessage
    };
  }
}

public class QueryLanguageErrorListener : BaseErrorListener
{
  public bool IsValid { get; private set; } = true;
  public int ErrorLocation { get; private set; } = -1;
  public string ErrorMessage { get; private set; }

  public override void ReportAmbiguity(
    Parser recognizer, DFA dfa, 
    int startIndex, int stopIndex, 
    bool exact, BitSet ambigAlts, 
    ATNConfigSet configs)
  {
    IsValid = false;
  }

  public override void ReportAttemptingFullContext(
    Parser recognizer, DFA dfa, 
    int startIndex, int stopIndex, 
    BitSet conflictingAlts, SimulatorState conflictState)
  {
    IsValid = false;
  }

  public override void ReportContextSensitivity(
    Parser recognizer, DFA dfa, 
    int startIndex, int stopIndex, 
    int prediction, SimulatorState acceptState)
  {
    IsValid = false;
  }

  public override void SyntaxError(
    IRecognizer recognizer, IToken offendingSymbol, 
    int line, int charPositionInLine, 
   string msg, RecognitionException e)
 {
   IsValid = false;
   ErrorLocation = ErrorLocation == -1 ? charPositionInLine : ErrorLocation;
   ErrorMessage = msg;
 }
}

Now the code that uses our parser simply looks like this (and includes error handling)

var expression = "\"HELLO\" AND 123";
var result = Query.Parse(expression);