Implementing a lexer
ReSharper requires a custom language to create a lexer that implements (at least) the ILexer interface:
The IBuffer is given to the lexer in the constructor, via ILexerFactory.
Clients of the lexer will follow these steps:
Call
Startto get the lexer to recognise the first token.Retrieve the current token type from the
TokenTypeproperty, which will be a singleton instance of a language-specific class that derives fromTokenNodeType(see the guide on Token Node Types for more details).Use the
TokenStartandTokenEndproperties to retrieve the offset of the token start and end in the text buffer. This is required because the token type is a singleton instance, and therefore cannot contain details about the location and length of the token itself. The start offset is inclusive, and the end offset is exclusive, just likeTextRange(e.g. given "Hello world", the text range (0, 5) returns "Hello").Call the
Advancemethod repeatedly, to move to the next token, which will update theTokenType,TokenStartandTokenEndproperties with the information about the current token and location.
The CurrentPosition property is a lexer specific object that encapsulates the information required by the lexer to save and restore the current location. The LexerStateCookie class can be used by parsers to make it easy to rollback to a specific state in the lexer. It implements the IDisposable interface, so it can be used in using statement:
This can be used to implement lookahead, retrieving a number of tokens ahead, then rolling back to the current position (see Lexer Utility Methods for more details).
Strongly typed lexers
The ILexer class exposes the CurrentPosition as an object, to allow lexers maximum flexibility for storing state about the current position - the lexer can return any object it wishes. However, if the lexer wishes to return a value type, this can add boxing allocations, so the lexer can also implement ILexer<TState>:
This overrides (shadows) the CurrentPosition property to be of type TState instead of object. This will allow a value type to be returned without boxing allocations. For example, if the lexer only requires an integer position (such as a caching lexer), it can implement ILexer<int>, and avoid boxing the int into object.
Similarly, the lexer can implement its state object as a struct, and return it as a strongly typed item:
The struct is copied by value to the caller of CurrentPosition, and no boxing allocations take place.
Incremental lexers
ReSharper includes infrastructure for incremental lexing, that is, only lexing the parts of a file that change, and reusing existing tokens for the rest of the file. Most of the work is handled by a caching lexer, and is covered in more detail in the section on incremental parsing.
The custom language parser can implement the ILexerEx and IIncrementalLexer interfaces:
These interfaces expose the lexer state as a uint value. If the lexer is built with CsLex, this state can be the yy_lexical_state value, which is used to decide when specific regular expression rules are applied. Alternatively, it can be used as a lookup into other (static) values, or used to encode more state information into the bits of the uint (the C# lexer uses this strategy to encode a stack of items).
The IIncrementalLexer interface has a Start method, which allows the lexer to start from an arbitrary point in the text buffer, without having to parse the preceding part of the file first. It takes a start and end offset, and also the uint state value returned from ILexerEx.LexerStateEx. These values will have been cached from a previous scan of the text buffer. The TokenBuffer and CachingLexer classes implement this.
More details are in the section on incremental parsing.