3 Design of version 2

Chapter 3
Design of version 2

This chapter contains the design information of gaSQLParser version 2.

I believe that the structure of first version of this component is too ”heavy”. To make it more lightweight, the following changes should be made:

Make the parsing process more modular. As in current implementation, the parsing process is executed in one indivisible process. The execution process is following:
1. Next token is identified in the SQL string.
2. The token is forwarded to the SQL object builder.
3. The SQL object builder tries to guess how to handle current token.
This execution process has some disadvantages:
1. There can be no forward search while guessing how to handle current token. It is impossible to make right guess for some tokens without forward search.
2. As there can be wrong guesses, there must be some kind of mechanism to reparse already parsed tokens. This adds complexity to the code.
Handling of whitespaces/comments. Currently, they are parsed inline, in the order they appear in SQL statement. This means that all object building code is cluttered with wait for next non-comment code fragments.
Internal storage system. Currently, the storage system in use is linked list with mirror list creation ability. Every SQL building object (FieldRef for example) is aware of all tokens between it’s start and end position. It also has to maintain the correct order of those tokens in case of SQL modification. This means that one token is ”owned” by several objects, and the topmost SQL builder object (some kind of SQLStatement, TgaSelectStm for example) owns all of them. This owning model introduces much complexity to write new parsing objects.

So, to overcome the problems listed above, I will plan to make the structure as follows:

Parsing process would be build up from following steps:
1. Tokenizing. The SQL string is parsed into SQL tokens and stored in some list. Parsing is currently implemented in TgaSimpleSQLParser. Only thing left to do is to store parsed tokens in some kind of list.
2. Token list normalizing. All whitespaces/comments should be removed from list. The normalizing process might take in place of token list, but maybe it would be easier to copy wanted tokens into another list. The whitespaces/comments must be stored somewhere, probably.
3. SQL structure building. As all tokens are already parsed, it can use forward search, and such, there is no need to implement parse reversing.
Every token would be owned by exactly one SQL statement part. One statement part contains one or more building blocks. A building block is either a SQL token or another statement part. No whitespaces/comments are considered as building blocks.