Chapter 3
Design of version 2
This chapter contains the design information of gaSQLParser version 2.
3.1 High level design of component
I believe that the structure of first version of this component is too ”heavy”. To make
it more lightweight, the following changes should be made:
- Make the parsing process more modular. As in current implementation,
the parsing process is executed in one indivisible process. The execution
process is following:
- Next token is identified in the SQL string.
- The token is forwarded to the SQL object builder.
- The SQL object builder tries to guess how to handle current token.
This execution process has some disadvantages:
- There can be no forward search while guessing how to handle current
token. It is impossible to make right guess for some tokens without
forward search.
- As there can be wrong guesses, there must be some kind of mechanism
to reparse already parsed tokens. This adds complexity to the code.
- Handling of whitespaces/comments. Currently, they are parsed inline, in the
order they appear in SQL statement. This means that all object building code
is cluttered with wait for next non-comment code fragments.
- Internal storage system. Currently, the storage system in use is linked list with
mirror list creation ability. Every SQL building object (FieldRef for example) is
aware of all tokens between it’s start and end position. It also has to maintain
the correct order of those tokens in case of SQL modification. This means that
one token is ”owned” by several objects, and the topmost SQL builder object
(some kind of SQLStatement, TgaSelectStm for example) owns all of them.
This owning model introduces much complexity to write new parsing
objects.
So, to overcome the problems listed above, I will plan to make the structure as
follows:
- Parsing process would be build up from following steps:
- Tokenizing. The SQL string is parsed into SQL tokens and stored in
some list. Parsing is currently implemented in TgaSimpleSQLParser.
Only thing left to do is to store parsed tokens in some kind of list.
- Token list normalizing. All whitespaces/comments should be removed
from list. The normalizing process might take in place of token list,
but maybe it would be easier to copy wanted tokens into another list.
The whitespaces/comments must be stored somewhere, probably.
- SQL structure building. As all tokens are already parsed, it can
use forward search, and such, there is no need to implement parse
reversing.
- Every token would be owned by exactly one SQL statement part. One
statement part contains one or more building blocks. A building block is either
a SQL token or another statement part. No whitespaces/comments are
considered as building blocks.