Perhaps
the trickiest part of data migration is the source application
migration. The largest problem is the large number of different
programming languages, and different methods for accessing
databases in all of those languages.
The source migration piece of DATADrone5 is, again very modular.
Users indicate the type of source from the list of available
modules. The module then opens the files associated with the
specific type selected. The location of the customer application
code is located and DATADrone5 finds applicable pieces of
code that need to be changed via a two pass analysis and makes
those changes. DATADrone5 then converts both the language
specific code and the SQL; writing the new source out to a
new file prefixed with “2oS_” for easy identification.
Source code migration is accomplished through a process that
searches the source code looking for pieces that interact
with the database by identification of “reserve”
words of the source “SQL” language and extensions
native to the source implementation of SQL. After it identifies
those pieces, it then goes through a process of parsing the
piece, breaking it down into a syntax tree. The individual
elements of the syntax tree are then translated into their
corresponding target elements for the system that the code
is being translated to. Then the syntax tree is put back together
into a statement table, and the new statements replace the
original in the current source code.
For instance, to connect to a database in Informix ESQL/C,
the following syntax is used:
EXEC SQL CONNECT TO 'migrate_from' WITH CONCURRENT TRANSACTION;
DATADrone5 is able to identify this piece of source to convert
because ESQL/C statements begins with 'EXEC SQL' and ends
with a ';'. After identifying an applicable piece of source
code to convert, we need to parse the statement into a syntax
tree. The 'EXEC SQL' and ';' are placed in branches that are
tagged as the starting and ending portions of the statement.
The CONNECT TO ' is placed in a branch that is tagged as the
statement. The contents of 'migrate_from' are placed in a
branch that is identified as the database name. The section
'WITH CONCURRENT TRANSACTION’ is stored in a branch
which identifies it as the type of lock-block this connection
should have.
Now that we have a syntax tree for the ESQL/C statement, we
can take the individual pieces and reassemble them into the
equivalent Embedded SQL statements for the target. In this
example, the only portions that have to be modified are the
name of the database what we are connecting to, and the equivalent
statement for the type of lock-block identified in the ESQL/C
statement. The resulting statement looks like this for DB2
UDB:
EXEC SQL CONNECT TO 'migrate_to' IN SHARE MODE;
The above is a very simple example and really only demonstrates
how DATADrone5 converts code that is specific to ESQL/C. The
module for converting SQL queries between databases is shared
by each of the language specific modules. Its responsibility
is to convert any code that is tagged as an SQL query from
one database to the other. This has been built into a separate
module because the code that converts SQL queries is language
independent.
An example of an SQL query that would require migration
is:
SELECT
cust_id,
DECODE(SUBSTR(cust_type, 0, 1), 'A', 'a type', 'B', 'b type',
'C', 'c type', 'default')
FROM customers;
The important part of this statement is the DECODE. DB2 UDB,
for example does not have a DECODE, but other engines do.
The DB2 equivalent is the CASE statement. Functionally, the
parser is going to break down the query into a syntax tree
in like manner as with the ESQL/C statement above. The 'SELECT'
will be placed on a branch that will be tagged identifying
this as a select query. The database name 'customers' will
be stored on a branch tagged as the table name. The two fields
will get their own branches, identified by both the fact that
they are fields to be returned by the select query and the
order that they are presented in. The DECODE field will have
additional branches for each of the parameters passed to it.
SUBSTR is different though. As a statement that does not need
to be translated, it will not have separate for its parameters.
Once the entire query has been parsed into a syntax tree
the module can go to work putting it back together as a functional
DB2 query. The only changes to this example is converting
the DECODE into a CASE, and that conversion is performed once
we have the syntax tree. Simply, each of the parameters passed
to the DECODE are placed in their appropriate locations in
the CASE.
SELECT
cust_id,
CASE SUBSTR(cust_type, 0, 1)
WHEN 'A' THEN 'a type'
WHEN 'B' THEN 'b type'
WHEN 'C' THEN 'c type'
ELSE 'default'
END
FROM customers;
In conclusion, the source code migration basically happens
in three steps. First, we parse identified source code looking
for applicable portions of the code to migrate. Then, we generate
a syntax tree for the applicable code. If that code contains
an SQL query, or is an SQL query, then the SQL query is parsed
into an SQL syntax tree. Finally, we take the syntax trees,
convert any of the statements in them to their migrated equivalents,
and then put them back together again.
|