Perhaps
the trickiest part of data migration is the source migration.
The largest problem is the large number of different programming
languages, and different methods for accessing databases in
all of those languages.
The source migration piece of DataDrone3 is very modular.
Users indicate the type of source from the list of available
modules. The module then opens the files associated with the
specific type selected. The location of the customer application
code is located and DataDrone3 finds applicable pieces of
code that need to be changed via a two pass analysis and makes
those changes. DataDrone3 then converts both the language
specific code and the SQL then writing the new source out
to a new file prefixed with “DD3_” for easy identification.
Source code migration is accomplished through a process that
searches the source code looking for pieces that interact
with the database. After it identifies those pieces, it then
goes through a process of parsing the piece, breaking it down
into a syntax tree. The individual elements of the syntax
tree are then translated into their corresponding elements
for the system that the code is being translated into. Then
the syntax tree is put back together into a statement, and
the new statement replaces the original one in the current
source code.
For instance, to connect to a database in Informix ESQL/C,
the following syntax is used:
EXEC SQL CONNECT TO 'migrate_from' WITH CONCURRENT TRANSACTION;
DataDrone3 is able to identify this piece of source to convert
because the vast majority of ESQL/C statements begins with
'EXEC SQL' and ends with a ';'. After identifying an applicable
piece of source code to convert, we need to parse the statement
into a syntax tree. The 'EXEC SQL' and ';' are placed in branches
that are tagged as the starting and ending portions of the
statement. The 'connect to' is placed in a branch that is
tagged as the statement. The contents of 'migrate_from' are
placed in a branch that is identified as the database name.
'WITH CONCURRENT TRANSACTION', is stored in a branch which
identifies it as the type of lock-block this connection should
have.
Now that we have a syntax tree for the ESQL/C statement,
we can take the individual pieces and reassemble them into
the equivalent Embedded SQL statements. In this example, the
only portions of the example that have to be modified are
the name of the database what we are connecting to, and the
equivalent statement for the type of lock-block identified
in the ESQL/C statement. The resulting statement looks like
this:
EXEC SQL CONNECT TO 'migrate_to' IN SHARE MODE;
The above is a very simple example and really only demonstrates
how DataDrone3 converts code that is specific to ESQL/C. The
module for converting SQL queries between databases is shared
by each of the language specific modules. Its responsibility
is to convert any code that is tagged as an SQL query from
one database to the other. This has been built into a separate
module because the code that converts SQL queries is language
independent.
An example of an SQL query that would require migration is:
SELECT
cust_id,
DECODE(SUBSTR(cust_type, 0, 1), 'A', 'a type', 'B', 'b type',
'C', 'c type', 'default')
FROM customers;
The important part of this statement is the DECODE. DB2 does
not have a DECODE, but Informix does. The DB2 equivalent is
the CASE statement. Functionally, the parser is going to break
down the query into a syntax tree in like manner as with the
ESQL/C statement above. The 'SELECT' will be placed on a branch
that will be tagged identifying this as a select query. The
database name 'customers' will be stored on a branch tagged
as the table name. The two fields will get their own branches,
identified by both the fact that they are fields to be returned
by the select query and the order that they are presented
in. The DECODE field will have additional branches for each
of the parameters passed to it. SUBSTR is different though.
As a statement that does not need to be translated, it will
not have separate for its parameters.
Once the entire query has been parsed into a syntax tree
the module can go to work putting it back together as a functional
DB2 query. The only changes to this example is converting
the DECODE into a CASE, and that conversion is performed once
we have the syntax tree. Simply, each of the parameters passed
to the DECODE are placed in their appropriate locations in
the CASE.
SELECT
cust_id,
CASE SUBSTR(cust_type, 0, 1)
WHEN 'A' THEN 'a type'
WHEN 'B' THEN 'b type'
WHEN 'C' THEN 'c type'
ELSE 'default'
END
FROM customers;
In conclusion, the source code migration basically happens
in three steps. First, we parse identified source code looking
for applicable portions of the code to migrate. Then, we generate
a syntax tree for the applicable code. If that code contains
an SQL query, or is an SQL query, then the SQL query is parsed
into an SQL syntax tree. Finally, we take the syntax trees,
convert any of the statements in them to their migrated equivalents,
and then put them back together again.
|