Perhaps the trickiest part of data migration is the source application migration. The largest problem is the large number of different programming languages, and different methods for accessing databases in all of those languages.
The source migration piece of DATADrone4 is, again very modular. Users indicate the type of source from the list of available modules. The module then opens the files associated with the specific type selected. The location of the customer application code is located and DATADrone4 finds applicable pieces of code that need to be changed via a two pass analysis and makes those changes. DATADrone4 then converts both the language specific code and the SQL; writing the new source out to a new file prefixed with “2oS_” for easy identification.
Source code migration is accomplished through a process that searches the source code looking for pieces that interact with the database by identification of “reserve” words of the source “SQL” language and extensions native to the source implementation of SQL. After it identifies those pieces, it then goes through a process of parsing the piece, breaking it down into a syntax tree. The individual elements of the syntax tree are then translated into their corresponding target elements for the system that the code is being translated to. Then the syntax tree is put back together into a statement table, and the new statements replace the original in the current source code.
For instance, to connect to a database in Informix ESQL/C, the following syntax is used: EXEC SQL CONNECT TO 'migrate_from' WITH CONCURRENT TRANSACTION;
DATADrone4 is able to identify this piece of source to convert because ESQL/C statements begins with 'EXEC SQL' and ends with a ';'. After identifying an applicable piece of source code to convert, we need to parse the statement into a syntax tree. The 'EXEC SQL' and ';' are placed in branches that are tagged as the starting and ending portions of the statement. The CONNECT TO ' is placed in a branch that is tagged as the statement. The contents of 'migrate_from' are placed in a branch that is identified as the database name. The section 'WITH CONCURRENT TRANSACTION’ is stored in a branch which identifies it as the type of lock-block this connection should have.
Now that we have a syntax tree for the ESQL/C statement, we can take the individual pieces and reassemble them into the equivalent Embedded SQL statements for the target. In this example, the only portions that have to be modified are the name of the database what we are connecting to, and the equivalent statement for the type of lock-block identified in the ESQL/C statement. The resulting statement looks like this for DB2 UDB:
EXEC SQL CONNECT TO 'migrate_to' IN SHARE MODE;
The above is a very simple example and really only demonstrates how DATADrone4 converts code that is specific to ESQL/C. The module for converting SQL queries between databases is shared by each of the language specific modules. Its responsibility is to convert any code that is tagged as an SQL query from one database to the other. This has been built into a separate module because the code that converts SQL queries is language independent.
An example of an SQL query that would require migration is:
DECODE(SUBSTR(cust_type, 0, 1), 'A', 'a type', 'B', 'b type', 'C', 'c type', 'default')
The important part of this statement is the DECODE. DB2 UDB, for example does not have a DECODE, but other engines do. The DB2 equivalent is the CASE statement. Functionally, the parser is going to break down the query into a syntax tree in like manner as with the ESQL/C statement above. The 'SELECT' will be placed on a branch that will be tagged identifying this as a select query. The database name 'customers' will be stored on a branch tagged as the table name. The two fields will get their own branches, identified by both the fact that they are fields to be returned by the select query and the order that they are presented in. The DECODE field will have additional branches for each of the parameters passed to it. SUBSTR is different though. As a statement that does not need to be translated, it will not have separate for its parameters.
Once the entire query has been parsed into a syntax tree the module can go to work putting it back together as a functional DB2 query. The only changes to this example is converting the DECODE into a CASE, and that conversion is performed once we have the syntax tree. Simply, each of the parameters passed to the DECODE are placed in their appropriate locations in the CASE.
CASE SUBSTR(cust_type, 0, 1)
WHEN 'A' THEN 'a type'
WHEN 'B' THEN 'b type'
WHEN 'C' THEN 'c type'
In conclusion, the source code migration basically happens in three steps. First, we parse identified source code looking for applicable portions of the code to migrate. Then, we generate a syntax tree for the applicable code. If that code contains an SQL query, or is an SQL query, then the SQL query is parsed into an SQL syntax tree. Finally, we take the syntax trees, convert any of the statements in them to their migrated equivalents, and then put them back together again.