Friday, March 19, 2010

How to do database migration testing/ ETL testing effectively and quickly?

My earlier post, How to do real database testing (10 tips to perform serious database tests)?, turned out to be quite popular. You should know about database migration testing too. View my video on Database Migration Testing/ ETL testing (the volume is a bit low so, if needed, please turn Subtitles on by clicking Cc in the YouTube player) or read on...

Database migration testing is needed when you move data from the old database(s) to a new database. The old database is called the legacy database or the source database and the new database is called the target database or the destination database. Database migration may be done manually but it is more common to use an automated ETL (Extract-Transform-Load) process to move the data. In addition to mapping the old data structure to the new one, the ETL tool may incorporate certain business-rules to increase the quality of data moved to the target database.

Now, the question arises regarding the scope of your database migration testing. Here are the things that you may want to test.
1. All the live (not expired) entities e.g. customer records, order records are loaded into the target database. Each entity should be loaded just once i.e. there should not be a duplication of entities.
2. Every attribute (present in the source database) of every entity (present in the source database) is loaded into the target database.
3. All data related to a particular entity is loaded in each relevant table in the target database.
4. Each required business rule is implemented correctly in the ETL tool.
5. The data migration process performs reasonably fast and without any major bottleneck.

Next, let us see the challenges that you may face in database migration testing.
1. The data in the source database(s) changes during the test.
2. Some source data is corrupt.
3. The mappings between the tables/ fields of the source databases(s) and target database are changed by the database development/ migration team.
4. A part of the data is rejected by the target database.
5. Due to the slow database migration process or the large size of the source data, it takes a long time for the data to be migrated.

The test approach for database migration testing consists of the following activities:

I. Design the validation tests
In order to test database migration, you need to use SQL queries (created either by hand or using a tool e.g. a query creator). You need to create the validation queries to run against both the source as well as the target databases. Your validation queries should cover the scope defined by you. It is common to arrange the validation queries in a hierarchy e.g. you want to test if all the Orders records have migrated before you test for all OrderDetails records. Put logging statements within your queries for the purpose of effective analysis and bug reporting later.

II. Set up the test environment
The test environment should contain a copy of the source database, the ETL tool (if applicable) and a clean copy of the target database. You should isolate the test environment so that it does not change externally.

III. Run your validation tests
Depending on your test design, you need not wait for the database migration process to finish before you start your tests.

IV. Report the bugs
You should report the following data for each failed test:
    a. Name of the entity that failed the test
    b. Number of rows or columns that failed the test
    c. If applicable, the database error details (error number and error description)
    d. Validation query
    d. User account under which you run your validation test
    e. Date and time the test was run

Keep the tips below in mind to refine your test approach:

1. You should take a backup of the current copies of the source and target databases. This would help you in case you need to re-start your test. This would also help you in reproducing any bugs.
2. If some source data is corrupt (e.g. unreadable or incomplete), you should find out if the ETL tool takes any action on such data. If so, your validation tests should confirm these actions. The ETL tool should not simply accept the corrupt data as such.
3. If the mappings between the tables/ fields of the source and target databases are changed frequently, you should first test the stable mappings.
4. In order to find out the point of failure quickly, you should create modular validation tests. If your tests are modular, it may be possible for you to execute some of your tests before the data migration process finishes. Running some tests while the data migration process is still running would save you time.
5. If the database migration process is manual, you have to run your validation queries externally. However, if the process uses an ETL tool, you have the choice to integrate your validation queries within the ETL tool.

I hope that you are comfortable with the concept of database migration testing. (whether  data is migrated between binary files and an RDBMS or between RDBMSs (Oracle, SQL Server, Informix or Sybase)).