A Little Background –
IBM Change Data Capture is part of the IBM Infosphere suite and is mainly used as part of the ETL process. It has great transformation and data replication capabilities and supports most of the enterprise databases products.
For replication, CDC captures changed data directly from database logs rather than querying the database which makes it quite efficient. It provides near real time replication but that is dependent on a lot of parameters like network latency, I/O, etc.
Since CDC has such great replication capabilities, its application can be more just the ETL process.
The Use Case –
In one such case, we are using CDC to replicate Maximo database to create replicas of Maximo for different sites. Using CDC, we are replicating just the site specific data to the replicas which helps in keeping the database size low compared to main data center.
These replicas act as standby instances of Maximo which helps maintain high availability of Maximo at site for critical assets.
Bidirectional replication is enabled to sync back data from replicas to data center in case standby instance is used as primary node due to unavailability of the data center Maximo.
The Problem Statement –
CDC replicates data very efficiently but when it comes to schema changes, its a problem. CDC does has the provision to replicate schema changes but it comes with a lot of limitations. I’ll discuss those limitations in detail for DB2 database but before that lets see what these changes are.
When a bug fix, enhancement or ifix is applied to Maximo, these have to be applied to the Maximo replicas as well. These patches may include schema changes like column length change or new stored procedure, triggers etc.
Let say we have 20 sites and we maintain a standby instance of Maximo for each of the sites. Applying these patches to all 20 instances would be a time consuming activity and would require a large amount of production downtime.
For CDC to replicate data, schema definition at both source and destination databases must be identical. So these patch deployments on all instances becomes a necessity.
We could try to replicate schema through CDC. Here are the considerations and limitations of replication schema through CDC for DB2 for LUW ver. 11.3.0–
Consideration –
The following InfoSphere CDC issues should be taken into consideration before you attempt DDL replication:
- A table targeted for DDL replication cannot be involved in any other InfoSphere CDC table mapping. That is to say, you cannot mirror from two different source tables to a single target table.
- Conflict Detection and Resolution is not supported for DDL replication.
- Differential refresh and Refreshing a Subset of Rows are not supported for tables for which DDL operations are being replicated.
- Derived columns and derived expressions are not supported for tables for which DDL operations are being replicated.
- LOB columns are selected from the database at the time of replication using the key or unique index (if any) associated with the source table. Therefore, only the current image of a LOB column field in a source table will be sent at the time of replication. If latency is present for a subscription that is replicating DDL operations and there are changes to the list of columns which make up the key used for searching, the target column may contain a null value until the next DML change on that row. If latency is present and the key of the row changes, the target column may contain a null or incorrect value
- Bidirectional replication is not supported for DDL replication.
- When InfoSphere CDC encounters certain object types that cannot be replicated, such as UDTs (user-defined columns), the table will be parked. You will need to determine if the unsupported table is essential to your replication solution. If you decide that it is not essential, you should modify your rule set to exclude the table. If you determine that the table is essential, the table will have to be dropped, re-created and its structure changed in order to be supported for DDL replication.
Limitations –
The following types of DDL changes can be replicated by InfoSphere CDC for DB2 –
- CREATE TABLE
- DROP TABLE
- ALTER TABLE ADD COLUMN
- ALTER TABLE ALTER COLUMN SET DATA TYPE
Table-related objects for which DDL replication is NOT supported by InfoSphere CDC for DB2 –
- Views
- Synonyms
- Triggers
- Materialized query tables
- Tables containing user-defined types
Database-related objects for which DDL replication is NOT supported by InfoSphere CDC for DB2 –
- Functions
- Stored procedures
- Packages
- Java classes
- Database links
- Roles
- Directories
- Dimensions
- Libraries
- Profiles
- Users
- Sequences
- Tablespaces
- Schemas
Conclusion –
With all these limitations, replication of schema through CDC is not a viable solution. A better approach would be to use deployment scripts for patch deployment on all the instances.
If you have any comments or opinions, do write them in the comments section.
Have a great day!