Nowadays, organizations deal with heterogeneous information that needs to be managed and maintained. Given the departmental structure of many organizations, each department manages its own data sources. Organizations have their data spread across multiple systems. This situation often
produces data inconsistencies, lack of information, duplicated and non-normalized data. The existence of these data quality problems can lead to inefficient decision making. Consequently, the organization may spend resources unnecessarily or reduce its profits. This is the same situation in
Clients office now.
Client Management needs to be informed about the business situations with various reports at appropriate levels. To achieve this, the IT team needs to access the data that is present at various locations with different infrastructure with respect to sustaining the software products that are used
in each location.
For example, suppose an organization whose commercial department needs to analyse information about its customer's behaviour in what concerns invoicing in order to assess the timeliness of payments. The system used for collecting this information is an ERP (Enterprise Resource Planning)
whose function is to integrate information from multiple business processes, such as sales, marketing, production and human resources.
Consider as well the marketing department that needs more specific information about customers. The system used is a CRM (Customer Relationship Management); it aims at managing information about customers and improving the relationship between them and the organization. In fact, it
provides information to coordinate all business processes that deal with customers with regard to sales and marketing campaigns. The two data sources are represented in below given Figure as tables under the names of Client_COM and Client_MKT, respectively. Both the data schemas and
some data instances are represented.
In the above figure, we observe that the information is not consistent in both sources. In fact, the zip code of client “Anas”, in the Client_COM table (on the left) does not match the corresponding value stored in the Client_MKT table (on the right). Moreover, there are customers whose information is only kept in one of the systems. For example, the information about customer “Ruby” is stored only in the Client_MKT table. These data quality problems may have a big impact in the organization. In particular, we identify the following three important consequences of the existence of data quality roblems. First, a client may not be considered in a situation where it should be. Given the existence of a marketing campaign tailored to each customer needs, the client "Philip" may be unaware of the campaign since there is no record referring to him in the Client_MKT table. Therefore, he would not join a campaign that could possible increase his satisfaction with respect to the services of the organization.
Second, performing an action based on inconsistent and outdated information can cause harm to the organization. An example that illustrates this is the case in which the client “Maria” changes her address to “Ram Nagar, Coimbatore” and communicates this change to the organization. The new address is registered within the commercial department. Unaware of this update, the marketing department keeps sending advertisements to the client's old address, and thus wasting resources.
Third, a situation where a client is contacted several times regarding the same subject due to information replicated in both systems may arise. This can lead to the customer dissatisfaction because he is contacted multiple times about the same subject. This might occur with the client “Maria” during a campaign whose goal is to contact all the existing customers recorded in the Client_MKT and Client_COM tables.
What we need
Several existing techniques provide partial solutions for the data quality problems identified above.
We highlight the following two:
(i) Data integration and cleaning in a data warehousing architecture or
(ii) Virtual Data Integration.
The first type of technology consists in a data integration and cleaning or ETL tool that transforms data coming from heterogeneous sources and produces target data containing cleaned data. If the target source is a single repository supporting data analysis, it is called a data warehouse. There is a main problem associated to this approach. The data source systems are independent from any data repository produced. Therefore, the data quality problems existing in the sources will continue to exist. To eliminate the data quality problems, the organization business processes must use only the
data stored in the integrated repositories.
If the target data repository is a data warehouse, there is a process to update it according to the modifications occurring in data sources, which is called data refreshment. Data refreshment is performed periodically (e.g., daily or weekly), except in cases where there are processes that need constantly updated information. This approach has the advantage that the integrated repository is always updated when changes occur in the source systems.
Our Situation: This technique / technology shall be used for our planned requirement.
The Second Type, Virtual data integration is a technique in which a virtual unified data view is defined over a set of data sources. Data is kept only in the source systems. The virtual integration architecture encloses a mediator (that exports a global data schema) and one wrapper for each data source (that
exports a source schema). The mediator keeps the mapping between its schema and the schema of data sources. A query posed to the mediator is decomposed into as many queries as the number of data sources that are able to answer total or partially to the query. Each wrapper is responsible for
translating the initial queries to partial queries understood by each data source, then translating the result and forwarding it back to the mediator. Finally, the mediator integrates the responses of each wrapper and process the result of the interrogation initially placed. Although it is a cheap, fast and
non-intrusive method for data integration, virtual data integration does not solve all the problems mentioned in the motivating example. In fact, similarly to the previous data integration and cleaning approach, the organization business processes must be modified so that data stored in sources is accessed via the mediator. In addition, this solution does not solve the data quality problems.
Our Situation: This technique / technology will not do what we expect because we are not planning to modify the existing database.
Three different approaches:
1. Classic client/server approach: In this approach we don't put any database in the units. We simply have the applications access the server.
Pros: It's the way databases were originally thought.
Cons: if you have high latency, complex operations could get slow, you might have to use stored procedures to reduce the number of round trips.
Our Situation: This approach will not meet our expectation since the databases are different in each units
Cons: somewhat more complex to setup (especially if you have several units / locations), breaking in any unit’s database could potentially compromise the whole system.
Pros: Better responsivity as read operations are totally local and write operations are propagated asynchronously.
Our Situation: This approach will not meet our expectation since we cannot modify the databases at different units because custom developed software’s are already using the database
3. Database Synchronization: Do all work locally and from time to time (might be once an hour, daily, weekly, whatever) write a summary with all new/modified records from the last sync operation and send to the server.
Pros: Can work with network, fast, easy to check (if the summary is readable).
Cons: We don't have real-time information.
Our Situation: This approach may be the right one for our requirement.
3.1 One-Way or unidirectional Synchronization
This way of synchronization will consume the data from different locations but update /
insert it in only single database. So the change will occur in only the database where it is
inserted / updated. This is the approach that we are planning for.
The advantages of this are
? New and updated records are copied from source to destination entity.
? New records added to the source entity are copied to the destination entity.
? Deleted files in the source are also deleted from the destination entity.
? Updated records in the source copy over older records in the destination.
3.1 Two-Way or Bidirectional Synchronization
This way of synchronization will consume the data from different locations and update /
insert it in both the databases. So the change will occur in both the database. This approach
is not the what we are looking for.
After the initial discussion with Salzer IT organization with IT heads, the goals / key are derived as below. Some of them are the key criteria’s
a) Reduce Operating Expense / Total Cost
Current version of database platform is customized according to the needs of each location. So altering / modifying the database objects / platform will be tedious and too complex because many custom applications are already written and being used. Since they are meeting the required purpose there is no need in re-inventing the wheel rather try to consume the required inputs from the locations to generate robust reports.
b) Risk Proof the Business
No clear roadmap has been laid to follow standard database design among various locations which leads to posing a risk. So instead of spending lot of efforts in stabilizing the database design, it would be a better approach to build a master database which will just communicate with the databases at
various units, consume the required inputs and populate them in the master database. The only required option is to grant access to the service to communicate with the Database. For this, we need to give permission (allow access) to the SQL Ports.
c) Standardize Database Technology
Many database versions with Numerous tables across various IT locations are adding complexity to fulfil the requirement. So it would be good, if we go with the approach on building the Central Data Source and utilizing only the required tables that are mandatory to generate reports.
d) Improve Operational Efficiency
The database at various location are consuming more disk storage for database objects which will lead the Backups consuming significant number of valuable resources and also result in High CPU utilization in executing queries against the databases. Based on the above said Master Database
approach, more effort overhead in managing objects across numerous database will result in increase in response time for core business functions.
e) Gain Competitive Business Advantages / Increase Business Value By Processing the data that are Transferred / synced from different locations into the master database and generating analytics report, it will help the management to see where we are in the business and can predict the future business.
f) Achieve Interoperability and Internationalization
The underlying databases that are used by the application at various locations, pose challenges in integrating with other location database. The predicted approach is expected to support global business operations by using a single unified version of the database.
At a very high level, the data sync / migration process involves the following steps.
a) Scope identification – identify the data objects and data, for each object that needs to be migrated
b) Data Mapping – Map the data from the source to target object. In case source and target have different data models, transformation and mapping would be essential for migration
c) Option selection – identify the migration option suitable, as per system needs such as the time taken and target DB as well as the data needs like transformation and volume
d) Migration / Sync – Perform data migration / sync to the destination system using the selected solution
e) Validation – perform audits, validations and acceptance tests to validate and certify data desination.
For the one-way data sync process (Data sync is essentially the synchronization of data from one system to another) we will develop a Database synchronization service (DSS) for performing data movement across systems. The Database Synchronization Service is the name given to the software
service that is planned to develop to provide the functionality to keep one or more secondary databases synchronized with the primary database. We may need other than data sync process like, data migration, data replication and data backup/recovery solutions, since these also provide the capability of copying data from one system to another.
Enables very high-speed movement of bulk data and metadata
• Simplest approach to export objects from source to target
• Performed as a server side job and hence more efficient (unlike legacy exp/imp where
dump file (SQL Queries) is created at the client location)
• Centralized data center consolidating information from remote sites.
Useful for any industry, central order processing.
• Useful for multiple units serving different type of customers
• Data feeds to operation data store/data warehouse, supporting Operational Business
• Being a server side job, parallel processing can be done to make data export and import
• Automatically tunes the export and import processes. Tuning parameters like BUFFER,
COMMIT, COMPRESS, DIRECT and RECORDLENGTH are not require
This stage has ten steps, with some significant milestones in the overall database sync program.
The steps are:
I) Infrastructure Setup and Requirements Gathering
II) Baseline Testing and Detailed Solution Design
III) Database Migration
IV) Application Remediation / Porting / Migration
V) Unit Testing
VI) System and Integration Testing
VII) Regression Testing
VIII) Performance Testing
X) User Acceptance Testing.
i) Infrastructure Setup and Requirements Gathering
The steps involved in this process are
a) Baseline the inventory of application components and database objects
b) Gather non-functional requirements (performance, availability, scalability)
c) Plan and provide the infrastructure (say, desktop, virtual desktop, thin client, development
tools, testing tools connectivity drivers etc.) facilities to the project team
d) Plan for the capacity and build the infrastructure for all the environments
e) Provide access to the project team and testing team to perform baseline testing
ii) Baseline Testing and Detailed Solution Design
The steps involved in this process are
a) Perform functional and performance baseline testing of the application
b) Define detailed solution design covering the database layout, database security, data
replication / synchronization, disaster recovery, configuration layout and setup, and
equivalents for the unsupported database features of the target platform
c) Define detailed approach and methodology for porting / converting / remediating
application components, migrating stored procedures and triggers, migrating schema and
data, comparing data for verification and validation
d) Document production roll-out / cut-over / go-live process in detail
e) Analyze and group the existing application interface and report generation components
based on technology stack, business functionality, etc.
f) Analyze and group the database objects such as stored procedures and triggers based on
the dependencies and SQL built in operation performed in unsupported column data types
g) Analyze other database objects such as unsupported column data types, data compression,
and internal data storage representation aspect such as Unicode impact and code page
specifics of data, etc.
h) Perform proof of concept to evaluate and select the tools / utilities to perform stored
procedure migration, SQL migration, data object migration, schema migration, data
masking / scrambling, data validation, data extraction / transformation / loading,
performance testing and load testing.
iii) Database Migration
The steps involved in this process are
a) Configure tools and build the required scripts and utilities for the database migration tasks
b) Execute the database migration tasks in the following sequence:
a. Database schema migration - migration of database objects such as databases,
table spaces, tables, views, indices and constraints (primary key, foreign key, unique
b. Migration of table structure with equivalent data types in the target database
platform for the unsupported column data types
c. Database security setup migration
d. Data migration
e. Data reconciliation
f. Data comparison and validation
g. Database housekeeping processes and scripts migration (backup, defrag, etc.)
At the end of database migration step, the project team will have a full-proof tested processes, procedure and package build for the entire database migration program.
iv) Application Development
The steps involved in this process are
a) Configure tools and build the scripts and procedures for application remediation.
b) Group the applications based on business functions and execute the application
remediation / porting tasks in the recommended sequence:
a. Migration of stored procedures, triggers, business user queries and reports
b. Remediation of front end application components (languages such as VB, PHP, JSP
c. Porting of back end application components (developed in script languages such as
perl, shell etc.)
d. Remediation of report components (developed using products such as crystal
reports, cognos, business objects, etc.)
e. Remediation of interface application components.
v) Unit Testing
a) Unit testing ensures that the individual application components (stored procedures,
triggers, front end application components, back end application components,
report components, interface components, queries) are working as expected post
remediation / conversion / porting / migration. Unit testing will usually be
performed by the development migration team to detect and fix the defects during
the component migration. The key documentation artifacts are test cases, test
scenarios, test plan, and the testing approach that drives the entire unit testing
vi) System Testing
System testing is carried out to verify and certify the end-to-end business functionality of the
migrated IT system. In addition to this, the interface systems will also be tested to prove that the
entire IT system works in a coherent fashion and produces the expected results post conversion /
migration. Automated testing tools and scripts can be leveraged to run the system test cycle
automatically and compare the results of the test.
vii) Regression Testing
Regression testing is carried out using the baseline test cases for the migrated / converted /
ported / remediated application components. It compares the results with the baseline testing
• Ensure the functional correctness of the components
• Detect and fix the defects in the components if there is any deviation
viii) Performance Testing
Performance testing is carried out to certify that the performance of the target state application is
either equivalent to the current situation or exceeds the defined goals. Performance testing tools
accelerate the performance testing exercise. It is also important to select the right candidates such
as business critical transactions and batch jobs for the scope of the performance testing exercise.
Performance testing needs to be conducted in a dedicated environment which should mimic the
production environment in terms of environment setup, data volume, server configuration,
concurrent users, etc., to obtain the perfect performance behavior of the migrated target system.
Retrofitting is the key step in applying functional changes to the migrated application components
and schema changes to the migrated database objects (component inventory baseline identified
prior to analysis and design phase) which have already been implemented post baselining
exercise. It is important to have a freeze period for the functional releases during retrofitting
phase so that all the changes are consolidated and applied in the migrated components at one go.
The testing exercise needs to be carried out to certify the functional correctness of the migrated
components post implementation of changes.
x) User Acceptance Testing
User Acceptance Testing (UAT) is executed to validate the functionality of end-to-end business
process, system transactions and business / IT user access to the migrated application system. The
key focus of UAT is to identify and provide fixes for areas where business user requirements are
incorrectly implemented in the system. The objective of the UAT is to obtain sign off from the
business users, indicating that the delivered / migrated application system with new database
technology meets requirements and is fit for production cut-over
How Synchronization is expected to work
1. Configure / input the required information that are mandatory for the database sync
2. Check the network connection either available or not.
3. If a network connection is not available, continuously check whether the network connection is
available or not
4. If the network connection is available, then try to connect to the database as defined in the
5. If the connection is success do the sync process.
6. If the connection is failure, try to connect to the next database as defined step 4 and step 5.
We are an integrated development and business promoting firm and consultancy, based in India.
Our top management consists of highly experienced and qualified professionals having minimum of
ten years of specialized experience in different sectors.
Founded in the year 2008 our company has provided business services to a growing list of highly
We take special pride in our skills of providing viable and successful marketing strategies and analysis
to clients. In addition, our handlings of promotional activities are well known for efficiency and
We propose a Time and Material Model (T & M) for the development of this project. We can assure you that
our time estimates will not vary more than 10% unless there is a vast change in the requirement.
We estimate it would take 60 man days for developing the Database Synchronization Service (DSS)
Including analysis, Design, Planning, Coding, Unit Testing and Documentation. Given below is the
high level break down. Detailed breakdown is in “Database Sync Service - App-proposal.xlsx” Excel
Total Hours 480
Total Price 76,800
Once we have received your initial profile, our team of professionals with representatives from
appropriate sections shall be available for discussion on how we can become your effective business
partners to take care of your marketing needs.