Recap Best Practices: Part 1
Welcome back! I am currently revisiting my data integration series where I am highlighting data integration best practices that I have learned since my junior consultant days, these best practices are which I abide by when I am designing and building data migrations/integrations with Dynamics 365.
My previous blog discussed the following best practices:
- Error reporting
- Integration Accounts
Part 2 : Staging Databases
It took several large Data Migration projects to realise that loading large amount of data from the source data set directly to the target system was not the most practical solution, the size and complexity of the data migration profiles was very inefficient. For data migrations where merging of one or more source data sets was also required, it was clear this should not be completed in CRM and often required numerous clear downs until the logic was correct or post migration clean up activities to remove duplicate data were required. I required a solution that sat in the middle of both the source and target which would have the following properties:
- Ability to merge the data sets.
- Be a familiar platform which I know and is compatible with the data integration tools I used.
- Allow for data transformation to take place before load into CRM.
With discussion with colleagues and reading recommendations from the CRM community, it was clear that a staging area was required. It would have never occurred to me to add another component to a data migration, my thought process regarding data migrations/integrations stemmed around moving data from source to target (with no via option) only.
Location, Location and Location
When I first started out with CRM, most of my projects where On-Premise based; customers had numerous on premise servers (mostly VM’s) but also dedicated SQL instances for various applications they hosted (including CRM).
Additionally, the data integration tools I used where also on premise (i.e. Scribe Insight), so where else better than to host the new Staging database in SQL! CRM is built on top of it, it reports from it and so does Scribe; perfect – a Staging Database.
When I talk about staging databases, I refer to a new database which is independent of any data system I am integrating with. This will be either hosted in an On-Premise Microsoft SQL Server instance or Online in Azure SQL.
Tip if using Azure SQL – locate the Staging Database in the same (or as close to) the data-center/region of your Dynamics 365 instance to reduce the distance the data has to travel.
So, what are Staging Databases?
Staging Databases are used as a temporary storage locations between two (or more systems), they allow integration consultants to:
- Merge multiple source (and target) data sets whilst de-duplicating the data.
- Perform the standard ETL operations (Extraction of data, Transformation of data and Load of the data into the Staging database tables) so data is in the required format before loading into the target system.
- Provide a complete view of all merged and collated data in a single location before it is loaded into the Target (such as row counts, data volumes and size).
- Allow loading of transformed data into the target system without impacting on the Source Systems.
Why use them?
Staging Databases should be considered when there is either a large amount of data (high data volume) or multiple data sources need to be merged together and a fair amount of data merging or data transformation is required. Completing these tasks as part of a direct source-target data integration can impact the performance of the integration (and data systems) and data quality of the integrated data sets massively which can be prone to errors and failure.
Performing all the merging and data transformations in the staging database will increase the performance of the data loading into Dynamics 365 (which we all know can take along time); performance will be discussed later on in this series.
Data comes in all shapes, sizes, formats, localisations; throwing this all directly at Dynamics 365 will require significant conversion (e.g. Date Fields, Currencies, related records, Option Set mapping) which will reduce the efficiency of the data load.
Do not under estimate the importance of good quality and timely data, integrations that are very slow and frequently error will never be well received by the customer.
Data Integrations require time and investment, there is importance when designing a good and solid, quality integration. The same applies when designing and planning for Data migrations, these are often under estimated in every project, but these usually lead to the longest delays (time) that a project will experience.
A couple of years ago I attended a Scribe User Conference in Amsterdam, I was talking with one of the European Scribe MVP’s, Martin Buebl, he was discussing his latest data project (which went very well) but coined the phrase “You will never here anyone talk about a good data migration, only bad ones”. But one of the standout reasons (barring his expert knowledge of data integrations and the tool) is that it went very well was that he used a staging database to merge together a few systems (and by a few I mean a lot)!
Not every scenario requires them, for example:
Simple integrations where only one system with a small amount of data is transferred may not warrant the time and investment spent building such an integration. This also may have a Transactional/near real time requirement, where the addition of a staging component may slow down the integration time.
Think about your integration, consider if the design could benefit with a staging database and discuss it with your clients.
Building the Staging Database
Before you can use the staging database, you first need to define the schema of the Tables and Columns required.
When I design an integration which requires a staging database, I try to re-create the schema of the Dynamics 365 Entities and Fields (and data type) inside of the staging database that I will be mapping to. Whilst this may seem overkill and unnecessary to some, there is method to my madness as it actually makes it easier when building up mapping profiles and I find myself rarely having to reach for data mapping document (and this is a future best practice). There is an added bonus if the third party tool has “Auto Map” where similar or identical column names are mapped (Scribe does!).
I also learned how to create (and drop/recreate) database tables with SQL Scripts instead of manually creating these. Re-using scripts between projects can be especially useful for when using the out of the box entities and fields, these will not change and you just need to add the custom columns and datatypes that are bespoke to that customer.
Another benefit to recreating the Dynamics 365 schema in the staging database makes the data mapping profile simpler, you already know the target location and you will be able to reduce the number of target profiles you have (maybe even a single profile!).
Tips when using Staging Databases
- Build the tables with a unique Staging Database ID’s which have an index (helps with lookup performance), so you can relate imported data back to source row and will help with Error reporting and row identification if you pass this information to CRM.
- Build when you CRM Solution Design (data model and schema) have been completed, signed off and maybe even built so you that have a reference of what the target solution will look like and data mapping can be completed.
- Build an Option Set table allowing you to map option set values to target Dynamics 365 Picklist values.
- Think about building the error/transaction tables to aid with error reporting and investigation.
- Convert the database to Simple Mode (to reduce database white space usage at the beginning).
- Backup your Database frequently and immediately after you have finished building (Backup and Restore allows for faster clear downs than dropping all rows/tables).
- Build and maintain (rebuild) Indexes across larger tables frequently on key columns that will be searched across (not really required for tables with low volumes of data).
- Build useful Views which will join together Tables from a system perspective instead of using a Join in your data queries.
- Avoid performing data integrations/ETL profiles during you maintenance jobs on the staging database!
- If using an On Premise database, make sure the log files (MDF and LDF) are on separate drives.
- Allow more than 4GB Ram!
This concludes my best practices advise around Staging Database best practice and why you may wish to use one, the last article in this series will focus on:
- Performance Best Practices
- Additional Tips and Pointers
Thanks for reading,