Recap Best Practices: Part 2
My previous blog discussed the use of Staging Databases as part of integration design and some tips and pointers whilst building a staging database (if you decide to use one that is).
My background with data integration is using either out of the box/free tools or 3rd party data integration software (like Scribe or KingswaySoft) and the methods for designing and building data integration have evolved over time. This blog will detail the best practices when dealing with performance issues that I have encountered over the years.
You can apply some (if not all) of these principles for bespoke data integration tools, such as:
- Web Services
- Console Applications
- Utilising a Service Bus (like Azure for integration messaging)
This will be my final* post when discussing best practices with data integration, most of which can be applied when using any tool to complete the integration.
*Star Wars did not end with a trilogy.
Part 3 : Performance Issues and Improving Performance
Integrating with Dynamics 365 can be a painful learning curve, some of the issues (that I would see as performance related) I have encountered are:
- Understand of Entity Relationships (I mean who loves integrating with the Activity and Pointer Entities??) which have lead to poor profile design operations being completed by the integration.
- Data import speed (e.g. records per second or records/second) slowing over time.
- Network Latency/Connection errors which can arise at any time during the integration.
- SQL Timeouts due to high load/queries on the D365 database
- Error exceptions such plugins/workflow failing due to lack of resource or Dynamics 365 providing us with as little detail as possible
- Server Resources (source related)
- Database Log File growth (staging database – source)
- SQL Locks (source)
- And the list goes on
Additionally, big factor to consider with data integrations and loading data directly into Dynamics 365 is that this can only be achieved via two supported methods:
- Directly through the Dynamics 365 interface, manually entering though forms, bulk editing or uploading a file using the out of the box import tool.
- Via the API using one of the many free, 3rd party or custom tools available which connect directly to one of the Dynamics 365 endpoints.
There is no supported method for loading data directly into the Dynamics 365 database, and if anyone asks you to do this, just say NO. So even if you are using a Staging DB no SQL – SQL transfers are NOT allowed.
With Dynamics 365 Online, Microsoft have protected customers by this method of tomfoolery by restricting access (or even visibility) to the Dynamics 365 Database!
All Dynamics 365 data integrations must be made through the API, which means they:
- Adhere to the Security Model of the User connecting to Dynamics and performing data load operations via their Security Roles.
- Trigger any system events or custom automation (bespoke) when record operations occur (e.g. create, update, delete etc.), such as:
- createdon is populated
- Back-end relationships are created in the database with other records in the system
- When records are created, CRM populates the createdby, owner and so on.
- Caching of data/sharing of records stored to the PoA table.
- And all additional
other magical thingssystem operations Dynamics 365 completes ( using the Force!) (such as creating an Audit logs trail etc.).
Considering the above items, we can see:
- Firstly why loading data has to be (and can only be) loaded through the API.
- Why performance maybe slowed by the Dynamics 365 and flowing through the application Layer to the database layer.
Whilst investigating why performance is being impacted, it also makes sense that we investigate ways of improving the performance of data integrations with the aim of improving:
- Data load times and record/second.
- The quality and reliability of you integrations
I will walk though a common data load scenario which will highlight a few of the performance issues are listed above and are usually encountered when completing a data migration to Dynamics 365.
I will then review the migration and make suggestions/ list my best practices on how to overcome or help mitigate these issues.
Sales Load Example
For this example, I will walk through a very basic scenario of importing data without using any special tips or best practices; this also assumes that the the data integration tool is importing 1 record at a time (discussed later on). This is a direct source to target data import and all values are made up for the purposes of this example.
If you have the requirement to load a medium size data set (e.g. the last 5 years worth of data) for a customer, and say:
- 15,000 Accounts
- 100,000 Contacts
- 6000 Opportunities
- 12,500 Opportunity Products
Dynamics 365 Online data migration from a data source.
We will use the example Sales load highlighted in my first blog, the load order would be as follows:
- Opportunity with grouped Opportunity Products
The loading of the Account data into Dynamics 365 will be relatively painless; it is a simple profile to just perform a Create operation into the CRM database. The Account data set is relatively small and this will probably complete within about 20-25 mins.
When measuring import speed, you will probably hear people talking about records/second (which a 3rd party tool will calculate for you). The average Account import speed here is about 10-15 records/second.
The next step is to load the Contact data into Dynamics 365, this profile is slightly more complicated in the fact that we have to also search for an existing Account to firstly set as the Parent Account ID and also update it with the Primary Contact value.
So instead of the one load operation to Dynamics 365, we have 3 operations with each operation requiring round trips to and from Dynamics 365 to the data integration tool:
- Create (Contact)
- Search and Update (Account and Contact)
- Update (Account)
Because of the additional work the profile has to do, the import speed may go down to an average of 3-4 records/second as the profile is having to do more work (roughly 3 times). This will take approximately 7 hours to import 100,000 Contact records.
This process is actually grouping and loading Opportunities and child Opportunity Product records together and loading these in the same data profile and contains 4 primary operations:
- Search (Account)
- Create (Opportunity)
- Search (Product)
- Create (Opportunity Product)
With the added complexity of grouping related data together (Opportunity and Opportunity Products) at the source, the profile may only achieve an average maximum of 1-2 records/second; with a maximum of 6000 rows the load time will be around 50 minutes.
Reviewing the Sale Process load example it can seen that increasing the complexity of a data profile can reduce the performance of the import speed.
For low-volume data loads, the time differences can be in minutes or hours which do not impact project timelines too much.
But, consider the impact to projects where the data sets contain millions of rows (i.e. high-volume); the potential increase in project timelines can be calculated in either days or weeks which your customers will not be happy with.
Identifying performance gains will be critical in these scenarios to reduce the time it takes to integrate the data.
So, how can we go about ways of improving data integration performance?
- Environment Optimisation – manipulating the environments around the data such as:
- Storage location
- Server Specs (On Premise only)
- Network Performance
- Data Profile Optimisation – breaking down the steps with your data profiles.
- Matching Optimisation – increasing the efficiency when matching to records in Dynamics 365.
- Record Ownership – assigning record owners in Dynamics 365.
- Record Status – Saving your record statuses until last.
There will be different environments for every single data integration, and there will be different things you can try to help improve performance without touching the data integration itself. The changes could be either hardware (e.g. increase Server Memory) or software configuration (e.g. creating a database index) optimisations.
So lets look at the environmental changes we can complete to increase performance when integrating data with Dynamics 365 – first we can split out the environments as follows:
- Source – You may have one or more source systems if merging from multiple systems; if you have a staging database this will be your source system for your data profiles when loading data into Dynamics 365.
- Integration Tool Resource – this maybe under your control (i.e. On-premise server)
- Transit – Network connection between Source, Target and the Integration Tool
- Target – Dynamics 365 (Online or On Premise).
I will use the Staging Database approach with data integrations to provide the context of the environment, if using a file then there is not too much you can do except clear down erroneous rows or the 63000 extra blank rows that Excel sometimes like to add.
So with Staging Databases, I will make the assumption that this is a SQL database (but same tips will apply to additional database types like MySQL.
Firstly I recommend you look standard best practices for optimising a database; I will list out a few of the key ones that you can actually perform yourself.
- Create Useful Indexes.
- Shrinking the database – often (if using Full Recovery mode).
- Keep on top of Database maintenance plans.
- Make sure the following routines do not interrupt/occur when your data integration is under heavy load as this will negatively impact the read speed and queries being executed against tables with the potential of generating SQL Timeouts:
- Index rebuild
The following tasks may require an Customer System Administrator to complete:
- Limit your SQL Instance Memory to 1/2 or 3/4 of the SQL Server the instance is located on, leaving the rest available for the Operating System to use.
- Log Files on separate hard drives
- Increase the SQL Server allocated memory (to enable the above!
Integration Tool Resource
Some integration tools (like Scribe Insight) require their own Server with its own set of resources (and SQL database); I recommend using the following as a guide when choosing a location for your data integration tool:
- Take the recommended System Requirements, and then double them.
For bespoke solutions e.g. console apps or web services, make sure they have:
- Decent connection to Dynamics 365.
- Adequate memory and processing power to function under extremely high load.
- A Service Account with appropriate security privileges and access to Dynamics 365.
If these are hosted in Azure and connecting to Dynamics 365 Online, how about moving to the same data center as where your Dynamics 365 instance is located? This will reduce network latency by reducing the number of connections it needs to make to Dynamics 365 via other servers.
For all tools and bespoke solutions console apps or web services – make sure they have a decent connection to Dynamics 365 (as stated above).
- Ensure that you are using a secure and encrypted HTTPS connection (applies to On Premise only).
- Make sure all IP ranges have been authorised through your Firewall for Dynamics 365
Loading data into Dynamics 365 can be affected by any number of items once the connection to D365 has been established. There are several tips I recommend completing whilst loading data into Dynamics 365:
- Disable any automation such as Workflows and Plugin or SLA which maybe triggered on import create/update.
- Turn off Email Synchronisation (just in case any get created and sent out).
- Postpone some of the out of the box internal maintenance schedules in Dynamics 365 (e.g. bulk deletion tasks).
- For On-Premise deployments, standard Database practices should be completed regularly.
- Add more resources to the CRM and SQL servers, and split out the Front and Back End server roles onto different Servers with load balancing.
Data Profile Optimisation
We have looked at my recommended best practices from an environmental point of view, next we can look at tweaking the way your data profiles operate and suggesting improvements in logic.
Profile Load Order
The first item to think about is your profile load order, each data integration will require a profile order where the data should be ordered based on the required data to create that record type. The following diagram shows an example scenario for a typical data migration.
The decision could be made to either start with Leads or start with the Product Catalog (Products and Price Lists).
My best practices when planning a data integration and the profiles ordering logic are:
- Understanding the essential “must-have” data for production go-live and build a import plan accordingly.
- Identifying independent data sets and/or configuration data and focusing on these areas first unless it is not required for go -live or flagged as “nice to have”.
- Determining your key data sets which are prerequisites for other data sets (e.g. Accounts are required before you can import Opportunities or Cases).
- Import smaller data-sets first for quick wins; if you encounter issues with a larger load and are required to be restarted/fixed then at least you have got some data in and you can concentrate of fixing the larger data-set.
- Finish with Activities/Notes and Attachments; these are often the most complicated and difficult tasks to complete; Activities can also be related to any of the proceeding data (e.g. via the Regarding field) and so I will tackle the hardest items until last. Also – Historic Activities are most likely never going to change; these could also be imported after (not part of any critical path) production go-live.
When querying source data, only select the exact columns you need to process as part of your data integration. Any additional Columns queried, but are not used become a waste of resource and can impact the performance of data profiles.
So in an example source Contact SQL Query:
SELECT [FirstName], [SurName], [Email], [PostCode], [Status]
WHERE [STATUS] = ‘Active’
Instead of the below query which returns all Columns.
SELECT * FROM CUSTOMER
WHERE [STATUS] = ‘Active’
Populating Lookup fields often requires some element of validation to ensure that the record GUID exists; e.g:
- Profile will first check to see if the record exists
- Retrieves the record GUID
- Sets the GUID in the create/update operation.
This means that for every lookup value that is set requires as part of the data integration, an extra query to and from Dynamics 365 is required before the Create/Update operation would occur. This will slow down the time it takes to complete the transaction for this row (e.g. the record/second would decrease).
On the Case Entity, you may have several lookup GUIDs to query before the Case can be created:
If we had these GUIDS before we started the import (as we should have imported the prerequisite data beforehand), then we could populate these fields without having to perform the additional query.
This would reduce the number of steps the profile has to complete to just a single Create Case step and improve the performance of data integration import speed (records/second).
For smaller data-sets, this may not be required as the data does not take much time to load; but for the larger data-sets we can see that improvements to import speed can have a positive impact on the total time it takes to load the entire data set.
To record the GUIDS of the pre-requisite data (especially configuration/Account/Contact data) I will always write back to the source data row in my staging database with the record GUID in the original profile. Or, I will build a separate profile which query Dynamics 365 (as the source) and targets the staging database. This will literally just write the GUID back to the Staging database source row.
So back to my Case import example, assuming I had captured the record GUIDS of each lookup, I could either:
- Join to the Account,Contact and Product Tables to include the GUID columns in my Case query.
- Perform a simple SQL Update Script to populate these values (this will take seconds).
Larger Volume Data-sets
Identify your largest data sets, some of these may be simple but some of them maybe complex. The aim is to improve the import speed (record/second) and the larger data-sets will gain the most from performance improvements.
The downside is, importing large volumes of data will eventually slow down table performance in Dynamics 365, which also will have a negative effect when trying to match to records as part of data profiles.
Reduce the number of steps in the profile by completing additional ETL transformations on the Staging Database (if available) – like pre-populating lookup fields; aim for straight inserts or straight update.
For simple – high volume profiles, many tools (even bespoke Console Apps/Web Services_ can take advantage of the Bulk methods (ExecuteMultiple) provided by Dynamics 365 where you can submit data in blocks as part of a single transaction. These are designed to increase the throughput of your data to Dynamics 365.
Nearly all data integrations will involve some sort of matching criteria, whether this is to return related record ID’s to populate lookup fields on import or when matching for either create new or update existing records.
Matching to records will often involve matching on key fields (in most cases a single field or key, in others the matching criteria may involve multiple field values.
Single key matching example – matching to Orders:
- OrderNumber exact match
Multiple key matching example – matching to Contacts:
- First Name AND
- Last Name AND
- Email Address AND
- Postcode all exact matches
Whilst matching to these values may be required and cannot be changed. we can actually create indexes on these columns quite easily. In Dynamics 365, you can directly add these to the Quick Find View – Find Columns for that Entity; Dynamic 365 will create a database index for you automatically, this is extremely useful for Online environments where you can directly affect the database without touching it.
Additionally – it may be useful to also create indexes on the source columns used for matching over larger data-sets.
Where unique records are required to be matched, it may be worth considering creating an Alternate Key in Dynamics 365. Ben Hosking has written a very good article on alternate keys here. This will also create an index in the CRM database which improves lookup times.
Assigning records to User or Team can take a while even through the web interface, and this may be additionally impacted by cascading relationships to the records children (if applicable) where is child is also assigned to the User or Team. So this may have a negative impact on the data import speed
Where the requirement for records to be have their ownership during a data migration/integration, I considered it best practice on larger data sets or records where cascading relationship exists (e.g. Accounts and Opportunities) to either:
- Recommended – Split out the record assignment into a completely separate profile after the data has been integrated with Dynamics 365. This will increase the performance of data integration initially creating or updating the record; the Assign action can then be completed post load.
- Or have an additional relationship to the User Entity (i.e. Assign Owner), populate that during the migration and then have Workflow post Migration/Integration set the Owner to that of the Assign Owner field.
Tip: Ensure Users:
- Have been created
- Assigned the correct Product License (if Online)
- Security Roles have been assigned which allows them to own the record type you are integrating with.
Completing Activities or setting record Statuses can have the same impact that assigning record ownership can have (e.g. cascade behavior). Also, depending on when the profile sets the status, further updates cannot (should not be in most scenarios) be made to those records through the data integration; especially with Activities – you will receive a hard Dynamics 365 error.
So my recommended approach is to consider having a separate record status profile as this will increase the performance of data integration initially creating or updating the record; then a profile to set the record Status executed post load (assuming no further record updates are required).
This completes the final blog*: Performance Best Practices blog for the data integration series, I hope you can start incorporating some of these principles into your data integration designs.
Thanks for reading,
*I lied, there is another post to come with more best practices!