Sunday, April 15, 2018

Data Architecture and Management Designer : Things to consider before appearing for this exam

Today is very special day for me as I have completed my Application Architect exam by completing second exam (Data Architecture and Management Designer) on Architect journey and this is my 100th blog on Salesforce.

I would like say big thanks to Salesforce community who motivated for this and provided their support to achieve this.



Through this blog, I am going share my preparation experience and tips for everyone who all are planning for this.
  • For self study, please refer Resource Guide for Data architecture and management designer, which contains all the study material links for different topics. If you complete this, then you are ready for this exam. 
  • Development knowledge is not necessary for this exam but if you have development experience then you can easily understand Large Data Volume considerations easily.
  • Exam contains lot of scenario based question. Most of question are like which decision or step you will take for given problem being Data Architect.
  • Different ways of maintaining data quality.

Below are different topics which you need to cover for exam. Also I have accumulated below information from different source in place for quick overview.
  • Duplicate Management
Please go through matching rules and Duplicate rules and how to configure them
Different option which you configure when duplicate detected(Alert, Block, reporting etc)
  • Data Archiving and Purging
There were many question on for specific scenarios how to implement data archiving. Also cover on which case, we should use ETL tools.

Different points to remember:
    1. Aggregate info on parent object and delete child object if reporting is required only in parent info.
    2. Always plan for data archiving and purging as data keeps on increasing with time.
    3. Records in recycle bin also affects query performance so if you don't want records, then perform hard delete.
  • MDM Solution
When you have different application and different data stored in different application for customers, then implement MDM solution to have single source of truth for records. 
  • PK Chunking
You can use PK Chunking with most standard objects and all custom objects.

To enable the feature you specify the header ‘Sforce-Enable-PKChunking‘ on the job request for your Bulk API query.

                           Sforce-Enable-PKChunking: 

By default the Bulk API will split the query into 100,000 record chunks – you can use the ‘chunkSize‘header field to configure smaller chunks or larger ones up to 250,000. Larger chunk sizes will use up fewer Bulk API batches, but may not perform as well. For each object you are extracting, you might need to experiment a bit to determine the optimal chunk size.

                       Sforce-Enable-PKChunking: chunkSize=250000;

You can perform filtering while using PK Chunking by simply including a WHERE clause in the Bulk API query. In this case, there may be fewer records returned for a chunk than the number you have specified in ‘chunkSize‘. 

If an object is supported, you can also use PK Chunking to query the object’s sharing table. In this case, determining the chunks is more efficient if the boundaries are defined on the parent object record IDs, rather than the share table record IDs. To take advantage of this, you should set the value of the Parent header field to the name of the parent object. For example, when querying OpportunityShare, set Parent to Opportunity.

For example:
Customer is planning a security audit and wants to identify all the manual shares that exist on their Account records. To execute this, they can perform a bulk query on AccountShare, using the filter WHERE rowCause=Manual, with a header like this: 

                   Sforce-Enable-PKChunking: chunkSize=250000; parent=Account 
  • Data Governance & Data Stewardship
Data governance includes what type of information is required, who can create it and update it and specify quality rules(data  integrity, usability and security)

Data Stewardship includes how to maintain data and its distribution to its users.
  • Skinny Tables
You can refer this blog for Skinny Tables Overview
  • Bulk API
Cover different scenario on when to use parallel mode or serial mode for Bulk API.
If you are getting lock contention errors, then load child records order by parent record Id.
If you getting group membership locks, then use serial mode for Bulk API.
Lock contention errors may occur if you have master detail, lookup relationship and roll up summary fields defined.
Avoid full relaod operations as it will consume more resource. Try to perform incremental upload.

Please refer LDV Best practices link. It covers most of the topics for this exam.

Problem Statements

Slow report on a large object 
  • Document your org’s indexed fields.
  • Learn index selectivity rules.
  • Build reports that use indexes.
  • Don’t use formula fields (returning non-deterministic values) in filter as they are calculated on fly which will reduced the performance
 
Slow bulk data load 
  • Cleanse and transform data pre-load.
  • Disable triggers, validations, assignment rules,and workflow rules pre-load.
  • Use the Bulk API.
  • Keep a Full Queue : The Force.com platform cannot process what it doesn’t have the opportunity to process, and slowly feeding batches and jobs to Salesforce causes platform threads to sit idle when they could be processing batches. Always try to keep at least 20 batches on the queue at any given time. 

Lock contention problems while data load
  • Pre-sorting the child records by parent Id in CSV file to lessen the chance of parent record lock contention among parallel load batches
  • By deferring the org’s sharing calculations until data load finish, could significantly increase both the load and sharing calculation performance.
  • For optional lookup fields, you can avoid the locks by setting the Clear the value of this field option, which does more than just tell Salesforce what to do if your lookup record is deleted. When you set this option, whenever a record that has a lookup field to this lookup record is inserted or updated, Salesforce doesn't lock the lookup records; instead it only validates that the lookup values exist.
  • Roll Up Summary Fields : Salesforce locks those master records so it can update the appropriate roll-up summary field values. If detail records that look up to the same master record are updated simultaneously in separate batches, and those updates affect roll-up summary fields on the master record, there is a high risk that these updates will cause lock exceptions.
  • Disable triggers and Workflow rules to improve performance. Whenever trigger perform DML on other records of object then salesforce lock those records which may cause lock contention errors in parallel load. When workflow update fields then it lock the records which can result in lock contention errors.
  • Group Membership Locks: For a few special operations, Salesforce uses organization-wide group membership locks. To avoid lock exceptions when performing the following operations, you must use serial processing for your data load. 
    1. Adding users who are assigned to roles 
    2. Changing users’ roles 
    3. Adding a role to the role hierarchy or a territory to the territory hierarchy 
    4. Changing the structure of the role hierarchy or the territory hierarchy 
    5. Adding or removing members from public or personal groups, roles, territories, or queues 
    6. Changing the owner of an account that has at least one community role or portal role associated with it to a new owner who is assigned to a different role than the original owner
Full Database Backups are Slow
  • Perform incremental data backups–only backing up the data that is new or updated since the previous incremental backup. When doing this, use queries that filter records using SystemModstamp (a standard field in all objects that has an index) rather than LastModifiedDate field (not indexed).
When to use serial load option rather than parallel?
  • When you insert group members or users who are assigned to roles—or perform any other data load that requires group membership operations—Salesforce uses organization-wide group membership locks to ensure data integrity and security.
  • If you are not able to manage or avoid lock contention errors then switch to serial mode for data load.

What are Strategies for Addressing Lags in Search Indexing?

Depending on the current load and utilization of indexing servers, asynchronous text index updates may lag behind actual transactions. This lag means that stale search indexes can lead to search results not entirely representative of the current database records.

The amount of time necessary to update search indexes is directly related to the amount of text data that such loads modify, and can be quite lengthy in some cases.
So how can you architect acceptable solutions that address inevitable lags in search indexing after data loads? Here are a couple of things to consider.
  1. Disable full-text search indexing for custom objects (especially large ones) that don’t need to be searchable. 
  2. Instead of relying on the full-text search engine and SOSL, implement your application’s search feature using SOQL. Because SOQL queries target the transactional database, they’ll always return results that correspond to the latest set of committed records.

Hope this will help!!!

1 comment:

  1. Thanks for the informative article. This is one of the best resources I have found in quite some time. Nicely written and great info. I really cannot thank you enough for sharing.

    Restaurant in OMR
    Apartments in OMR
    Villas in OMR
    Resorts in OMR

    ReplyDelete