Expected behavior of Document Intelligence classification with dynamic banner content

Question

Expected behavior of Document Intelligence classification with dynamic banner content

IT Cognity 0

Hello,

We are using Azure AI Document Intelligence for document classification/data extraction in a Salesforce-based digital onboarding flow.

We are attaching two sample utility bill documents/images that should belong to the same functional document category. However, classification is not happening correctly.

In these samples, there are visible differences between the documents. One difference is the marketing/information banner area. The banner is placed in the same predefined section of the document and has approximately the same layout/position, but the actual banner content/image can change dynamically. This content is managed directly by Marketing/COPS teams without technical involvement.

At the same time, there are also other visual/layout differences between the sample documents, such as branding, colors, section styling, spacing, field positioning, and general template presentation.

We would like Microsoft’s official guidance for this specific scenario and for the general expected behavior of Document Intelligence classification.

Specifically

Based on the attached sample documents, is the classification issue more likely caused by the dynamic banner content itself, or by the broader layout/template differences between the documents?

For documents where the banner is always placed in the same section, with the same approximate position and dimensions, but only the banner image/content changes dynamically, is Document Intelligence expected to classify the document correctly without retraining?

More generally, if dynamic banner content in the same layout can potentially affect classification, what is Microsoft’s recommended approach?

- Should the banner area be masked/excluded before analysis?

- Should the model be trained with multiple representative banner variations?

Our goal is to confirm whether the current behavior is expected product behavior, whether the banner alone could explain the issue, or whether the misclassification is more likely caused by the combination of banner changes and other layout/template differences.

SRILAKSHMI C 19,110 Reputation points Microsoft External Staff Moderator

2026-05-27T15:22:29.4633333+00:00
Hello @IT Cognity

Thank you for reaching out to Microsoft Q&A.

Based on the behavior you described, the current classification result is more likely related to the overall combination of layout/template differences rather than the dynamic banner content alone.

In Azure AI Document Intelligence, classification models primarily learn from both structural layout features and visual/textual cues across the entire document page, including:

• Relative positioning of sections and fields

• Text distribution and alignment

• Visual structure and spacing

• Repeating layout patterns

• Headers, branding areas, tables, and section organization

From the details shared, small graphic swaps such as dynamic marketing banners can influence classification, but it is usually the combination of:

• Banner/image changes

• Branding/theme/color differences

• Section styling modifications

• Spacing and alignment changes

• Field positioning differences

• General template presentation changes

that pushes the document toward a different classification boundary.

In your case, it is likely that both the banner variation and the broader template/layout differences are contributing, with the broader layout changes probably having a larger impact than the banner alone.

Regarding the banner specifically:

If the banner always appears in approximately the same location, with similar dimensions/layout, and only the banner image/content changes dynamically, this alone would typically not be expected to completely break classification in a well-generalized model.

However, classification models do not automatically ignore those regions by default. The model still evaluates visual and textual signals across the full page. Because of this, entirely unseen banner graphics or highly dynamic promotional content can still affect classification confidence, especially when combined with additional template/layout variations.

If the banner is the only changing element and the classifier has been trained using enough representative banner variations, the model can usually generalize successfully. However, with completely unseen graphics, classification accuracy may still degrade unless those variations are represented during training or the region is excluded from analysis.

General recommendations for this scenario are:

Train with representative variations Include multiple real-world document variants during classifier training, especially samples containing:

Different banner versions, Seasonal/promotional content changes, Branding/theme variations, Known template variants used in production

This helps the classifier generalize better across expected visual changes.

Focus on stable layout regions If possible, maintain consistency in the core business-relevant document structure across versions, including key field positions and section organization.

Consider masking or excluding highly dynamic visual regions (optional) If the banner area is highly dynamic (large image changes, promotional graphics, major color/theme shifts, variable text blocks, etc.), preprocessing the document to mask, blur, or exclude that region before classification can improve consistency.

This approach is especially useful when:

• The banner appears in fixed coordinates/regions

• Marketing content changes frequently

• The stable document structure is otherwise highly consistent

• The banner introduces significant visual noise unrelated to document identity

Use a hybrid approach for best robustness In many production implementations, customers achieve the best results by combining both approaches:

• Masking or suppressing high-variance graphic regions AND

• Training the classifier with multiple representative real-world banner variations

This typically provides stronger robustness against future marketing/content changes.

Avoid grouping substantially different templates into a single class without sufficient representation If the documents differ significantly beyond the banner area, additional representative training samples may be required, or the documents may need to be modeled as separate template variants/classes depending on the business requirement.

Please refer this: Custom classification models in Document Intelligence https://learn.microsoft.com/azure/ai-services/document-intelligence/train/custom-classifier

I Hope this helps. Do let me know if you have any further queries.

Thank you!
IT Cognity 0 Reputation points

2026-05-29T08:58:19.21+00:00

Hello @SRILAKSHMI C **,
**Just to confirm the recommended implementation: if we choose to mask or blur the dynamic banner area, should this preprocessing be applied only to the training documents, or do you recommend applying the same preprocessing to every production document before classification as well?
Jerald Felix 13,500 Reputation points Volunteer Moderator

2026-05-29T10:23:56.31+00:00

Hello IT Cognity

Greetings! Thanks for your follow-up question!

That's a great and very practical clarification to ask and the answer is important to get right. Yes, you must apply the same masking or blurring preprocessing to both your training documents AND every production document at inference time. This is not optional it is a fundamental requirement for the approach to work correctly.

Here's why this matters and how to implement it properly:

Why Both Training and Production Must Be Treated the Same

The Document Intelligence classification model learns patterns from the documents you train it on. If your training documents have the banner area masked but your production documents still contain the dynamic banner, the model will encounter visual features during inference that it never saw during training. This mismatch will actually make classification worse, not better the model will be confused by the unexpected content in the region it learned to treat as blank or neutral.

The golden rule is: whatever preprocessing you apply to training documents must be applied identically to every production document before it is sent to the classifier.

Step 1: Define the Banner Region as Fixed Coordinates

Since you confirmed the banner always appears in the same section with approximately the same position and dimensions, define the banner region as a fixed bounding box (for example, x, y, width, height in pixels or percentage of page). Document this as a constant in your preprocessing pipeline so it is applied consistently every time.

Step 2: Apply Masking in Your Preprocessing Pipeline

Before sending any document whether training or production to the Document Intelligence classifier, apply the mask to the defined banner region. You can do this using standard image processing libraries such as Pillow in Python or any image manipulation library available in your Salesforce integration layer. Replace the banner area with a solid neutral color (white or light grey works well) so the model sees a consistent blank region regardless of what the original banner contained.

Step 3: Retrain the Classifier with Masked Training Documents

Once your preprocessing pipeline is ready, go back and apply the masking to all your existing training documents and retrain the custom classification model from scratch using the masked versions. Do not mix masked and unmasked training samples the entire training set should be preprocessed consistently.

Step 4: Apply the Same Preprocessing to Every Production Document

In your Salesforce-based onboarding flow, every document that arrives for classification must pass through the same masking step before being sent to the Document Intelligence API. This should be a mandatory step in your pipeline, not an optional one.

Step 5: Validate Before Going Live

After retraining with masked documents, test the classifier with a sample set of production documents that have also been masked especially documents with banner variations that were not in the training set. Confirm that classification confidence improves and that the model is no longer affected by banner content changes before deploying to production.

In short masking only the training data without masking production data will not help and may make things worse. The preprocessing must be symmetric across both phases. This is the correct and recommended implementation.

If this answer helps you kindly accept the answer which will help others who have similar questions.

Best Regards,

Jerald Felix.
SRILAKSHMI C 19,110 Reputation points Microsoft External Staff Moderator

2026-05-31T16:46:41.9933333+00:00

Hi @IT Cognity

Thank you for the follow-up question.

If you decide to use masking or blurring of the dynamic banner region, the recommended approach is to apply the same preprocessing consistently to both the training documents and the production documents used for classification.

The reason is that the classifier learns patterns from the training data. If the banner is masked during training but remains visible during inference (production), the model will encounter visual content that it was not trained to interpret, which can negatively impact classification accuracy. Similarly, if the banner is visible during training but masked during production, the model may rely on information that is no longer available at inference time.

For best results, the preprocessing pipeline should be consistent across the entire lifecycle:

• Training documents --> banner masked/blurred

• Validation/testing documents --> banner masked/blurred

• Production documents --> banner masked/blurred

This ensures the model learns and evaluates documents using the same visual representation.

That said, if the banner occupies a relatively small area and you can provide sufficient representative banner variations during training, masking may not be necessary. In many cases, training with diverse real-world banner examples is the preferred first approach because it allows the model to generalize while preserving the full document content.

Therefore, our general recommendation would be:

First, train and evaluate the classifier using representative samples that include the expected banner variations.

If banner variability continues to impact classification performance, consider introducing masking/blurring.

If masking/blurring is adopted, apply it consistently to both training and production documents.

This approach typically provides the most reliable and predictable classification behavior.

Thank you!
IT Cognity 0 Reputation points

2026-06-02T14:30:50.76+00:00
Hello,

Thank you for the previous clarification regarding dynamic banner content.

We would like to ask a broader architectural / product guidance question for our use case.

Currently, our Azure AI Document Intelligence implementation uses the following flow:

Custom classifier

Route to the corresponding custom extraction model based on the predicted document class 3. Extract required fields

The documents are mainly utility bills, for example electricity bills and natural gas bills. For those bills, our main business requirement is not necessarily to identify the exact supplier/company template. The most important requirement is to extract the required business fields correctly from the document.

The challenge is that suppliers may change their document layouts, banners, branding, styling, spacing, field positions, and general template presentation over time. These changes currently affect classification. We are looking for a solution that does not need constant retraining of custom classifier and custom extraction models every time a supplier makes minor layout or banner changes.

Based on this, we would like Microsoft’s recommended approach for this type of scenario.

Should we continue using Azure AI Document Intelligence with a custom classifier and separate custom extraction models, or would Microsoft recommend a different approach for documents with frequent layout/template variations?

Also, how would Microsoft recommend designing and training the selected models so that they are less dependent on supplier-specific layouts, banners, branding, and field positions?

Could you please advise what Microsoft recommends as the most suitable Azure AI approach for this scenario, and how the models should be designed, trained, and validated to reliably extract the required fields while reducing dependency on supplier-specific layouts, banners, and template changes?

Thank you.

1 answer

Your answer

IT Cognity 0 Reputation points

2026-05-29T08:58:19.21+00:00

Hello @SRILAKSHMI C **,
**Just to confirm the recommended implementation: if we choose to mask or blur the dynamic banner area, should this preprocessing be applied only to the training documents, or do you recommend applying the same preprocessing to every production document before classification as well?
IT Cognity 0 Reputation points

2026-06-02T14:30:50.76+00:00

Hello,

Thank you for the previous clarification regarding dynamic banner content.

We would like to ask a broader architectural / product guidance question for our use case.

Currently, our Azure AI Document Intelligence implementation uses the following flow:

Custom classifier

Route to the corresponding custom extraction model based on the predicted document class 3. Extract required fields

The documents are mainly utility bills, for example electricity bills and natural gas bills. For those bills, our main business requirement is not necessarily to identify the exact supplier/company template. The most important requirement is to extract the required business fields correctly from the document.

The challenge is that suppliers may change their document layouts, banners, branding, styling, spacing, field positions, and general template presentation over time. These changes currently affect classification. We are looking for a solution that does not need constant retraining of custom classifier and custom extraction models every time a supplier makes minor layout or banner changes.

Based on this, we would like Microsoft’s recommended approach for this type of scenario.

Should we continue using Azure AI Document Intelligence with a custom classifier and separate custom extraction models, or would Microsoft recommend a different approach for documents with frequent layout/template variations?

Also, how would Microsoft recommend designing and training the selected models so that they are less dependent on supplier-specific layouts, banners, branding, and field positions?

Could you please advise what Microsoft recommends as the most suitable Azure AI approach for this scenario, and how the models should be designed, trained, and validated to reliably extract the required fields while reducing dependency on supplier-specific layouts, banners, and template changes?

Thank you.

Answer 1

Hello @IT Cognity

Thank you for bringing this excellent architectural question to our attention. We understand how critical it is to build a resilient, low-maintenance document extraction architecture for your business.

For documents subject to frequent layout, branding, and template variations—such as utility bills—Microsoft highly recommends transitioning from multiple Custom Template models to a single Custom Neural (or Custom Generative) extraction model.

Recommended Architectural Shift

Instead of using a Custom Classifier to route to supplier-specific Custom Template models, we advise consolidating this workflow.

Custom Template models rely heavily on static visual structures and positions, making them brittle to spacing or banner changes.
Custom Neural models utilize deep learning to recognize document semantics and structure, allowing them to reliably extract key-value pairs from semi-structured and unstructured documents entirely independent of strict layout adherence.

Design and Training Best Practices

To effectively design your models so they are resilient to supplier-specific layouts:

Consolidate to a Single Model: Unify your extraction requirements into a single Custom Neural extraction model. You no longer need to strictly classify and route by supplier if the fields to extract remain the same.
Diversify Training Data: Curate a highly diverse training dataset that represents the full spectrum of your production variants. Include at least 5-10 samples from each major supplier, intentionally capturing variations in banners, spacing, and styling. The neural model requires this diversity to learn the context of a field (e.g., recognizing an "Amount Due" label) rather than its absolute visual position.
Evaluate Generative Capabilities: If your utility bills vary wildly, we also recommend evaluating the Custom Generative extraction capabilities, which excel at handling entirely unstructured and dynamic templates with minimal training data.

By adopting a Custom Neural approach trained on a diverse representative dataset, you will maximize extraction accuracy while significantly reducing the operational overhead of constant model retraining.

If you found this to be helpful, please Upvote and mark it as an Accepted answer. This helps others find relevant help to similar issues on the platform.

Best regards,

Andrew S Taylor

Share via

Expected behavior of Document Intelligence classification with dynamic banner content

1 answer

Recommended Architectural Shift

Design and Training Best Practices

Your answer