Processing Invoices with AWS Textract

As more and more companies digitize, a need for Intelligent Document Processing (IDP) has emerged. Transactions and payments are a part of almost every business. Here we present a minimal IDP processing system using Textract and A2I. Amazon Textract is a managed AWS service that specializes in text detection and analysis. It has several specialized APIs for different text tasks including, among others: Detecting Text Analyzing Documents Analyzing Expense/Invoice Analyzing IDs Analyzing Lending Textract Features: Supported text format: detects typed and handwritten text Supported Modes: synchronous calls and asynchronous calls. Supported Languages: English, Spanish, German, Italian, French, and Portuguese. Supported File Formats: PDF, TIFF, JPEG, and PNG for synchronous. Only PDF and TIFF for Asynchronous. Fine Tuning & Customization: Textract can be fine tuned to fit specific needs by adding adapters. Adapters are components that plug into the Amazon Textract pre-trained deep learning model, customizing its output for your business specific documents. You create an adapter for your specific use case by annotating/labeling your sample documents and training the adapter on the annotated samples. Input Resolution: ideally at least 150 DPI. The minimum height for text to be detected is 15 pixels. At 150 DPI, this would be the same as 8 point font. Synchronous vs Asynchronous processing: Textract has 2 processing modes: synchronous and Asynchronous. Synchronous processing: ONLY single page documents supported (latency critical). PDF, TIFF, JPEG, and PNG formats supported accepts S3 file or streamed base 64-encoded image bytes has Built in integrations with A2I Asynchronous processing: supports single or mutli-page documents supports only PDF or TIFF formats expects data in S3, no streaming has NO Built in integrations with A2I. Integration must be a custom call Textract SNS notification: Textract Async call can notify an SNS topic upon job completion. It is important to note that the receiving SNS name MUST start with AmazonTextract*. Also note that to get Batch job status, monitor an SQS queue NOT using Textract GET operations as those can be throttled. Augmented AI (A2I) knowledge Amazon A2I is a fully managed service that makes it easier to incorporate human validation of ML predictions, removing the need to build human review systems or manage large numbers of human analysts. A2I brings humans into the machine learning automated loop; it allows human reviewers to step in when a model is unable to make a high confidence prediction or to audit its predictions. A2I has 3 main components: A Work Team/Workforce (the “who”): This is the entity that will validate the predictions. It can be 1 of 3 options: Amazon Mechanical Turk workforce, a vendor-managed workforce, or a private workforce. A Human Task UI/Template (the “what”): This defines what the reviewer will see in the A2I console when reviewing the task. A Workflow Definition (the “when”): This defines the configurations and conditions to trigger the human review. It includes the Workforce and the UI template, plus other configurations, such as how many reviewers should review a particular task and the maximum allowed review time is. A2I has built-in integration with Rekognition and SYNCHRONOUS Textract calls. Notes: A2I must have read permission for SageMaker to the S3 bucket that the images is stored in. The S3 bucket MUST also have CORS enabled. Note that the throughout for MTURK and a private workforce are 100 and 5000 in flight messages at a time PER workflow definition. One way to work around this is to create more than one workflow definition and distribute load across them.

Mar 12, 2025 - 15:51
 0
Processing Invoices with AWS Textract

As more and more companies digitize, a need for Intelligent Document Processing (IDP) has emerged. Transactions and payments are a part of almost every business. Here we present a minimal IDP processing system using Textract and A2I.

IDP HLD

Amazon Textract is a managed AWS service that specializes in text detection and analysis. It has several specialized APIs for different text tasks including, among others:

  • Detecting Text
  • Analyzing Documents
  • Analyzing Expense/Invoice
  • Analyzing IDs
  • Analyzing Lending

Textract Features:

  • Supported text format: detects typed and handwritten text

  • Supported Modes: synchronous calls and asynchronous calls.

  • Supported Languages: English, Spanish, German, Italian, French, and Portuguese.

  • Supported File Formats: PDF, TIFF, JPEG, and PNG for synchronous. Only PDF and TIFF for Asynchronous.

  • Fine Tuning & Customization: Textract can be fine tuned to fit specific needs by adding adapters. Adapters are components that plug into the Amazon Textract pre-trained deep learning model, customizing its output for your business specific documents. You create an adapter for your specific use case by annotating/labeling your sample documents and training the adapter on the annotated samples.

  • Input Resolution: ideally at least 150 DPI. The minimum height for text to be detected is 15 pixels. At 150 DPI, this would be the same as 8 point font.

Synchronous vs Asynchronous processing:

Textract has 2 processing modes: synchronous and Asynchronous.

Synchronous processing:

  • ONLY single page documents supported (latency critical).

  • PDF, TIFF, JPEG, and PNG formats supported

  • accepts S3 file or streamed base 64-encoded image bytes

  • has Built in integrations with A2I

Asynchronous processing:

  • supports single or mutli-page documents

  • supports only PDF or TIFF formats

  • expects data in S3, no streaming

  • has NO Built in integrations with A2I. Integration must be a custom call

Textract SNS notification:

Textract Async call can notify an SNS topic upon job completion. It is important to note that the receiving SNS name MUST start with AmazonTextract*.

Also note that to get Batch job status, monitor an SQS queue NOT using Textract GET operations as those can be throttled.

Augmented AI (A2I) knowledge

Amazon A2I is a fully managed service that makes it easier to incorporate human validation of ML predictions, removing the need to build human review systems or manage large numbers of human analysts. A2I brings humans into the machine learning automated loop; it allows human reviewers to step in when a model is unable to make a high confidence prediction or to audit its predictions.

A2I has 3 main components:

  • A Work Team/Workforce (the “who”): This is the entity that will validate the predictions. It can be 1 of 3 options: Amazon Mechanical Turk workforce, a vendor-managed workforce, or a private workforce.

  • A Human Task UI/Template (the “what”): This defines what the reviewer will see in the A2I console when reviewing the task.

  • A Workflow Definition (the “when”): This defines the configurations and conditions to trigger the human review. It includes the Workforce and the UI template, plus other configurations, such as how many reviewers should review a particular task and the maximum allowed review time is.

A2I has built-in integration with Rekognition and SYNCHRONOUS Textract calls.

Notes:

A2I must have read permission for SageMaker to the S3 bucket that the images is stored in. The S3 bucket MUST also have CORS enabled.

Note that the throughout for MTURK and a private workforce are 100 and 5000 in flight messages at a time PER workflow definition. One way to work around this is to create more than one workflow definition and distribute load across them.