Microsoft recently announced the public preview of built-in actions for document parsing and chunking in Logic Apps Standard. These actions are designed to streamline Search Augmentation Generation (RAG) based ingestion for Generative AI applications. With these actions, the company is further investing in the artificial intelligence capabilities of its low-code offerings.
According to the company, these out-of-the-box operations make it easy for developers to ingest documents and files containing both structured and unstructured data into AI Search without writing or managing code. The new data manipulation actions “Parse Document” and “Chunk Text” convert content in formats such as PDF, CSV, Excel, etc. into tokenized strings and split them into manageable chunks based on the number of tokens. This feature is suitable for compatibility with Azure AI Search and Azure OpenAI, which require tokenized input and have token limits.
Divya Swarnkar, program manager at Microsoft, wrote:
These actions are built on the Apache Tika toolkit and parser library to parse thousands of file types in multiple languages, including PDF, DOCX, PPT, HTML, etc. Seamlessly read and parse documents from almost any source without any custom logic or configuration.
(Source: Tech Community blog post)
Wessel Beulink, cloud architect at Rubicon, concluded in a blog post about the new Actions:
The document parsing and chunking capabilities of Azure Logic Apps open up many automation possibilities. From legal workflows to customer support, these capabilities allow businesses to leverage AI for more innovative document processing. By leveraging low-code RAG ingestion, organizations can simplify the integration of AI models for smoother data ingestion, enhanced search capabilities, and more efficient knowledge management.
The blog post touches on various use cases, such as integrating analytics into AI workflows to streamline document processing, enabling AI-powered chatbots to ingest and retrieve relevant information for customer support, and breaking data into manageable pieces to improve knowledge management and searchability.
Additionally, Logic Apps provides out-of-the-box templates for RAG ingestion that easily connect to familiar data sources such as SharePoint, Azure File, SFTP, Azure Blob Storage, etc. These templates save developers time and allow them to customize workflows to suit their needs.
Kamaljeet Kharbanda, a master’s student in data science, wrote in a blog post on Medium that RAG transforms enterprise data processing by combining a deep knowledge base with the powerful analytical capabilities of large language models (LLMs). This synergy enables advanced interpretation of complex datasets, which is essential for competitive advantage in today’s digital ecosystem.
Low-code/no-code platforms such as Azure AI Studio, Amazon Bedrock, Vertex AI, Logic Apps, etc. make advanced AI capabilities available. In addition to these cloud solutions, tools such as LangChain and Llama Index provide a robust environment for implementing customized AI functions in a code-heavy manner.