Watch Sorcero Integrations Engineer Juan Pablo Ugarte demo DocMeta, an editor for the pipelines used to ingest unstructured text and other content from PDF files into the Sorcero Language Intelligence platform. With the interactive DocMeta tool, developers select areas in a PDF to which they want to apply Language Intelligence capabilities. Then with a click of a button, an ingestion pipeline is created.
DocMeta solves an important challenge with PDF ingestion—the coherent extraction of passages of text from different layouts—through the simple construction of custom pipelines. It is as easy as mousing over the target text or images of interest and clicking "Export."
"In the [Sorcero] ingestion team, we take care of important documents from different source types, process them, and generate a standard output that the AI system can easily understand," said Ugarte.
DocMeta is based on Mozilla PDF viewer and integrates seamlessly with the Sorcero platform's utility sets. It's a web service implemented in Python using the FastAPI framework.
Contact Sorcero to learn more about what Language Intelligence can do to give your development teams the tools to empower experts at your enterprise.