
Docling: the easy way to work with files and LLMs
Working with files and language models can easily turn into chaos. Different formats, the need to process both text and images, and integrating with models often make workflows inefficient. That’s where Docling comes in—an IBM library that brings a solid, unified way to handle files and LLMs.
What is Docling?
Docling is an open-source library from IBM Research designed to make the interaction between documents and language models much simpler. Its main goal is to streamline file processing, letting developers integrate analysis, extraction, and data handling features powered by AI models directly into their projects. One of its key ideas is to provide a common document representation format, so you don’t need to juggle different data structures depending on the file type.
Supported file types
Docling works with a wide range of files, making it very flexible. Among the most common:
- PDF: with OCR, text, and image extraction. (Tip: if you don’t need OCR, turn it off—it’s the most resource-intensive part.)
- Images (JPG, PNG)
- Text files (TXT)
- Word documents (DOCX)
- Spreadsheets (XLSX): better stick to plain values, without formulas or macros.
- Presentations (PPTX): includes table support
- Markdown (MD)
This wide coverage makes Docling especially useful for document analysis, data migration, or automating workflows.
Strengths
One of Docling’s biggest strengths is how well it handles vision tasks. It doesn’t just process text—it also understands images inside documents, bringing a strong multimodal approach. With Hugging Face vision models under the hood, it can pull insights from charts, diagrams, and tables, integrating visuals seamlessly into the same flow as text analysis.
It’s also very flexible: Docling plays nicely with existing AI pipelines and works well alongside tools like LangChain, langGraph, and Crew AI. That makes it a natural fit for RAG (Retrieval-Augmented Generation) setups or for agents that need to deal with structured knowledge.
Weaknesses
The same thing that makes Docling powerful can also be a drawback. Since it relies on Hugging Face models, you’ll need to download them locally. That means your environment must have enough storage and compute resources—something that could be tricky for resource-limited setups or cloud deployments without strong infrastructure.
And, as a relatively new project, its documentation and community are still growing. Expect a bit of a learning curve and possibly some time spent digging through code to figure things out.
MCP support
Docling also supports the Model Context Protocol (MCP), an effort to standardize how models interact with different sources and contexts. The idea is to create a common framework where models and systems can interoperate smoothly. While MCP is still in its early stages, Docling’s alignment with it positions the library as a forward-looking tool for enterprise AI.
Processing a document with Docling and sending it to an LLM
Here’s a quick and practical example of extracting text from a PDF using Docling and then sending it to a language model for automatic summarization.
Requirements
pip install docling openai
Python example
import os
from openai import OpenAI
from docling.parsers.pdf import PdfParser
parser = PdfParser()
doc = parser.parse("example.pdf")
text = doc.text
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
res = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are an assistant that summarizes documents in Spanish."},
{"role": "user", "content": "Summarize the following text in 5 lines:"},
{"role": "user", "content": text[:6000]}
],
temperature=0.2
)
print(res.choices[0].message.content)
Quick notes
- Replace
example.pdf
with your actual file path. - Make sure you have the HuggingFace and, in this case, OpenAI keys defined in your environment.
Conclusion
Docling is a promising library for anyone dealing with complex documents in AI workflows. Its multimodal capabilities, seamless integration with LLMs, and alignment with emerging standards make it a powerful tool. On the flip side, infrastructure requirements and its early stage mean you might face some hurdles in production.
Overall, Docling is an exciting move from IBM that points toward more unified and efficient ways of working with files and LLMs—and it’s likely we’ll see it gain more traction as its community and ecosystem continue to grow.