DocLang aims to make documents readable by AI, not humans – cio.com

DocLang aims to make documents readable by AI, not humans - cio.com https://indiaprimetv.com/uncategorized-en/doclang-aims-to-make-documents-readable-by-ai-not-humans-cio-com/

AIs struggle to understand documents designed for humans; the DocLang working group seeks to flip that imbalance with its specification for machine-readable business documents “built from the ground up for LLM tokenizers.”
The working group, founded by IBM, Nvidia, and Red Hat and hosted by the Linux Foundation’s LF AI & Data project, aims to create an open, universal, AI-native document format designed to improve how enterprises prepare, exchange, and govern document data for AI systems. ABBYY and Human Signal will also be involved in its development, and other contributors are welcome.
“Enterprises today work across a fragmented landscape of document formats, including PDFs, JPEGs, and other file types built primarily for human consumption rather than AI interpretation,” the group said in its launch announcement.
“This disconnect can introduce complexity, raise costs, and reduce reliability when extracting meaning from business documents,” as organizations increasingly rely on generative AI and agentic systems, it said.
Mark Collier, executive director of LF AI & Data, said the goal of the DocLang Specification Working Group is to “develop a vendor-neutral, interoperable standard that helps organizations prepare document data for AI more reliably, transparently, and at scale.”
DocLang defines a structured, machine-readable format for documents of any type, like JSON for data, that any tool can implement and any pipeline can consume. It builds on DocLing, a document processing toolkit hosted by LF AI & Data that can transform human-readable PDFs, word processor documents or spreadsheets into structured data.
Something like DocLang is needed, said independent technology analyst Carmi Levy. “Existing document standards have done an admirable job allowing global stakeholders to confidently collaborate for decades, but it’s becoming increasingly clear that they are in desperate need of an update as AI reshapes the rules around how work gets done,” he explained.
Largely static document types, he said, “can be somewhat limiting when AI is redefining the very word, ‘document.’ In many ways. AI-age documents are far more iterative and dynamic than what they once were, and the definitions need to evolve with the times. The documents we currently live with simply weren’t designed for the AI age.”
Within that context, Levy said, “DocLang represents an early, best hope of achieving some kind of foundational baseline for document standards, one that will hopefully allow more intelligent, more efficient, lower-risk workflows than is currently the case.”
Taking an open-source, vendor-agnostic approach to the process ensures the collective will take precedence over the needs of specific vendors, he said, adding, “earlier standards-setting efforts around networking, documentation, the web, and the cloud powered the free-flowing digital landscape that defines modern life.”
An AI-centric documentation standard will carry that reality into the next generation of technology, said Levy.
The entire concept of LLMs, Jason Andersen, principal analyst at Moor Insights & Strategy said, “involves using natural human languages. The computer is supposed to understand us without us changing our syntax or language. Forcing a syntax on users is exactly what we have today with SEO and more advanced programming languages.”
With something like DocLang, where the standard can be applied to content ingestion, he said, “I would be OK with that being automated, which seems to be the intent. The use case I envision is that when I upload a document to an agent, a skill can be run to preprocess the document into the DocLang standard format, saving tokens.”
That makes sense, he said, adding that he thinks it’s good “if it can help generate outputs, like a visualization, that can be shared outside an AI tool. On that front, that is also why I am liking Web MCP, since you are just adding some code to the page, like CSS or JavaScript, and the consumer, in this case, an AI browser or skill, is better equipped to handle the site.”
The point, he said, is, “these standards need to preserve the fact that humans can still do what they want, and do not need to know any coding to be proficient. In terms of governance, I am not sure if it matters.”
But one analyst did foresee governance problems arising from DocLang’s use.
Yaz Palanichamy, senior research analyst at Info-Tech Research Group, said DocLang adoption will require organizations to implement and review controls in order to scale its use accountably and securely.

Paul Barker is a freelance journalist whose work has appeared in a number of technology magazines and online, including IT World Canada, Channel Daily News, and Financial Post. He covers topics ranging from cybersecurity issues and the evolving world of edge computing to information management and artificial intelligence advances.
Paul was the founding editor of Dot Commerce Magazine, and held editorial leadership positions at Computing Canada and ComputerData Magazine. He earned a B.A. in Journalism from Ryerson University.
Sponsored Links

source

Leave a Reply

Your email address will not be published. Required fields are marked *

Paramount-Warner Bros. Deal Cleared in Australia, New Zealand - The Hollywood Reporter https://indiaprimetv.com/uncategorized-en/doclang-aims-to-make-documents-readable-by-ai-not-humans-cio-com/
Latest Updates

Paramount-Warner Bros. Deal Cleared in Australia, New Zealand – The Hollywood Reporter

    Paramount-Warner Bros. Deal Cleared in Australia, New Zealand  The Hollywood Reportersource

    Read More
    University of Memphis launching a new degree concentration in AI - The Commercial Appeal https://indiaprimetv.com/uncategorized-en/doclang-aims-to-make-documents-readable-by-ai-not-humans-cio-com/
    Latest Updates

    University of Memphis launching a new degree concentration in AI – The Commercial Appeal

      The University of Memphis will now have a new concentration focused on artificial intelligence in its Polytechnic@UofM unit.The Polytechnic@UofM Initiative, a subunit of Herff College of Engineering, was launched in July 2025. Polytechic@UofM focuses on “flexible, workforce-aligned degree pathways,” according to its website.The new concentration will be in applied artificial intelligence and be a part of […]

      Read More
      Jacksonville's Suddath sees brighter future in military moving after controversial contract cancellation - The Business Journals https://indiaprimetv.com/uncategorized-en/doclang-aims-to-make-documents-readable-by-ai-not-humans-cio-com/
      Latest Updates

      Jacksonville's Suddath sees brighter future in military moving after controversial contract cancellation – The Business Journals

        Jacksonville’s Suddath sees brighter future in military moving after controversial contract cancellation  The Business Journalssource

        Read More