雲知聲Unisound U1-OCR大模型發布!首個工業級文檔智能基礎大模型,開啟OCR 3.0時代

北京, 2026年2月28日 - (亚太商讯 via SeaPRwire.com) - 2月26日,就在剛剛,雲知聲正式推出「Unisound U1-OCR」 文檔智能基礎大模型。作為首個工業級文檔智能基座,該模型憑借 「性能 SOTA、可信可驗、開箱即用、高效部署、強適配」 五大核心優勢,打破傳統文檔處理邊界,樹立起行業新標杆。一、技術跨越:從OCR 2.0邁向3.0文檔智能(Document Intelligence)是指利用人工智能技術自動閱讀和理解文檔影像,並進行內容的讀取、理解、分類及關鍵信息提取。傳統視覺方案(OCR 1.0,以CRNN為代表)僅能識別文字,新一代多模態方案(OCR 2.0,以VLM為代表)具備初步版面理解能力。而 「Unisound U1-OCR」則正式開啟OCR 3.0時代--在理解版面的基礎上,進一步洞察文檔深層語義,實現自動分類與業務級信息抽取,完成了從「字符感知」到「文檔認知」的質的飛躍。二、實力領跑:多項權威評測穩居全球第一梯隊「Unisound U1-OCR」是一款達到國際頂尖水平(SOTA)的文檔智能理解模型,其核心優勢在於突破了傳統模型「只讀文字、不懂排版」的瓶頸,能夠像人類專家一樣「看懂」複雜文檔。為適應OCR 3.0時代對於文檔業務級結構化抽取的新要求,Unisound U1-OCR采用ViT+LLM架構,其中視覺編碼器部分采用NaViT架構,實現文檔分辨率動態處理,模型參數規模3B 量級,兼顧模型計算效率與文檔深層語義信息理解的能力要求。除此之外,模型還提出了多項創新舉措:首先,它擁有「先懂結構,再讀內容」的智慧。傳統模型往往按順序死板閱讀,而「Unisound U1-OCR」首創了「語義驅動+動態聚焦」策略。如同人類閱讀習慣,先梳理文檔目錄、標題的層級關系,再按需提取內容。模型能自動構建文檔的「語義地圖」,精准識別標題、圖表與正文的從屬關系,即使面對排版混亂的極端場景,也能條理清晰地提取信息。其次,它具備敏銳的「空間感知力」。通過強化空間對齊模塊,模型能充分利用文字在頁面上的位置信息,主動理解元素間的空間布局。結合動態分辨率技術,無論是密集表格還是圖文混排,它都能精准還原文檔結構,徹底解決了以往模型「張冠李戴」的空間盲區。此外,模型采用Multi-Token Prediction(MTP)技術--在預測當前Token時,同步考慮未來多個Token的概率分布,大幅提升長文檔邏輯連貫性。配合全任務強化學習策略,增強模型對版式結構的全局預見性並在推理階段將模型生成效率提升了80%以上。在訓練階段,采用多任務協同強化訓練方案,實現文檔結構還原、文檔分類與信息抽取的深度對齊。強化訓練策略圍繞「語義+坐標」雙目標優化,針對坐標回溯的IoU精度進行專項強化,有效遏制定位幻覺,確保輸出結果的物理可信度。通過多檔位分辨率擾動與Mask采樣策略,顯著提升了模型多場景文檔圖像的理解能力。憑借這些創新,Unisound U1-OCR在多項權威測試中均獲業界SOTA表現,真正實現了從「識別文字」到「理解文檔」的跨越。1.OmniDocBench V1.5評測SOTA在OmniDocBench V1.5評測中,Unisound U1-OCR以95.1分取得SOTA表現(如圖1),領先GLM-OCR,Deepseek-OCR2,Gemini-3-Pro,GPT-5.2等主流模型,實現了精度與泛化能力的雙重突破。圖1 Unisound U1-OCR在OmniDocBench V1.5的評測得分對比2.D4LA評測SOTA在D4LA評測中,F1分數達90.8,大幅領先 DocLayout-YOLO(87.3)、PP-StructureV3(86.0)。無需微調即可高精度解析學術論文、財務報表等11類高複雜度文檔。3.DocLayNet評測SOTA在DocLayNet評測中,F1分數95.9,超越MinerU 2.5、PP-StructureV3等模型。在表格識別、跨頁關聯、微小文本檢測等高難任務上優勢顯著,魯棒性極強。4.業務相關評測SOTA在內部業務測試中,其信息抽取與文書分類能力超越Gemini-2.5-Flash、Qwen-235B-VL等主流通用商業和開源模型。特別是在醫療入院記錄、出院小結等強業務場景中,領先優勢尤為明顯,Unisound U1-OCR以3B規模的參數獲得比更大規模通用 VLM 更好的評測性能。與較小尺寸的文檔解析任務模型相比,得益於模型多項創新舉措,在業務級信息抽取等深層語義信息理解的能力表現更好。三、面向真實場景:4大核心能力助推U1-OCR從「讀懂」邁向「執行」作為開啟OCR 3.0時代的文檔智能基礎大模型,除了在通用評測中斬獲多項SOTA,Unisound U1-OCR更立足工業級場景需求,打造了四大核心能力,實現從『讀懂』到『執行』的業務落地。1.可信可查:精准溯源,結果可驗模型獨創「坐標-文本-語義」融合架構,實現像素級精准定位與完整證據鏈構建。在完成信息抽取的同時,系統精准標示信息在文檔中的來源位置,使結果審核過程全透明、可追溯,從技術層面保障文檔處理結果的可信度,徹底解決傳統文檔處理「結果不可驗」的行業難題。例如,在企業審核場景中,審核人員無需大海撈針般翻閱原文,點擊抽取結果即可實時高亮定位原始位置。這種「人機協同」的閉環將審核耗時縮短至秒級,讓人工漏檢率降至最低,真正實現了「可信任的AI」。2.業務融合:開箱即用,Agent Ready通用OCR工具在專業領域存在局限——例如醫保結算單中「自付一」「自付二」與「個人自費」的邏輯關系,或合同中金額大小寫的校驗規則,都需要領域知識支撐。Unisound U1-OCR在基礎模型之上,融入了雲知聲在醫療、金融等領域的行業知識積累,模型可基於業務邏輯進行多字段關聯校驗。在內部業務測試中,面向50餘種常見業務文書的分類准確率超過99%。3.高效部署,安全可控模型深度支持私有化與離線部署,可在無外網環境下穩定運行,完美匹配政務、醫療、金融等高安全等級行業的數據隱私保護需求。同時,通過版面級並行解碼與多Token預測架構等優化措施,一份十多頁的文檔,整理處理可在數秒內完成,高效的文檔處理能力,讓工業級文檔智能能力觸手可及。4.超強適配,攻克複雜場景針對企業實際業務中遇到的非標准拍照、文檔彎折模糊、複雜花式排版、多語言混排等各類極端複雜文檔場景,Unisound U1-OCR仍能保持穩定、高精度的處理表現,徹底擺脫傳統技術對標准化文檔的依賴,真正適配企業真實業務的全場景需求。媒體聯系:june@intelligentjoy.com Copyright 2026 亞太商訊 via SeaPRwire.com. All rights reserved. www.acnnewswire.com

Unisound U1-OCR: The First Industrial-Grade Document Intelligence Foundation Model, Ushering in the OCR 3.0 Era

BEIJING, Feb 28, 2026 - (ACN Newswire via SeaPRwire.com) – Feb 26, Unisound has officially launched its Unisound U1-OCR, the world's first industrial-grade foundation model for document intelligence, a groundbreaking release that ushers in the OCR 3.0 era and sets a new industry standard with five core strengths: SOTA performance, verifiable results, out-of-the-box functionality, efficient deployment, and robust adaptability.Document intelligence leverages AI to automatically read, understand, classify digitized documents and extract key information. OCR 1.0 only enabled basic text recognition, while OCR 2.0 added preliminary layout understanding capabilities. U1-OCR takes a quantum leap to OCR 3.0, moving far beyond layout recognition to deliver deep semantic insight, automatic document classification and business-level information extraction—marking a transformative shift from "character perception" to "document cognition".As a SOTA-level document intelligence model, U1-OCR resolves the longstanding bottleneck of traditional models that "recognize text but fail to grasp layout", enabling it to interpret complex documents like human experts. It pioneers a "semantic-driven + dynamic focus" strategy, first mapping a document's hierarchical structure of headings and structural metadata before extracting content on demand, and builds a semantic map to identify the relationship between titles, charts and text—even in disorganized layouts. Its enhanced spatial alignment module leverages positional data to accurately restore document structure for dense tables and mixed text-image content, effectively mitigating spatial recognition errors. Equipped with Multi-Token Prediction technology and full-task reinforcement learning, it boosts reasoning efficiency by over 80%, ensuring logical coherence for long documents.Trained with multi-task collaborative reinforcement learning and optimized for both semantics and coordinates, U1-OCR suppresses spatial hallucinations for reliable outputs, and achieves SOTA results across major authoritative benchmarks: scoring 95.1 in OmniDocBench V1.5, outperforming leading models like GLM-OCR and Gemini-3-Pro; hitting an F1 score of 90.8 in D4LA and 95.9 in DocLayNet, excelling in table recognition and cross-page association; and outperforming models such as Gemini-2.5-Flash and Qwen-2.5-VL in internal business tests, with standout performance in medical document processing such as admission and discharge records.Figure: Comparison of Unisound U1-OCR Evaluation Scores on OmniDocBench V1.5Built for real-world industrial applications, U1-OCR features four key capabilities that bridge the gap between document understanding and business action. Its proprietary "coordinate-text-semantics" architecture enables pixel-level positioning and full evidence traceability, making audit processes transparent and efficient. Integrated with Unisound's industry expertise in healthcare and finance, it achieves over 99% classification accuracy for more than 50 common business documents, supporting cross-field logical verification with zero-shot capabilities. It supports private on-premise and offline deployment while delivering highly efficient document processing, meeting strict data privacy requirements for government, healthcare, and finance sectors while lowering hardware costs. Most notably, it delivers stable, high-precision performance in extreme scenarios—including non-standard photos, blurred documents, complex formatting and multilingual text—freeing businesses from reliance on standardized document formats.Validated in real-world use cases, U1-OCR enables visual traceability of extracted information, automatic classification of mixed documents, performing intelligent image purification for cluttered layouts, and accurate recognition of complex nested tables with full structural retention.The launch of U1-OCR marks AI's evolution from simple text recognition to business logic comprehension, a key step for Unisound toward AGI. By taking multimodal documents as a knowledge entry point, Unisound is empowering machines with autonomous reasoning and evidence traceability capabilities, driving AI from perceptual intelligence to cognitive intelligence—with the vision to build a general intelligent agent that reads, thinks and solves complex problems like humans, turning every document into a stepping stone to AGI.CONTACT: june@intelligentjoy.com Copyright 2026 ACN Newswire via SeaPRwire.com. All rights reserved. www.acnnewswire.com