@Yehuda It is truly amazing on iPhone and Mac, you can directly select text from ANY image, even handwritten, like this. No external software needed, it is built in.
@Yehuda It is truly amazing on iPhone and Mac, you can directly select text from ANY image, even handwritten, like this. No external software needed, it is built in.
Früher dachte ich, man bräuchte #OCR, um ältere Bücher in in die so leicht durchstöberbare digitale Welt rüberzuretten.
Was ich heute mit OCR mache: Ich drucke E-Mails in PDF-Dateien aus, in denen die benötigten Infos als etliche Screenshots/Pixelgrafiken eingebettet sind, und lasse dann #OCRmyPDF drüber laufen, um darin suchen zu können.
#Digitalisierung, so ein großes, mitunter wahlkämpferisches Wort, um damit Leute zu überzeugen, die nicht mal E-Mail sachdienlich benutzen können.
➤ 專為機器學習訓練而生的多功能 OCR 管道
✤ https://github.com/ses4255/Versatile-OCR-Program
這個GitHub專案「Versatile-OCR-Program」提供一個針對機器學習訓練優化的多模式 OCR 管道。它能夠處理包含文字、圖形、數學公式、表格和圖表的複雜教育材料,並提供結構化輸出,例如JSON或Markdown格式,方便模型訓練。系統支援多種語言(日文、韓文、英文)並具有高準確率,尤其適用於學術資料集。
+ 這個專案對於需要處理大量學術論文或考試題目的研究人員來說,簡直是救星!它能自動提取並結構化資料,省去了大量人工處理的時間。
+ 這套系統不僅支援多種語言,而且對於數學公式和圖表的處理能力也相當出色,讓我對它在教育領域的應用充滿期待。
#機器學習 #OCR #開源專案
https://github.com/ses4255/Versatile-OCR-Program #Technology #Innovation #HackerNews #ngated
OCR pipeline for ML training (tables, diagrams, math, multilingual)
https://github.com/getomni-ai/benchmark/blob/main/README.md #openSource #GitHub #techTrends #AI #models #criticism #HackerNews #ngated
Qwen-2.5-32B is now the best open source OCR model
https://github.com/getomni-ai/benchmark/blob/main/README.md
#HackerNews #Qwen-2.5-32B #OpenSource #OCR #BestModel #AI #Technology
➤ 評估大型語言模型 OCR 能力的開源工具
✤ https://github.com/getomni-ai/benchmark/blob/main/README.md
本文件描述了 Omni OCR 基準測試工具,用於評估不同大型多模態模型(如 gpt-4o)的 OCR 和資料提取能力。該基準測試比較了傳統 OCR 供應商和語言模型的 OCR 準確性,並提供開源的評估資料集和方法。主要評估指標為 JSON 準確性和文字相似度,並提供運行基準測試的詳細步驟和支援的模型清單,包含閉源和開源 LLM 以及雲端 OCR 供應商。使用者可以透過設定模型參數和 API 金鑰來運行測試,並查看結果。
+ 這個基準測試對研究 OCR 和 LLM 在資料提取方面的應用非常有幫助,能更客觀地比較不同模型的優劣。
+ 開源的評估方式讓人很放心,可以根據自己的需求擴展測試範圍,參考價值很高。
#人工智慧 #OCR #基準測試 #大型語言模型
Cette semaine sur Oxytude, l'Hebdoxytude 401, l'actu des nouvelles technologies et de l'accessibilité.
Microsoft Foto di Windows si rinnova: tante funzioni AI in arrivo
#Aggiornamenti #AI #App #Copilot #EsploraFile #Foto #GommaMagica #IntelligenzaArtificiale #JXL #MicrosoftFoto #Notizie #Novità #OCR #Software #TechNews #Tecnologia #Windows10 #Windows11 #WindowsInsider
https://www.ceotech.it/microsoft-foto-di-windows-si-rinnova-tante-funzioni-ai-in-arrivo/
Microsoft Fotos: OCR-Funktion und mehr KI-Integration
Microsoft verteilt eine aktualisierte Fotos-App an Windows Insider. Sie kann Texte mittels OCR extrahieren und integriert mehr KI.
So if you’re using Mastodon on the web, you can press the
On Mac/iOS, you can select text on images as if they were text by clicking/tapping and dragging and paste that in (might be more accurate; that’s what I did).
PS. This was meant to be a reply to https://mastodon.social/@fatbrit/114215995914155838 but somehow didn’t get threaded correctly (was using the web client instead of Mona. I somehow manage to do that there sometimes. Has happened before.) :)
Trying some modern OCR tools recently: marker - https://github.com/VikParuchuri/marker and Mistral OCR - https://mistral.ai/news/mistral-ocr
I last looked at this in 2015, when James Bond entered the public domain: https://www.hotelexistence.ca/james-bond-enters-public-domain-in-canada-for-now/ . With 2015 tools, the OCRed output of Bond books was poor.
The new generation is better, but still requires human review.
I admire the quality of work done by Project Gutenberg in their creation of digital editions of books in the public domain.
Am Wochenende ist das Business-Handy mein treuer Begleiter: "Eventberichterstattung" vom Camp Canis Saisonauftakt. Ich bin sozusagen wie immer hautnah dabei. Aus dem Homeoffice heraus. :D
And ultra compact #AI model for document #OCR by #IBM and #HuggingFace
2/2 re #OCR
All three were set to #rotate and #deskew. None rotated the page that was sideways, but they all #deskewed pages that needed it. Kofax was the speediest of the bunch, then #OCRmyPDF not far behind and #Foxit was by far the slowest.
File size Foxit produced the smallest file size, #Kofax created files double the original. OCRmyPDF struggled here, ballooning the original size by at least 6 times larger.
1/2 re OCR
I got to do a fun test at work with #OCR. #Foxit Phantom with AbbyFineReader from 2013, #Kofax Power PDF from 2020 and #OCRmyPDF via #WSL with #Tesseract as the OCR engine.
The best results in OCR were from OCRmyPDF great results. Second was Kofax lagging was the over 10-year-old Foxit. OCRmyPDF did perform great and just picked up a few more characters, especially fuzzy scanned text, plus it got some handwritten text.
Why extracting data from PDFs is still a nightmare for data experts https://arstechnica.com/ai/2025/03/why-extracting-data-from-pdfs-is-still-a-nightmare-for-data-experts/ #AI #OCR #LLMs
#erara hat zum 15jährigen Jubiläum ein paar neue Features bekommen: #NamedEntityRecognition #NamedEntityLinking und verbesserte Volltexterkennung.
В #PowerToys добавили ИИшницу для #OCR и и преобразования текста. Радостно! (#НаСамомДелеНет).