No description
  • Python 98.7%
  • Dockerfile 1.3%
Find a file
samsy 592b07045f
All checks were successful
Build & Push Docker Image / build (push) Successful in 11s
add env to compose
2026-03-20 09:39:14 +01:00
.forgejo/workflows new build.yml part 4 2026-03-18 12:41:52 +01:00
dockerbuild add env to compose 2026-03-20 09:39:14 +01:00
import Initial clean commit 2025-10-17 09:09:53 +02:00
ocr_output refactor: api.py ins image, scripts nur noch per volume 2026-03-18 11:56:06 +01:00
python_scripts refactor: api.py ins image, scripts nur noch per volume 2026-03-18 11:56:06 +01:00
.gitignore Initial clean commit 2025-10-17 09:09:53 +02:00
barcode_divider.pdf Initial clean commit 2025-10-17 09:09:53 +02:00
docker-compose.yml add env to compose 2026-03-20 09:39:14 +01:00
LICENSE Initial clean commit 2025-10-17 09:09:53 +02:00
README.md change api.py for dynamic scripts 2026-03-18 14:14:34 +01:00

scantool

Alpha-Stage Build for a tool to set a bunch of duplex scanned png or jpg (with divider) into ocr-ready pdf documents.

Setup

  1. Clone this repo to your system:
git clone https://codeberg.org/samsy/scantool.git
  1. Start docker-compose build:
docker compose up -d
  1. Put some scanned pictures in your ./import dir. (the script needs a divider-picture scanned between documents, see barcode_divider.pdf)

  2. Start converting with the integrated fast-API:

curl -X POST http://localhost:51822/scan -d '{"input_folder":"/data/import","output_folder":"/data/ocr_output"}' -H "Content-Type: application/json"

*localhost could also be internal IP or the dockername (ocr-python)

Hinweis:

/scripts Endpoint listet alle verfügbaren Scripts auf, praktisch um zu sehen was gemountet ist Script-Parameter ?script=ocr_split_pdf_only.py, Default bleibt ocr_split.py Sicherheitscheck verhindert Path Traversal (z.B. ?script=../../etc/passwd) Dateiendungen erkennt jetzt auch .pdf als Input, nicht nur Bilder Output-Pfad nutzt os.path.splitext statt hartkodiertem .replace(".png", ".pdf") Alternative: Start converting directly:

# Without checking blank_pages:
docker compose exec ocr-python python ocr_split_no_blank.py /data/import /data/ocr_output
# Just do everything:
docker compose exec ocr-python python ocr_split.py /data/import /data/ocr_output
# Just remove blank_pages (maybe change from png to jpg) but no pdf or ocr addings
docker compose exec ocr-python python remove_blanks.py /data/import /data/ocr_output
  1. Finish (output is in german, feel free to translate it)