No description

Python 98.7%
Dockerfile 1.3%

Find a file

samsy 592b07045f All checks were successful Build & Push Docker Image / build (push) Successful in 11s Details add env to compose		2026-03-20 09:39:14 +01:00
.forgejo/workflows	new build.yml part 4	2026-03-18 12:41:52 +01:00
dockerbuild	add env to compose	2026-03-20 09:39:14 +01:00
import	Initial clean commit	2025-10-17 09:09:53 +02:00
ocr_output	refactor: api.py ins image, scripts nur noch per volume	2026-03-18 11:56:06 +01:00
python_scripts	refactor: api.py ins image, scripts nur noch per volume	2026-03-18 11:56:06 +01:00
.gitignore	Initial clean commit	2025-10-17 09:09:53 +02:00
barcode_divider.pdf	Initial clean commit	2025-10-17 09:09:53 +02:00
docker-compose.yml	add env to compose	2026-03-20 09:39:14 +01:00
LICENSE	Initial clean commit	2025-10-17 09:09:53 +02:00
README.md	change api.py for dynamic scripts	2026-03-18 14:14:34 +01:00

README.md

scantool

Alpha-Stage Build for a tool to set a bunch of duplex scanned png or jpg (with divider) into ocr-ready pdf documents.

Setup

Clone this repo to your system:

git clone https://codeberg.org/samsy/scantool.git

Start docker-compose build:

docker compose up -d

Put some scanned pictures in your ./import dir. (the script needs a divider-picture scanned between documents, see barcode_divider.pdf)
Start converting with the integrated fast-API:

curl -X POST http://localhost:51822/scan -d '{"input_folder":"/data/import","output_folder":"/data/ocr_output"}' -H "Content-Type: application/json"

*localhost could also be internal IP or the dockername (ocr-python)

Hinweis:

/scripts Endpoint – listet alle verfügbaren Scripts auf, praktisch um zu sehen was gemountet ist Script-Parameter – ?script=ocr_split_pdf_only.py, Default bleibt ocr_split.py Sicherheitscheck – verhindert Path Traversal (z.B. ?script=../../etc/passwd) Dateiendungen – erkennt jetzt auch .pdf als Input, nicht nur Bilder Output-Pfad – nutzt os.path.splitext statt hartkodiertem .replace(".png", ".pdf") Alternative: Start converting directly:

# Without checking blank_pages:
docker compose exec ocr-python python ocr_split_no_blank.py /data/import /data/ocr_output
# Just do everything:
docker compose exec ocr-python python ocr_split.py /data/import /data/ocr_output
# Just remove blank_pages (maybe change from png to jpg) but no pdf or ocr addings
docker compose exec ocr-python python remove_blanks.py /data/import /data/ocr_output

Finish (output is in german, feel free to translate it)

README.md Unescape Escape

scantool

Setup

README.md