No description
  • Python 97.6%
  • Dockerfile 2.4%
Find a file
martin 78269154a4 prepend_vorlage.py
Legt vor jede Bilddatei im Input-Ordner die Datei vorlage.png und erstellt
daraus eine zweiseitige PDF: vorlage.png + <datei>.png -> <dateiname>-fertig.pdf
2026-03-12 11:26:10 +00:00
dockerbuild add watchdog 2025-10-20 20:09:22 +02:00
import Initial clean commit 2025-10-17 09:09:53 +02:00
ocr_output Initial clean commit 2025-10-17 09:09:53 +02:00
python_scripts prepend_vorlage.py 2026-03-12 11:26:10 +00:00
.gitignore Initial clean commit 2025-10-17 09:09:53 +02:00
barcode_divider.pdf Initial clean commit 2025-10-17 09:09:53 +02:00
docker-compose.yml removed hashtags 2025-10-17 11:22:32 +02:00
LICENSE Initial clean commit 2025-10-17 09:09:53 +02:00
README.md README.md aktualisiert 2025-10-17 11:31:42 +02:00

scantool

Alpha-Stage Build for a tool to set a bunch of duplex scanned png or jpg (with divider) into ocr-ready pdf documents.

Setup

  1. Clone this repo to your system:
git clone https://codeberg.org/samsy/scantool.git
  1. Start docker-compose build:
docker compose up -d
  1. Put some scanned pictures in your ./import dir. (the script needs a divider-picture scanned between documents, see barcode_divider.pdf)

  2. Start converting with the integrated fast-API:

curl -X POST http://localhost:51822/scan -d '{"input_folder":"/data/import","output_folder":"/data/ocr_output"}' -H "Content-Type: application/json"

*localhost could also be internal IP or the dockername (ocr-python)

Alternative: Start converting directly:

# Without checking blank_pages:
docker compose exec ocr-python python ocr_split_no_blank.py /data/import /data/ocr_output
# Just do everything:
docker compose exec ocr-python python ocr_split.py /data/import /data/ocr_output
# Just remove blank_pages (maybe change from png to jpg) but no pdf or ocr addings
docker compose exec ocr-python python remove_blanks.py /data/import /data/ocr_output
  1. Finish (output is in german, feel free to translate it)