Role of OCR
Role of OpenCV
How to verify image content
Installing Tesseract on Windows (Very Easy)
OCR (Optical Character Recognition) is a technology that converts images containing text—such as scanned documents, photographs of signs, or printed pages—into machine-readable text data. This allows for the digitization of printed or handwritten information, making it possible to electronically search, edit, or process the captured text. Common applications of OCR include document digitization, automated data entry, and extracting information from invoices or receipts.
OpenCV (Open Source Computer Vision Library) is a powerful, open-source set of tools and algorithms focused on real-time computer vision and image processing. Its primary purpose is to manipulate images and videos through operations like filtering, resizing, object detection, and feature extraction. In the context of OCR, OpenCV is often used for preprocessing images—such as noise removal, contrast enhancement, and perspective correction—to improve the accuracy of text recognition engines like Tesseract
Key Purposes
OCR: Turns images containing text into editable and searchable text by recognizing characters and words in various languages and fonts. This is crucial for automating data entry, making physical documents machine-accessible, and supporting technologies used by the visually impaired.
OpenCV: Provides a toolkit for handling and transforming images or video frames. It is widely used in applications like object detection, feature matching, face recognition, and preprocessing steps for OCR. OpenCV can improve OCR results by preparing images (e.g., cleaning up noise or correcting orientation) before text extraction.
How to verify image content
requirements.txt
flask
flask-cors
opencv-python
pytesseract
Pillow
pip install -r requirements.txt
config.py
import os
class Config:
SQLALCHEMY_DATABASE_URI = 'postgresql://postgres:root@localhost:5433/flask_database'
SQLALCHEMY_TRACK_MODIFICATIONS = False
SECRET_KEY = "dev-key"
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
UPLOAD_FOLDER = os.path.join(BASE_DIR, "uploads")
MAX_CONTENT_LENGTH = 10 * 1024 * 1024 # 10 MB
# ✅ Add this line for Windows OCR
TESSERACT_CMD = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
3) app.py
from flask import Flask
from flask_cors import CORS
from config import Config
from views import register_blueprints
import os
def create_app():
app = Flask(__name__)
app.config.from_object(Config)
# create uploads dir if missing
os.makedirs(app.config["UPLOAD_FOLDER"], exist_ok=True)
CORS(app, resources={r"/*": {"origins": "*"}})
register_blueprints(app)
@app.get("/")
def home():
return "Flask OK. Open /image/hello"
return app
if __name__ == "__main__":
app = create_app()
app.run(debug=True)
4) views/init.py
from .image_views import image_bp
def register_blueprints(app):
app.register_blueprint(image_bp, url_prefix="/image")
5) views/image_views.py
import os
import cv2
import pytesseract
from flask import Blueprint, jsonify, request, render_template, current_app
from werkzeug.utils import secure_filename
image_bp = Blueprint("image_bp", __name__)
ALLOWED_EXT = {"png", "jpg", "jpeg", "webp", "bmp", "tif", "tiff"}
def allowed(filename: str) -> bool:
return "." in filename and filename.rsplit(".", 1)[1].lower() in ALLOWED_EXT
def preprocess_for_ocr(path: str):
"""Basic CV pipeline: grayscale -> denoise -> threshold -> (optional) morphology."""
img = cv2.imread(path)
if img is None:
return None
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# slight denoise
gray = cv2.bilateralFilter(gray, 9, 75, 75)
# adaptive threshold helps with variable lighting
th = cv2.adaptiveThreshold(gray, 255,
cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY, 31, 9)
return th
def run_ocr(image_mat):
# you can tune config; good defaults for ID/card-like docs:
config = "--oem 3 --psm 6"
text = pytesseract.image_to_string(image_mat, config=config)
# normalize whitespace
return "\n".join([line.strip() for line in text.splitlines() if line.strip()])
def verify_text_rules(txt: str, doc_type: str):
"""Very simple rule checks per document type. Expand as needed."""
t = txt.lower()
result = {"doc_type": doc_type, "checks": []}
def add(name, ok):
result["checks"].append({"name": name, "ok": bool(ok)})
if doc_type == "insurance":
add("mentions 'insurance'", "insurance" in t)
add("has policy/policy no", ("policy" in t) or ("policy no" in t) or ("policy number" in t))
add("has date", any(k in t for k in ["date", "valid", "expiry", "exp", "validity"]))
elif doc_type == "pollution":
add("mentions 'pollution' or 'PUC'", ("pollution" in t) or ("puc" in t))
add("has 'certificate'", "certificate" in t)
add("has validity/date", any(k in t for k in ["valid", "expiry", "exp", "date"]))
elif doc_type == "rc":
add("mentions 'registration' or 'RC'", ("registration" in t) or ("rc" in t))
add("has 'vehicle'/'owner'", ("vehicle" in t) or ("owner" in t))
add("has number/id", any(k in t for k in ["no", "number", "regn", "id"]))
else:
# generic
add("text present", len(t) > 10)
result["ok"] = all(c["ok"] for c in result["checks"]) and len(result["checks"]) > 0
return result
# ---------- Routes ----------
@image_bp.get("/hello")
def hello_flask():
return render_template("index.html", message="Upload a document to verify")
@image_bp.post("/verify")
def verify_image():
# Accept both HTML form and raw API
doc_type = request.form.get("doc_type") or request.args.get("doc_type") or "insurance"
f = None
# 1) multipart form upload
if "image" in request.files:
f = request.files["image"]
# 2) raw JSON base64 (optional future)
# elif request.is_json:
# ...
if not f or f.filename == "":
err = "No file uploaded. Field name must be 'image'."
if request.accept_mimetypes.accept_html:
return render_template("index.html", message=err, extracted_text="", result=None), 400
return jsonify({"ok": False, "error": err}), 400
if not allowed(f.filename):
err = "Unsupported file type."
if request.accept_mimetypes.accept_html:
return render_template("index.html", message=err, extracted_text="", result=None), 400
return jsonify({"ok": False, "error": err}), 400
filename = secure_filename(f.filename)
dest = os.path.join(current_app.config["UPLOAD_FOLDER"], filename)
f.save(dest)
# Optional: on Windows set Tesseract binary if configured
tcmd = getattr(current_app.config, "TESSERACT_CMD", None) or current_app.config.get("TESSERACT_CMD")
if tcmd:
pytesseract.pytesseract.tesseract_cmd = tcmd
mat = preprocess_for_ocr(dest)
if mat is None:
err = "Failed to read image."
if request.accept_mimetypes.accept_html:
return render_template("index.html", message=err, extracted_text="", result=None), 400
return jsonify({"ok": False, "error": err}), 400
text = run_ocr(mat)
result = verify_text_rules(text, doc_type)
# If the request is HTML form, render page with results
if request.accept_mimetypes.accept_html:
return render_template("index.html",
message="Verification complete",
extracted_text=text,
result=result)
# Else, return JSON for API clients (e.g., Laravel)
return jsonify({
"ok": True,
"filename": filename,
"doc_type": doc_type,
"extracted_text": text,
"result": result
})
templates/index.html
<!doctype html>
<html>
<head>
<meta charset="utf-8" />
<title>Document Verification</title>
<style>
body { font-family: system-ui, -apple-system, Segoe UI, Roboto, Arial, sans-serif; margin: 40px; }
.card { max-width: 720px; padding: 20px; border: 1px solid #ddd; border-radius: 12px; }
h1 { margin-top: 0; }
label { display:block; margin: 12px 0 6px; font-weight: 600; }
.row { display:flex; gap:12px; align-items:center; }
.btn { padding: 10px 16px; border: 0; border-radius: 8px; cursor: pointer; }
.btn-primary { background: #0ea5e9; color: #fff; }
pre { white-space: pre-wrap; background: #fafafa; padding: 12px; border-radius: 8px; border: 1px solid #eee; }
.checks { margin-top: 8px; }
.ok { color: #16a34a; }
.bad { color: #dc2626; }
</style>
</head>
<body>
<div class="card">
<h1>Verify Document</h1>
{% if message %}<p><strong>{{ message }}</strong></p>{% endif %}
<form action="/image/verify" method="post" enctype="multipart/form-data" accept="text/html">
<label for="doc_type">Document Type</label>
<select id="doc_type" name="doc_type">
<option value="insurance">Insurance</option>
<option value="pollution">Pollution Certificate</option>
<option value="rc">RC Document</option>
</select>
<label for="image">Upload Image</label>
<input id="image" type="file" name="image" required />
<div class="row" style="margin-top: 16px;">
<button class="btn btn-primary" type="submit">Verify</button>
</div>
</form>
{% if extracted_text is defined %}
<h3>Extracted Text</h3>
<pre>{{ extracted_text }}</pre>
{% endif %}
{% if result is defined %}
<h3>Checks ({{ result.doc_type | capitalize }})</h3>
<div class="checks">
{% for c in result.checks %}
<div>
<strong>{{ c.name }}:</strong>
<span class="{{ 'ok' if c.ok else 'bad' }}">{{ 'OK' if c.ok else 'Missing' }}</span>
</div>
{% endfor %}
<p><strong>Overall:</strong> <span class="{{ 'ok' if result.ok else 'bad' }}">{{ 'PASS' if result.ok else 'FAIL' }}</span></p>
</div>
{% endif %}
</div>
</body>
</html>
Run
# in your venv (myenv)
pip install -r requirements.txt
# Windows: install Tesseract (see below), then:
python app.py
# Visit: http://127.0.0.1:5000/image/hello
Installing Tesseract on Windows (Very Easy)
Download installer:
https://github.com/UB-Mannheim/tesseract/wiki
Install (Next → Next → Finish)
Note down install path:
Step 1: Open Environment Variables
Press Windows Key
Search “Environment Variables”
Open Edit the system environment variables
Click Environment Variables
Step 2: Edit the PATH
Under System variables, select Path
Click Edit
Click New
Paste this:
C:\Program Files\Tesseract-OCR\
Step 3: Save & Close (OK → OK → OK)
Step 4: Confirm success
Open PowerShell and run:
tesseract --version
Top comments (0)