Back to Projects
Production API · AI-Powered Document Engine

FICA Document Processing API

AI classification · Data extraction · Onboarding automation

A production-grade backend API built with Node.js and PHP that uses OpenAI to classify and extract structured data from South African FICA documents — ID documents, bank statements, and payslips — in PDF or image form. The API accepts multiple files per request, identifies the document type, extracts every relevant field, and returns clean structured JSON ready to plug into onboarding, CRM, or credit workflows.

Node.js PHP OpenAI REST API PDF Processing OCR Multi-File Upload JSON

Public Demo — Sensitive Data Masked

The API itself returns the full extracted data set. The demo frontend shown in these screenshots intentionally masks sensitive fields (ID numbers, account numbers, salary figures, addresses, etc.) on the client side because this portfolio is publicly accessible. In production, the API response is consumed directly by internal onboarding systems.

AI-Driven
OpenAI Classification
3 Doc Types
ID / Bank / Payslip
Batch Upload
Multi-File Processing
PDF + Image
Mixed Format Support
Structured JSON
Ready For Integration

Problem, Solution & Outcome

Built to remove the bottleneck of manual FICA document handling during customer onboarding

The Problem

Onboarding teams were manually opening every FICA document, identifying whether it was an ID, payslip, or bank statement, then typing the same fields into the CRM by hand. This was slow, error-prone, hard to audit, and impossible to scale during high-volume application periods.

The Solution

A backend API built in Node.js + PHP that accepts a batch of files (PDF or image), automatically classifies each document using OpenAI, runs tailored extraction prompts per document type, and returns a clean structured JSON response with every field detected — including encrypted PDF handling with per-file passwords.

The Outcome

Manual capture time per customer dropped dramatically, data quality improved through consistent structured output, and onboarding teams could focus on verification instead of typing. The API now plugs directly into downstream CRM and credit workflows for fully automated FICA intake.

API Pipeline

From raw upload to structured JSON in one round trip

01

Receive & Validate

API accepts a batch of files (PDF / JPG / PNG / TIFF / WebP) with optional per-file passwords for encrypted PDFs.

02

Classify Document

OpenAI is used to determine whether each document is an ID, bank statement, or payslip — no manual tagging required.

03

Extract Fields

Document-specific prompts pull every relevant field (names, ID numbers, balances, salary breakdown, etc.) into structured data.

04

Return JSON

API responds with one structured JSON object per document, grouped by classification, ready to ingest into the CRM.

Supported Document Types

Tailored extraction logic per document type for maximum accuracy

South African ID

Smart Card or Green Book — supports both formats and full identity extraction.

Full Names ID Number Date of Birth Gender Nationality ID Type

Bank Statement

Detects all major SA banks and pulls account, balance, and statement metadata.

Account Holder Bank Account No. Branch Code Address Period Balances

Payslip

Full salary breakdown including deductions, employer details, and banking info.

Employee Employer Job Title Department Gross / Net Tax / UIF Account No.

Key Features

Built for real onboarding workflows — not just a tech demo

AI Classification

Uses OpenAI to automatically identify document type — no manual tagging.

Field-Level Extraction

Pulls every relevant data point per document type into a structured object.

Multi-File Batches

Process several documents in one API call with mixed types and formats.

PDF + Image Support

Handles PDF, JPG, PNG, GIF, BMP, WebP, and TIFF inputs up to 10 MB each.

Encrypted PDF Handling

Per-file password support for password-protected bank statements and payslips.

Structured JSON Output

Clean, predictable JSON shape per document type — easy to consume downstream.

Public Demo Masking

Demo frontend masks sensitive fields client-side; raw API returns full data.

Plug-In Integration

Designed to slot directly into onboarding, CRM, or credit workflows.

Confidence-Aware

Returns N/A for missing fields rather than guessing — safer for compliance.

Fast Turnaround

Processes multi-document batches in seconds rather than minutes of manual work.

SA-Specific Logic

Tailored for South African ID formats, local banks, and SARS payslip layouts.

Production-Ready

Currently in active use — not a prototype or proof-of-concept demo.

Technology Stack

Built end-to-end with a focus on reliability, accuracy, and integration

Backend

Node.js PHP REST API

AI / NLP

OpenAI Classification Extraction

Document Handling

PDF Parsing Encrypted PDFs Image OCR

Integration

JSON Multipart Upload Auth

Need a similar AI-powered document API?

If you're processing high volumes of documents — onboarding, KYC, FICA, or anything that needs classification and structured extraction — let's talk.