Hacettepe University · Graduation Project 2026

Nafie

Regulation-focused RAG assistant
FastAPI · Flutter · Turkish retrieval

Answers grounded in official Markdown corpora using hybrid search, reranking, and optional GraphRAG. NAFIE and other generation backends (local Gemma GGUF, Gemini API) are configurable. Transcript-aware personal mode and course forums add student context — this system does not replace official student affairs.

App UI NAFIE model

RAG

Hybrid + rerank

BGE-M3 · ChromaDB

Turkish-first

Article-aware chunking

473M

NAFIE (research)

Turkish LM from scratch

Features

System Highlights

Overview of components in the production backend and Flutter client: Turkish-first retrieval, configurable LLMs, and student context (transcripts + forums).

Multiple generation backends

Default deployment uses local Gemma 4 GGUF (llama-cpp-python); optional Google Gemini API and research model NAFIE (Hugging Face). NAFIE is a 473M-parameter decoder-only Transformer trained from scratch for Turkish.

Local GGUF · low latency
Gemini / NAFIE optional
Guardrails and session management

Turkish retrieval

Turkish BGE-M3 (or alternatives per config) for morphology-friendly vectors; hybrid ranking (dense + sparse) and optional BGE reranker.

Hybrid search + RRF
Persistent ChromaDB index
Main corpus vs. forum collections

Article-aware chunking

Markdown regulation texts are split at article and sentence boundaries; low-quality chunks can be filtered from context.

Smart article groups
Overlapping windows
Optional GraphRAG neighbors

Deterministic checks

For numeric thresholds and eligibility questions, rules and source text take priority; model output alone is not official verification.

Prompt rules by question type
Context selection via score thresholds
Transcript summary in personal mode

Multi-turn conversation

Follow-up questions can extend the retrieval query with prior user messages; course-level RAG via forum context and `/forum=` commands.

General / personal mode
Forum wiki + messages + documents
JWT-secured API

Model

NAFIE — Built for Turkish from Scratch

The chatbot backend selects a generation engine at deploy time (e.g. local Gemma 4 GGUF, Gemini API, or a NAFIE checkpoint). NAFIE is a 473M-parameter decoder-only Transformer trained from scratch for Turkish on 4 × H100 GPUs via TÜBİTAK ULAKBIM TRUBA.

Architecture

NAFIE adopts a decoder-only Transformer with four modern improvements over the baseline:

Multi-Query Attention (MQA)

Single key/value head shared across all query heads — significantly lower memory footprint and faster inference with minimal quality loss.

Rotary Position Embedding (RoPE)

Injects absolute and relative position into query/key vectors via rotation matrices — better length generalization than sinusoidal encoding.

SwiGLU Feed-Forward

Replaces ReLU with a gated activation (SiLU ⊙ linear gate) for richer non-linear representations within each Transformer block.

Flash Attention

IO-aware attention computation that matches GPU memory hierarchy — reducing both memory usage and wall-clock training time.

Configuration

Total Parameters473M

Transformer Layers36

Embedding Dimension1,024

Attention Heads4 (MQA)

FFN Hidden Dim2,816

Context Length1,024 tokens

Vocabulary Size65,500

Attention TypeMulti-Query Attention

Positional EncodingRoPE (θ = 10,000)

ActivationSwiGLU

NormalizationRMSNorm (pre-norm)

Weight TyingEmbedding ↔ lm_head

Custom Turkish BPE Tokenizer

Turkish's agglutinative morphology makes standard multilingual tokenizers inefficient — a single word can carry many suffixes, fragmenting the token budget. NAFIE uses a purpose-built BPE tokenizer trained on Turkish corpora:

Vocabulary size

65,500 tokens

Training data

TDK dict · COSMOS · literary works

Special tokens

Normalization

NFKC Unicode

Progressive Layer Growth

Rather than training a 36-layer model from scratch, NAFIE grew progressively — starting at 24 layers and expanding in stages. This improves convergence stability and lets each new layer inherit strong representations from prior stages. Total pre-training: ~200B tokens.

Base training — 24 layers, 3 epochs

~22–24 h

First expansion — 24→32 layers (new layers only)

~17 h

Full fine-tuning — all 32 layers, 1 epoch

~17 h

Final expansion — 32→36 layers (new layers only)

~17 h

Final full training — all 36 layers

~36 h

Infrastructure: 4 × NVIDIA H100 GPUs · TÜBİTAK ULAKBIM TRUBA · PyTorch DDP + NCCL · bfloat16

Supervised Fine-Tuning (SFT)

After pre-training, NAFIE was fine-tuned on ~101,000 Turkish instruction samples to gain instruction-following and dialogue capabilities — significantly improving next-token prediction accuracy.

90,000 samples

High-quality chain-of-thought data (synthetic and distilled) including ~2,700 multi-turn dialogues

10,000 samples

Turkish Open Orca — general instruction-following

1,000 samples

Basic arithmetic in Turkish

Validation Accuracy

52.9%→69.41%

Perplexity (val)

—→3.8

Architecture

RAG Pipeline

Documents are indexed offline; queries flow through hybrid retrieval, optional GraphRAG, and the selected LLM. The Flutter client sends sessions, transcripts, and forum context to FastAPI.

Indexing pipeline

Markdown corpus

document_md

Article chunker

Smart splitting

Embedding

BGE-M3 etc.

Hybrid search

Dense + sparse + RRF

LLM generation

GGUF / Gemini / NAFIE

Response

Sources + metrics

ChromaDB

Persistent vector cache; corpus fingerprint controls re-embedding

Runtime

User question

Natural language or forum command

Mode & guardrails

General / personal · safety filters

Conversation memory

Follow-up query expansion

Retrieval

Hybrid · min_score · optional reranker and GraphRAG

LLM generation

Selected backend · low temperature · grounded prompt

Response + sources

Cited passages, scores, and timing metrics

Why hybrid search?

Vector similarity alone can fall short on Turkish legal text; combining keywords with RRF yields more stable results.

Why local GGUF?

Run generation offline for development and demos; production can switch to Gemini or NAFIE.

Why low temperature?

Regulation Q&A favors consistency; source context and score thresholds help reduce hallucination risk.

Results

Measurable Outcomes

Summary of reported or target metrics for the model and system components. Live latency and success rates depend on the chosen LLM and hardware.

3.8

NAFIE perplexity (SFT)

Validation perplexity after SFT on ~101k Turkish instruction examples — a strong signal for Turkish generation quality.

Post-SFT

69.41%

NAFIE validation accuracy

Next-token accuracy rose from 52.9% to 69.41% after SFT — a clear gain in instruction following.

Up from 52.9%

473M

Research-scale LM

A compact decoder-only model trained specifically for Turkish — suitable for experimentation alongside production GGUF / API backends.

36 layers · MQA

200B

Pre-training tokens

Five-stage layer growth over ~200 billion Turkish tokens — web, Wikipedia, COSMOS, TDK, and instruction pairs.

4 × H100 · TRUBA

Hybrid

Retrieval stack

Production chatbot uses dense + sparse hybrid search, persistent ChromaDB cache, and optional cross-encoder reranking.

BGE-M3 · Chroma

+20%

Turkish retrieval (reference)

Turkish-focused embeddings are designed to outperform generic multilingual models on academic Turkish text (design target; depends on your evaluation setup).

vs. multilingual baseline

Technology

Stack

Open-source, production-friendly components: Turkish retrieval, local inference, and a cross-platform mobile client.

⚡

FastAPI

Backend API

Chat, auth, forums, OCR/STT, and health endpoints.

📱

Flutter

Client

Android and Windows desktop; Aurora design system.

🦙

llama-cpp-python

Local GGUF

Default generation: Gemma 4 GGUF; configurable CPU/GPU layers.

🔢

sentence-transformers

Embedding

Turkish BGE-M3 and alternative embedders per config.

🗄️

ChromaDB

Vector store

Persistent cache for main corpus and forum collections.

🔥

PyTorch

Rerank / HF

Cross-encoder reranker and optional NAFIE / HF loading.

✨

Google Gemini

Optional API

Cloud generation and GraphRAG entity extraction.

🎙️

EasyOCR · Whisper

Multimodal input

Image-to-text and audio transcription endpoints.

Models

Gemma 4 GGUF

Production (default)

Default local generation · UD-Q4_K_XL · llama.cpp

NAFIE

Research LLM

473M · 36 layers · HF / Torch · optional

Gemini API

Optional

Cloud generation · low latency option

Turkish BGE-M3

Embedding

1024-dim · hybrid retrieval

Screenshots

Application UI

Sample screens from Nafie’s Flutter client: source summaries, regulation Q&A, course forums, and transcript context in one flow — targeting a consistent experience on Android and Windows.

Source citations

Answers can be linked to relevant regulation passages.

◇

Forums & transcript

Course forums and transcript summaries enrich context for signed-in users.

Aurora theme

Light and dark palettes aligned with the Flutter app’s Aurora design system.

Nafie · Flutter