Nafie
Regulation-focused RAG assistant
FastAPI · Flutter · Turkish retrieval
Answers grounded in official Markdown corpora using hybrid search, reranking, and optional GraphRAG. NAFIE and other generation backends (local Gemma GGUF, Gemini API) are configurable. Transcript-aware personal mode and course forums add student context — this system does not replace official student affairs.
Hybrid + rerank
BGE-M3 · ChromaDB
Turkish-first
Article-aware chunking
NAFIE (research)
Turkish LM from scratch
System Highlights
Overview of components in the production backend and Flutter client: Turkish-first retrieval, configurable LLMs, and student context (transcripts + forums).
Multiple generation backends
Default deployment uses local Gemma 4 GGUF (llama-cpp-python); optional Google Gemini API and research model NAFIE (Hugging Face). NAFIE is a 473M-parameter decoder-only Transformer trained from scratch for Turkish.
- Local GGUF · low latency
- Gemini / NAFIE optional
- Guardrails and session management
Turkish retrieval
Turkish BGE-M3 (or alternatives per config) for morphology-friendly vectors; hybrid ranking (dense + sparse) and optional BGE reranker.
- Hybrid search + RRF
- Persistent ChromaDB index
- Main corpus vs. forum collections
Article-aware chunking
Markdown regulation texts are split at article and sentence boundaries; low-quality chunks can be filtered from context.
- Smart article groups
- Overlapping windows
- Optional GraphRAG neighbors
Deterministic checks
For numeric thresholds and eligibility questions, rules and source text take priority; model output alone is not official verification.
- Prompt rules by question type
- Context selection via score thresholds
- Transcript summary in personal mode
Multi-turn conversation
Follow-up questions can extend the retrieval query with prior user messages; course-level RAG via forum context and `/forum=` commands.
- General / personal mode
- Forum wiki + messages + documents
- JWT-secured API
NAFIE — Built for Turkish from Scratch
The chatbot backend selects a generation engine at deploy time (e.g. local Gemma 4 GGUF, Gemini API, or a NAFIE checkpoint). NAFIE is a 473M-parameter decoder-only Transformer trained from scratch for Turkish on 4 × H100 GPUs via TÜBİTAK ULAKBIM TRUBA.
Architecture
NAFIE adopts a decoder-only Transformer with four modern improvements over the baseline:
Multi-Query Attention (MQA)
Single key/value head shared across all query heads — significantly lower memory footprint and faster inference with minimal quality loss.
Rotary Position Embedding (RoPE)
Injects absolute and relative position into query/key vectors via rotation matrices — better length generalization than sinusoidal encoding.
SwiGLU Feed-Forward
Replaces ReLU with a gated activation (SiLU ⊙ linear gate) for richer non-linear representations within each Transformer block.
Flash Attention
IO-aware attention computation that matches GPU memory hierarchy — reducing both memory usage and wall-clock training time.
Configuration
Custom Turkish BPE Tokenizer
Turkish's agglutinative morphology makes standard multilingual tokenizers inefficient — a single word can carry many suffixes, fragmenting the token budget. NAFIE uses a purpose-built BPE tokenizer trained on Turkish corpora:
Vocabulary size
65,500 tokens
Training data
TDK dict · COSMOS · literary works
Special tokens
<s> </s> <pad> <unk>
Normalization
NFKC Unicode
Progressive Layer Growth
Rather than training a 36-layer model from scratch, NAFIE grew progressively — starting at 24 layers and expanding in stages. This improves convergence stability and lets each new layer inherit strong representations from prior stages. Total pre-training: ~200B tokens.
Base training — 24 layers, 3 epochs
~22–24 hFirst expansion — 24→32 layers (new layers only)
~17 hFull fine-tuning — all 32 layers, 1 epoch
~17 hFinal expansion — 32→36 layers (new layers only)
~17 hFinal full training — all 36 layers
~36 hInfrastructure: 4 × NVIDIA H100 GPUs · TÜBİTAK ULAKBIM TRUBA · PyTorch DDP + NCCL · bfloat16
Supervised Fine-Tuning (SFT)
After pre-training, NAFIE was fine-tuned on ~101,000 Turkish instruction samples to gain instruction-following and dialogue capabilities — significantly improving next-token prediction accuracy.
High-quality chain-of-thought data (synthetic and distilled) including ~2,700 multi-turn dialogues
Turkish Open Orca — general instruction-following
Basic arithmetic in Turkish
Validation Accuracy
Perplexity (val)
RAG Pipeline
Documents are indexed offline; queries flow through hybrid retrieval, optional GraphRAG, and the selected LLM. The Flutter client sends sessions, transcripts, and forum context to FastAPI.
Indexing pipeline
Markdown corpus
document_md
Article chunker
Smart splitting
Embedding
BGE-M3 etc.
Hybrid search
Dense + sparse + RRF
LLM generation
GGUF / Gemini / NAFIE
Response
Sources + metrics
ChromaDB
Persistent vector cache; corpus fingerprint controls re-embedding
Runtime
User question
Natural language or forum command
Mode & guardrails
General / personal · safety filters
Conversation memory
Follow-up query expansion
Retrieval
Hybrid · min_score · optional reranker and GraphRAG
LLM generation
Selected backend · low temperature · grounded prompt
Response + sources
Cited passages, scores, and timing metrics
Why hybrid search?
Vector similarity alone can fall short on Turkish legal text; combining keywords with RRF yields more stable results.
Why local GGUF?
Run generation offline for development and demos; production can switch to Gemini or NAFIE.
Why low temperature?
Regulation Q&A favors consistency; source context and score thresholds help reduce hallucination risk.
Measurable Outcomes
Summary of reported or target metrics for the model and system components. Live latency and success rates depend on the chosen LLM and hardware.
3.8
NAFIE perplexity (SFT)
Validation perplexity after SFT on ~101k Turkish instruction examples — a strong signal for Turkish generation quality.
69.41%
NAFIE validation accuracy
Next-token accuracy rose from 52.9% to 69.41% after SFT — a clear gain in instruction following.
473M
Research-scale LM
A compact decoder-only model trained specifically for Turkish — suitable for experimentation alongside production GGUF / API backends.
200B
Pre-training tokens
Five-stage layer growth over ~200 billion Turkish tokens — web, Wikipedia, COSMOS, TDK, and instruction pairs.
Hybrid
Retrieval stack
Production chatbot uses dense + sparse hybrid search, persistent ChromaDB cache, and optional cross-encoder reranking.
+20%
Turkish retrieval (reference)
Turkish-focused embeddings are designed to outperform generic multilingual models on academic Turkish text (design target; depends on your evaluation setup).
Stack
Open-source, production-friendly components: Turkish retrieval, local inference, and a cross-platform mobile client.
FastAPI
Backend API
Chat, auth, forums, OCR/STT, and health endpoints.
Flutter
Client
Android and Windows desktop; Aurora design system.
llama-cpp-python
Local GGUF
Default generation: Gemma 4 GGUF; configurable CPU/GPU layers.
sentence-transformers
Embedding
Turkish BGE-M3 and alternative embedders per config.
ChromaDB
Vector store
Persistent cache for main corpus and forum collections.
PyTorch
Rerank / HF
Cross-encoder reranker and optional NAFIE / HF loading.
Google Gemini
Optional API
Cloud generation and GraphRAG entity extraction.
EasyOCR · Whisper
Multimodal input
Image-to-text and audio transcription endpoints.
Models
Gemma 4 GGUF
Production (default)Default local generation · UD-Q4_K_XL · llama.cpp
NAFIE
Research LLM473M · 36 layers · HF / Torch · optional
Gemini API
OptionalCloud generation · low latency option
Turkish BGE-M3
Embedding1024-dim · hybrid retrieval
Application UI
Sample screens from Nafie’s Flutter client: source summaries, regulation Q&A, course forums, and transcript context in one flow — targeting a consistent experience on Android and Windows.
Source citations
Answers can be linked to relevant regulation passages.
Forums & transcript
Course forums and transcript summaries enrich context for signed-in users.
Aurora theme
Light and dark palettes aligned with the Flutter app’s Aurora design system.

Five screenshots · navigate with arrows or dots · Flutter (Android / Windows)
Product Demo
Watch a short walkthrough of the app in action.
Project Team
Graduation project, Department of Computer Engineering, Hacettepe University.

Ufuk Ağaya
Developer

Halis Ebrar Cengiz
Developer

Bedirhan Gençaslan
Developer

Prof. Mehmet Önder Efe
Project advisor
Department of Computer Engineering, Hacettepe University

Hacettepe University
Department of Computer Engineering
Graduation project · 2025–2026