Hacettepe University · Graduation Project 2026

Nafie

Regulation-focused RAG assistantFastAPI · Flutter · Turkish retrieval

Answers grounded in official Markdown corpora using hybrid search, reranking, and optional GraphRAG. NAFIE and other generation backends (local Gemma GGUF, Gemini API) are configurable. Transcript-aware personal mode and course forums add student context — this system does not replace official student affairs.

RAG

Hybrid + rerank

BGE-M3 · ChromaDB

TR

Turkish-first

Article-aware chunking

473M

NAFIE (research)

Turkish LM from scratch

System Highlights

Overview of components in the production backend and Flutter client: Turkish-first retrieval, configurable LLMs, and student context (transcripts + forums).

Multiple generation backends

Default deployment uses local Gemma 4 GGUF (llama-cpp-python); optional Google Gemini API and research model NAFIE (Hugging Face). NAFIE is a 473M-parameter decoder-only Transformer trained from scratch for Turkish.

  • Local GGUF · low latency
  • Gemini / NAFIE optional
  • Guardrails and session management

Turkish retrieval

Turkish BGE-M3 (or alternatives per config) for morphology-friendly vectors; hybrid ranking (dense + sparse) and optional BGE reranker.

  • Hybrid search + RRF
  • Persistent ChromaDB index
  • Main corpus vs. forum collections

Article-aware chunking

Markdown regulation texts are split at article and sentence boundaries; low-quality chunks can be filtered from context.

  • Smart article groups
  • Overlapping windows
  • Optional GraphRAG neighbors

Deterministic checks

For numeric thresholds and eligibility questions, rules and source text take priority; model output alone is not official verification.

  • Prompt rules by question type
  • Context selection via score thresholds
  • Transcript summary in personal mode

Multi-turn conversation

Follow-up questions can extend the retrieval query with prior user messages; course-level RAG via forum context and `/forum=` commands.

  • General / personal mode
  • Forum wiki + messages + documents
  • JWT-secured API

NAFIE — Built for Turkish from Scratch

The chatbot backend selects a generation engine at deploy time (e.g. local Gemma 4 GGUF, Gemini API, or a NAFIE checkpoint). NAFIE is a 473M-parameter decoder-only Transformer trained from scratch for Turkish on 4 × H100 GPUs via TÜBİTAK ULAKBIM TRUBA.

Architecture

NAFIE adopts a decoder-only Transformer with four modern improvements over the baseline:

Multi-Query Attention (MQA)

Single key/value head shared across all query heads — significantly lower memory footprint and faster inference with minimal quality loss.

Rotary Position Embedding (RoPE)

Injects absolute and relative position into query/key vectors via rotation matrices — better length generalization than sinusoidal encoding.

SwiGLU Feed-Forward

Replaces ReLU with a gated activation (SiLU ⊙ linear gate) for richer non-linear representations within each Transformer block.

Flash Attention

IO-aware attention computation that matches GPU memory hierarchy — reducing both memory usage and wall-clock training time.

Configuration

Total Parameters473M
Transformer Layers36
Embedding Dimension1,024
Attention Heads4 (MQA)
FFN Hidden Dim2,816
Context Length1,024 tokens
Vocabulary Size65,500
Attention TypeMulti-Query Attention
Positional EncodingRoPE (θ = 10,000)
ActivationSwiGLU
NormalizationRMSNorm (pre-norm)
Weight TyingEmbedding ↔ lm_head
TR

Custom Turkish BPE Tokenizer

Turkish's agglutinative morphology makes standard multilingual tokenizers inefficient — a single word can carry many suffixes, fragmenting the token budget. NAFIE uses a purpose-built BPE tokenizer trained on Turkish corpora:

Vocabulary size

65,500 tokens

Training data

TDK dict · COSMOS · literary works

Special tokens

<s> </s> <pad> <unk>

Normalization

NFKC Unicode

Progressive Layer Growth

Rather than training a 36-layer model from scratch, NAFIE grew progressively — starting at 24 layers and expanding in stages. This improves convergence stability and lets each new layer inherit strong representations from prior stages. Total pre-training: ~200B tokens.

1

Base training — 24 layers, 3 epochs

~22–24 h
2

First expansion — 24→32 layers (new layers only)

~17 h
3

Full fine-tuning — all 32 layers, 1 epoch

~17 h
4

Final expansion — 32→36 layers (new layers only)

~17 h
5

Final full training — all 36 layers

~36 h

Infrastructure: 4 × NVIDIA H100 GPUs · TÜBİTAK ULAKBIM TRUBA · PyTorch DDP + NCCL · bfloat16

Supervised Fine-Tuning (SFT)

After pre-training, NAFIE was fine-tuned on ~101,000 Turkish instruction samples to gain instruction-following and dialogue capabilities — significantly improving next-token prediction accuracy.

90,000 samples

High-quality chain-of-thought data (synthetic and distilled) including ~2,700 multi-turn dialogues

10,000 samples

Turkish Open Orca — general instruction-following

1,000 samples

Basic arithmetic in Turkish

Validation Accuracy

52.9%69.41%

Perplexity (val)

3.8

RAG Pipeline

Documents are indexed offline; queries flow through hybrid retrieval, optional GraphRAG, and the selected LLM. The Flutter client sends sessions, transcripts, and forum context to FastAPI.

Indexing pipeline

Markdown corpus

document_md

Article chunker

Smart splitting

TR

Embedding

BGE-M3 etc.

Hybrid search

Dense + sparse + RRF

LLM generation

GGUF / Gemini / NAFIE

Response

Sources + metrics

DB

ChromaDB

Persistent vector cache; corpus fingerprint controls re-embedding

Runtime

User question

Natural language or forum command

Mode & guardrails

General / personal · safety filters

Conversation memory

Follow-up query expansion

Retrieval

Hybrid · min_score · optional reranker and GraphRAG

LLM generation

Selected backend · low temperature · grounded prompt

Response + sources

Cited passages, scores, and timing metrics

Why hybrid search?

Vector similarity alone can fall short on Turkish legal text; combining keywords with RRF yields more stable results.

Why local GGUF?

Run generation offline for development and demos; production can switch to Gemini or NAFIE.

Why low temperature?

Regulation Q&A favors consistency; source context and score thresholds help reduce hallucination risk.

Measurable Outcomes

Summary of reported or target metrics for the model and system components. Live latency and success rates depend on the chosen LLM and hardware.

3.8

NAFIE perplexity (SFT)

Validation perplexity after SFT on ~101k Turkish instruction examples — a strong signal for Turkish generation quality.

Post-SFT

69.41%

NAFIE validation accuracy

Next-token accuracy rose from 52.9% to 69.41% after SFT — a clear gain in instruction following.

Up from 52.9%

473M

Research-scale LM

A compact decoder-only model trained specifically for Turkish — suitable for experimentation alongside production GGUF / API backends.

36 layers · MQA

200B

Pre-training tokens

Five-stage layer growth over ~200 billion Turkish tokens — web, Wikipedia, COSMOS, TDK, and instruction pairs.

4 × H100 · TRUBA

Hybrid

Retrieval stack

Production chatbot uses dense + sparse hybrid search, persistent ChromaDB cache, and optional cross-encoder reranking.

BGE-M3 · Chroma

+20%

Turkish retrieval (reference)

Turkish-focused embeddings are designed to outperform generic multilingual models on academic Turkish text (design target; depends on your evaluation setup).

vs. multilingual baseline

Stack

Open-source, production-friendly components: Turkish retrieval, local inference, and a cross-platform mobile client.

FastAPI

Backend API

Chat, auth, forums, OCR/STT, and health endpoints.

📱

Flutter

Client

Android and Windows desktop; Aurora design system.

🦙

llama-cpp-python

Local GGUF

Default generation: Gemma 4 GGUF; configurable CPU/GPU layers.

🔢

sentence-transformers

Embedding

Turkish BGE-M3 and alternative embedders per config.

🗄️

ChromaDB

Vector store

Persistent cache for main corpus and forum collections.

🔥

PyTorch

Rerank / HF

Cross-encoder reranker and optional NAFIE / HF loading.

Google Gemini

Optional API

Cloud generation and GraphRAG entity extraction.

🎙️

EasyOCR · Whisper

Multimodal input

Image-to-text and audio transcription endpoints.

Models

Gemma 4 GGUF

Production (default)

Default local generation · UD-Q4_K_XL · llama.cpp

NAFIE

Research LLM

473M · 36 layers · HF / Torch · optional

Gemini API

Optional

Cloud generation · low latency option

Turkish BGE-M3

Embedding

1024-dim · hybrid retrieval

Application UI

Sample screens from Nafie’s Flutter client: source summaries, regulation Q&A, course forums, and transcript context in one flow — targeting a consistent experience on Android and Windows.

Source citations

Answers can be linked to relevant regulation passages.

Forums & transcript

Course forums and transcript summaries enrich context for signed-in users.

TR

Aurora theme

Light and dark palettes aligned with the Flutter app’s Aurora design system.

Nafie · Flutter
Sign-in and authentication
Sign-in and authentication (1/5)

Five screenshots · navigate with arrows or dots · Flutter (Android / Windows)

Product Demo

Watch a short walkthrough of the app in action.

Project Team

Graduation project, Department of Computer Engineering, Hacettepe University.

Ufuk Ağaya

Developer

Halis Ebrar Cengiz

Developer

Bedirhan Gençaslan

Developer

Advisor

Prof. Mehmet Önder Efe

Project advisor

Department of Computer Engineering, Hacettepe University

Nafie

Hacettepe University

Department of Computer Engineering

Graduation project · 2025–2026