DOCS = {
"transformer_architecture.md": textwrap.dedent("""\
# Transformer Architecture
## Overview
The Transformer is a deep learning architecture introduced in "Attention Is All
You Need" (Vaswani et al., 2017). It replaced recurrent networks with a
self-attention mechanism, enabling parallel training and better long-range
dependency modelling.
## Key Components
- **Multi-Head Self-Attention**: Computes attention in h parallel heads, each
with its own learned Q/K/V projections, then concatenates and projects.
- **Feed-Forward Network (FFN)**: Two linear layers with a ReLU activation,
applied position-wise.
- **Positional Encoding**: Sinusoidal or learned embeddings that inject
sequence-order information, since attention is permutation-invariant.
- **Layer Normalisation**: Applied before (Pre-LN) or after (Post-LN) each
sub-layer, stabilising gradients.
- **Residual Connections**: Added around each sub-layer to ease gradient flow.
## Encoder vs Decoder
The encoder stack processes input tokens bidirectionally (e.g. BERT).
The decoder stack uses causal (masked) attention over previous outputs plus
cross-attention over encoder outputs (e.g. GPT, T5).
## Scaling Laws
Kaplan et al. (2020) showed that model loss decreases predictably as a power
law with compute, data, and parameter count. This motivated GPT-3 (175B) and
subsequent large language models.
## Limitations
- Quadratic complexity in sequence length: O(n^2)
- No inherent recurrence -> long-context challenges
- High memory footprint during training
## References
Vaswani et al. (2017). Attention Is All You Need. NeurIPS.
Kaplan et al. (2020). Scaling Laws for Neural Language Models. arXiv:2001.08361.
"""),
"rag_systems.md": textwrap.dedent("""\
# Retrieval-Augmented Generation (RAG)
## Definition
RAG augments a generative LLM with a retrieval step: given a query, relevant
documents are fetched from a corpus and prepended to the prompt, giving the
model grounded context beyond its training data.
## Architecture
1. **Indexing Phase** — Documents are chunked, embedded via a bi-encoder
(e.g. text-embedding-3-large), and stored in a vector database (e.g.
Faiss, Pinecone, Weaviate).
2. **Retrieval Phase** — The user query is embedded; approximate nearest-
neighbour (ANN) search returns the top-k chunks.
3. **Generation Phase** — Retrieved chunks + query are passed to the LLM
which synthesises a final answer.
## Variants
- **Dense Retrieval**: DPR, Contriever — queries and docs in the same space.
- **Sparse Retrieval**: BM25 — term frequency-based, no embeddings needed.
- **Hybrid Retrieval**: Reciprocal Rank Fusion (RRF) combines dense + sparse.
- **Re-ranking**: A cross-encoder re-scores the top-k before the LLM sees them.
## Challenges
- Context window limits: long retrieved passages may not fit.
- Retrieval quality is a hard ceiling on generation quality.
- Chunking strategy significantly affects recall.
- Multi-hop questions require iterative retrieval (IRCoT, ReAct).
## Relationship to Transformers
RAG systems rely on transformer-based encoders for embedding and decoder
models for generation. The quality of the embedding model directly determines
retrieval precision and recall.
## References
Lewis et al. (2020). RAG for Knowledge-Intensive NLP Tasks. NeurIPS.
Gao et al. (2023). RAG for Large Language Models. arXiv:2312.10997.
"""),
"knowledge_graph_integration.md": textwrap.dedent("""\
# Knowledge Graphs and LLM Integration
## What is a Knowledge Graph?
A knowledge graph (KG) is a directed labelled graph of entities (nodes) and
relations (edges): (subject, predicate, object) triples, e.g.
(Vaswani, authored, "Attention Is All You Need").
## Why Combine KGs with LLMs?
LLMs hallucinate facts; KGs provide structured, verifiable ground truth.
KGs are hard to query in natural language; LLMs provide the interface.
Together they enable faithful, grounded, explainable question answering.
## Integration Strategies
### KG-Augmented Generation (KGAG)
Retrieve triples or sub-graphs instead of text chunks, serialise into text,
then feed to the LLM prompt.
### LLM-Assisted KG Construction
LLMs extract (subject, relation, object) triples from unstructured text,
reducing manual curation effort significantly.
### GraphRAG (Microsoft Research, 2024)
GraphRAG clusters document communities, generates community summaries, and
stores them in a KG. Queries answered by map-reduce over community summaries
outperform flat-vector RAG on sensemaking tasks.
## Challenges
- KG construction quality depends on extraction LLM accuracy.
- Graph databases add infrastructure complexity.
- Ontology design requires domain expertise.
- KGs go stale without continuous update pipelines.
## Relation to RAG and Transformers
KG integration addresses two key RAG limitations: lack of structured reasoning
and inability to follow multi-hop relations.
## References
Pan et al. (2023). Unifying LLMs and KGs. IEEE Intelligent Systems.
"""),
}










Leave a Reply