How to Build a RAG Pipeline Over Your Company Docs

End-to-end guide: ingest, chunk, embed, store, retrieve, answer with citations. Real costs at each scale.

Every company wants "ChatGPT for our docs". Here's the actual build, with the choices we'd make today.

The 5-step pipeline

  1. Ingest docs (PDF/HTML/Markdown/Notion) into raw text
  2. Chunk into 500-token pieces with 50-token overlap
  3. Embed with text-embedding-3-small ($0.02 / 1M tokens)
  4. Store in pgvector (Supabase, free tier handles 10K docs)
  5. Retrieve top-k + send to GPT-4o-mini with a "cite your sources" prompt

Tools to use

Cost at 1K docs / 10K queries/month

That's it. $15/month for a production RAG system. Most "enterprise RAG vendors" charge $500-5000/mo for this.

The 80% you can build in a weekend

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.vectorstores.pgvector import PGVector

Ingest docs = load_documents("company_docs/") chunks = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50).split_documents(docs)

Embed + store vector_store = PGVector.from_documents(chunks, OpenAIEmbeddings(model="text-embedding-3-small"), connection_string=PG_URL)

Query results = vector_store.similarity_search(question, k=5) context = "\n\n".join([d.page_content for d in results]) answer = ChatOpenAI(model="gpt-4o-mini").invoke(f"Answer using only this:\n{context}\n\nQuestion: {question}") ```

The 20% that takes you to production

Hire a RAG pipeline developer →

Need this built for you?

Hire a vetted Nexora expert. Escrow-protected. Fixed price. From $65.

Browse automation services →