TL;DR — the build playbook
A useful custom GPT (or Claude Project, or Gemini Gem) ships in 1–3 days for internal use, 2–6 weeks for production. The order: scope → data prep → system prompt → tools → eval → deploy → iterate. The most common failure mode is jumping straight to prompt engineering before deciding the one job the assistant exists to do. Pick the job first.
"Build a custom GPT" went from novel feature to commodity expectation between 2024 and 2026. Every team has a custom GPT for support, sales, onboarding, internal Q&A. The difference between the ones that get daily use and the ones that get forgotten isn't the model — it's the scoping and the eval discipline.
This is the practical playbook for shipping a custom assistant that your team (or customers) actually open every day.
In this post
Step 1 — Scope ruthlessly
The number-one mistake: trying to build "the everything assistant for our company." Useful assistants do ONE job for ONE persona. Write down:
- The job: "Answer customer support tickets about shipping & returns" (not "help with anything").
- The persona: "Our buyer support team handling first-line tickets" (not "everyone").
- The success metric: "Reduce average ticket resolution time from 14 min to 4 min" (not "improve support").
- The before / after journey: What does the user do today? What do they do after the assistant?
If you can't write all four in 10 minutes, your scope is too vague. Tighten it before touching ChatGPT.
Step 2 — Prepare the knowledge base
For most business GPTs, the knowledge base is the moat — the model is interchangeable. Collect:
- FAQs that answer the top 50 questions in your domain
- SOPs / runbooks the team actually follows
- Product documentation (cleaned of marketing fluff)
- Past tickets or conversations with high-quality resolutions
- Style guides if you have tone-of-voice requirements
Sweet spot: 100–500 well-curated documents. More than that and retrieval quality degrades unless you've invested in good chunking + reranking.
Clean rules: strip nav and footer cruft, deduplicate, convert tables to text the model can reason about, add document-level metadata (last-updated date, owner, jurisdiction).
Don't have the time to ship it yourself?
Our AI agent specialists ship custom GPT / Claude Project / Gemini Gem builds in 2–6 weeks — including data prep, evals, and a measurable success metric. Fixed-price quotes.
Hire an AI agent expert →Step 3 — Write the system prompt
A good business-GPT system prompt is 200–800 tokens and contains five things:
- Role: "You are the Nexora Support Assistant for first-line customer questions."
- Tone: "Concise, warm, never apologetic for the product. Use plain English."
- Boundaries: "Do not promise refunds. Do not quote prices outside the published pricing page. Do not write code."
- Escalation paths: "If you can't find a confident answer, end with 'I'll connect you with a human teammate — one moment.' and call the
escalate_to_humantool." - Format expectations: "Answer in 1–4 sentences. Use bullet points only when listing distinct items."
Test the prompt against 20 representative questions before iterating further. Most "the GPT is hallucinating" complaints are actually under-specified system prompts.
Step 4 — Add tools (function calling)
Tools turn a question-answerer into something useful. Keep it minimal — 3–7 tools maximum. Common ones:
search_knowledge_base(query)— RAG retrievallook_up_order(order_id)— pull live datalook_up_customer(email)— pull customer recordescalate_to_human(reason)— route to your support systemcreate_followup_ticket(summary)— write back to CRM
Each tool needs a strict JSON schema (parameter types, descriptions, required fields). The model's tool-call accuracy is downstream of how clear your schemas are.
Step 5 — Deploy and measure
Deploy internally first (always). Watch for two weeks while you measure:
- Resolution rate: % of conversations where the user got their answer without escalation
- Escalation rate: % where the assistant correctly punted to a human
- Hallucination rate: % where the assistant made up a fact (sample 5% of conversations and review manually)
- Satisfaction: 👍/👎 inline rating, weekly
Don't expose to customers until resolution rate > 70% AND hallucination rate < 2%.
Step 6 — Set up a real eval harness
An eval harness is a fixed set of input/expected-output pairs that you run after every prompt or tool change to catch regressions. Tools that help in 2026: LangSmith, Helicone, Langfuse, Braintrust.
Start with 30–50 representative cases. Add a case every time a user complaint surfaces a real failure mode. Run before every deploy. This is the single biggest determinant of whether your custom GPT keeps getting better or quietly degrades.
Step 7 — Monetisation paths
If you're building for revenue:
- OpenAI GPT Store: OpenAI started revenue-sharing in 2024 and expanded in 2025–2026. Reach is huge, control is limited, take rate is opaque.
- Wrap behind your own paywall: Build a thin UI, charge per seat or per query. Use Stripe / Paddle for billing. You own the relationship and the margin.
- Embed inside your existing product: The GPT becomes a feature, not a SKU. Easiest sale to existing customers.
- White-label for agencies / consultants: Build once, license per client. Higher ACV, fewer customers.
FAQ
What's the difference between an OpenAI GPT and a Claude Project?
OpenAI GPTs run inside ChatGPT, support function calling and a code interpreter, and can be shared in the GPT Store. Claude Projects run inside Claude, focus on persistent knowledge and longer-running collaboration, and don't have a public marketplace. GPTs are better for product-style builds; Projects are better for internal team workflows with sensitive context.
How long does it take to build a custom GPT?
A useful internal GPT: 1–3 days. A production-grade customer-facing GPT with evals, monitoring, and human escalation: 2–6 weeks. The main time sinks are data cleanup, prompt iteration, and writing the eval harness.
How much does it cost?
For internal use on existing ChatGPT Team / Plus / Claude Pro subscriptions: $0 incremental. For a customer-facing deployment via API: typically $300–$2,000/month at moderate scale, dominated by LLM call costs. A specialist-built end-to-end build is usually $4K–$25K depending on scope.
Do I need RAG or can I just upload docs?
For under 30 documents and short corpora: just upload them as Knowledge. For 30+ docs, long content, or anything you need to update frequently — build a proper RAG pipeline with a vector DB, embeddings, and a search tool the model calls. The upload limit on GPTs (~20 files, 2M tokens) is real and ungraceful.
Can I monetise a custom GPT?
On the OpenAI GPT Store, yes — OpenAI launched revenue sharing for popular GPTs in 2024 and expanded it through 2025. Outside the store, you can wrap your custom assistant behind your own paywall or charge per seat. The Store route gives you reach; the wrap route gives you control and margin.
Which model should I start with?
For internal Q&A: Claude Sonnet 4.6 or Haiku 4.5 — they're cheap, fast, and have strong tool-use. For consumer-facing in the GPT ecosystem: GPT-4o-class models with the Store. For agentic workflows that call many tools and reason for several steps: Claude Sonnet 4.6 is usually the cheapest competent option.
Ship a real custom GPT in 2 weeks
From scoping workshops through eval harness and production launch. Our AI agent specialists have shipped 80+ custom GPT / Claude Project deployments. Quotes in 24 hours.
Get a custom GPT quote →Related: RAG pipeline cost breakdown · LangChain vs LangGraph