Loading…
Loading…
For Sceptre-class products that put a streaming AI pipeline at the core, Archiet generates the full real-time stack — with provider cascades, vector store adapters, backpressure, and BYOK — so the differentiator is the architecture, not the scaffolding.
Triggered by rag_pipeline, streaming_stt, vector_search, or llm_orchestration capabilities, or when functional requirements mention transcribe, voice, chat, embed, or semantic search.
Primary provider (Azure Speech or Deepgram or OpenAI Realtime), three retry attempts with exponential backoff, then fall through to the next provider. Local Whisper as the last-resort fallback so a cloud outage doesn’t take you down.
Same input text within 24 hours hits cache and doesn’t re-embed. Token spend on repeat queries is zero, latency is microseconds.
pgvector by default; opt-in adapters for Pinecone, Weaviate, and Qdrant. Hybrid retrieval combines vector similarity with BM25 lexical for better recall on technical terminology.
Wraps the existing 6-provider llm_service cascade (OpenRouter primary; Anthropic / OpenAI / Google / DeepSeek / HuggingFace fallback) with streaming token delivery and backpressure. Per-tenant BYOK via WorkspaceLLMKey.
Full lifecycle generated: connect, auth, stream-up (audio chunks), stream-down (transcription tokens), close. Backpressure slows the sender if the downstream WebSocket buffer fills.
Every LLM, STT, and embedding call writes a row in llm_request_log with token count and estimated cost. Per-tenant rate limit via Redis token bucket using LLM_REQUESTS_PER_MIN.
Full real-time surface (STT cascade + embeddings + vector + LLM + WebSocket) on Python and NestJS stacks. LLM orchestration and embeddings on the rest.
Where this generator answers a procurement question on day one.