Large Language Model(LLM) stack

The LLM (Large Language Model) stack refers to the comprehensive set of technologies, tools, and processes used to build, deploy, scale, and maintain applications powered by large language models. This stack has evolved rapidly with the rise of generative AI, addressing unique challenges like high computational demands, data management, model unpredictability, and security concerns. It encompasses everything from foundational hardware to high-level application orchestration, enabling developers to create robust AI systems for tasks such as natural language processing, content generation, and autonomous agents.

The LLM stack operates as a pipeline: Data from pipelines is used to train or fine-tune foundation models on specialized hardware. Trained models are deployed for inference, where orchestration tools like LangChain build complex workflows, often incorporating RAG for accuracy. Security and validation layers filter inputs/outputs, while observability monitors everything to feed back into testing for continuous improvement. For example, in a chatbot app, user queries hit the orchestration layer, retrieve context from vector DBs, run through inference on Groq hardware, get validated by Guardrails, and are logged via Helicone for analysis.

7 Layers of LLM stack:

  •     1. Lower layers (1-3) focus on building reliable models from huge, clean data.

  •     2. Middle layers (4-5) handle reasoning chains and efficient, safe model serving.

  •     3. Top layers (6-7) make LLM power accessible within real-world software and business context.

Comparing the 7 Layers of the LLM Stack:

Here's a side-by-side summary of each layer in the typical LLM stack, highlighting the key components and how each layer connects to the full lifecycle of a large language model, from data to deployed applications.


LayerPurposeKey Components
1. Data AcquisitionGather vast language data to build and train LLMs- Web crawlers, APIs -
Document parsers -
Data labeling/curation -
Data storage/data lakes
2. Data Preprocessing & ManagementClean, filter, structure and govern the data- Deduplication, normalization -
Tokenization, vocabulary creation -
PII removal, privacy tools -
Versioned datasets
3. Model Selection & TrainingBuild, optimize, and train LLMs- Model architecture
(e.g. Transformer variants) -
Distributed training frameworks -
GPUs/TPUs and accelerators -
Checkpointing, early stopping -
Fine-tuning, RLHF
4. Orchestration & PipelinesEnable modular workflows and complex reasoning- Agent frameworks
(e.g., LangChain, Haystack) -
Memory modules, scratchpads -
Chaining multiple models/tools -
Retrieval-
Augmented Generation
(RAG) pipelines
5. Inference & ExecutionRun models at scale, safely & efficiently- Serving endpoints/APIs -
Prompt management/templating -
Guardrails for safety &
content filtering -
Caching & load balancing
6. Integration LayerConnect LLMs with business systems and data- API gateways, SDKs -
Authentication/authorization
modules -
Plugins & connectors
(e.g., Salesforce, SQL DB) -
Usage metering, monitoring
7. Application LayerDeliver user-facing solutions and automation- Chatbots, copilots,
 text analytics -
Custom enterprise apps - Reporting/analytics dashboards - Workflow automation


ref:

Decoding the LLM stack for future AI applications @ https://medium.com/@KapilDaga/decoding-the-llm-stack-for-future-ai-applications-97e5250dbc79

sarvam.ai overview

sarvam.ai is a Bengaluru-based AI company that is building a full-stack Generative AI platform for India. It's selected under the IndiaAI Mission to build India’s first indigenous, sovereign Large Language Model (LLM)The goal is to create AI that can understand and respond naturally in multiple Indian languages, including through voice-based interactions.

Sarvam AI has publicly launched Sarvam-M, a flagship 24-billion-parameter model built on top of Mistral Small, a French open-source language model. Sarvam-1 is developed from scratch, smaller (2B parameter) model offers token-efficient support for major Indian languages, outperforming similar-scale global LLMs. 


Key Products and Initiatives:

  • Sarvam-M: This is one of their large language models, described as a multilingual, hybrid-reasoning, text-only model. It is specifically fine-tuned for Indic languages and has shown strong performance on benchmarks related to Indian languages, math, and programming.

  • Sarvam Samvaad: A platform to build, customize, and launch AI agents that can operate in 11 Indian languages across various channels like phone calls, WhatsApp, and websites.

  • Open-Source Contributions: Sarvam.ai actively contributes to open-source projects, including their "OpenHathi" series, to help foster the broader Indian AI ecosystem.

ref:

sarvam.ai @ https://dashboard.sarvam.ai/

sarvam.ai chathttps://dashboard.sarvam.ai/chat

sarvam.ai apihttps://docs.sarvam.ai/api-reference-docs/chat/completions

india stackhttps://indiastack.org/

iSPIRIT( A non-profit think tank that builds public goods for Indian product startup to thrive and grow. Verified )https://github.com/iSPIRT

AI Agent vs AI Workflow

AI Agent and AI Workflow are related but distinct concepts in the field of Artificial Intelligence

  • AI Agent is an autonomous system capable of perceiving, deciding, and acting in its environment, often in real-time. 
  • AI Workflow is a structured process for creating, deploying, and maintaining AI models, typically staged and planned.

Analogy: Recipe vs Chef

  • AI Workflow is like a detailed recipe: You follow the steps precisely to get a predictable dish. If an ingredient is missing, you stop or notify. The "AI" might be a smart scale telling you how much flour to add, but you're still following a set procedures.
  • AI Agent is like a seasoned chef: You tell the chef to "make a delicious Italian dinner". The chef knows various recipes, can adapt to available ingredients, troubleshoot if something goes wrong, and might even invent a new dish based on the desired goal set. The chef has autonomy and the ability to plan and execute dynamically.

AI Workflow (or AI-powered Workflow):

    An AI workflow is a structured, step-by-step process where one or more steps are enhanced or automated by Artificial Intelligence capabilities. It's essentially a traditional workflow that has been infused with AI to improve its efficiency, accuracy, or intelligence. The sequence of steps is largely pre-determined and follows a set logic or rules. As the steps are defined, the outcomes are generally predictable and consistent.

Examples:
  • Automated Invoice Processing: OCR (AI) extracts data from invoices, which then flows through a system for validation, approval (human or AI-assisted), and payment.
  • Customer Support Routing: NLP (AI) analyzes a customer's query to automatically route it to the correct department or provide a canned response.
  • Predictive Maintenance: Sensors collect data (workflow step), ML (AI) analyzes it to predict equipment failure (AI step), triggering a maintenance order (subsequent workflow step).

AI Agent:

    AI agent is a software entity designed to autonomously perceive its environment, reason, plan, and take actions to achieve a specific goal with minimal human intervention. Unlike a workflow that follows a script, an agent can dynamically determine its course of action based on new information and its understanding of the goal. Agents often use sophisticated AI models (like Large Language Models - LLMs) to reason about the problem, break down complex goals into subtasks, and devise a plan to achieve them.

Examples:

  • Personalized Digital Assistant: An agent that can book travel, manage calendars, find information online, and respond to emails, adapting its approach based on user preferences and real-time changes.
  • Autonomous Code Generation and Debugging: An agent tasked with building a feature might research documentation, write code, run tests, identify errors, and debug the code until the feature is complete.
  • Advanced Customer Service Agent: Beyond routing, an agent that can dynamically troubleshoot complex issues, access multiple internal systems, learn from customer interactions, and even initiate follow-up actions without explicit human instruction for each step.
  • Research Assistant: An agent assigned to research a topic might formulate search queries, synthesize information from multiple sources, summarize findings, and even generate a report, adapting its research strategy as it discovers new information.

ref:

    Gemini LLM response @ https://gemini.google.com/app/03989649287ea08e

    Meta AI LLM response @ https://www.meta.ai/prompt/0dc5102b-9e90-437f-a810-01d98c5a3b7c

    CharGPT LLM response @ https://chat.chatbotapp.ai/chats/-OR6XiaFDXl8y3Hz1yQ2?model=4o-mini

    Automation vs. AI Workflow vs. AI Agent: Making Sense of the Buzzwords @ https://www.linkedin.com/pulse/automation-vs-ai-workflow-agent-making-sense-buzzwords-akarsh-kain-epmwc/

    Automations vs AI Workflows vs AI Agents: Understanding the Key Differences @ https://www.linkedin.com/pulse/automations-vs-ai-workflows-agents-understanding-key-tyler-mcgregory-w1c8e/

Periodic table of machine learning

MIT researchers have created a periodic table that shows how more than 20 classical machine-learning algorithms are connected. The new framework sheds light on how scientists could fuse strategies from different methods to improve existing AI models or come up with new ones.

The periodic table stems from one key idea: All these algorithms learn a specific kind of relationship between data points. While each algorithm may accomplish that in a slightly different way, the core mathematics behind each approach is the same. Building on these insights, the researchers identified a unifying equation that underlies many classical AI algorithms. They used that equation to reframe popular methods and arrange them into a table, categorizing each based on the approximate relationships it learns.

Just like the periodic table of chemical elements, which initially contained blank squares that were later filled in by scientists, the periodic table of machine learning also has empty spaces. These spaces predict where algorithms should exist, but which haven’t been discovered yet.

ref:

MIT Insights: Periodic table of machine learning @ https://news.mit.edu/2025/machine-learning-periodic-table-could-fuel-ai-discovery-0423

I-CON: A UNIFYING FRAMEWORK FOR REPRESENTATION LEARNING paper(Published as a conference paper at ICLR 2025) @ https://openreview.net/pdf?id=WfaQrKCr4X