The layers of modern AI infrastructure
In the last couple of years, our digital world has evolved rapidly with AI. Although research in this field has been going on for decades, modern AI adoption started around the 2010s and has been significantly booming since late 2022.
But the modern AI that we interact with doesn't just consist of an algorithm running on a server — it is a complex system of applications, powerful hardware, deep learning, and complex networks.
Let's explore the different layers of modern AI. We will start from the top at the Application layer as to what we see in our day to day lives down to Infrastructure and Energy Layer which includes energy sources to power data centres and powerful hardware.
Application Layer
This is the topmost layer of the AI stack. The application layer consists of what we see on a daily basis.
Without any existing applications, building and running models is almost useless. This is because the goal of Artificial Intelligence is to enhance user experience in the digital space.
Youtube's auto generated captions in their videos, Meta's live translation feature for their reels, Spotify's recommended music albums, driving electric cars automatically or facial recognition in our smartphones are all examples of applications of AI.
The AI system constantly updates its behaviour through the application by learning the interactions of the user. But a user can also be a part of this loop. As an example, Google's email classifier automatically filters emails to Primary, Conversations, Spam etc. A user can always update the classification of a particular email from Inbox to Spam and vice versa. The system updates internally based on this interaction and updates new incoming emails, slowly making it a highly personalised email filtering system.
Orchestration Layer
Below the Application Layer sits the Orchestration Layer. This is where the AI models are connected with a real environment and APIs. A server is responsible for handling the AI model and its behaviour with external services.
Usually, an AI model is "frozen" after training. Its internal weights are fixed and are not changed during inference. A model can only be trained upto a certain timestamp of an existing data.
To make the model useful, tools and techniques like RAG (Retrieval Augmented Generation) and MCP (Model Context Protocol) are quite helpful in altering the output of a model when attached to a data store like a Vector Database. This helps in the model use its existing knowledge base, learn about new data through RAG and generate more accurate responses.
An AI model can also be used to build agents. Agents are a type of autonomous software that perceive and act on their environment, reason through their steps, and utilise tools to achieve their objectives.
As an example Claude code, widely used terminal agent, lives and acts in the terminal, writes code and run terminal commands on the user's behalf. This enables an AI model to 'see' at a particular environment, use the available tools through the agent and achieve the task.
Model Layer
The brain of the AI where all of the foundation models are present. All machine and deep learning models are essentially large-scale mathematical structures that process an input and return an output involving statistics, linear algebra, calculus and lots of matrix multiplications.
What is a model?
A typical model contains weights, biases and activation functions which combine together to form a neuron. These weights and biases can be adjusted while training and the model is said to be 'learning' the data. Together, these weights and biases are called the parameters of the model. Usually, the more number of parameters a model has, the more computationally extensive it is, but the output quality tends to get better.
Each model comes with a structural design, called its architecture, that dictates how an incoming input is processed and how neurons are connected to each other. This is decided before the training starts and is never changed thereafter. Some examples include Convolutional Neural Networks (CNNs), Feed Forward Neural Nets, or the Transformer architecture.
Most of the most powerful models you can think of are proprietary — i.e., their architecture and weights are not publicly available. Some companies like Anthropic make them available through APIs or commercial apps. But open-source models like Qwen, DeepSeek, and Mistral are slowly catching up and becoming just as capable.
Model deployment
This layer also includes two more things: model deployment and fine-tuning.
Fine-tuning is a technique where a pre-trained model is trained on a smaller and more specific dataset so that it adapts to a particular task or domain. For example, we can use a pre-trained Stable Diffusion model, fine tune it with a dataset that consists of someone's personal images, and it can generate new images of that particular person based on the input prompt.
While most of the models are deployed on a server grade GPU, it is possible to 'shrink' them down to a smaller size to deploy and run it on a less powerful hardware. To achieve this, engineers typically use techniques like model pruning, quantisation, offloading to CPU and RAM, and using efficient model formats like GGUF or Safetensors.
Data Layer
The data layer includes Vector storage, big data, context cache as well as storage software services.
Data pipelines
This is where all the data, knowledge and the model lives. It involves data collection, storage, transformation and preparation for AI models. For a good quality model, having a large scale of quality data is important.
Data can be obtained from various sources, such as databases, user interactions and information from existing apps, and scraping the internet.
The data which is obtained can be pretty messy and unstructured. Data scientists often analyse this data and create pipelines to move this raw data, apply transformations, convert to specific formats and store it to different destinations such as databases or data warehouses.
Companies who collect data, process and store it must adhere to privacy laws such as the GDPR or the CCPA. Often frontier labs and big tech companies do not disclose how they collect this information, and it often raises privacy ethical concerns.
Vectors and Context
The knowledge and the information a model learns during training is stored in a high dimensional set of numbers called vectors and embeddings. These numbers have a semantic representation to real world objects and entities.
In terms of large language models, context is an important aspect. This context represents the background information, chat and prompt history and specific instructions. Think of it as a model's working memory: it dictates how the new response is generated based on the conversation, user behaviour and existing prompt.
Infrastructure Layer
This layer consists of all the hardware which powers all of the above layers. This comprises of data centres, high speed physical networks, processing units and chips, storage devices like HDDs and SSDs and high bandwidth memory.
Chips and Processors
While a CPU is a basic necessity for any computer, to train and deploy AI models a server will contain a Graphics Processing Unit (GPU). While originally designed for gaming and graphics, its architecture supports massive data parallelism (Single Instruction Multiple Data) which makes it ideal for running AI models. Every GPU contains something called VRAM (Video-RAM) in which all the parameters of AI models are stored.
While Nvidia holds the dominance in the GPU market, Intel and AMD are slowly catching up. This is because Nvidia maintain the CUDA software ecosystem for their chips, frequently innovates and updates its GPU architecture (Blackwell, Rubin, Ampere etc) and work on technologies like NVLink to connect large number of GPUs which makes it hard for any other company to replace.
While every AI data centre contains Nvidia chips, hyperscalers are building their own ASICs (Application Specific Integrated Circuits) to reduce their reliance on Nvidia. Some examples include Google with its TPUs (Tensor Processing Units), Amazon with Trainium and Inferentia chips and Apple with Apple silicon for on-device AI.
Memory
Memory is an important aspect of modern compute systems. Essentially all computer RAM is made by 3 companies: Samsung Electronics, SK Hynix and Micron but the RAM that we have in our phones and laptops is very different than what data centres have.
Server RAMs include Error Correcting Codes to avoid data corruption, include an extra chip acting as a middleman between the CPU and the memory, and are designed to run at slower speeds than consumer RAM for stability.
Due to higher demand for AI infrastructure, companies are redirecting to building server grade RAM with Micron announcing that it would focus entirely on high bandwidth memory by retiring Crucial. As a result, RAM prices have been significantly increasing, leading to a RAM shortage. This would also mean that buying your next phone or laptop would be significantly more expensive for the same RAM.
Personal AI
While infrastructure typically means large data centres and networks, this infrastructure could also be your personal laptop or your mobile phone. Companies like Qualcomm and Apple already ship specialised chips called Neural Processing Units (NPUs) which are highly power efficient and can run inference of AI models locally, without requiring a connection to the internet or any remote server.
Nvidia and AMD sell their gaming GPUs to consumers, which are also fully capable of training and running AI models at a much smaller scale. You can either download, configure and run open-source models locally, or you can train an AI model of your own with your custom datasets.
If you do not have access to powerful hardware, many cloud providers like Google, RunPod, Vast AI, etc. make it possible to rent hardware at cheap, affordable prices.
Conclusion
In this blog, I touched all the layers in modern AI infrastructure.
This is not a hard-and-fast way of looking at the current infrastructure, but one of the ways that we can look and understand what comprises today's AI systems. Each layer acts as a foundation for subsequent layers on top of it. To build a successful AI system, it is important to manage each and every part of the flow properly.
On a large scale, different companies work on different layers across the stack. One exception is Google. Google owns the entire AI stack from their data centres like Google Cloud to applications like Google Search and Youtube.
As I mentioned earlier, it is also possible to own the entire AI infrastructure at a personal scale. The only challenge is to setup and configure different environments and orchestrate services, which can be a fun learning experience on its own!
What we see in terms of AI in our day-to-day lives is just the tip of the iceberg, and we often overlook the efforts of engineers and researchers to make this technology a reality.