The Growing Demand for On-Premise AI Processing
Artificial intelligence (AI) models continue to evolve, demanding increasingly powerful computing infrastructures. While cloud-based AI solutions offer convenience, many businesses prefer on-premise or private cloud deployments to ensure data privacy, reduce latency, and optimize costs. Stackscale’s dedicated servers with high-performance NVIDIA GPUs provide the ideal infrastructure for running advanced AI models like DeepSeek-R1 using Ollama.
What is DeepSeek-R1?
DeepSeek-R1 is an open-source AI model designed for complex reasoning tasks such as programming problem-solving, advanced mathematics, and logical processing. Unlike cloud-dependent AI solutions, DeepSeek-R1 can run locally, ensuring privacy and eliminating the need to send sensitive data to external servers. This makes it an excellent choice for companies prioritizing data sovereignty and compliance with strict security regulations.
Why Choose DeepSeek-R1?
- Reinforcement Learning vs. Supervised Training
- DeepSeek-R1 uses reinforcement learning (RL), allowing it to refine its reasoning ability through trial and error rather than relying solely on pre-labeled training data.
- Cost Efficiency
- Running DeepSeek-R1 locally eliminates the pay-per-use costs of cloud-based AI models.
- Available in versions ranging from 1.5B to 70B parameters, suitable for standard GPUs and even high-performance CPUs.
- The full-scale 671B parameter model requires enterprise-grade hardware, ideal for highly complex tasks.
- Open-Source Flexibility
- DeepSeek-R1 can be customized, fine-tuned, and integrated into proprietary applications without vendor lock-in.
- Supports integration with private APIs and AI pipelines for enterprise applications.
What is Ollama?
Ollama is an open-source framework designed to run large language models (LLMs) on local infrastructure. It allows users to execute AI workloads on bare-metal servers or cloud environments without an internet connection, making it a powerful tool for businesses concerned with security, latency, and cost control.
Key Features of Ollama:
- Runs AI models locally without requiring cloud access.
- Compatible with various LLMs, including DeepSeek-R1.
- Optimized for GPUs like NVIDIA Tesla T4, L4, and L40S, which are available in Stackscale’s dedicated servers.
- Supports Open WebUI, a GUI interface for managing AI interactions.
Why Use Stackscale’s Dedicated Servers for DeepSeek-R1?
High-Performance NVIDIA GPUs
Stackscale offers bare-metal servers and private cloud nodes equipped with high-end NVIDIA GPUs optimized for AI workloads. These include:
GPU Model | Memory | Tensor Cores | FP32 Performance |
---|---|---|---|
NVIDIA Tesla T4 | 16 GB | 320 | 8.1 TFLOPS |
NVIDIA L4 | 24 GB | 240 | 30.3 TFLOPS |
NVIDIA L40S | 48 GB | 586 | 91.6 TFLOPS |
With this level of computational power, businesses can deploy, fine-tune, and run AI models at peak efficiency.
Secure and Compliant Infrastructure
- Data centers in Madrid and Amsterdam, ensuring compliance with European data sovereignty laws.
- Redundant network architecture and 99.90% SLA uptime, guaranteeing reliability for critical workloads.
- Support for Intel and AMD processors, as well as NVMe and SSD storage for fast data access.
Installing DeepSeek-R1 with Ollama
- Download Ollama
curl -fsSL https://ollama.com/install.sh | sh
- Pull the DeepSeek-R1 Model
ollama pull deepseek-r1:8b
- Run the Model
ollama run deepseek-r1:8b
- Replace
8b
with the desired model version:- 1.5B parameters:
ollama run deepseek-r1:1.5b
- 7B parameters:
ollama run deepseek-r1
- 70B parameters (Requires 24GB+ VRAM):
ollama run deepseek-r1:70b
- Full-scale 671B model:
ollama run deepseek-r1:671b
- 1.5B parameters:
- Replace
- Optimize for GPU Acceleration
- Ensure that NVIDIA CUDA drivers are installed.
- Use
ollama list
to verify installed models. - Start the service:
ollama serve
Enhancing Performance with Open WebUI
For ease of use, Open WebUI provides a browser-based interface to interact with AI models running on Ollama. It offers features such as:
- Model switching via
@
commands. - Conversation tagging and management.
- Easy download and removal of models.
Final Thoughts
Deploying DeepSeek-R1 on Stackscale’s private cloud or dedicated servers provides a scalable, cost-effective, and secure solution for businesses running AI models on-premise. With high-performance NVIDIA GPUs, low-latency networking, and European data compliance, Stackscale ensures optimal AI deployment without relying on third-party cloud providers.
To learn more about GPU-powered AI solutions for your enterprise, contact Stackscale (Grupo Aire) today!