Skip to main content

Building Your First AI Workload

In this tutorial, you’ll learn how to deploy a machine learning workload on Airon’s bare-metal infrastructure. We’ll walk through creating a GPU machine, setting up your environment, and running a sample AI model.

What You’ll Learn

  • How to provision GPU machines for AI workloads
  • Setting up machine learning frameworks
  • Optimizing performance on bare-metal hardware
  • Monitoring and scaling your workloads

Prerequisites

  • Completed the Getting Started Guide
  • Basic knowledge of Python and machine learning
  • Docker installed locally (for testing)

Step 1: Create a GPU Machine

First, let’s create a powerful GPU machine for our AI workload:
airon machines create \
  --type gpu \
  --architecture x86 \
  --brand nvidia \
  --number 4 \
  --region us-west-2 \
  --image ubuntu-22.04-ml
This creates a machine with:
  • 4x NVIDIA GPUs
  • Ubuntu 22.04 with ML frameworks pre-installed
  • Optimized for AI workloads

Step 2: Set Up Your Environment

Connect to your machine and set up the environment:
# Connect to your machine
airon machines ssh YOUR_MACHINE_ID

# Update system packages
sudo apt update && sudo apt upgrade -y

# Verify GPU availability
nvidia-smi

Step 3: Deploy Your Model

Let’s deploy a sample computer vision model:
import torch
import torchvision
from torch import nn

# Verify CUDA is available
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU count: {torch.cuda.device_count()}")

# Load a pre-trained model
model = torchvision.models.resnet50(pretrained=True)
model = model.cuda()

# Your model training code here...

Step 4: Monitor Performance

Monitor your workload performance:
# Monitor GPU usage
watch nvidia-smi

# Monitor system resources
htop

# Check machine status via CLI
airon machines status YOUR_MACHINE_ID

Step 5: Scale Your Workload

When you need more compute power:
# Create additional machines
airon machines create --type gpu --count 3

# Use with orchestration tools
# (Kubernetes, Docker Swarm, etc.)

Best Practices

Performance Optimization

  • Use NVMe storage for fast data access
  • Optimize batch sizes for your specific GPU configuration
  • Consider multi-GPU training strategies

Cost Management

  • Destroy machines when not in use
  • Use spot instances for development
  • Monitor usage with Airon’s billing dashboard

Security

  • Use SSH keys instead of passwords
  • Configure firewall rules appropriately
  • Keep systems updated

Next Steps

Troubleshooting

Common Issues

GPU not detected
# Restart the machine
airon machines restart YOUR_MACHINE_ID

# Check driver installation
nvidia-smi
Out of memory errors
# Monitor memory usage
nvidia-smi -l 1

# Reduce batch size or model complexity
Slow training
  • Verify data loading pipeline
  • Check if using all available GPUs
  • Monitor network I/O for data-intensive workloads

Getting Help

If you need assistance: