Building Your First AI Workload

In this tutorial, you’ll learn how to deploy a machine learning workload on Airon’s bare-metal infrastructure. We’ll walk through creating a GPU machine, setting up your environment, and running a sample AI model.

What You’ll Learn

How to provision GPU machines for AI workloads
Setting up machine learning frameworks
Optimizing performance on bare-metal hardware
Monitoring and scaling your workloads

Prerequisites

Completed the Getting Started Guide
Basic knowledge of Python and machine learning
Docker installed locally (for testing)

Step 1: Create a GPU Machine

First, let’s create a powerful GPU machine for our AI workload:

airon machines create \
  --type gpu \
  --architecture x86 \
  --brand nvidia \
  --number 4 \
  --region us-west-2 \
  --image ubuntu-22.04-ml

This creates a machine with:

4x NVIDIA GPUs
Ubuntu 22.04 with ML frameworks pre-installed
Optimized for AI workloads

Step 2: Set Up Your Environment

Connect to your machine and set up the environment:

# Connect to your machine
airon machines ssh YOUR_MACHINE_ID

# Update system packages
sudo apt update && sudo apt upgrade -y

# Verify GPU availability
nvidia-smi

Step 3: Deploy Your Model

Let’s deploy a sample computer vision model:

import torch
import torchvision
from torch import nn

# Verify CUDA is available
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU count: {torch.cuda.device_count()}")

# Load a pre-trained model
model = torchvision.models.resnet50(pretrained=True)
model = model.cuda()

# Your model training code here...

Step 4: Monitor Performance

Monitor your workload performance:

# Monitor GPU usage
watch nvidia-smi

# Monitor system resources
htop

# Check machine status via CLI
airon machines status YOUR_MACHINE_ID

Step 5: Scale Your Workload

When you need more compute power:

# Create additional machines
airon machines create --type gpu --count 3

# Use with orchestration tools
# (Kubernetes, Docker Swarm, etc.)

Best Practices

Performance Optimization

Use NVMe storage for fast data access
Optimize batch sizes for your specific GPU configuration
Consider multi-GPU training strategies

Cost Management

Destroy machines when not in use
Use spot instances for development
Monitor usage with Airon’s billing dashboard

Security

Use SSH keys instead of passwords
Configure firewall rules appropriately
Keep systems updated

Next Steps

Nomad Autoscaler Plugin - Automatically scale based on demand
API Integration - Integrate Airon into your workflows
Advanced Tutorials - Explore specialized use cases

Troubleshooting

Common Issues

GPU not detected

# Restart the machine
airon machines restart YOUR_MACHINE_ID

# Check driver installation
nvidia-smi

Out of memory errors

# Monitor memory usage
nvidia-smi -l 1

# Reduce batch size or model complexity

Slow training

Verify data loading pipeline
Check if using all available GPUs
Monitor network I/O for data-intensive workloads

Getting Help

If you need assistance:

Join our community forum
Check the troubleshooting guide
Contact support at [email protected]

Getting Started

Compute

Network

Inference

CLI

Plugins

Building Your First AI Workload

Building Your First AI Workload

What You’ll Learn

Prerequisites

Step 1: Create a GPU Machine

Step 2: Set Up Your Environment

Step 3: Deploy Your Model

Step 4: Monitor Performance

Step 5: Scale Your Workload

Best Practices

Performance Optimization

Cost Management

Security

Next Steps

Troubleshooting

Common Issues

Getting Help

Getting Started

Compute

Network

Inference

CLI

Plugins

​Building Your First AI Workload

​What You’ll Learn

​Prerequisites

​Step 1: Create a GPU Machine

​Step 2: Set Up Your Environment

​Step 3: Deploy Your Model

​Step 4: Monitor Performance

​Step 5: Scale Your Workload

​Best Practices

​Performance Optimization

​Cost Management

​Security

​Next Steps

​Troubleshooting

​Common Issues

​Getting Help

Building Your First AI Workload

What You’ll Learn

Prerequisites

Step 1: Create a GPU Machine

Step 2: Set Up Your Environment

Step 3: Deploy Your Model

Step 4: Monitor Performance

Step 5: Scale Your Workload

Best Practices

Performance Optimization

Cost Management

Security

Next Steps

Troubleshooting

Common Issues

Getting Help