Last Myle

Seattle, WA

Research Kits

These patterns and procedures enable you to run experiments with AI and LLMs that suit your research governance requirements.

Best for data privacy and security.

Benefits:

Best for running long time period experiments in a sandbox environment.

Benefits:

Always‑on experiments: You can run long jobs (overnight, multi‑day) without worrying about laptops sleeping, rebooting for updates, overheating, or being taken home.
Better hardware for the money: A VPS can give more CPU, RAM, and SSD IOPS than a typical student laptop for a similar or lower monthly cost, especially with price‑aggressive providers like Hetzner.
Isolation and consistency: VPS resources are reserved for you, so you avoid "Zoom + Chrome killed my run" issues and get stable performance.
Customizable environment: You control OS, Python/R versions, CUDA, system packages, and can script environment setup so others can reproduce it exactly on their own VPS.
Remote access and collaboration: Multiple people can SSH in, share tmux/screen sessions, use Jupyter over an SSH tunnel, and manage it via APIs (Hetzner Cloud API, Terraform, etc.).
Easier scaling: If a project outgrows one VM, you can scale up (more vCPUs/RAM), add more nodes (e.g., a small Kubernetes or batch cluster on Hetzner), or move heavy storage/DBs to separate servers.

Best for experiments demanding high performance inference.

Benefits:

Running private LLM experiments on AWS or Google Cloud lets you keep sensitive research data inside your own security perimeter while scaling up to powerful GPU hardware and specialized inference stacks.
High‑end accelerators on demand: You can choose EC2 GPU instances (G5/G6 for inference, P4/P5 for very large models) or Inferentia2 for cheaper large‑scale inference, which handle multi‑billion parameter models that would overwhelm a VPS.
Vertical and horizontal scaling: You can scale a single instance (more GPUs, RAM, bandwidth) or run many nodes behind an autoscaling group or Kubernetes/EKS cluster to handle bursty or high‑QPS workloads.
Optimized LLM runtimes: documents with continuous batching, KV‑cache optimizations, and GPU‑saturation tuning for high throughput, low latency inference.
VPC‑isolated environments: You keep all traffic inside a private VPC, restrict access with security groups, and avoid exposing LLM endpoints to the public internet; everything stays in your AWS or Google Cloud account boundary.
Encryption by default: document storage, virtual machine volumes, and model artifacts can be encrypted at rest, and all internal service traffic can be forced over TLS.