How Arquimea Research Center Doubled GPU Utilization and Avoided €180K-270K in Hardware Costs
ARC doubled GPU utilization and eliminated 80-90% of knowledge transfer overhead—turning the same hardware into a research factory
Key Results
75-85%
GPU utilization across all machines, up from ~50% idle—with zero new hardware
2-10 hrs → minutes
Knowledge transfer reduced to "here's the Valohai link, copy and run"
30-50% more
Compute capacity absorbed without expansion—one project's YoY growth handled internally
Zero
Infrastructure complaints—down from weekly GPU battles
Company
Arquimea Research Center (ARC) is the innovation and research hub of Arquimea Group, a global technology company. ARC focuses on 3D reconstruction and computer vision, with workflows that start with hundreds of images + metadata and produce 3D models and derived outputs.
They run tens of thousands of experiments per year across ~20 researchers in four teams, all on fully on-premise GPU infrastructure.
The AI research group of Arquimea Research Center has a dedicated fleet of:
22 GPUs across 7 machines—five servers with dual 3090s, one with four 3090s, and one NVIDIA DGX with 8× A100 80GB GPUs (two split via MIG for flexible workloads).
Before Valohai: The "Convenient Machine" Monopoly
ARC had multiple GPU servers across different networks and VPNs. On paper, plenty of compute. In practice, everyone fought for one "big" machine with 80GB GPUs.
Why? That machine had the easiest data access. Moving data to other servers meant manual sync, path reconfiguration, and ad-hoc setup differences. Result: the big box was perpetually contested while other GPUs sat 50% idle.
The root cause wasn't scheduling—it was that infrastructure friction created a compute monopoly.
Traditional HPC schedulers wouldn't fix this. The team needed something that eliminated the friction entirely: unified data access, zero-setup job submission, and built-in experiment tracking. Not another tool to learn, but infrastructure that gets out of the way.
Meanwhile, Mario spent 1-4 hours/week negotiating GPU access manually. During deadline crunches, coordination overhead exploded into endless Slack threads and priority battles.

We had to spend so much time just figuring out who could run where, and when. It became a bottleneck every time we hit shared deadlines.
Mario Alfonso Arsuaga – Principal Researcher, AI, Arquimea Research CenterKnowledge Transfer Tax
When someone built a good experiment, sharing it triggered a 2-10 hour process:
- Author documents environment, paths, Docker details, common errors
- Live tutorial session (1-2 hours) walking teammates through execution
- Follow-up debugging for "doesn't work on my setup" issues
At 4-8 sessions per week, the organization burned 8-80 hours weekly on what's now a copy-paste operation. Senior researchers spent more time teaching Docker mounts than doing research.

People don't need to learn all the details, they just wanted to get the result. But to get the result, they had to understand Docker images, mounts, inputs, outputs… It was a very disgusting process for everyone involved.
Mario Alfonso Arsuaga – Principal Researcher, AI, Arquimea Research CenterWhy Valohai: One Control Plane, Zero Infrastructure Friction
ARC deployed Valohai fully on-premise, connecting all GPU servers into one unified system:
- Researchers see one environment, not scattered machines across networks
- Jobs land wherever capacity exists—inputs/outputs handled through data abstraction, no manual transfers
- Provenance is automatic—every run captures code, Docker image, data versions, parameters, hardware requirements
- Web UI + CLI—researchers choose their comfort level, from browser-based submission to full programmatic control
This positioned them for future cloud bursting: same workflows, same Valohai interface, just more compute plugged in during peak periods.
Seamless Cloud Integration When Needed
Connecting AWS required minimal effort and feels seamless to researchers—they just select a different environment from a dropdown or change a value in their YAML. Everything runs inside ARC's own cloud environment, with all code and data staying under their control.
This gives ARC the ability to burst to both CPU and GPU machines in the cloud when needed. It functions as a natural extension of their on-prem infrastructure, making it easy for researchers to scale up without changing how they work.
Impact: Same Hardware, 2× the Research Output
All GPUs Are Equal Now (And They're Actually Busy)
With fair queuing and automatic data handling:
- The "big box" now runs only workloads that truly need 80GB—at high utilization
- Smaller machines actively used instead of forgotten
- Idle time collapsed from ~50% to most GPUs busy or queued
Behavior changed: Researchers started queuing overnight and weekend jobs without coordination. And because data transfer friction disappeared, teams experimented with configurations they'd never tried before.
One project alone demanded 30-50% more compute vs. last year. Without Valohai, this would've triggered a 2-3 server purchase cycle. Instead, they absorbed it by unlocking existing idle capacity.

Last year, someone complained about GPUs almost every week. This year, with more demanding projects and more runs, nobody complains. Everyone feels like they have more GPUs but it's the same hardware.
Mario Alfonso Arsuaga – Principal Researcher, AI, Arquimea Research CenterKnowledge Transfer Collapsed to "Here's the Link"
Valohai's "copy execution" feature eliminated the tutorial process:
- Share a Valohai run link (contains everything: code, image, data, params)
- Recipient clicks "copy," tweaks parameters, hits run
- Maybe a few chat questions, no sessions or debugging marathons
~30% of all organizational runs are now copied executions launched from the web UI. For some teams, that ratio hits 5-10× more copies than fresh VS Code launches.
Mario's personal workflow: "Most days I don't even open VS Code. I go to Valohai, find a run, copy it, tweak params, launch. About half my work starts this way."
The onboarding impact is dramatic: before Valohai, a new researcher needed 1-3 weeks to learn the Docker/VS Code/SSH stack before running anything real. Now, new team members contribute from day one—senior researchers prepare a Valohai step with parameters defined, and newcomers start by copying executions with different inputs.
Organic Code Promotion from Usage Patterns
Before: Fast experiments lived in long-running Docker containers. 3-4 times/year, code was lost to bad git hygiene or crashes, requiring days to recreate.
Now: Researchers run quick tests as ad-hoc Valohai executions. If results are interesting, teammates discover and copy that execution. Popular runs become templates. Monthly, engineering-focused members review most-copied executions and merge the best patterns into official pipelines.
The safety net proved its value multiple times this year. In one case, a researcher developed a script for splitting 3DGS models using a segmentation algorithm—iterating through Valohai to test results. When they accidentally deleted the local folder with no git commit, the code was gone. Recovery from Valohai's ad-hoc execution took less than an hour of searching.
Pipelines and Reuse: Designing for the Research Factory
ARC teams lean on Valohai Pipelines, especially Mario's team:
- Most of their workloads are multi-step 3D reconstruction workflows
- They use Valohai's pipeline reuse features to avoid rerunning expensive steps
Over time, this changed how they design projects:
- Reusable steps (e.g., heavy preprocessing) live in well-defined places
- Later steps can reuse outputs from previous runs
- Change the last step → reuse all earlier steps from existing pipeline runs
Tens of Thousands of Experiments, Minimal Friction
Mario's team alone ran tens of thousands of experiments this year—many small inference jobs launched directly from the UI.
"I'm pretty sure we wouldn't have run this many experiments without Valohai. For my own work, I simply wouldn't have done it—I do most of it from the UI."
The organization ran 20,000-50,000 experiments this year through Valohai—and that's across just 20 researchers on 22 GPUs.

When I look at our team chat, every day there are Valohai links flying around. That's how we share work now: not documents, not tutorials, just runs you can copy and reuse.
Mario Alfonso Arsuaga – Principal Researcher, AI, Arquimea Research CenterWhat Changed (And What Didn't)
ARC didn't set out to "industrialize" ML research. They just wanted researchers to stop fighting over GPUs and wasting time on tutorials.
What emerged was a research factory by accident:
- One control plane for scattered on-prem GPUs (soon extended to cloud bursts)
- Fair queuing replaced weekly negotiation meetings
- Copy-and-run became the default way to share and reproduce work
- Ad-hoc executions served triple duty: collaboration mechanism, audit trail, and safety net for lost code
The result: researchers stay focused on science while infrastructure, GPU allocation, and environment management "just work" in the background.
What didn't change matters too: No codebase rewrites. No new DevOps headcount. No GPU purchases. They made the existing infrastructure work harder, not bigger.
Impact Summary
Annual ROI
2-3×
return on investment
| Metric | Before Valohai | After Valohai | Business Impact |
|---|---|---|---|
| GPU Utilization | ~50% idle | 75-85% utilized | €180K-270K avoided hardware cost |
| Knowledge Transfer | 2-10 hrs/session, 4-8×/week | Link → copy → run | €40K-150K annual savings |
| Onboarding | 1-3 weeks to first experiment | Day 1 productivity | New researchers contribute immediately |
| Reproducibility | Manual, often failed | 100% rerunnable | Zero "can't reproduce" failures |
| Sysadmin Overhead | 1-4 hrs/week negotiating | ~1 hr/month | 150 hrs/year freed |
| Quota Escalations | Weekly battles | 2-3 times/year | Coordination bottleneck eliminated |
| Experiment Volume | Baseline | 20K-50K runs/year | Research factory output |
| Code Recovery | 5-30 hrs to recreate lost work | <1 hr to recover | €2K-5K rework avoided |

Valohai is one of the best improvements we did this year. People focus on the research they care about, and infrastructure just happens behind the scenes.
Mario Alfonso Arsuaga – Principal Researcher, AI, Arquimea Research Center