Robert Craig

The ask

By 2022, Eseye had spent a decade running its global IoT connectivity fleet on Cisco's Jasper Control Center. Jasper was the industry default at the time and it worked, but "worked" and "ours to shape" are different things. The business needed to move its own customers, 1.2 million cellular IoT devices, onto Eseye's own platform, Integra. Integra was the connectivity management side of what became the white-label offering behind TELUS Global Connect, Eseye's partnership with Canadian carrier TELUS.

The migration was the thing standing between an ambitious partnership announcement and a product that could actually ship. Every device that did not re-attach on Integra was a device that did not generate revenue, and for some customers that revenue was tied to fleet-wide billing thresholds that did not forgive gaps.

The constraints

1.2 million devices across roughly 800 networks in 190 countries. Devices deployed in vehicles, in meters buried in car parks, in field equipment, in remote solar installations. None of them were within physical reach. Everything had to happen over the air, cold, with the device having no idea a migration was underway.

The constraints that mattered:

Per-device downtime budget of under 30 minutes, measured from last-seen-on-Jasper to first-confirmed-on-Integra.
No big-bang cutover window. Customers could not accept a coordinated global outage, so the migration had to be rolling and recoverable at any point.
Heterogeneous device estate. Different radio stacks, different firmware revisions, different customer APNs, different authentication mechanisms. A script that worked for a fleet in Germany would silently fail for a fleet in Australia.
Carrier differences. Each underlying mobile network operator had its own quirks around IMSI switching, roaming, and provisioning latency. The migration had to accommodate all of them without special-casing every carrier in the orchestration layer.

The approach

I built the migration tooling as a set of multi-threaded Perl scripts sitting on top of Eseye's existing internal SIM-management platform. Perl was not glamorous in 2022 but it was the language every backend service at Eseye already spoke, and it let me move fast against MySQL, the carrier APIs, and the internal authentication layer without adding a new runtime to the production stack.

Three-panel architecture diagram of the 1.2M device migration: the fleet on Cisco Jasper, the multi-threaded orchestration layer shuttling IMSIs across with a monitoring dashboard, and the fleet on Eseye Integra with TELUS Global Connect on top — Before, during, after: the fleet on Jasper, the orchestration layer mid-rollout, and the fleet on Integra with TELUS Global Connect on top.

The shape of it:

Discovery and mapping. Before the first device moved, I ran an audit pass that snapshotted the full estate from Jasper against Eseye's own records. Every device had to have a known IMSI, a known APN, a known authentication profile, and a known carrier context. Anything that did not reconcile got kicked out to an exceptions queue and handled by hand.
Batched rollout with per-carrier throttling. The orchestration layer decided which devices were safe to move next, based on carrier load, time of day in the device's local timezone, and how many devices from the same customer had moved in the preceding hour. One customer's fleet was not allowed to migrate faster than the customer's support rota could respond to it.
Multi-threaded execution. Each worker thread picked a batch, performed the IMSI switch, waited for first-attach confirmation on Integra, and wrote the outcome back into the source of truth. Workers held no global state, which meant I could add or remove them mid-rollout without coordinating across the estate.
Rollback at the device level. If a device failed to re-attach on Integra within the downtime window, the orchestration layer flipped its IMSI back to the Jasper-side provisioning automatically. No engineer got woken up for a single-device failure; only aggregated patterns paged out.
Monitoring over orchestration. The dashboard that ran alongside the scripts showed per-customer migration progress, per-carrier success rates, and a tail of the exceptions queue. Any engineer on the rotation could see, at a glance, whether the migration was healthy or about to become a problem.

I wrote the scripts; the orchestration, customer communication, and escalation lines were run jointly with the CTO and the heads of the support, customer success, and operations teams.

The result

1.2 million devices migrated. Under 30 minutes downtime per unit. No customer revenue lost to the migration itself. The TELUS partnership launched on top of the platform the migration had built.

The CTO's line afterwards was that it was "the smoothest migration he'd ever experienced".

More useful than any quote: the operations team did not have to firefight during the migration window. The exceptions queue stayed small, the rollbacks did what they were supposed to, and the customer-facing signal was steady availability rather than a scheduled-maintenance banner.

What I'd do differently

This part matters more than the numbers.

Everything was done at breakneck speed against an external deadline. With hindsight there are three things I would build in earlier:

More aggressive self-recovery in the scripts, so they needed less babying from a human operator. The orchestration layer already handled rollbacks, but the scripts themselves still paged out in edge cases that a more defensive design would have absorbed.
Better re-run tooling, so I could retry the whole estate against a fresh checkpoint and prove, programmatically, that no device was stuck partway through the migration. I trusted the reconciliation pass; I would rather not have had to.
More deliberate use of MySQL transactions around the per-device state machine. I am still not fully sure, in hindsight, how I would have framed those transactions to work across the async carrier callbacks, but I know the write patterns I shipped had more room for drift than I would accept in a 2026 design.

None of these broke the migration. They are all "the thing I would write up as a lessons-learned doc if I were running this again." Which is the real point: the migration worked, and it worked cleanly, but shipping at that pace leaves behind technical debt that a well-timed second pass would retire.

One-line takeaway

A million-device migration is not a heroics problem; it is a visibility-and-rollback problem. Build the tooling that tells you when to stop, and the tooling that lets you stop cleanly, and the rest is just throughput.

By the numbers

1.2M devices migrated
Under 30 min downtime per unit
1 developer on the scripts, orchestration run with the CTO plus the heads of support, customer success, and operations
0 customer revenue lost to the migration itself

Migrating 1.2 Million IoT Devices

Tech Stack