Upgrading NetBox
Cecilia Lau
March 26, 2025

NetBox
So we had started an update to NetBox, a solution for modelling and documenting networks. Unfortunately, we never finished the update, leaving the application in a broken state for several months. This article describes the troubleshooting done and actual steps taken to get NetBox back up and running. While we plan to move to NetBox for our IP management and resource registration, we have several more steps to take before we fully migrate. At the moment, we use STARRS, an in-house solution for IP management and resource registration. STARRS has served us well thus far, but it’s long overdue for us to move to a more complete solution.
What is NetBox? What does it do?
NetBox is a Django application that models and documents networks. We define systems, the interfaces attached to the systems, and assign them IP addresses. It also has the capability to visualize what and where things are in our server racks and even documents cables, helping tremendously for when we have to do maintenance. However, by default, it doesn’t have the ability to write entries to our DNS and DHCP servers, which is one of the primary features we need to replace STARRS, our current IPAM and DNS solution. Luckily, NetBox’s community has created plugins that allow for this and even for syncing switch and router configuration. Using and configuring these plugins would let use NetBox as a single source of truth, simplifying how we configure our network.
Troubleshooting and Fixing NetBox
What we had was an incompletely upgraded NetBox, and internal server errors with strange errors in the logs. Initially, we thought that we had tried to migrate the database directly from 3.6.X to 4.X, but NetBox requires migrating to the latest minor release of the major version (3.7.X) before updating to the next major version (4.X). If that was the case, the best solution would have been to restore the database from a backup (which we didn’t have). Luckily for us, this wasn’t the problem, so we moved on.
We had several plugins (netbox-topology-views
, social-auth-app-django
, and netbox-plugin-dns
) installed and enabled. So, first things first - let’s find out if the problem is one of the plugins by disabling them all. And… voila! We are no longer presented with an error page. Our prime suspect was netbox-topology-views
, since it’d caused problems in the past. And after simply bumping the version to its latest, everything seemed to work fine.
However, a while ago, we’d commented out a few lines of our OIDC groups handler which would sync NetBox user groups with our groups in LDAP. This means that while the page was running, users wouldn’t get their groups synced, so our fix would be mildly incomplete. Since I was the one who wrote that handler, I looked into a bit more. I was getting errors like Field id expected number but got Group
, but the documentation said that everything should’ve been fine, since the argument I’m passing into user.groups.set()
was correct. After a little bit of poking around in the Django shell, I find that my import (from django.contrib.auth.models import Group
) fails, with an error basically saying that this “Group” model is disabled in favor of something else. Turns out that NetBox had updated their system to use a custom Group
instead of Django’s built-in version. A simple change to the import fixed group syncing.
The largest issues seemed to be fixed, but a few clicks around would again present internal server errors. The messages we got looked to be the database columns not being quite right, so I went ahead and ran the database migrations, and like magic, all the errors are gone!
The actual changes needed were remarkably simple, but let this be a lesson to take database backups and document what was done if you don’t complete a migration.
For NetBox to actually replace STARRS, we still need to work out the plugins that [do the thing that starrs does] and also work towards having NetBox manage our switch and router configurations. For now though, having a working version at all is a major improvement from three months ago and will allow us to actually take these next steps.
More blog posts