When One Problem Turns Into Two

When One Problem Turns Into Two

The calm before...

This post contains Amazon affiliate links. If you purchase through them, I may earn a small commission at no extra cost to you. Every product mentioned here is something I personally own and use.

I'd enjoyed almost a month of absolute peace. My OPNsense firewall was working flawlessly, without a single outage. It was performing so well that, to be honest, I'd fallen into that dangerous state of complacency where you don't even check the logs or the connection. You simply assume everything will be fine forever.

Until this morning.

At 7:01 AM, I sat down at my computer ready to start the workday and was met with the worst possible screen: No connection.

At first, nothing made sense. The infrastructure had been stable for weeks. The OptiPlex running OPNsense was powered on. The NUC was powered on. Physical links? At first glance, all good. Everything looked normal. But the network clearly wasn't.

I couldn't reach the OPNsense dashboard. Nothing.

The problem...

Since the business relies heavily on network connectivity, I couldn't allow extended downtime, so I started troubleshooting fast.

First step: connect directly to the ISP, bypassing everything. That worked — I had internet. So the ISP itself was fine. The problem was somewhere in my own infrastructure.

Then came the first rabbit hole.

My main PC, connected to the ISP directly over Wi-Fi, had no internet either — connected, but nothing loading. I dug into it and found the cause: DNS was still pointing to Pi-hole, on a network segment that didn't exist in this configuration. Switched everything to automatic DHCP and DNS, and that PC came back online.

Good. One thing fixed. But the real problem — OPNsense unreachable — was still there.

So I activated the bypass plan: an old Linksys E900, pre-configured to take over DHCP for the whole office in exactly this kind of scenario. I connected it between the ISP and the internal network.

Except it had the same DNS problem. Configured to hand out Pi-hole's address as DNS to every client. Same fix: switched it to public DNS. Bypass online. Core systems back up.

The proof

And here's the part that actually matters most to me: it worked. Not "worked in theory" — worked under real pressure, on a real morning, with a real firewall down. I'd built this bypass months ago for exactly this scenario, and today was the first time it actually had to do its job. It did. That's not a small thing. That's the difference between a stressful morning and a lost day.

Relief — for about ten minutes.

The partial failure

Because the network was still only partially healthy.

Maria's PC, on the second switch, still had no connection through the network — though it kept working because it was also connected directly to the ISP over Wi-Fi as a fallback.

The access points weren't responding either.

That uncertainty was stressful. When everything is down, the problem is clear. When half the network works and half doesn't, every assumption becomes questionable.

And the whole time, one thought kept nagging at me: what happens if a customer walks in right now? Odoo runs on this network, and without it, the store can't operate the way it needs to.

Luck was on my side, oddly. Nobody walked in until 11am. That gave me a window to work without the pressure of a customer standing there — but it was also its own kind of pressure, a clock running in the background the whole time. Peace and pressure, at the same time.

I checked Pi-hole on the NUC directly, worried it had crashed and was the common thread. It was fine — active, healthy, full history of queries. Not the culprit. Crossed that off the list.

The fix

With the bypass holding the network up, I had a window — and the OptiPlex was already down for diagnosis anyway. So I decided to act on something I'd suspected for a while.

The OptiPlex's LAN interface was running on a USB Ethernet adapter. This wasn't the first time a USB NIC had given me trouble — when I first set up OPNsense on the NUC, before moving it to the OptiPlex, a USB NIC was the exact thing that failed on me back then too. Apparently that's a known weak point for USB network adapters under sustained load. Fine for temporary setups. Not for a firewall running 24/7.

I had a dual-port Intel PCIe NIC sitting ready. Took the OptiPlex apart, installed it, reassigned the interfaces from the console — WAN stayed on the onboard port, LAN and a future OPT/DMZ port moved to the new card. Brought OPNsense back up.

LAN came back clean. WAN picked up a real IP from the ISP again, not the bypass's.

I felt relief. Found the root cause. Or so I thought.

The second failure

Because the access points were still offline. Maria's PC still couldn't reach anything through the LAN.

That made no sense. I'd just fixed the actual hardware problem on the firewall. So why was half the network still acting like nothing happened?

That's when the frustration really kicked in.

Back to troubleshooting. DHCP leases. IP configurations. Pings that came back as "destination unreachable" from addresses that shouldn't exist. Manual IPs versus automatic ones. Everything seemed to check out individually, and none of it explained the access points being completely dark.

Mental fatigue sets in fast on mornings like this. Your brain starts looking for a complicated answer, because surely something this stubborn has to be complicated.

The resolution

I stepped back, took a breath, and decided to forget the theories. Check every physical connection manually. One cable at a time.

That's when I found it.

While moving the OptiPlex earlier to install the new NIC, I had accidentally knocked loose the UTP cable connecting the first switch to the second switch — the one feeding the access points, the cameras, and Maria's desk.

I had unintentionally created a second problem while fixing the first one.

I just stared at the cable. Then I laughed. Because that's IT sometimes. You fix one issue, and in the process of fixing it, you cause another.

Reconnected the cable. Instantly, everything came back. Access points online. Maria's PC online. Full network restored.

After a long, stressful morning, the network was finally healthy again.

What I Learned Today

1. Bypass plans are essential — and proving they work matters

Having a physical bypass kept the business running while I troubleshot the real problem. Without it, the entire office would have gone down for hours.

But more than that: this was the first time that bypass had to perform for real, outside of a test. It worked. There's a particular kind of relief in seeing a contingency plan you built months ago actually do its job under pressure — it's no longer theoretical, it's proven.

2. Avoid USB NICs for critical systems

This is the second time a USB Ethernet adapter has failed me on a firewall — once on the NUC, now on the OptiPlex. Fine for temporary use. Not for production infrastructure. Dedicated PCIe NICs, especially Intel-based ones, are the right call.

3. Troubleshooting can create new problems

This one stung. Not every issue you see belongs to the original failure. Sometimes, during diagnosis — or while physically moving hardware to fix one thing — you unintentionally introduce a second problem. That can completely distort your understanding of what's actually happening.

Final thought

Today's outage taught me something simple:

The first problem was hardware.

The second problem was a cable, knocked loose by my own hands while fixing the first.

Both mattered. Both taught me something.

Comments

Popular posts from this blog

From Fear to Control: What Self-Hosting Gave Me

Google Photos Was Free. Until It Wasn't. Here's My Setup Now.