This isn’t a story where at the end I will say “yes, it was DNS.”
Two days ago, I started noticing sporadic failures from servers at one client’s office, unable to be pinged for a few seconds, then operating as normal for five to ten minutes. One might start off with checking the switch, physically investigating the servers and cables which, I did. What made the issue stand out was it wasn’t just one server, it was multiple servers on multiple switches.
The investigation jumped to the routers. Nope, they were fine.
At this time, desktops began to lose connectivity to the affected servers. Not all servers, and not even all servers on the same switch! From the servers I found I could ping other machines sporadically but ping working servers constantly.
Magically, it all stopped (well, started working again, the problem stopped) at five o’clock on the dot. So, this was user related, not the servers, not the switches and not the routers.
I decided to setup a sting for the following day. I started watching users as they logged into the domain while running non-stop pings on the servers when suddenly, it began again! I raced across the office to the offending user’s office to find, of all things, an ancient HP printer sitting on the desk. The culprit had been discovered.
It seems he HP printer’s LaserJet card had thrown a wrench. It was auto-assigning the gateway address as is local IP address complete with no net mask and no gateway address (how could it, it was its own gateway.) By statically assigning an IP to the printer pool, all went back to working well.
The end result showed an HP printer, assigned its IP to the default gateway of the system. Servers picked up on this from ARP updates and began, sporadically sending packets destined to the routers to the printer.
I’m not sure how to prevent this from happening again other than to constantly monitor the MAC of the router and alert if the perceived gateway MAC suddenly changes.
It certainly was an entertaining hunt.