This particular problem is close to my heart. It’s the first one I solved using Wireshark and I learnt a lot from it. Reading this it sounds like it was an easy resolve but the reality was it took several days and lots of staring at log files. I don’t detail all the blind alleys I went up and all the head scratching I did.
You might think that the amount of effort is not commensurate with its severity but I learnt a lot from it and subsequent issues were a lot quicker to resolve. The first few times you use these tools it will take a lot of time but you are investing in skills that will stand you (and your organisation) in good stead in the future.
The Set Up
Managers at one of our remote offices had paid for a subscription to a Genealogy website, for reasons best known to themselves. The website was excruciatingly slow with pages taking minutes to load. I was told this only effected this website from this location.
A fair amount of troubleshooting had already been done before I got involved. The server team told me that if you deleted the pac file it would work until you restarted IE, this is despite the website not making an appearance in the file. The network team couldn’t see any fault on the line and the utilization was fine. They had visited the remote office and the network techs laptop worked fine but if he logged onto the desktop there he’d experience the slow loads.
This made them certain that it was the desktop build at the office. I was not so sure. Luckily for me it was an easily replicable fault so I installed Wireshark on one of the desktops and took a trace……….
I captured a trace of browsing to the website and stopped the trace when the slow page finished opening.
Because this was a browsing issue I started by looking at the conversation between the desktop and the proxy server. I typed in the following in the filter box
The IP is the IP of the proxy, so only packets exchanged between the the desktop and the Proxy would show.
As I slowly scrolled down the trace I could clearly see the cause of the slow loads. There were multiple pauses of around 21 seconds.
The desktop appeared to be sending packets to the Proxy server and waiting for a reply. This was proof that there was nothing wrong with the desktop, the fault lay further down the line. When you see multiple pauses of roughly the same size you can bet your bottom dollar that something somewhere is timing out. I just needed to find out what.
21 seconds to go…….
What I didn’t know at the time, but soon discovered, was that 21 seconds is the timeout of the TCP stack. If TCP sends a SYN and doesn’t get a reply it will wait 3 seconds then send another, then 6 seconds and finally 12 seconds (3+6+12=21).
If you are seeing multiple pauses of 21 seconds then the chances are TCP is trying to communicate to something but not getting any replies. If you look at the above picture though there’s no sign of the errant SYN’s.
I decided to widen the filter to look at all the TCP communication and lo and behold the missing SYN’s appeared
I’ve colourised the TCP streams so you can see the SYN’s, in pink, that are timing out. If you look at the destination, the SYN’s are not going to the proxy they’re trying to directly contact a Google analytics site and not getting any reply, causing the timeout.
Once I discovered it was not an issue with the desktop or the Geneology website the solution was easy to find. The Google sites were named in the pac file so they were not meant to go via the proxy. An investigation of the Firewall showed that it was receiving the SYN’s from the client but that the return address for that site was misconfigured. It turned out that the site had recently had its subnet changed and nobody had updated the firewall.
The Wireshark trace unequivocally showed that the issue was with contacting websites named in the pac file. Without the trace I don’t think we would have never of discovered that.
The reason why the network technicians laptop worked was because you only need to contact the Google site once to get a cookie. Because he had successfully browsed the website from another location that morning, it didn’t try to access the Google site again. What seemed to be a piece damning evidence was in fact quite misleading.