All Tooled Up – Wireshark

Wireshark is a packet analysis tool. It reads and displays the data packets going in and out of your NIC. You can download it from here.

For those of you who think packet analysis is only network administrators, think again. Packet analysis is one of the most powerful troubleshooting tools to have at your disposal. It can be used to troubleshoot all manner of application and configuration errors.

Here’s a list of all the things I wished I know before I started using it.

TCP Communication

When I started using Wireshark I didn’t my SYN – SYN ACK – ACK from my FIN – FIN ACK – FIN – FIN ACK; it didn’t end well. If you think a TCP 3 way handshake is an antiseptic greeting between a trio of friends, then you need to read up on the basics of TCP communication before going any further.

Personally I find most comms documentation and forums quite difficult to read. They all seem to be written on a 1980’s processor with an RFC aesthetic. However, saving the day, is Laura Chappell’s excellent book  Wireshark Network Analysis, it is a great resource for anyone starting out with Wireshark.

Placement of Capture

In an ideal world you would have a great understanding of the faulting system and have packet analysers set up at strategic points, tracking the flow of data. However in practice I’ve found it difficult to convince network and system administrators to let me install packet sniffers on their infrastructure.

Consequently a lot of my captures are from end devices, where users don’t seem to mind you installing capture software. Don’t worry if you can only get captures from the end device, you’ll be amazed what they reveal.

Careful What You Capture

Just like ProcMon Wireshark produces masses of data. The idea when using these tools is to capture for as short a time as possible. Try to capture the seconds around the performance issue or error.

Look at the right layer

Although I passed my Network A+ exam many years ago, I never really understood the significance of the OSI model.  Put simply, each layer is reliant on the layer below to function. Most network troubleshooting takes place from the TCP down; this is where the comms guys really earn their crust.

When you use Wireshark to troubleshoot non networking issues you need to concentrate on
TCP upwards. The majority of clues are found in the TCP connection or the control commands the application layer sends.

Don’t Cross the Streams

TCP streams as one of the basic building blocks of a packet trace, especially when looking at performance issues. Fortunately in Wireshark the “Follow TCP Stream” option is only a right click away. Long pauses within streams indicate an issue. With multiple streams taking place at any given time they can be difficult to spot unless you isolate individual streams.

A complementary method is colourising  the TCP streams. This allows you to see how the streams interact. If you only have a single stream running at any given time or there are long pauses after streams finish, then that’s probably where your problem is.

Filtering

Although this is definitely not a how to guide I want to touch on filtering because it took me a while to get to grips with it, and much like ProcMon filtering is at the root of good analysis.

Just to make things simple there are two sorts of filtering, capture and display, each with a different syntax. Although purists will tell me I shouldn’t, I always use the No Broadcast and no Multicast capture filter as it takes away a lot of the noise.

The single most useful filter for me cannot be created from a right click and seems weirdly under-documented. To display all the packets with a given IP (for instance 192.168.0.3) as the source or destination use

ip.addr == 192.168.0.3

Also if you want to filter on a given protocol just type the name of the protocol into the filter bar. Be aware that it will display all packets that sit on that protocol. So typing in TCP will display HTTP and SMB packets because those two protocols rely on TCP connections.

Delta Time

Delta time is a useful column to add. It calculates the time since the last packet. Sort on delta time and look for similar length pauses. Multiple pauses of a similar length would indicate a timeout of some description in your system.

Remember there will be a lot of SYN’s with high delta times. These probably just indicate natural pauses in activity.

Well so much for the basics. The next post will be putting some light packet analysis into action.

The Trouble with My Documents

By pure coincidence a couple of days after my last post “the trouble with opening files” I came across a very similar issue. When I say similar I mean it had similar symptoms and I used the same technique to resolve it.   The solution, however, was completely different. It just goes to show that in this day and age, knowledge is cheap and technique is king.

So here’s the set up. When Gary from Compliance (Compliance staff in this office were something akin to the secret police, so we liked to keep them happy) opened his My Documents from the start menu it would take about 30 seconds to show the contents. The My Documents was mapped to the SAN and was fine once it displayed. Anyone logging onto his device would have the same trouble and it also happened to an undisclosed number of his colleagues. The desktop techs had recreated his profile and ran NET USE as in my previous post. The server guys couldn’t see anything wrong with his share on the SAN either.

That seems like a lot of good information. A lot of people might be tempted to get Gary to logon to various different devices and see if he worked on any of them. If he did you’d just compare the differences and that would be your answer. However there is a better, quicker and easier way.

The issue was easy to reproduce so I copied ProcMon.exe over to the offending device, started a capture and opened My Documents from the start menu. Sure enough it took about 30 seconds to display the My Documents folder. I stopped and saved the capture once I could see the contents.

I filtered on Explorer.exe, the process doing the work here, but there was still over 18,000 events.

my docs first

This is where the technique comes in. I used the scroll bar to quickly scroll down the trace and concentrated on the seconds column

my docs second column

When I saw the seconds jump from 53 to 14 I knew Explorer had been waiting for something to happen

my docs pause

Immediately after the pause Explorer was trying to enumerate the following key

HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\ShellExecuteHooks

I right clicked on the key and selected Jump To.

There were two values under this key

{AEB6717E-7E19-11d0-97EE-00C04FD91972} and {B5A7F190-DDA6-4420-B3BA-52453494E6CD}

The former was blank but the later  had a value of “Groove GFS Stub Execution Hook”

A quick Google (yes I do use it for some stuff) revealed that this was part of the Office Sharepoint Workspace (Office 2010) or the painfully moniker’ed, dad dance of name, Office Groove in 2007. The feature wasn’t used in the organisation so I uninstalled it from Office and everything worked fine.

Despite having little knowledge of SharePoint and its components, good technique enabled me to quickly identify the culprit. Even though the symptoms to the “slow opening files” issue the solution was completely different. However the technique for finding the solution was exactly the same. It was quick, effective and you get to be a troubleshooting hero for an afternoon.

The Trouble with Opening Files

I was asked to look into an issue where a team of 20 users were having to wait for 20 – 30 seconds for files to open from Office applications. The usual solutions of rebuilding profiles and Pc’s, had been tried and the network had been checked for errors with no results.

Although on the surface this appeared to be a network issue, I thought I’d start with process monitor. Fortunately for me the issue was easy to replicate. On half the users screen I had Excel open poised ready to open a troublesome file and on the other half I had process monitor available so I could start and stop it quickly.

slow open

Despite being able to stop and start ProcMon quickly I still managed to capture nearly 60,000 events. This is way to much for analysis so I filtered on the Excel process because that was the one doing (or not) all the work.

slow open filtered

This led to a more manageable 11,290 events. This still sounds like a lot, and it is if you have to read through them all. However there is a simple and quicker approach when dealing with performance issues.

Once you’ve filtered to you’re faulting processes simply scroll down and look for the gaps. If you concentrate on the seconds, you can scroll down quite fast and the jump in numbers will be obvious.

slow network fault 2

In this case the issue was at event 10,018. As you can see there’s a 26 second gap between events 10017 and 10018 and unsurprisingly the event with the result of BAD NETWORK PATH was the culprit.

When I looked in Windows Explorer there wasn’t a G: drive mapped. I ran NET USE from command prompt and it listed the G: drive. I ran a NET USE /Delete G: and the problem went away. It turned out that, that particular team had a G drive mapped to a server that had recently been decommissioned. Excel was trying to enumerate a drive that no longer existed so timed out.

What I found interesting was the problematic event was after the 26 second pause, I expected it to be before the gap. It goes to show when you’re using the technique you need to look at both sides of the gap.

From start to finished it took me about 20 minutes to find the root cause of the problem. Before the issue reached me it had been open for over a week and four different technicians, with different skill sets had taken a look at it. It just goes to show that with a little technique and the right tools at your disposal you can make yourself look quite clever.