Process Monitor (or ProcMon to those in the know) is one of the most useful troubleshooting tools you’ll ever come across. Mastery of it will put you head and shoulders above your colleagues. It was developed by those clever people at Sysinternals and you can download it from here.
ProcMon records all registry, file, thread and network activity on the Windows device it’s running on, allowing you to see what’s really happening under the hood. As you can imagine, this makes it incredibly powerful as a troubleshooting tool.
There is a problem though…………Windows is a pretty noisy beast. I’ve had ProcMon running as I’ve typed this, and in the 5 minutes or so it’s taken to write, ProcMon has logged 1,116,637 events. If you’re troubleshooting, any one of those events could be the cause. Quite frankly finding a needle in a haystack can be doddle compared to finding an error in ProcMon.
It took me a year of using ProcMon before I solved any problems with it. I’m not going to go through what all the buttons do, I’m sure you can work that out for yourselves, but here is a list of all the things I wish I’d know when I started using it for troubleshooting….
ProcMon produces a lot of data. If you can reduce the amount of time you are capturing for then you’ll make your life easier. Ideally you just want to capture and analyse the couple of seconds before the fault occurs. This is only possible if you can reproduce the fault. If you have to leave ProcMon running to wait for a fault to reoccur then you need to have a way of recording, down to the second, when it happened. If you can’t locate the fault time then it will be almost impossible to make use out of the trace.
Although it’s the couple of seconds around the fault you want to concentrate on, it is also worth taking a trace of the application startup. A lot of settings will be loaded into memory at this point and they might be the root of your issue.
If I can, I’ll take a long trace from the application startup to the fault and a short trace that only includes the couple of seconds around the fault
Filter, Filter and then Filter again
One of the most powerful features of ProcMon is the filtering. It’s easy and intuitive and you need to take full advantage of it. My advice is to initially filter as hard as possible to reduce the number of lines you need to look at. You can always widen your search later if you don’t come up trumps.
90% of the time when an application goes wrong it will be an issue with one of the processes that application is directly running. As a first step I bring up the Process Tree under the Tools menu and select the subtree of the process I’m interested in. This will massively reduce the number of events you need to examine.
I also tend to filter out anything under the Classes root because they create a lot of noise and I’ve never encountered a problem caused by classes entries.
If you’ve looked at the main process and can’t see a fault you’ll have to start including other processes. The Svhost services are always a good next step.
Virus checkers tend to be very noisy, so I try to rule them out as a cause early on. If you can reproduce the fault, switch off the virus checker to see if you still have the problem. Although less technically satisfying, this is a much easier way to rule the virus checker in or out than using ProcMon. If you still get the error with it switched off, you can safely keep the virus checker filtered out of your ProcMon trace.
ProcMon logs multiple lines for simple tasks
To say ProcMon is verbose is putting it mildly. Even if you manage to capture just the couple of seconds around the fault, filter on the process your interested in and exclude all the classes references, you’re still likely to be left with hundreds or even thousands of lines.
It’s important to understand that for even the simplest of tasks ProcMon will log multiple lines.
For example here the YCMMirage.exe process take 4 lines to discover that the HKLM\SOFTWARE\Wow6432Node\Microsoft\Windows NT\CurrentVersion\Drivers32\msvideo value does not exist
Once you realise quite how much ProcMon logs for simple tasks it becomes a little easier to read and a bit less intimidating.
Don’t Look for Faults
I realise that this sounds like stupid advice, after all the whole point of the exercise is to find faults. However if you filter well and are not left with a huge number of lines you are better off trying to establish what the application was trying to when it went wrong, rather than looking for the fault
The Results Column
The results column records the outcome for the action detailed in that line. Sometimes it’s not always obvious what the implications of any given result is. Here’s a rough guide to the main outcomes, there are other but these are the main ones I’ve encountered.
Success – this means that the operation was completed successfully. I’ve read some troubleshooting guides that suggest you filter out all the success results to concentrate on faults. I don’t like this approach though because you won’t be able to understand what is happening at any given time without the success there.
Access Denied – this means exactly what you’d expect it to mean. The users doesn’t have permissions to that registry key or file. Access Denied results are always worth investigating.
Buffer Overflow – this sounds really scary and the first time I saw it I thought there was a virus. In most cases is applications deliberately overflowing registry values so they know how big the fields are. I wouldn’t worry too much about these results.
Name Not Found – again quite a common result and it means that a file or registry setting Windows was looking for does not exist. Most of the time this isn’t a problem but sometimes it is. Unfortunately this means to need to establish which ones are a problem.
Double click on line in ProcMon and it will bring up its properties. One of the tabs will be the call stack. When I first started using ProcMon I spent a lot of time trying to get symbols to work from behind a proxy, and looking at the call stack. My advice would be don’t bother unless your an OS developer. You’ll drive yourself crazy and there are better ways to spend your time. I know Mark Russinovich solves a couple of issues using the call stack in his Cases of the Unexplained, but it is not for mere mortals.
Look out for log files
When an application crashes it often creates a log file but doesn’t bother tell you about it. When you’re looking through your trace search for .txt and .log files to see if the application is logging the issue. This can provide you with an error code and description to Google and an accurate time to concentrate your search around
It is useful to if you can capture two traces, one of the application performing a task correctly and the other of it faulting. That way you can compare the traces to help locate the issue. Look out for differences in the Results column between the two. Especially in Name not Found and Access Denied. Be warned though traces of a process doing the same thing twice are rarely exactly the same. I tried exporting two traces into two columns in Excel and using the EXACT command to compare them. It wasn’t helpful.
I hope this helps you start using ProcMon. In the next post I’ll show a real world example of using ProcMon to resolve a real issue.