Identifying crashes with the Windows Event Log

By , February 12, 2020 4:43 pm

It’s an unfortunate and inevitable fact that while developing software sometimes your software will crash. This also happens, sometimes, hopefully very infrequently, in production code. Each time this happens Windows stores some information about each crash in the Windows Event Log, along with a multitude of other event information it logs.

In this article I’m going to explain two event log entry types which encode crashes, and how to read them. Then I’ll also introduce some tools that take the drudgery out of converting this information into symbol, filename and line number.

The Windows Event Log

The Windows event log can be viewed using Microsoft’s Event Viewer. Just type “Event Viewer” in the start menu search box and press return. That should start it. Crash information is stored in the sub category “Application” under “Windows Logs”. The two event sources that describe crashes are Windows Error Reporting and Application Error.


The image above shows a Windows Error Reporting event has been selected. The human readable form is shown below in the General tab. Although I say human readable, it really is unintelligible gibberish. None of the fields are identified and you have nothing to work with. The details tab isn’t any better – the raw data is present in text or XML form. Here’s the XML for the crash shown above.

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Windows Error Reporting" /> 
    <EventID Qualifiers="0">1001</EventID> 
    <Level>4</Level> 
    <Task>0</Task> 
    <Keywords>0x80000000000000</Keywords> 
    <TimeCreated SystemTime="2020-02-12T10:09:34.000000000Z" /> 
    <EventRecordID>260507</EventRecordID> 
    <Channel>Application</Channel> 
    <Computer>hydra</Computer> 
    <Security /> 
  </System>
  <EventData>
    <Data>2023787729086567941</Data> 
    <Data>1</Data> 
    <Data>APPCRASH</Data> 
    <Data>Not available</Data> 
    <Data>0</Data> 
    <Data>testDeliberateCrash.exe</Data> 
    <Data>1.0.0.1</Data> 
    <Data>5e419525</Data> 
    <Data>testDeliberateCrash.exe</Data> 
    <Data>1.0.0.1</Data> 
    <Data>5e419525</Data> 
    <Data>c0000005</Data> 
    <Data>000017b2</Data> 
    <Data /> 
    <Data /> 
    <Data>C:\Users\stephen\AppData\Local\Temp\WERC24C.tmp.WERInternalMetadata.xml</Data> 
  <Data>C:\Users\stephen\AppData\Local\Microsoft\Windows\WER\ReportArchive\AppCrash_testDeliberateCr_c31b903842d94a84d4621dceaac377462674f7a_eb589596_139ec4bd</Data> 
    <Data /> 
    <Data>0</Data> 
    <Data>c3d360b2-4d7f-11ea-83d3-001e4fdb3956</Data> 
    <Data>0</Data> 
    <Data>54756af49aec84f97c15f03794ffd605</Data> 
  </EventData>
</Event>

There’s quite a bit of data in here, the purpose of each field implied, not stated. Towards the end is some information related to minidumps, but if you go searching for it, the minidump will no longer be present.

The format for Application Error crashes is different.

Windows Error Reporting

The event log data for a Windows Error Reporting event contains many fields that we don’t need if we’re just investigating a crash address. Each event starts with an <Event> tag and ends with an </Event> tag.

We need to correctly identify the event. Inside the event is a <System> tag which contains a tag with an attribute “Provider Name” set to “Windows Error Reporting”.

Once the event is identified we need to find the <EventData> tag inside the event. The <EventData> contains 14 <Data> tags. These tags are present:


  • 1. Timestamp.

  • 2. Number of data items.

  • 3. Information Type.

  • 4. Information Status.

  • 5. Unknown.

  • 6. Crashing executable.

  • 7. Executable version.

  • 8. Executable timestamp.

  • 9. Crashing DLL. This will be the same as 6 if the crash is in the .exe.

  • 10. DLL version. This will be the same as 7 if the crash is in the .exe.

  • 11. DLL timestamp. This will be the same as 8 if the crash is in the .exe.

  • 12. Exception code.

  • 13. Fault offset.

  • 14. Class. This may or may not be present

Information Type is normally “APPCRASH”. In this case we’re interested in tags 9, 12 and 13.

If Information Type is “BEX”, the data is different:


  • 1. Timestamp.

  • 2. Number of data items.

  • 3. Information Type.

  • 4. Information Status.

  • 5. Unknown.

  • 6. Crashing executable.

  • 7. Executable version.

  • 8. Executable timestamp.

  • 9. Crashing DLL. This will be the same as 6 if the crash is in the .exe.

  • 10. DLL version. This will be the same as 7 if the crash is in the .exe.

  • 11. DLL timestamp. This will be the same as 8 if the crash is in the .exe.

  • 12. Fault offset.

  • 13. Exception code.

  • 14. Class. This may or may not be present

Note that the order of the fault offset and exception code has been reversed compared to APPCRASH.

Of these tags we’re interested in tags 9, 12 and 13.

If we want to version the crashing DLL we also need tags 10 and 11.

Application Error

The event log data for an Application Error event contains many fields that we don’t need if we’re just investigating a crash address. Each event starts with an <Event> tag and ends with an </Event> tag.

We need to correctly identify the event. Inside the event is a <System> tag which contains a tag with an attribute “Provider Name” set to “Application Error”.

Once the event is identified we need to find the <EventData> tag inside the event. The <EventData> contains at least 12 <Data> tags, some of which may not be present, or which may be empty. These tags are present:


  • 1. Crashing executable.

  • 2. Executable version.

  • 3. Executable timestamp.

  • 4. Crashing DLL. This will be the same as 1 if the crash is in the .exe.

  • 5. DLL version. This will be the same as 2 if the crash is in the .exe.

  • 6. DLL timestamp. This will be the same as 3 if the crash is in the .exe.

  • 7. Exception code.

  • 8. Fault offset.

  • 9. Process id.

  • 10. Application start timestamp.

  • 11. Application path.

  • 12. Module path.

Of these tags we’re interested in tags 7, 8 and 12.

If we want to version the crashing DLL we also need tags 5 and 6. If 12 isn’t available, use 4.

Removing the drudgery

The previous two sections have described which fields to extract data from. If you’re doing this manually this is tedious and error prone. You have to select the correct values from the correct fields and then use another application to turn them into a symbol, filename and line number. Our tools DbgHelpBrowser and MapFileBrowser are designed to take a crash offset inside a DLL and turn it into a human readable symbol, filename and line number. But that still requires you to do the hard work of fishing the correct data out of the XML dump.

Now there is a better way, we’ve added an extra option to these tools that allows you to paste the entire XML data from a crash event and the tool then extracts the data it needs to show you the symbol, filename and line number.

DbgHelpBrowser

Load the crashing exe (or DLL) into DbgHelpBrowser. This will cause the symbols to be loaded for the DLL (assuming symbols have been created and can be found). We’re not covering versioning the DLL as most likely you will have your own methods for this.

Choose the option Find Symbol from Event Viewer XML crash log… on the Query menu. The Event Viewer Crash Data dialog is displayed.


Paste the XML data into the dialog and click OK.


The main display will select the appropriate symbol in the main grid and display the relevant symbol, filename, line number and source code in the source code viewer below.


MapFileBrowser

Load the MAP file for the crashing exe (or DLL) into MapFileBrowser. We’re not covering versioning the DLL as most likely you will have your own methods for this.

Choose the option Find Symbol from Event Viewer XML crash log… on the Query menu. The Event Viewer Crash Data dialog is displayed.


Paste the XML data into the dialog and click OK.


The main display will select the appropriate symbol in the main grid and display the relevant symbol, filename, line number and source code in the source code viewer below.


Conclusion

Windows Event Logs can be hard to read and error prone to use. However when paired with suitable tools you can quickly and easily turn event log crashes into useful symbol, filename and line number information to inform your debugging efforts.

Monitoring a service with the NT Service API

By , February 11, 2020 5:21 pm

Debugging services is a pain. There is a lot that can go wrong and very little you can do to find out what went wrong. Perfect! Just what you need for an easy day at work. Services run in a restricted environment, these days you also need to be Administrator to do anything with them, and getting your favourite software tool which isn’t a debugger working with them is hard. I remember years ago seeing the list of things you needed to do to get NuMega’s BoundsChecker to work with services. It was a couple of web pages of instructions, each line containing a detailed step. You had to do all of the actions correctly in order to set things up to work with services.

These days Microsoft have changed the security landscape and it’s no longer possible to launch your data monitoring software tool from a service as that ability is correctly regarded as a security vulnerability. It’s also pretty much impossible to inject into a service from a GUI application. As a result the correct way to work with services is to add a few lines of glue code, in the form of calls to an API that setup communications with an already running user interface.

We’ve described our updated NT Service API in a previous article, so in this article I’m going to talk about the using the API to track errors in the service code calling the API and also describe how you use the user interface to work with services. This article will focus on C++ Memory Validator, but the techniques described here will also work for C++ Coverage Validator, C++ Performance Validator and C++ Thread Validator. If you’re using a .Net service, or a mixed mode service with a .Net entry point you don’t need to use the API, but the GUI parts of this article will still apply to you. If you using a native service or a mixed mode service with a native entry point all of this article applies to you.

Monitoring a Service

Before we get into the error codes and error handling in the GUI, let’s first take a tour of how things should work if everything goes to plan. This will provide some context for the errors I’m going to describe later. I’m going to assume you’ve built both the example service and the example service client, and that you’ve installed the service (serviceMV.exe -install in an Administrator mode command prompt). The service client passes a string to the service, which reverses it and passes it back to the client. The service also deliberately leaks some memory for testing purposes.

Here’s a video of the process.

From the Launch menu, choose Monitor a Service.


The Monitor a Service dialog is displayed.


Enter the full path to your service and click OK to start monitoring. The Validator will now setup some environment variables and some data in the registry that will be used by the service API. After a few seconds the Start your Service dialog appears.


Click OK, then start your service (you’ll need to do this from an Administrator command prompt).

serviceMV -start

The Validator attaches to the service and after a few moments various status information in the Validator title bar and the Validator status bar updated.

It is possible that you may get a debug information informational dialog displayed. You can dismiss this (it can be viewed from the Validator Tools menu). To change how symbols are found you’ll need to look at the Symbol Server and File Locations parts of the Validator settings dialog.


Next a dialog is displayed informing you that Administrator Privileges may be required.


For some services you may find that the Validator gets better data, or sends data to the GUI faster if the Validator is run in Administrator mode. If that is the case you’ll need to restart the Validator with Administrator privileges (and also stop and restart the service, etc).

For this particular example service, we don’t need Administrator privileges so we’ll continue without them.

Now we can interact with the service from the service client by sending a string to the service. The service reverses it and sends it back.

serviceClient "Hello World"


Once we’re done working with the service we can stop it (you’ll need to do this from an Administrator command prompt).

serviceMV -stop

The Validator disconnects from the service and displays all the data it has collected from the service.


That’s how it looks when everything goes according to plan.

What happens when things go wrong? That’s what the next section is about.

Tracking errors in the service

The various API functions return a SVL_SERVICE_ERROR error code. We’ve extended this code so that you can detect when the user has forgotten to do something prior to starting the service, or you can detect if various other error conditions have occurred. Some of these error codes are internal error codes and should never be seen by a customer, but we’re documenting them here for completeness.


  • SVL_FAIL_PATHS_DO_NOT_MATCH. Internal error. Looks like you forgot to Monitor a Service from the Launch Menu before starting the service.

  • SVL_FAIL_INCORRECT_PRODUCT_PREFIX. Internal error. Looks like you forgot to Monitor a Service from the Launch Menu before starting the service.

  • SVL_FAIL_X86_VALIDATOR_FOUND_EXPECTED_X64_VALIDATOR. Looks like you’re monitoring a 64 bit service with a 32 bit Validator. You need to use a 64 bit Validator.

  • SVL_FAIL_X64_VALIDATOR_FOUND_EXPECTED_X86_VALIDATOR. Looks like you’re monitoring a 32 bit service with a 64 bit Validator with the svl*VStubService.lib library. You need to use a 64 bit Validator with the svl*VStubService6432.lib.

  • SVL_FAIL_DID_YOU_MONITOR_A_SERVICE_FROM_VALIDATOR. Looks like you forgot to Monitor a Service from the Launch Menu before starting the service.

  • SVL_FAIL_ENV_VAR_NOT_FOUND. Internal error. Looks like you forgot to Monitor a Service from the Launch Menu before starting the service.

  • SVL_FAIL_VALIDATOR_ENV_VAR_NOT_FOUND. Internal error. Looks like you forgot to Monitor a Service from the Launch Menu before starting the service.

  • SVL_FAIL_VALIDATOR_ID_NOT_SPECIFIED. Internal error. Looks like you forgot to Monitor a Service from the Launch Menu before starting the service.

  • SVL_FAIL_VALIDATOR_ID_NOT_A_PROCESS. Internal error. Looks like you forgot to Monitor a Service from the Launch Menu before starting the service.

  • SVL_FAIL_VALIDATOR_NOT_FOUND. Internal error. Looks like you forgot to Monitor a Service from the Launch Menu before starting the service.

To aid in debugging we strongly recommend that you log all error codes (successful or failure) from Software Verify API calls. This will allow you to track down errors rapidly rather a series of trial error coding mistakes or a back and forth with support but with no information to help support. We added all of the above error codes after 3 customers all reported similar, but different problems with using the service API. All of their problems would be have been solved if these error codes had been available.

Error codes can be logged with this call.

void writeToLogFile(const wchar_t     *fileName,
                    SVL_SERVICE_ERROR errCode);

Helpful messages can be logged with this call.

void writeToLogFile(const wchar_t *fileName,
                    const wchar_t *text);

Error codes can be turned into human readable messages with this call.

const wchar_t *getTextForErrorCode(SVL_SERVICE_ERROR	errorCode);

And if you need to log Windows error codes, use this call.

void writeToLogFileLastError(const wchar_t *fileName,
                             DWORD         errCode);

See the help documentation for all the available API calls.

Tracking errors in the GUI

There are a couple of mistakes that can be made in the user interface. These are related to monitoring the wrong type of service, and the location of the service. Where it is possible to identify this error in the GUI, we will do so. Where it is not, the error codes described above will help you understand the mistake that has been made.

64 bit Service, 32 bit GUI

If you try to monitor a 64 bit service with a 32 bit GUI that will fail. We can detect this and prevent this. When this error happens you will be shown an error dialog similar to this.


Note that monitoring a 32 bit service with a 64 bit GUI is OK, but you need to use the svl*VStubService6432.lib not the svl*VStubService.lib. We can’t detect this from the GUI, which is why the SVL_FAIL_X64_VALIDATOR_FOUND_EXPECTED_X86_VALIDATOR error code exists – you will get this if you are linked to svl*VStubService.lib when you should be linked to svl*VStubService6432.lib.

Service on a network share

Windows won’t let you start a service on a network share. And yet I’ve lost count of the number of times I’ve tried to do this. This is typically because I have the solution working on machine X (where I wrote it) and wish to test on machine Y, and I just use a network share to map it across. This works for applications and fails for services. This can be a real time waster and Windows isn’t exactly helpful about this, and of course it’s in a service’s startup code, so fun debugging that.

To make this failure easier to detect we check the path of the service you specify in the Monitor a Service dialog and determine if the service is on a network share. If it is we tell you we can’t work with it. This then alerts you to the fact you’ll need to copy that service locally to run tests on it. Probably and hour or two of your time saved, right there.


Conclusion

Working with services can be fraught with problems, but if you log your error codes you can easily and quickly identify any errors made configuring your use of the NT Service API that we were unable to catch with the Validator user interface.

Panorama Theme by Themocracy