Detecting Abandoned Critical Sections

By , March 6, 2020 4:24 pm

Multithreading is a powerful way to improve the processing throughput and responsiveness of your software. We use it to great effect at Software Verify. In order to manage multithreading successfully it’s necessary to use some form of synchronization between each thread that wishes to read/write data. Deadlocks can result. The main cause of deadlocks is two or more locks (critical section being an example) accessed in different orders on each thread. This has been the subject of much writing, so for now I won’t repeat that topic here.

There is another cause of deadlock which is less well known. The abandoned critical section.

In this article I’m going to describe how to detect abandoned critical sections. But first I need to describe them to you and explain how abandoned critical sections get created.

What is an Abandoned Critical Section?

An abandoned critical section is a critical section that has been locked but then the thread that owns the lock ends without unlocking the critical section. This creates a critical section that cannot be unlocked, and is thus permanently locked. If any other thread attempts to enter the critical section it will wait forever, in a deadlock caused by an infinite wait.

How does this happen?

There are several ways that a critical section can become abandoned.

  • Incorrect code.
  • Incorrect exception handling.
  • Terminate Thread.

Incorrect code

This is where the thread code enters a critical section to do some work and forgets to unlock the critical section. Then the thread exits. If you use object oriented code (CSingleLock for example) to manage the lifetime of critical section ownership then this problem should never happen. But if you manually control the locking, using say, CCriticalSection::Lock() and CCriticalSection::Unlock(), or EnterCriticalSection(&cs) and LeaveCriticalSection(&cs) then it’s possible for you to forget to leave a locked CS, or for a logic failure to result in a critical section not being locked.

If you’re using object oriented synchronization locking methods you might want to look at Thread Lock Checker to automate checking for some simple and common errors that can happen.

DWORD doThread(void	*param)
{
	EnterCriticalSection(&dataCS);
	
	doWork(data);
	
	return 0;	// forgot to call LeaveCriticalSection(&dataCS);
}

Incorrect exception handling

This is where some code in a thread is protected by an exception handler (you’re calling a 3rd party library, or working with data of unknown integrity) and a critical section is locked when an exception is thrown. In an ideal world the exception handler will leave that locked critical section. Unfortunately the writer of the exception handler may not known about the critical section, or they may have forgotten about it – either way the locked critical section doesn’t get unlocked. As with the previous case, if you use object oriented access to critical sections (CCriticalSection, CSingleLock) the process of unwinding the stack during the exception handling should automatically unlock these locks. This won’t happen if you’re using CRITICAL_SECTIONs with the Win32 API.

DWORD doThread(void	*param)
{
	__try
	{
		EnterCriticalSection(&dataCS);
	
		doWork(data);	// something inside here throws an exception. 
	
		LeaveCriticalSection(&dataCS);
	}
	__except(EXCEPTION_EXECUTE_HANDLER)
	{
		// forgot to call LeaveCriticalSection(&dataCS);
	}
	
	return 0;	
}

Terminate Thread

This is where a thread that is doing some work that has accessed some critical sections is killed by another thread calling TerminateThread(). There are occasions where TerminateThread() can be useful, but this is a last ditch method for dealing with threads. If your code is using TerminateThread() to manage your own threads why not spend some time to work out how not to use TerminateThread and to make your threads end normally (by exiting the thread or calling ExitThread()).

// correctly written thread

DWORD doThread(void	*param)
{
	EnterCriticalSection(&dataCS);
	
	doWork(data);
	
	LeaveCriticalSection(&dataCS);
	
	return 0;
}

void mainThread()
{
	HANDLE	hThread;
	DWORD	threadId;
	
	hThread = CreateThread(NULL, 0, doThread, NULL, 0, &threadId);
	if (hThread != NULL)
	{
		doSomeWork();
		
		TerminateThread(hThread, 0); // this is a bit brutal
		CloseHandle(hThread);
	}
}

How to detect Abandoned Critical Sections?

We have two ways to detect Abandoned Critical Sections.

  • Thread Wait Chain Inspector
  • Thread Validator

Thread Wait Chain Inspector

Thread Wait Chain Inspector is a free software tool that we wrote that uses the Win32 Wait Chain API to identify various wait chain states of the locks and waits in a given application. Just select the application in question and look at the results.


This tool tells you process ids and thread ids, but it can’t give you symbols, filenames and line numbers. It will provide thread names if you’re working on Windows 10 and you’ve named your threads using the SetThreadDescription() API.

Thread Validator

Thread Validator is our thread analysis software tool for analysing thread synchronization problems, deadlocks, busy locks, slow locks, contended locks and recursing locks. We’ve recently added some reporting options to Thread Validator will help you identify the location of abandoned critical sections.

I’ve used the tvExample demonstration application that ships with Thread Validator (you’ll need to build) to deliberately create two abandoned critical sections. From the test menu choose “Exit thread with a locked critical section” and “Terminate thread with a locked critical section”.

The summary display will show an abandoned count of 2 in the Errors panel.


The various locks displays will colour the abandoned thread dark purple and list the Lock status as Abandoned


If you click the Abandoned bar in the Errors panel, the display will move to the Analysis tab and the callstacks for the abandoned critical sections will be displayed.


Expanding each entry reveals the callstacks so that you can see where see where each critical section is abandoned. Note that each entry shows two callstacks. The first is where the critical section was created. The second is where the critical section was abandoned. You can expand any entry on any callstack to see the source code.

Abandoned because of thread exit


Abandoned because of TerminateThread()


Expanding the callstack entries to reveal the source code…


Conclusion

Abandoned Critical Sections are bad news. They cause deadlocks. But they don’t need to be hard to track down when you’ve got the right tools to put to work.

Turbo Debugger Symbols Viewer

By , March 6, 2020 12:48 pm

If you’re using Delphi or 32 bit C++ Builder your compiler/linker produces symbols in TDS format. TDS means Turbo Debugger Symbols – it’s an old naming convention from the days of the Turbo compilers.

There are occasions when it’s useful to know what’s inside the symbol file. Is the symbol name as I expected? Or is it mangled to something else? Is the symbol name in the debug info? If not, then maybe the compiler optimised that function out of existence (it does happen).

We’d already written a symbol file viewer for Visual Studio PDB files. It seemed logical to write a similar tool for TDS symbols. That tool is TDS Browser.

TDS symbols can be stored in the executable to which they relate, or in a separate TDS file. The 64 bit version of Delphi doesn’t provide an option for symbols in an external TDS file, which is odd, as no one wants to ship symbols with their executable. But hey, Embarcadero must have their reasons, right? As a result to view symbols with TDS Browser you can specify the TDS file or the executable file, either way we’ll load the symbols if they exist.

Symbol name mangling is different for 64 bit symbols, so we’ve provided an option to see the raw, unmangled symbol name for the occasions when you need that.

You can sort the symbols by any column, and reverse the sort by clicking the same column again. This is useful for finding symbols with source files – sort by filename. Select a symbol and we’ll show you the source code and line numbers if we can find the source code (not easy for source files that don’t have complete paths).

We provided options for resolving addresses into symbols for the 4 main use cases:

  • Crash at absolute address in DLL.
  • Crash at relative address in DLL.
  • Crash at symbol relative address.
  • Crash data from Windows Event Log in XML format.

The above options are on the Query menu.


If the crash address can be resolved into a symbol, the symbol is selected, and the relevant source code and line number information is displayed. A context menu also provides additional options for highlighing symbols and copying symbol related information to the clipboard.


Here’s a short video showing how to use TDS Browser.

An easier way to view crashes in the Windows Event Log

By , March 6, 2020 11:45 am

In a previous article I wrote about how to identify crashes in the Windows Event Log.

You need to use the Windows Event Viewer, inspect each entry looking for some keywords then decode the XML data to get the information you want. All a bit slow, tedious and error prone.

So we wrote a tool to do that for you: Event Log Crash Browser.

Event Log Crash Browser scans your event log looking for crash events, then picks out only the information that is useful:

  • The executable.
  • The DLL that crashed (if it did crash in a DLL, rather than non-DLL memory).
  • The exception code.
  • The offset into the DLL of the crash location (or the location in memory for non-DLL crashes).
  • We also read the version information from the DLL so that we can identify the company responsible for the DLL that crashed.

You can sort on any column, and filter by exception type, executable and DLL.

It’s a really easy way to see what failures are happening on your machine. A lot more convenient than Windows Event Viewer. Looking at this machine I can see that the Visual Studio compiler, linker and IDE crash from time to time. I can also see that the WMI provider service dies quite often from a heap corruption – this is a core bit of Microsoft technology that has problems. Most of the other failures are related to the software under development and test on this machine.

Identifying crashes with the Windows Event Log

By , February 12, 2020 4:43 pm

It’s an unfortunate and inevitable fact that while developing software sometimes your software will crash. This also happens, sometimes, hopefully very infrequently, in production code. Each time this happens Windows stores some information about each crash in the Windows Event Log, along with a multitude of other event information it logs.

In this article I’m going to explain two event log entry types which encode crashes, and how to read them. Then I’ll also introduce some tools that take the drudgery out of converting this information into symbol, filename and line number.

The Windows Event Log

The Windows event log can be viewed using Microsoft’s Event Viewer. Just type “Event Viewer” in the start menu search box and press return. That should start it. Crash information is stored in the sub category “Application” under “Windows Logs”. The two event sources that describe crashes are Windows Error Reporting and Application Error.


The image above shows a Windows Error Reporting event has been selected. The human readable form is shown below in the General tab. Although I say human readable, it really is unintelligible gibberish. None of the fields are identified and you have nothing to work with. The details tab isn’t any better – the raw data is present in text or XML form. Here’s the XML for the crash shown above.

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Windows Error Reporting" /> 
    <EventID Qualifiers="0">1001</EventID> 
    <Level>4</Level> 
    <Task>0</Task> 
    <Keywords>0x80000000000000</Keywords> 
    <TimeCreated SystemTime="2020-02-12T10:09:34.000000000Z" /> 
    <EventRecordID>260507</EventRecordID> 
    <Channel>Application</Channel> 
    <Computer>hydra</Computer> 
    <Security /> 
  </System>
  <EventData>
    <Data>2023787729086567941</Data> 
    <Data>1</Data> 
    <Data>APPCRASH</Data> 
    <Data>Not available</Data> 
    <Data>0</Data> 
    <Data>testDeliberateCrash.exe</Data> 
    <Data>1.0.0.1</Data> 
    <Data>5e419525</Data> 
    <Data>testDeliberateCrash.exe</Data> 
    <Data>1.0.0.1</Data> 
    <Data>5e419525</Data> 
    <Data>c0000005</Data> 
    <Data>000017b2</Data> 
    <Data /> 
    <Data /> 
    <Data>C:\Users\stephen\AppData\Local\Temp\WERC24C.tmp.WERInternalMetadata.xml</Data> 
  <Data>C:\Users\stephen\AppData\Local\Microsoft\Windows\WER\ReportArchive\AppCrash_testDeliberateCr_c31b903842d94a84d4621dceaac377462674f7a_eb589596_139ec4bd</Data> 
    <Data /> 
    <Data>0</Data> 
    <Data>c3d360b2-4d7f-11ea-83d3-001e4fdb3956</Data> 
    <Data>0</Data> 
    <Data>54756af49aec84f97c15f03794ffd605</Data> 
  </EventData>
</Event>

There’s quite a bit of data in here, the purpose of each field implied, not stated. Towards the end is some information related to minidumps, but if you go searching for it, the minidump will no longer be present.

The format for Application Error crashes is different.

Windows Error Reporting

The event log data for a Windows Error Reporting event contains many fields that we don’t need if we’re just investigating a crash address. Each event starts with an <Event> tag and ends with an </Event> tag.

We need to correctly identify the event. Inside the event is a <System> tag which contains a tag with an attribute “Provider Name” set to “Windows Error Reporting”.

Once the event is identified we need to find the <EventData> tag inside the event. The <EventData> contains 14 <Data> tags. These tags are present:


  • 1. Timestamp.

  • 2. Number of data items.

  • 3. Information Type.

  • 4. Information Status.

  • 5. Unknown.

  • 6. Crashing executable.

  • 7. Executable version.

  • 8. Executable timestamp.

  • 9. Crashing DLL. This will be the same as 6 if the crash is in the .exe.

  • 10. DLL version. This will be the same as 7 if the crash is in the .exe.

  • 11. DLL timestamp. This will be the same as 8 if the crash is in the .exe.

  • 12. Exception code.

  • 13. Fault offset.

  • 14. Class. This may or may not be present

Information Type is normally “APPCRASH”. In this case we’re interested in tags 9, 12 and 13.

If Information Type is “BEX”, the data is different:


  • 1. Timestamp.

  • 2. Number of data items.

  • 3. Information Type.

  • 4. Information Status.

  • 5. Unknown.

  • 6. Crashing executable.

  • 7. Executable version.

  • 8. Executable timestamp.

  • 9. Crashing DLL. This will be the same as 6 if the crash is in the .exe.

  • 10. DLL version. This will be the same as 7 if the crash is in the .exe.

  • 11. DLL timestamp. This will be the same as 8 if the crash is in the .exe.

  • 12. Fault offset.

  • 13. Exception code.

  • 14. Class. This may or may not be present

Note that the order of the fault offset and exception code has been reversed compared to APPCRASH.

Of these tags we’re interested in tags 9, 12 and 13.

If we want to version the crashing DLL we also need tags 10 and 11.

Application Error

The event log data for an Application Error event contains many fields that we don’t need if we’re just investigating a crash address. Each event starts with an <Event> tag and ends with an </Event> tag.

We need to correctly identify the event. Inside the event is a <System> tag which contains a tag with an attribute “Provider Name” set to “Application Error”.

Once the event is identified we need to find the <EventData> tag inside the event. The <EventData> contains at least 12 <Data> tags, some of which may not be present, or which may be empty. These tags are present:


  • 1. Crashing executable.

  • 2. Executable version.

  • 3. Executable timestamp.

  • 4. Crashing DLL. This will be the same as 1 if the crash is in the .exe.

  • 5. DLL version. This will be the same as 2 if the crash is in the .exe.

  • 6. DLL timestamp. This will be the same as 3 if the crash is in the .exe.

  • 7. Exception code.

  • 8. Fault offset.

  • 9. Process id.

  • 10. Application start timestamp.

  • 11. Application path.

  • 12. Module path.

Of these tags we’re interested in tags 7, 8 and 12.

If we want to version the crashing DLL we also need tags 5 and 6. If 12 isn’t available, use 4.

Removing the drudgery

The previous two sections have described which fields to extract data from. If you’re doing this manually this is tedious and error prone. You have to select the correct values from the correct fields and then use another application to turn them into a symbol, filename and line number. Our tools DbgHelpBrowser and MapFileBrowser are designed to take a crash offset inside a DLL and turn it into a human readable symbol, filename and line number. But that still requires you to do the hard work of fishing the correct data out of the XML dump.

Now there is a better way, we’ve added an extra option to these tools that allows you to paste the entire XML data from a crash event and the tool then extracts the data it needs to show you the symbol, filename and line number.

DbgHelpBrowser

Load the crashing exe (or DLL) into DbgHelpBrowser. This will cause the symbols to be loaded for the DLL (assuming symbols have been created and can be found). We’re not covering versioning the DLL as most likely you will have your own methods for this.

Choose the option Find Symbol from Event Viewer XML crash log… on the Query menu. The Event Viewer Crash Data dialog is displayed.


Paste the XML data into the dialog and click OK.


The main display will select the appropriate symbol in the main grid and display the relevant symbol, filename, line number and source code in the source code viewer below.


MapFileBrowser

Load the MAP file for the crashing exe (or DLL) into MapFileBrowser. We’re not covering versioning the DLL as most likely you will have your own methods for this.

Choose the option Find Symbol from Event Viewer XML crash log… on the Query menu. The Event Viewer Crash Data dialog is displayed.


Paste the XML data into the dialog and click OK.


The main display will select the appropriate symbol in the main grid and display the relevant symbol, filename, line number and source code in the source code viewer below.


Conclusion

Windows Event Logs can be hard to read and error prone to use. However when paired with suitable tools you can quickly and easily turn event log crashes into useful symbol, filename and line number information to inform your debugging efforts.

Monitoring a service with the NT Service API

By , February 11, 2020 5:21 pm

Debugging services is a pain. There is a lot that can go wrong and very little you can do to find out what went wrong. Perfect! Just what you need for an easy day at work. Services run in a restricted environment, these days you also need to be Administrator to do anything with them, and getting your favourite software tool which isn’t a debugger working with them is hard. I remember years ago seeing the list of things you needed to do to get NuMega’s BoundsChecker to work with services. It was a couple of web pages of instructions, each line containing a detailed step. You had to do all of the actions correctly in order to set things up to work with services.

These days Microsoft have changed the security landscape and it’s no longer possible to launch your data monitoring software tool from a service as that ability is correctly regarded as a security vulnerability. It’s also pretty much impossible to inject into a service from a GUI application. As a result the correct way to work with services is to add a few lines of glue code, in the form of calls to an API that setup communications with an already running user interface.

We’ve described our updated NT Service API in a previous article, so in this article I’m going to talk about the using the API to track errors in the service code calling the API and also describe how you use the user interface to work with services. This article will focus on C++ Memory Validator, but the techniques described here will also work for C++ Coverage Validator, C++ Performance Validator and C++ Thread Validator. If you’re using a .Net service, or a mixed mode service with a .Net entry point you don’t need to use the API, but the GUI parts of this article will still apply to you. If you using a native service or a mixed mode service with a native entry point all of this article applies to you.

Monitoring a Service

Before we get into the error codes and error handling in the GUI, let’s first take a tour of how things should work if everything goes to plan. This will provide some context for the errors I’m going to describe later. I’m going to assume you’ve built both the example service and the example service client, and that you’ve installed the service (serviceMV.exe -install in an Administrator mode command prompt). The service client passes a string to the service, which reverses it and passes it back to the client. The service also deliberately leaks some memory for testing purposes.

Here’s a video of the process.

From the Launch menu, choose Monitor a Service.


The Monitor a Service dialog is displayed.


Enter the full path to your service and click OK to start monitoring. The Validator will now setup some environment variables and some data in the registry that will be used by the service API. After a few seconds the Start your Service dialog appears.


Click OK, then start your service (you’ll need to do this from an Administrator command prompt).

serviceMV -start

The Validator attaches to the service and after a few moments various status information in the Validator title bar and the Validator status bar updated.

It is possible that you may get a debug information informational dialog displayed. You can dismiss this (it can be viewed from the Validator Tools menu). To change how symbols are found you’ll need to look at the Symbol Server and File Locations parts of the Validator settings dialog.


Next a dialog is displayed informing you that Administrator Privileges may be required.


For some services you may find that the Validator gets better data, or sends data to the GUI faster if the Validator is run in Administrator mode. If that is the case you’ll need to restart the Validator with Administrator privileges (and also stop and restart the service, etc).

For this particular example service, we don’t need Administrator privileges so we’ll continue without them.

Now we can interact with the service from the service client by sending a string to the service. The service reverses it and sends it back.

serviceClient "Hello World"


Once we’re done working with the service we can stop it (you’ll need to do this from an Administrator command prompt).

serviceMV -stop

The Validator disconnects from the service and displays all the data it has collected from the service.


That’s how it looks when everything goes according to plan.

What happens when things go wrong? That’s what the next section is about.

Tracking errors in the service

The various API functions return a SVL_SERVICE_ERROR error code. We’ve extended this code so that you can detect when the user has forgotten to do something prior to starting the service, or you can detect if various other error conditions have occurred. Some of these error codes are internal error codes and should never be seen by a customer, but we’re documenting them here for completeness.


  • SVL_FAIL_PATHS_DO_NOT_MATCH. Internal error. Looks like you forgot to Monitor a Service from the Launch Menu before starting the service.

  • SVL_FAIL_INCORRECT_PRODUCT_PREFIX. Internal error. Looks like you forgot to Monitor a Service from the Launch Menu before starting the service.

  • SVL_FAIL_X86_VALIDATOR_FOUND_EXPECTED_X64_VALIDATOR. Looks like you’re monitoring a 64 bit service with a 32 bit Validator. You need to use a 64 bit Validator.

  • SVL_FAIL_X64_VALIDATOR_FOUND_EXPECTED_X86_VALIDATOR. Looks like you’re monitoring a 32 bit service with a 64 bit Validator with the svl*VStubService.lib library. You need to use a 64 bit Validator with the svl*VStubService6432.lib.

  • SVL_FAIL_DID_YOU_MONITOR_A_SERVICE_FROM_VALIDATOR. Looks like you forgot to Monitor a Service from the Launch Menu before starting the service.

  • SVL_FAIL_ENV_VAR_NOT_FOUND. Internal error. Looks like you forgot to Monitor a Service from the Launch Menu before starting the service.

  • SVL_FAIL_VALIDATOR_ENV_VAR_NOT_FOUND. Internal error. Looks like you forgot to Monitor a Service from the Launch Menu before starting the service.

  • SVL_FAIL_VALIDATOR_ID_NOT_SPECIFIED. Internal error. Looks like you forgot to Monitor a Service from the Launch Menu before starting the service.

  • SVL_FAIL_VALIDATOR_ID_NOT_A_PROCESS. Internal error. Looks like you forgot to Monitor a Service from the Launch Menu before starting the service.

  • SVL_FAIL_VALIDATOR_NOT_FOUND. Internal error. Looks like you forgot to Monitor a Service from the Launch Menu before starting the service.

To aid in debugging we strongly recommend that you log all error codes (successful or failure) from Software Verify API calls. This will allow you to track down errors rapidly rather a series of trial error coding mistakes or a back and forth with support but with no information to help support. We added all of the above error codes after 3 customers all reported similar, but different problems with using the service API. All of their problems would be have been solved if these error codes had been available.

Error codes can be logged with this call.

void writeToLogFile(const wchar_t     *fileName,
                    SVL_SERVICE_ERROR errCode);

Helpful messages can be logged with this call.

void writeToLogFile(const wchar_t *fileName,
                    const wchar_t *text);

Error codes can be turned into human readable messages with this call.

const wchar_t *getTextForErrorCode(SVL_SERVICE_ERROR	errorCode);

And if you need to log Windows error codes, use this call.

void writeToLogFileLastError(const wchar_t *fileName,
                             DWORD         errCode);

See the help documentation for all the available API calls.

Tracking errors in the GUI

There are a couple of mistakes that can be made in the user interface. These are related to monitoring the wrong type of service, and the location of the service. Where it is possible to identify this error in the GUI, we will do so. Where it is not, the error codes described above will help you understand the mistake that has been made.

64 bit Service, 32 bit GUI

If you try to monitor a 64 bit service with a 32 bit GUI that will fail. We can detect this and prevent this. When this error happens you will be shown an error dialog similar to this.


Note that monitoring a 32 bit service with a 64 bit GUI is OK, but you need to use the svl*VStubService6432.lib not the svl*VStubService.lib. We can’t detect this from the GUI, which is why the SVL_FAIL_X64_VALIDATOR_FOUND_EXPECTED_X86_VALIDATOR error code exists – you will get this if you are linked to svl*VStubService.lib when you should be linked to svl*VStubService6432.lib.

Service on a network share

Windows won’t let you start a service on a network share. And yet I’ve lost count of the number of times I’ve tried to do this. This is typically because I have the solution working on machine X (where I wrote it) and wish to test on machine Y, and I just use a network share to map it across. This works for applications and fails for services. This can be a real time waster and Windows isn’t exactly helpful about this, and of course it’s in a service’s startup code, so fun debugging that.

To make this failure easier to detect we check the path of the service you specify in the Monitor a Service dialog and determine if the service is on a network share. If it is we tell you we can’t work with it. This then alerts you to the fact you’ll need to copy that service locally to run tests on it. Probably and hour or two of your time saved, right there.


Conclusion

Working with services can be fraught with problems, but if you log your error codes you can easily and quickly identify any errors made configuring your use of the NT Service API that we were unable to catch with the Validator user interface.

Exception Tracer

By , August 24, 2019 8:43 am

We’ve just released another of our in-house tools – Exception Tracer.

Exception Tracer started off life as an experiment and then through a series of needing to capture debugging data for various problems with customers, morphed into the exception tracing tool that it is today. Exception Tracer logs debugging events that are sent to debuggers. Most debuggers respond to these events in an interactive manner, breaking the code on exceptions (such as access violations and breakpoints), stepping into and out of functions, inspecting variables. Exception Tracer doesn’t do any of those things. Exception Tracer simply logs every event and stores callstacks associated with each event. You can save the entire trace and inspect it later (or on a different machine if you wish).

Exception Tracer is great for understanding what exceptions are thrown by applications that throw a lot of exceptions, whether that is by design, or because something is going wrong and the exception handling mechanism is being triggered a lot.

We’ve provided filtering so that you only collect the events you’re interested in – perhaps all you’re interested in is what DLLs load and unload and the order they load in, or maybe you only care about a custom exception that your program throws.

We’ve also provided the ability to create minidumps when exceptions are thrown – minidump for any exception, or just the exceptions you care about.

Lastly, we’ve also provided automatic single stepping support (if you want it, turned off by default) with some intelligent options to reduce the amount of redundant single stepping events that are collected. Because you can turn single stepping on and off during a trace you can run at full speed to where the problem area is, turn on single stepping and collect just the area you need in detail.

We used single stepping to great effect to understand the cause of a stack overflow when one of our tools was shutting down on a customer machine. Turns out the culprit was an anti-virus product on the customer machine that was triggering an unexpected sequence of events that would never happen outside of the shutdown phase. We couldn’t get near this bug with a traditional debugger like Visual Studio, but Exception Tracer got us there (that’s where the intelligent filtering came in – the traces contained so many events we had to reduce the data size just to make it manageable when you were inspecting the results).


Select an item in the top window to view the event data and callstack in the lower windows.

Threads names are taken from GetThreadDescription() API (Windows 10), thread naming exceptions, and manual naming of threads (context menu).

Specific threads can be highlighted so that you can pick out related events on the same thread.

We’d love to hear about problems you’ve solved using Exception Tracer. Please let us know.

Thread Wait Chain Inspector

By , August 22, 2019 10:53 am

Since Windows Vista the Windows operating system has included functionality to iterate across the waiting objects that form a chain between threads. I’m waiting for thread A, which is waiting for thread B, which is waiting for process Y. That sort of thing. Waits come in the form of EnterCriticalSection, WaitForSingleObject, WaitForMultipleObjects, etc. All documented in Microsoft’s Synchronization API. If you get these waits wrong, you can get deadlocks, or waits that wait forever. Either way, it’s game over for your program if that happens.

The Wait Chain Traversal API was added with windows Vista, but only made public recently. Prior to the API, access to the Wait Chain API was only via Resource Monitor, and more recently via Task Manager. A detailed article by a Microsoft field engineer, faik hakan bilgen, documents the history of the Wait Chain user interfaces and then provides a console program (with source code on github) to provide a wait chain dump to a text file. Unfortunately this isn’t very easy to use as it relies on decoding thread ids and process ids to understand what is happening. Also because it’s a text file, to get an update you need to run the tool again.

We decided to take inspiration from our Thread Status Monitor tool and create a version specifically for wait chains – Thread Wait Chain Inspector.


Select the process you are interested in in the upper window. Wait chains for each thread are shown in the window below. Select a thread and it and any related threads (in the same wait chain) will be highlighted in yellow. Deadlocked threads are shown in red. Process names are displayed and thread names are taken from the GetThreadDescription() API (Windows 10 only).

If you wish to debug a process you can create a minidump for any process in a wait chain. Right click on the process of interest in the wait chain and Create a Minidump….

Thread Wait Chain Inspector is a free tool, complementing our other threading tools, Thread Validator, Thread Lock Checker and Thread Status Monitor.

Decompression and blue days

By , July 11, 2019 1:53 pm

It’s not uncommon for the founders of startup businesses to experience problems with motivation and problems with productivity as their business grows. I’m going to write about two issues I’ve run into over the years. They recur. You can’t stop them recurring. So the best thing to do is to understand them and accept them. They’re what I call decompression and blue days.

Decompression

Decompression is the word I use to identify the following pattern. You complete a major software release to the public. Then you find yourself unable to commit to any “serious” work for a period of time. For me, it’s typically one day. For you it could be an afternoon, a day, a week, maybe more.

So what is happening with decompression? I think it’s the process of your mind unwinding all the many layers of logic, dependencies, commitments and anxiety of %^&(ing up the release (it does happen!). During this decompression period I’ve found I can work on things tangentially related to the business, but not directly related to the business. As such I can work on side projects, read technical books, non-technical books, go for a walk, play musical instruments, provide mentorship, whatever. I just can’t work on the software or on marketing for the software during this period.

I’ve also found that I can’t do anything about this. Decompression needs to happen. Once it’s done I can get back to work with no distractions about any of the issues related to that previous software release. If I try to force it, by trying to work during a decompression period I just end up doing nothing, but getting frustrated that I’m doing nothing. That isn’t healthy, so I’ve come to the conclusion that the best thing to do is to accept that this happens and work with it. Do something else that is good for your mental health during one of these periods.

If you do this for yourself, cut your team some slack too. They’re probably going through their own version of the same thing.

Blue days

Blue days are different. These don’t come after any specific event. They just appear at random. You could be having a troubling business time or you could be having a great time, building the product, or you’ve already built it and have revenue pouring in, but then one day you’re thinking “I’m wasting my time. Why are we doing this? This will never succeed. Should I stop and spend my energy on (shiny! shiny!).” Typically this is accompanied by a very bleak outlook on life. Often this can be triggered by slow sales (which might mean you get this at certain times of year).

This is like a mini-depression, a very, very short duration depression. Emotionally it’s horrible. But they go away. After you’ve had this happen to you repeatedly you realise this is just hidden emotions bubbling to the surface and needing to be released. Being aware of this then makes the next time more bearable. Depending on your disposition what you do on a blue day will vary. You may bury yourself in work, or may need to leave all that behind and head off up a hill. Do what’s best for you. Mental health first.

Conclusion

Decompression and blue days both affect productivity and motivation. You can’t do much about them. But you can learn to recognise them, accept them for what they are, and that they will pass, and take action to make them bearable while they happen.

Hopefully if you’re reading this you recognise these two states and are now thinking “Someone else experiences this too. It’s normal!” and that’s a relief 🙂

There is also a small chance you ended up here because you’re seeking out articles on depression. If that’s the case you may find this wonderful talk at Business of Software by Greg Baugues talk about depression some help.

Stdout redirection and capture

By , July 11, 2019 10:34 am

We were recently asked if Memory Validator could handle monitoring a program that took it’s input from a file and wrote its’ output to a file. As shown in the following example.

redirect.exe < input.txt > output.txt

Our tools could handle this, but it wasn’t obvious that they could. Also for interactive use there was no easy way to do this via the launch dialog, unless you were using Coverage Validator. We had to make changes to most of our tools so that they could do what Coverage Validator could do. All tools had a new diagnostic sub-tab added so that stdout data that was captured (an option) could be viewed.

In this article I’m going to explain how to launch a program that reads from stdin and/or writes to stdout using our tools. Although I’m talking about Memory Validator, these techniques apply to all our Validator tools.

There are four ways of doing this.

  1. Start your program from a batch file, putting the redirect of stdin and stdout in the batch file.
  2. Start your program from the launch dialog/wizard, specifying the input and output files on the launch wizard.
  3. Start your program from the command line, specifying the input and output files on the command line.
  4. Use the Memory Validator API to start Memory Validator from inside the target program.

Batch File

Using a batch file to do this is easy. Simply create your batch file in the form shown in the example below.

e:\redirect\release\redirect.exe < e:\test\input.txt > e:\test\output.txt

Save the batch file with a known name. In this example I’ll save the batch file in e:\test\redirectTest.bat. Then launch the batch file from Memory Validator (or another Validator tool). The first program launched by the batch file will be monitored.


Launch Dialog/Wizard

We modified the launch wizard and the launch dialog to include fields for an optional input file and an optional output file. We also added an option that would capture stdout so that you could view the output on the diagnostic tab.

This example shows the program testAppTheReadsFromStdinAndWritesToStdout.exe being launched, reading from e:\test\input.txt, and writing to reading from e:\test\output.txt.


The stdout capture checkbox has been selected. This will mean a copy of stdout will be captured and displayed on the diagnostic sub-tab stdout.


Command Line

For command line operation we need two new options -stdin and -stdout, each of which takes a filename for an argument.

There are two additional arguments that you can supply to tell Memory validator to ignore missing input files and missing output files: -ignoreMissingStdin and -ignoreMissingStdout.

memoryValidator.exe -program e:\redirect\release\redirect.exe 
                    -directory e:\redirect\release 
                    -stdin e:\test\input.txt 
                    -stdout e:\test\output.txt
                    -showErrorsWithMessageBox

API

You can use the Memory Validator API to start Memory Validator each time the target program is run. In that case just running the following command on the command prompt would cause Memory Validator monitor the target program redirect.exe.

e:\redirect\release\redirect.exe < e:\test\input.txt > e:\test\output.txt

To use the Memory Validator API with a particular application, the following steps outline the minimum steps required.

  1. Link to svlMemoryValidatorStub.lib (_x64.lib for x64)
  2. Link to svlMemoryValidatorStubLib.lib (_x64.lib for x64)
  3. #include “stublib.h”
  4. call startProfiler(); at the start of your program
  5. See documentation in the help file for more details.

For other Validator tools the library names will change. See the documentation (topic API) for the particular tool for details.

Conclusion

There are four ways you can work with stdin and stdout with our Validator tools.

You can work with batch files, launch interactively using the launch dialog/wizard, work from the command line, and use the Validator API.

Thread naming

By , June 19, 2019 7:21 pm

Multi-threading is becoming quite common these days. It’s a useful way to provide a responsive user interface while performing work at the same time. Our tools report data per thread where that is warranted (per-thread code coverage doesn’t seem to be a thing – no one has requested it in 17 years). In this article I’m going to discuss thread naming, OS support for thread naming, and additional support for thread naming that our tools automatically provide.

The Default Thread Display

Threads are represented by a thread handle and a thread id. Typically the thread id is what will be used to represent the thread in a user interface reporting thread related data. Thread ids are numeric. For example: 341. For trivial programs you can often infer which thread is which by looking at the data allocated on each thread. However that doesn’t scale very well to more complex applications. You can end up with thread displays like this:


Microsoft Thread Naming Exception

The WIN32 API does not provide any functions to allow you to name a thread. To handle this oversight, Microsoft use a convention that allows a program to communicate a thread name with it’s debugger. This is done via means of an exception that the program throws (and then catches to prevent it’s propagation terminating the application). The debugger also catches this exception and with the help of ReadProcessMemory() can retrieve the exception name from the program. Here’s how that works.

In your program

The exception code that identifies a thread naming exception is 0x406D1388. To pass the thread name to the debugger you need to create a struct of type THREADNAME_INFO (definition shown int the code sample), populate it with the appropriate data then raise an exception with the specified exception code. You’ll need to use SEH __try/__except to surround the RaiseException() call to prevent crashing the program.

typedef struct tagTHREADNAME_INFO
{
	DWORD	dwType;		// must be 0x1000
	LPCSTR	szName;		// pointer to name (in user addr space) buffer must be 8 chars + 1 terminator
	DWORD	dwThreadID;	// thread ID (-1 == caller thread)
	DWORD	dwFlags;	// reserved for future use, must be zero
} THREADNAME_INFO;

#define MS_VC_EXCEPTION 0x406D1388

void nameThread(const DWORD	threadId,
                const char	*name)
{
	// You can name your threads by using the following code. 
	// Thread Validator will intercept the exception and pass it along (so if you are also running
	// under a debugger the debugger will also see the exception and read the thread name

	// NOTE: this is for 'unmanaged' C++ ONLY!

	#define BUFFER_LEN		16

	THREADNAME_INFO	ThreadInfo;
	char		szSafeThreadName[BUFFER_LEN];	// buffer can be any size, just make sure it is large enough!
	
	memset(szSafeThreadName, 0, sizeof(szSafeThreadName));	// ensure all characters are NULL before
	strncpy(szSafeThreadName, name, BUFFER_LEN - 1);	// copying name
	//szSafeThreadName[BUFFER_LEN - 1] = '\0';

	ThreadInfo.dwType = 0x1000;
	ThreadInfo.szName = szSafeThreadName;
	ThreadInfo.dwThreadID = threadId;
	ThreadInfo.dwFlags = 0;

	__try
	{
		RaiseException(MS_VC_EXCEPTION, 0, 
                               sizeof(ThreadInfo) / sizeof(DWORD_PTR), 
                               (DWORD_PTR *)&ThreadInfo); 
	}
	__except(EXCEPTION_EXECUTE_HANDLER)
	{
		// do nothing, just catch the exception so that you don't terminate the application
	}
}

In the debugger

The THREADNAME_INFO struct address is communicated to the debugger in event.u.Exception.ExceptionRecord.ExceptionInformation[0]. Cast this pointer to a THREADNAME_INFO *, check the number of parameters are correct (4 on x86, 3 on x64), check the dwType and dwFlags are correct, then use the pointer in the szName field. This is a pointer inside the other process, which means you can’t read it directly you need to use ReadProcessMemory(). Once the debugger continues and this event goes out of scope, the ability to read the thread name no longer exists. If you want to read the thread name you must read it immediately, storing it for future use.

	if ((de->dwDebugEventCode == EXCEPTION_DEBUG_EVENT) &&
	    (de->u.Exception.ExceptionRecord.ExceptionCode == SVL_THREAD_NAMING_EXCEPTION))
	{
#ifdef _WIN64
		if (de->u.Exception.ExceptionRecord.NumberParameters == 3)
#else	//#ifdef _WIN64
		if (de->u.Exception.ExceptionRecord.NumberParameters == 4)
#endif	//#ifdef _WIN64
		{
			THREADNAME_INFO	*tni;

			tni = (THREADNAME_INFO *)&de->u.Exception.ExceptionRecord.ExceptionInformation[0];
			if (tni->dwType == 4096 &&
			    tni->dwFlags == 0)
			{
				void	*ptr = (void *)tni->szName;
				DWORD	threadId = tni->dwThreadID;

				if (ptr != NULL)
				{
					char	buffer[1000];
					int		bRet;
					SIZE_T	dwRead = 0;

					memset(buffer, 0, 1000);
					bRet = ReadProcessMemory(hProcess,
								 ptr,
								 buffer,
								 1000 - 1,	// remove one so there is always a terminating zero
								 &dwRead);

					if (buffer[0] != '\0')
						setThreadName(threadId, buffer);
				}
			}
		}
	}

Our tools support intercepting this mechanism so that we can name your threads if you use this mechanism.

The problem with this mechanism is that it puts the onus for naming the threads on the creator of the application. If that involves 3rd party components that spawn their own threads, and OS threads then almost certainly not all threads will be named, which results in some threads having a name and other threads being represented by a thread id.


CreateThread() And Symbols

In an attempt to automatically name threads to provide useful naming for threads without having to rely on the author of a program using thread naming exceptions we started monitoring CreateThread() to get the thread start address, then resolve that address into a symbol name (assuming debugging symbols are available) and use that symbol name to represent the thread. After some tests (done with our own tools – which are heavily multithreaded – dog-fooding) we concluded this was a useful way to name threads.

However this doesn’t solve the problem as many threads are now named _threadstartex().


_beginthread(), _beginthreadex() And Symbols

The problem with the CreateThread() approach is that any threads created by calls to the C runtime functions _beginthread() or _beginthreadex() will result in the thread being called _threadstart() or _threadstartex(). This is because these functions pass their own thread function to CreateThread(). The way to solve this problem is to also monitor _beginthread() and _beginthreadex() functions to get the thread start address, then resolve that address into a symbol name.


SetThreadDescription

Starting with Windows 10, a name can be associated with a thread handle via the API SetThreadDescription(). The name associated with a thread handle can be retrieved using GetThreadDescription(). We use these functions to provide names for your threads if you have used these functions.

Un-named Threads

Some threads still end up without names – typically these are transient threads created by the OS to do a short amount of work. If the call to create the thread was internal to Kernel32.dll then CreateThread() will not be called by an IAT call and will not be monitored, resulting in the thread not getting named automatically. This isn’t ideal, but ultimately you don’t control these threads, which means you can’t affect the data reported by these threads, so it’s not that important.

Thread Naming Priority

We’ve made these changes to all our Validator tools and updated the user interfaces to represent threads by both thread id and thread name.

If a thread is named by a thread naming exception, SetThreadDescription(), or via a Software Verify API call that name will take precedence over any automatically named thread. This is useful in cases where the same function is used by many threads – you can if you wish give each thread a unique name to prevent multiple threads having the same name.

Here are some of the displays that have benefited from named threads.

Bug Validator


Memory Validator


Performance Validator


Thread Validator



Panorama Theme by Themocracy