We’ve just changed how we launch executables and attach to them at launch time. We’ve also changed how we inject into running executables. This blog post outlines the changes and the reasoning behind the changes.
The injected DLL
Microsoft documentation for DllMain() states that only certain functions can be called from DllMain() and you need to be careful about what you call from DllMain(). The details and reasons for this have never been made very explicit, but the executive summary is that you risk getting into a deadlock with the DLL loader lock (a lock over which you, as a programmer, have no control).
Even though this dire warning existed from the first alpha versions of our tools in 1999 until July 2015 we’ve been starting our profilers by making call from DllMain when it receives the DLL_PROCESS_ATTACH notification. We’ve never had any major problems with this, but that’s probably because we just tried to keep things simple. There are some interesting side-benefits of starting your profiler this way – the profiler stub just starts when you load the profiler DLL. You don’t need to call a specific function to start the profiler DLL. This is also the downside, you can’t control if the profiler stub starts profiling the application or not, it always starts.
Up until now the ability for the profiler to auto-start has been a useful attribute. But we needed to change this so that we could control when the profiler stub starts profiling the application into which it is injected. These changes involved the following:
- Removing the call to start the profiler from DllMain().
- Adding additional code to the launch profiler CreateProcess() injector to allow a named function to be looked up by GetProcAddress() then called.
- Changing all calls to the launch process so that the correct named function is identified.
- Finding all places that load the profiler DLL and modifying them so that they know to call the appropriate named function.
Injecting into running executables
The above mentioned changes also meant that we had to change the code that injects DLLs into running executables. We use the well documented technique of using CreateRemoteThread() to inject our DLLs into the target application. We now needed to add the ability to specify a DLL name, a function name, LoadLibrary() function address and GetProcAddress() function address and error handling into our dynamically injected code that can be injected into a 32 bit or 64 bit application from our tools.
A useful side effect of this change from DllMain() auto-start to the start function being called after the DLL has loaded is that thread creation happens differently.
When the profiler stub starts via a call to the start profiler from DllMain() any threads created with CreateThread()/beginthread()/beginthreadex() all wait until the calling DllMain() call terminates before these threads start running. You can create the threads and get valid thread handles, etc but they don’t start working until the DllMain() returns. This is to part of Microsoft’s own don’t-cause-a-DLL-loader-lock-deadlock protection scheme. This means our threads that communicate data to the profiler GUI don’t run until the instrumentation process of the profiler is complete (because it all happens from a call inside DllMain()).
But now we’ve changed to calling the start profiler function from after the LoadLibrary() call all the threads start pretty much when we create them. This means that all data comms with the GUI get up to speed immediately and start sending data even as the instrumentation proceeds. This means that the new arrangement gets data to the GUI faster and the user of the software starts receiving actionable data in quicker than the old arrangement.
In doing this work we noticed some inconsistencies with some of our tools (Coverage Validator and Thread Validator for instance) when working with an elevated Validator and non-elevated target application if we were injecting into the running target application. The shared memory used by these tools to communicate important profiling data wasn’t accessible due to permissions conflicts between the two processes. This was a problem because we were insisting the the Validator should be elevated prior to performing an injection into any process.
A bit of experimentation showed that under the new injection regime described above that we didn’t need to elevate the Validator to succeed at injecting into the target non-elevated application. It seems that you get the best results when the target application and the Validator require the same elevation level. This is also important for working with services as they tend to run with elevated permissions these days – but injecting into services is always problematic due to different security regimes for services.
This insight allowed us to remove the previously mandatory "Restart with administrator privileges" dialog and move any potential request to elevate privileges into the inject wizard and inject dialog. In this article I will describe the inject dialog, the changes to the inject wizard are similar with minor changes to accommodate the difference between dialog and wizard.
Depending upon Operating System and the version of the software there are two columns that may or may not be displayed on the inject dialog. The display can be sorted by all columns.
When running on Windows Vista or any more recent operating system the inject dialog will display an additional column titled Admin. If a process is running with elevated permissons this column will display an administrator permissions badge indicating the elevation may be required to inject into this process.
When running 64 bit versions of our tools an additional column, titled Arch will be added to the inject dialog. This column indicates if the process is a 32 bit process or 64 bit process. We could have added a control to allow only 32 bit or 64 bit processes to be displayed but our thinking is that examining this column will be something is only done for confirmation if the user of our tools is working on both 32 bit and 64 bit versions of their tools. As such having to find the process selector and select that you are interested in 32 bit tools is overhead the user probably doesn’t need.