Rss Feed
Tweeter button
Facebook button
Technorati button
Reddit button
Myspace button
Linkedin button
Webonews button
Delicious button
Digg button
Flickr button
Stumbleupon button
Newsvine button

Posts tagged: Memory

We’ve been quiet for a while, sorry about that.

By , November 28, 2016 3:29 pm

Hi,

It’s been a while since we posted anything on the blog. If you weren’t a customer, regularly receiving our software update emails you might think we weren’t going anything.

That’s an oversight on our part. We’re hoping to rectify this over the next few months, posting more useful information both here and in the library.

_tempnam and friends

Our most recent update has been to update C++ Memory Validator provide memory tracking for the _tempnam group of functions. These are _tempnam, _tempnam_dbg, _wtempnam, _wtempnam_dbg.

This support is for all supported compilers, from Visual Studio 2015, Visual Studio 2013, Visual Studio 2012, Visual Studio 2010, Visual Studio 2008, Visual Studio 2005, Visual Studio 2003, Visual Studio 2002, Visual Studio 6. Delphi and C++ Builder, Metrowerks compiler, MingW compiler.

.Net support, for the future

Internal versions of C++ Coverage Validator can provide code coverage statistics for .Net (C#, VB.Net, J#, F#, etc) as well as native languages (C++, C, Delphi, Fortran 95, etc).

Internal versions of C++ Performance Validator can provide performance profiling statistics for .Net (C#, VB.Net, J#, F#, etc) as well as native languages (C++, C, Delphi, Fortran 95, etc).

UX improvements

All tools, free and paid, have had the UX for filename and directory editing improved so that if a filename doesn’t exist it is displayed in red and if it does exist it is displayed in it’s normal colour (typically black). See screenshots (from Windows 8.1).

Non existent filename:
filenamebad

Existing filename:
filenamegood

Share

Marmalade game SDK support

By , December 16, 2015 12:02 pm

We’ve recently added support for the Marmalade game SDK to C++ Memory Validator.

This article will show you how to configure a Marmalade project for use with C++ Memory Validator, how to setup C++ Memory Validator for use with Marmalade and how to launch a Marmalade game from C++ Memory Validator.

Configuring your project settings

To work with C++ Memory Validator you need to build the x86 Debug configuration and/or the x86 Release configuration of your Marmalade project using Visual Studio.

These configurations need to be built so that they create debug information and so that a PDB file containing debug information is created. The example projects that ship with Marmalade do not do this – you will need to edit them to make the linker stage create debug information.

Editing the compile stage debug information

marmalade-vs-compile


Editing the link stage debug information

marmalade-vs-linker


You must ensure that both compile and link stages have the correct settings set. If only compile or only link is set you will not get debugging symbols.

Debugging symbols are important for two reasons:

  • Without symbols C++ Memory Validator cannot find the Marmalade memory allocators and will not be able to track the Marmalade memory allocations your game makes.
  • Without symbols C++ Memory Validator will not be able to turn callstack addresses into class names, function names, filenames and line numbers.

Configuring C++ Memory Validator

In order to work correctly with Marmalade we need to make sure we’re only going to track the memory allocation your game makes with Marmalade and not any of the work that the Marmalade game engine is doing itself. We need to make a few simple changes to C++ Memory Validator.

  • Open the settings dialog. Click Reset. This will reset all C++ Memory Validator settings to the default.
  • Go to the Collect tab, disable the check boxes in the top two groups of check boxes, then enable the single Marmalade check box. The settings should look like shown below.
  • Click OK.

settings-collect-marmalade


Launching the game

To launch a Marmalade game with C++ Memory Validator we launch the Marmalade simulator and specify the game to run using the Marmalade -via command line argument.

If Marmalade is installed in c:\marmalade then the path to the simulator is

c:/marmalade/8.0/s3e\win32\s3e_simulator_release.exe

If an example game (shipped with Marmalade) is found at this location

c:\Marmalade\8.0\examples\GameTutorial\Lua\Stage9\build_temp\build_stage9_vc14\Stage9_vc14_release.via

then the command line is

-via:"c:\Marmalade\8.0\examples\GameTutorial\Lua\Stage9\build_temp\build_stage9_vc14\Stage9_vc14_release.via"

and the startup directory is

c:\Marmalade\8.0\examples\GameTutorial\Lua\Stage9\build_temp\build_stage9_vc14\

We leave the Application to monitor unchanged. It should have the same value as Application to launch.

This is how the launch dialog looks when you are launching this game.

launch-marmalade


Click Go! to launch the Marmalade game. C++ Memory Validator will monitor all memory allocations, reallocations and deallocations made via the s3eMalloc, s3eRealloc, s3eFree functions. Run your game as normal, then close your game. Wait for C++ Memory Validator to show you any leaks on the Memory tab. Additional statistics can be views on the Objects, Sizes, Timeline and Hotspots tabs.

Share

Improving MFC memory performance

By , January 2, 2012 12:16 pm

If you are using MFC arrays it is possible in quite a few cases to improve the speed and memory performance of these arrays. This applies to the standard CStringArray, CUIntArray and similar classes and also to template classes based upon the CArray template.

If you are not using MFC but using another framework or set of classes that provide similar functionality you can often find similar functions in those classes that will allow you to get a similar benefit to what I will describe in this article.

The Problem

The problem is the typical use of populating the array is by calling the Add() method to add something to the array. No actual problem with that, its simple and straightforward enough. However, under the hood, each time you call Add() the array class has to reallocate more memory of the data stored in the class.

This reallocation has a cost. The cost is increased CPU usage as suitable memory space is searched for by the memory allocator and memory is copied and reassigned inside the class. For small arrays this is not really a problem.

However for larger arrays this becomes quite a noticeable issue. In addition you also run into potential memory fragmentation issues, where memory "holes" of an unusuable size are left in the memory managed by the memory allocator. Should enough of these holes occur you can suffer out of memory conditions even when Task Manager tells you you have available memory. Frustrating! I’ll cover Memory Fragmentation in a different article.

Here is a (simplified) example of the type of problem:

// read data from the serialization archive and store in an array

DWORD i, n;

ar >> n; 
for(i = 0; i < n; i++)
{
    someClass *sc;

    sc = new someClass();
    if (sc != NULL)
    {
        sc->load(ar);
        array.Add(sc);
    }
}

The Solution

In the case shown above we know how many objects we require storage for beforehand. This means we can tell the array how many objects to store and only perform one memory allocation to set aside storage for the array. This has CPU benefits and also because there are no repeated calls to reallocate the memory the likelihood of fragmentation occurring diminishes dramatically. In many cases, completely removed from the scenario.

To set the size beforehand we need to call SetSize(size); and to place data in the array we no longer use Add();, but use SetAt(index, data); instead.

Here is the reworked example:

// read data from the serialization archive and store in an array

DWORD i, n;

ar >> n; 
array.SetSize(n);
for(i = 0; i < n; i++)
{
    someClass *sc;

    sc = new someClass();
    if (sc != NULL)
    {
        sc->load(ar);
        array.SetAt(i, sc);
    }
}

For large volumes of data the above implementation can be noticeably faster.

Caveats

When you preallocate memory like this you must be aware that if you don’t fill all locations in the array using SetAt() you may get errors when you call GetSize() to get the array size and GetAt(i) to retrieve data.

// read data from the serialization archive and store in an array
// we won't store all data, leaving some unused memory at the end of
// the array

DWORD i, n, c;

ar >> n; 
c = 0;
array.SetSize(n);
for(i = 0; i < n; i++)
{
    someClass *sc;

    sc = new someClass();
    if (sc != NULL)
    {
        sc->load(ar);
        if (sc->IsEmpty())
        {
            // discard
 
            delete sc;
        }
        else
        {
            array.SetAt(c, sc);
            c++;
        }
    }
}

GetSize() will return the size of the array you set when you called SetSize(), this is not necessarily the number of items in the array – this will come as a surprise to people used to adding data by calling Add().

To fix this, use FreeExtra() to remove any unused items from the end of the array. You can also use GetUpperBound() to find the largest index that is used by the array. The example below shows this.

// read data from the serialization archive and store in an array
// we won't store all data, leaving some unused memory at the end of
// the array

DWORD i, n, c;

ar >> n; 
c = 0;
array.SetSize(n);
for(i = 0; i < n; i++)
{
    someClass *sc;

    sc = new someClass();
    if (sc != NULL)
    {
        sc->load(ar);
        if (sc->IsEmpty())
        {
            // discard
 
            delete sc;
        }
        else
        {
            array.SetAt(c, sc);
            c++;
        }
    }
}

// make sure array.GetSize() returns the max number of items used

array.FreeExtra();
Share

Doing good work can make you feel a bit stupid

By , July 19, 2010 6:08 pm

Doing good work can make you feel a bit stupid, well thats my mixed bag of feelings for this weekend. Here is why…

Last week was a rollercoaster of a week for software development at Software Verification.

Off by one, again?

First off we found a nasty off-by-one bug in our nifty memory mapped performance tools, specifically the Performance Validator. The off-by-one didn’t cause any crashes or errors or bad data or anything like that. But it did cause us to eat memory like nobodies business. But for various reasons it hadn’t been found as it didn’t trigger any of our tests.

Then along comes a customer with his huge monolithic executable which won’t profile properly. He had already thrown us a curve balled by supplying it as a mixed mode app – half native C++, half C#. That in itself causes problems with profiling – the native profiler has to identify and ignore any functions that are managed (.Net). He was pleased with that turnaround but then surprised we couldn’t handle his app, as we had handled previous (smaller) versions of his app. The main reason he was using our profiler is that he had tried others and they couldn’t handle his app – and now neither could we! Unacceptable – well that was my first thought – I was half resigned to the fact that maybe there wasn’t a bug and this was just a goliath of an app that couldn’t be profiled.

I spent a day adding logging to every place, no matter how insignificant, in our function tree mapping code. This code uses shared memory mapped space exclusively, so you can’t refer to other nodes by addresses as the address in one process won’t be valid in the other processes reading the data. We had previously reorganised this code to give us a significant improvement in handling large data volumes and thus were surprised at the failure presented to us. Then came a long series of tests, each which was very slow (the logging writes to files and its a large executable to process). The logging data was huge. Some of the log files were GBs in size. Its amazing what notepad can open if you give it a chance!

Finally about 10 hours in I found the first failure. Shortly after that I found the root cause. We were using one of our memory mapped APIs for double duty. And as such the second use was incorrect – it was multiplying our correctly specified size by a prefixed size offset by one. This behaviour is correct for a different usage. Main cause of the problem – in my opinion, incorrectly named methods. A quick edit later and we have two more sensibly named methods and a much improved memory performance. A few tests later and a lot of logging disabled and we are back to sensible performance with this huge customer application (and a happy customer).

So chalk up one “how the hell did that happen?” followed by feelings of elation and pleasure as we fixed it so quickly.
I’m always amazed by off-by-one bugs. It doesn’t seem to matter how experienced you are – it does seem that they do reappear from time to time. Maybe that is one of the persils of logic for you, or tiredness.

I guess there is a Ph.D. for someone in studying CVS commits, file modification timestamps and off-by-one bugs and trying to map them to time-of-day/tiredness attributes.

That did eat my Wednesday and Thursday evenings, but it was worth it.

Not to be outdone…

I had always thought .Net Coverage Validator was a bit slow. It was good in GUI interaction tests (which is part of what .Net Coverage Validator is about – realtime code coverage feedback to aid testing) but not good on long running loops (a qsort() for example). I wanted to fix that. So following on from the success with the C++ profiling I went exploring an idea that had been rattling around in my head for some time. The Expert .Net 2.0 IL Assembler book (Serge Lidin, Microsoft Press) was an invaluable aid in this.

What were we doing that was so slow?

The previous (pre V3.00) .Net Coverage Validator implementation calls a method for each line that is visited in a .Net assembly. That method is in a unique DLL and has a unique ID. We were tracing application execution and when we found our specific method we’d walk up the callstack one item and that would be the location of a coverage line visit. This technique works, but it has a high overhead:

  1. ICorProfiler / ICorProfiler2 callback overhead.
  2. Callstack walking overhead.

The result is that for GUI operations, code coverage is fast enough that you don’t notice any problems. But for long running functions, or loops code coverage is very slow.

This needed replacing.

What are we doing now that is so fast?

The new implementation doesn’t trace methods or call a method of our choosing. For each line we modify a counter. The location of the counter and modification of it are placed directly into the ilAsm code for each C#./VB.Net method. Our first implementation of .Net Coverage Validator could not do this because our shared memory mapped coverage data architecture did not allow it – the shared memory may have moved during the execution run and thus the embedded counter location would be invalidated. The new architecture allows the pointer to the counter to be fixed.

The implementation and testing for this only took a few hours. Amazing. I thought it was going to fraught with trouble, not having done much serious ilAsm for a year or so.

Result?

The new architecture is so lightweight that you barely notice the performance overhead. Less than 1%. Your code runs just about at full speed even with code coverage in place.

As you can imagine, getting that implemented, working and tested in less than a day is an incredible feeling. Especially compared to the previous performance level we had.

So why feel stupid?

Having acheived such good performance (and naturally feeling quite good about yourself for a while afterwards) its hard not to look back on the previous implementation and think “Why did we accept that?, We could have done so much better”. And that is where the feeling stupid comes in. You’ve got to be self critical to improve. Pat yourself on the back for the good times and reflect on the past to try to recognise where you could have done better so that you don’t make the same mistake in the future.

And now for our next trick…

The inspiration for our first .Net Coverage Validator implementation came from our Java Coverage Validator tool. Java opcodes don’t allow you to modify memory directly like .Net ilAsm does, so we had to use the method calling technique for Java. However given our success with .Net we’ve gone back to the JVMTI header files (which didn’t exist when we first wrote the Java tools) and have found there may be a way to improve things. We’ll be looking at that soon.

Share

Monitoring memory use in a JNI DLL called from Java

By , May 8, 2010 1:35 pm

Java is a garbage collected language that allows you to extend the language with code written in C and C++.

Given that the extensions are written in C or C++ the normal tools for monitoring Java memory usage will not report the C/C++ memory usage. So how do you monitor the memory usage of these JNI extensions when called from Java? This article is going to explain how to do this task.

Building an example JNI test

The first thing to do is to create an example to work with. The post Creating a JNI example for testing describes how to create a JNI example (both Java code and JNI C/C++ code, with downloadable source and project files). Please download the example code and build the example.

Monitoring JNI Memory

Monitoring memory in JNI extensions is straightforward. Java programs are executed by running a Java Virtual Machine (JVM). These are typically named java.exe, javaw.exe, jre.exe, jrew.exe. We can just launch java.exe with a C/C++ memory leak detection software tool and monitor the results. For this example we are going to monitor java with C++ Memory Validator. The image below shows the launch dialog (note if you are using Memory Validator for the first time you will be using the launch wizard, which is slightly different)

Memory Validator launch dialog showing launch of Java application

Items to note:

  • Application is set to the path to java.exe. C:\Program Files (x86)\Java\jdk1.5.0_07\bin\java.exe
  • Arguments is set to the name of the class to execute. Main
  • Startup directory is set to the directory containing the class to execute (and the native DLL to monitor). E:\om\c\memory32\testJavaJNI
  • If you wish to set the CLASSPATH you can set it in the Environment Variables part of the launch dialog. If you do not set the CLASSPATH here (as in this example) the CLASSPATH will be taken from the inherited environment variables of Memory Validator’s environment. For this example CLASSPATH is set in the global environment variables and thus does not need to be set on the launch dialog.

Click Go! to start the Java program. Java is started, Memory Validator monitors the application and records all memory allocations and deallocations. Any memory not deallocated by the end of the program is a leak.

JNI Leaks with no filtering

As you can see from the screenshot above, there is a quite a bit of memory left over after a simple run of this example Java program which does very little except load a native extension that prints Hello World! twice and deliberately leaks one 20 byte chunk of memory. The example image indicates there are 857 items, some of which are handles, the remainder C/C++ memory allocations. There are 3493 events. The memory leak we are interested in occurs at event 2985.

Clearly this is inefficient. To find memory leaks in your code you are going to have to wade through all the noise of the memory allocations made by Java and the JVM. There must be a better way!

There is. We’ll focus only on the native DLL that prints the Hello World! messages. Open the settings dialog and go to the Hooked DLLs section.

Setting up DLL filters to focus on the JNI DLL

  • Select the second radio box to indicate that only the DLLs we list will be monitored.
  • Now click Add Module… and select the HelloWorldImp.dll.
  • Click OK.

Memory Validator is now configured to monitor only HelloWorldImp.dll for memory and handle allocations.
Relauch the java application with these new settings.

JNI Leaks with DLL filtering

As you can see from the picture above much less data is collected. A total of 54 events compared to the previous session’s 3493 events. This is much more managable.

The list of items Memory Validator reports for this run contains only 11 events. This reduced amount makes it very easy to identify errors in the DLL.

  • 8 events are DLL loads (MV always reports DLL loads regardless of settings)
  • A one time, per-thread, buffer allocation inside printf
  • A memory leak that is deliberately present in the example JNI code
  • The final event is the status information for the application at the end of its run

If you don’t wish to see the DLL loads you can filter them out with a global, session or local filter.

Detail view of source code of memory leak in JNI code

The image above shows the source code of the location of the leaking memory in the JNI extension.

Conclusion

Monitoring memory allocations in JNI Dlls called from Java is a straightforward task.

Things to remember:

  • Ensure your JNI DLL is compiled with debugging information and linked with debugging information.
  • Ensure your JNI DLL debug information is present (put the PDB file in the same directory as the DLL).
  • Ensure your CLASSPATH is set correctly so that when Memory Validator starts your Java application the correct CLASSPATH is used.
Share

What is the difference between a page and a paragraph?

By , March 8, 2010 11:40 am

In the real world we all know that pages contain paragraphs and that paragraphs are full of sentences created from words.

In the world of Microsoft Windows Operating Systems it is somewhat different – paragraphs contain pages!

In this article I’m to going to explain what a page is and what a paragraph is and how they relate to each other and why this information can be useful helping to identify and resolve certain memory related bugs.

Virtual Memory Pages
A virtual memory page is the smallest unit of memory that can be mapped by the CPU. In the case of 32 bit x86 processors such as the Intel Pentium and AMD Athlon, a page is 4Kb. When you make a call to VirtualProtect() or VirtualQuery() you will be setting or querying the memory protection for sizes that are multiples of a page.

The size of a page may vary from CPU type to CPU type. For example a 64 bit x86 CPU will have a page size of 8Kb.

You can determine the size of a page by calling GetSystemInfo() and reading the SYSTEM_INFO.dwPageSize value.

Virtual Memory Paragraphs
A virtual memory paragraph is the minimum amount of memory that can be commited/reserved using the VirtualAlloc() call. On 32 bit x86 CPUs this value is 64Kb (0x00010000). If you have ever used the debugger and looked at the load addresses of DLLs in the Modules list you may have noticed that DLLs always load on 64Kb boundaries. This is the reason – the area a DLL is loaded into is initialised by a call to VirtualAlloc to reserve the memory prior to the DLL being loaded.

Loaded Modules

You can determine the size of a paragraph by calling GetSystemInfo() and reading the SYSTEM_INFO.dwAllocationGranularity value.

Given these values, you can see that (on 32 bit 86 systems) a virtual memory paragraph is composed of 16 virtual memory pages.

How can I use this information?
If you are using VirtualAlloc() it is important to know the granularity at which the allocations will be returned. This is the size of a paragraph. This information is fundamental in deciding how you would implement a custom heap. You know there are fixed boundaries at which your data can exist. You can enumerate the list of possible paragraph locations very quickly (there are 32,768 possible locations in a 2GB space, as opposed to 2 billion locations if the paragraph could start anywhere).

Custom heaps
If you are writing a custom heap, a key indicator to keep track of is memory fragmentation and memory utilisation. Knowing your paragraph and page sizes you can inspect how each page and each paragraph of memory are used by the application and the custom heap to determine if there is wastage, what wastage there is and what form the wastage takes. This information could lead you to modify your heap algorithm to use pages differently to reduce fragmentation. See Delete memory 5 times faster for one simple technique, using HeapAlloc, the same principles apply here.

Loading large data files
Another use for this information is finding out why a certain large file will not load into memory despite Task Manager saying that you have 2GB of free memory. It is not uncommon to find a forum posting somewhere from someone that has a large image file (a satellite phote, MRI scan, etc) that is about 1GB in size. They wish to load it into memory, do in-memory processing on it, save the results, discard the memory then repeat the process, often for numerous images.
Typically on the third attempt to load a large file, the file will not load and the forum poster is left very confused.

The typical implementation is to allocate space for the large file using a call such as malloc() or operator new(). Both of which use the C runtime heap to allocate the memory.

The principle seems fine, but the problem is caused by memory fragmentation which results in a less accessible, totally usable, free space because the remaining free space blocks are separated into a many smaller regions, most of which are smaller than any forthcoming large allocation required by the application. Without the information about where pages and paragraphs are situated, how big they are and what their status is, identifying the cause of this failure could be very time consuming. Once you know the cause, you can think about allocating and managing your memory differently and prevent the bug from happening in the first place.

For situations like these, using HeapAlloc() with a dedicate heap (created using HeapCreate()) or even just directly using VirtualAlloc() will most likely lead to superior results than using the C runtime heap.

Tools
A first step in understanding such bugs is to be able to visualize the memory and to also inspect the various page and paragraph information.

To aid in these tasks we have just added a VM Pages view and VM Paragraphs view to VM Validator to make identifying such issues easier. VM Validator is a free download.

Memory Validator will also be updated with a VM Paragraphs view in the next release (Memory Validator already has a more detailed VM Pages view).

Thank you to Blake Miller of Invensys for suggesting an alternative wording for one paragraph of this article.

Share

Delete memory 5 times faster

By , March 2, 2010 12:56 pm

Memory management in C and C++ is typically done using either the malloc/realloc/free C runtime functions or the C++ operators new and delete. Typically the C++ operators call down to the underlying malloc/free implementation to do the actual memory allocations.

This is great, its useful, it works, BUT it puts all the allocations in the same heap. So when you come around to deallocating the heap manager will need to take into account all the other objects and allocations that are also in the heap but unrelated to the data you are deallocating. That adds overhead to the heap manager, causing memory fragmentation and slower heap management.

There is a different way you can handle this situation – you can use your own heap for a given group of allocations.

	HANDLE	hHeap;

	hHeap = HeapCreate(0, 0x00010000, 0); // 64K growable heap

The downside to this is that you have to remember to use the correct allocators and deallocators for these objects and not to use malloc/free etc. You can mitigate this in C++ by overriding new/delete to use your own heap.

void *myClass::operator new(size_t size)
{
	return HeapAlloc(hHeap, 0, size);
}

void myClass::operator delete(void *ptr)
{
	HeapFree(hHeap, 0, ptr);
}

The upside to this technique is that you can delete all your deallocations in one call by deleting the heap and not bothering with deleting individual allocations. This is also about 5 times faster.

Old style:

	for(i = 0; i < count; i++)
	{
		HeapFree(hHeap, 0, ptrs[i]);
	}

New style:

	HeapDestroy(hHeap);
	hHeap = NULL;

There is another benefit to this technique: By deleting the heap and then re-creating a heap for new allocations you remove all fragmentation from the heap and start the new heap with 0% fragmentation.

HeapDestroy timing demonstration

Download the source of the demonstration application. Project and solution files for Visual Studio 6.0 and Visual Studio 2008 are provided.

We use this technique for some of our tools where we want a high performance heap and zero fragmentation.

You will also want to ensure that whatever software tool you are using to monitor memory allocations will mark all entries in a heap that is destroyed as deallocated.

Share

Panorama Theme by Themocracy