Rss Feed
Tweeter button
Facebook button
Technorati button
Reddit button
Myspace button
Linkedin button
Webonews button
Delicious button
Digg button
Flickr button
Stumbleupon button
Newsvine button

How to commit suicide – outsource your software development

By , July 20, 2010 10:05 am

Is that headline a bit too strong? I don’t think so. Allow me to explain.

This article applies to any company where your main product, or a substantial part of your main product is software and the software is a key part of your product. For example if you made top end oscilloscopes with an embedded PC inside them, the software to run the oscilloscope on top of the embedded OS would be important. This is unlike say if you are AOL, where the dialup software is important but can be easily outsourced without impacting your business at all.

We receive approaches via email, via Linked-In, etc, from outsourcing firms on a regular basis. Their message extoll the virtues of their multi-talented legion of software developers that can be hired from as little as $5.00 / hour. On the face of it getting staff at $5.00 sounds great, if you assume the staff know how to do their job. The list of technologies, computer languages and computing platforms they support is vast. You almost wonder how they do it. And at $5.00/hour. Whats the catch?

Cost

$5/hour. 10$/hour. Either of these rates is really attractive. No surprise, that is pretty much their unique selling point.

In pure financial terms this looks like a win. But the true cost of using outsourced software development is not the expense of paying for it. The cost is elsewhere. I’ll come back to this later.

Location and Language

Typically these operations are setup in low cost parts of the world, usually developing countries, ex-eastern bloc countries, India, China. Often with a nice shiny headquarters in the US or the UK.

Communication

Everyone (well, almost) has access to the Internet these days. Distributed teams are not new. Lots of people do it. We do it. The most famous web product team on the planet does it. It is quite possible your outsourcing company will work this way. If that is important to you, you should find out.

You will need a core spoken/written language for communications. That will probably be English, just because that is the way the world works and software tends to be written that way.

You’ll want to make sure any source code comments, version control commits and design specs/documents/presentations are also in English. I’ve seen comments in French in source code bases that were substantially English. No problem with French except that practically no one in the building knew what the comments said, except for the mathematician that wrote them. Is that useful? I don’t think so. Neither did the management when they found out 🙁

You probably have to assume that most of the people that will end up developing software for you will have English as a second or third language. This generally isn’t a problem, but I’ve found a few exceptions. Different languages have different rules on sentence construction – where the verb goes, how adjectives and nouns are used and so on. I’ve found over the years, from corresponding with our customer base, the I struggle with the sentence verb/noun/adjective formation rules used by some parts of the world which are not the same as English rules. When translated from their native language to English you get a strange beast which looks like English but which reads very differently. Everyone is different, you may not have that problem, but I struggle with it.

Staff

Is it reasonable to assume these people have good staff? I think it is. I think its also reasonable to assume they probably have some useless staff too, most companies do, so I don’t see why these companies would be any different.

Sounds OK so far, so what is the problem?

The main problem with outsourced software development is that as the development proceeds the following happens:

  • You are paying the outsourcing firm to learn the technologies required to build your product.
  • You are building up a reservoir of knowledge in the brains of their employees, not your employees.
  • At some point the outsourcing company will know more about the internals of your product that you do. At that point you become dependent upon the outsourcing company. A perfect time for them to raise their costs. I don’t know if this happens, but logic dictates that it should.
  • Once your product is complete, you have a lot of software that works and has no bugs (well, you can live in hope), but you don’t know anything about it. Is that a good position to be in? Sure you’ve got documentation on it and the full version control history. But fundamentally the knowledge is in the brains of people that don’t work for your company. How are you going to handle bug reports? Are you going to let your staff lose on a codebase they don’t know or are you going to hire the outsourcing company again? At what point can you break free from the outsourcing company?
  • The staff at the outsourcing company do know how your product works. If they want to go and write their own product in the same marketspace they can, and they will do it better as they will have all the knowledge about the pitfalls and mistakes made creating your product. Of course you can mitigate against this will a clause in your contract forbidding such competition.
  • Fundamentally, the software development business is about developing intellectual property. You want as much of that IP in the heads of the people that work for you, not in the heads of people that do not work for you. Outsourcing this work puts this knowledge outside of your company and is ultimately commercial suicide. Of course, staff can leave your company and move on to forge their careers. But that happens over time, you can manage that. Not the same as all the knowledge in the brains of the a separate company’s employees.

All the above reasons put you at a substantial disadvantage compared to your competitors that do not outsource their software development.

Exceptions

I can think of one exception to the above, but it is not really outsourcing. The exception is when you have one small discrete component of your software that you would like implemented and all you want is a functioning piece of code that can live in a DLL and get called to do its job. Examples would be a GUI widget library, a data compression library, a disassembler for a specific microprocessor etc. Typically you can find companies or lone developers that do things like this. They often have working examples you can try with very reasonable terms for use.

This is outsourcing in that you didn’t do the work, but it falls into the “not core activity” classification I mentioned at the start of the article, so I don’t think this really counts.

Conclusion

I always reply to people that approach us asking us to outsource our work to them. I explain that I regard their offer as asking us to commit commercial suicide. Normally I don’t get a reply. But I did get a reply from one gentleman, who shall remain nameless. Here is the edited exchange. Edited just to get to the nitty gritty. I have not edited any words or punctuation in the sentences.

Software Verification:
"We are not interested in outsourcing. Software development is a core activity here – outsourcing it is equivalent to outsourcing the core intellectual property. For our business that is suicide. For other businesses it makes sense."

Outsourcing reply:
"Honestly, I totally agree with you. But such is the reality nowadays.
…"

He then explained that he thought there were valid reasons for us to use his services. Good salesman, I guess.

Share

Problems building Firefox

By , July 20, 2010 8:35 am

I’ve been building Firefox on Windows recently. Without problems. But then I wanted a build with symbols and that is when the problems started. It is meant to be simple, and it is, so long as you don’t do a

make -f client.mk clean

If you do that, then a subsequent build will fail when it comes to updater.exe.manifest.

Setup

Assuming you have downloaded and installed Mozilla build and have also installed a suitable version of Visual Studio and the latest Microsoft Web SDK, you can start the build environment by running c:\mozilla-build\start_msvc9.bat (change the number for the version of Visual Studio you are using).

Read these instructions first.

Download the appropriate source tree with a mercurial command like this:

hg clone http://hg.mozilla.org/releases/mozilla-1.9.2/ src192

Basic Build

I’m building on drive M: and have downloaded the source into folder src192.

cd M:/src192

Configure the build with:

echo '. $topsrcdir/browser/config/mozconfig' > mozconfig

Build the build with

make -f client.mk build

This gives you a firefox release build. Strangely the build is created without debug symbols. I don’t understand this as debug symbols are really useful even with release builds. The symbols don’t have to ship with the executable code, so there is no good reason for not creating them.

Given that you’ve downloaded the source to build yourself the image its fair to assume that you are probably curious about the internal works or that you may be interested in modifying the source and/or writing an extension for it. Either way, you are going to want the symbols so that you can see what is happening in a debugger. Thus I don’t understand the lack of symbols by default.

Creating symbols

You can create symbols in several ways, by adding the following lines to your mozconfig.

echo 'ac_add_options --enable-debug' >> mozconfig
echo 'export MOZ_DEBUG_SYMBOLS=1' >> mozconfig
echo 'ac_add_options --enable-debugger-info-modules=yes' >> mozconfig

Getting a working build

At this point you’ve already done a build and have created executables. Thus it seems the appropriate thing to do is a make -f client.mk clean followed by a make -f client.mk build. This won’t work. The problem is that the clean deletes the manifest files that are present in the original mercurial source download.

In the end I wiped the whole source tree, downloaded from the source control again, modified mozconfig with the debug symbol options and did a build. That works.

This took me several attempts to understand – I thought that my various attempts at configuring the debug symbol options were breaking something, so I would change the options, clean then build. Each time taking several hours. I had this working in a virtual machine while I worked on other tasks. I probably spent about a day, including the build times before I decided to try again by starting from scratch.

Solution

I don’t know if this is true for non-Windows firefox builds, but if you want your builds to succeed don’t do a make -f client.mk clean prior to make -f client.mk build as it will prevent you from successfully building firefox.

I hope this information helps prevent you from wasting your time with this particular build problem.

Share

Doing good work can make you feel a bit stupid

By , July 19, 2010 6:08 pm

Doing good work can make you feel a bit stupid, well thats my mixed bag of feelings for this weekend. Here is why…

Last week was a rollercoaster of a week for software development at Software Verification.

Off by one, again?

First off we found a nasty off-by-one bug in our nifty memory mapped performance tools, specifically the Performance Validator. The off-by-one didn’t cause any crashes or errors or bad data or anything like that. But it did cause us to eat memory like nobodies business. But for various reasons it hadn’t been found as it didn’t trigger any of our tests.

Then along comes a customer with his huge monolithic executable which won’t profile properly. He had already thrown us a curve balled by supplying it as a mixed mode app – half native C++, half C#. That in itself causes problems with profiling – the native profiler has to identify and ignore any functions that are managed (.Net). He was pleased with that turnaround but then surprised we couldn’t handle his app, as we had handled previous (smaller) versions of his app. The main reason he was using our profiler is that he had tried others and they couldn’t handle his app – and now neither could we! Unacceptable – well that was my first thought – I was half resigned to the fact that maybe there wasn’t a bug and this was just a goliath of an app that couldn’t be profiled.

I spent a day adding logging to every place, no matter how insignificant, in our function tree mapping code. This code uses shared memory mapped space exclusively, so you can’t refer to other nodes by addresses as the address in one process won’t be valid in the other processes reading the data. We had previously reorganised this code to give us a significant improvement in handling large data volumes and thus were surprised at the failure presented to us. Then came a long series of tests, each which was very slow (the logging writes to files and its a large executable to process). The logging data was huge. Some of the log files were GBs in size. Its amazing what notepad can open if you give it a chance!

Finally about 10 hours in I found the first failure. Shortly after that I found the root cause. We were using one of our memory mapped APIs for double duty. And as such the second use was incorrect – it was multiplying our correctly specified size by a prefixed size offset by one. This behaviour is correct for a different usage. Main cause of the problem – in my opinion, incorrectly named methods. A quick edit later and we have two more sensibly named methods and a much improved memory performance. A few tests later and a lot of logging disabled and we are back to sensible performance with this huge customer application (and a happy customer).

So chalk up one “how the hell did that happen?” followed by feelings of elation and pleasure as we fixed it so quickly.
I’m always amazed by off-by-one bugs. It doesn’t seem to matter how experienced you are – it does seem that they do reappear from time to time. Maybe that is one of the persils of logic for you, or tiredness.

I guess there is a Ph.D. for someone in studying CVS commits, file modification timestamps and off-by-one bugs and trying to map them to time-of-day/tiredness attributes.

That did eat my Wednesday and Thursday evenings, but it was worth it.

Not to be outdone…

I had always thought .Net Coverage Validator was a bit slow. It was good in GUI interaction tests (which is part of what .Net Coverage Validator is about – realtime code coverage feedback to aid testing) but not good on long running loops (a qsort() for example). I wanted to fix that. So following on from the success with the C++ profiling I went exploring an idea that had been rattling around in my head for some time. The Expert .Net 2.0 IL Assembler book (Serge Lidin, Microsoft Press) was an invaluable aid in this.

What were we doing that was so slow?

The previous (pre V3.00) .Net Coverage Validator implementation calls a method for each line that is visited in a .Net assembly. That method is in a unique DLL and has a unique ID. We were tracing application execution and when we found our specific method we’d walk up the callstack one item and that would be the location of a coverage line visit. This technique works, but it has a high overhead:

  1. ICorProfiler / ICorProfiler2 callback overhead.
  2. Callstack walking overhead.

The result is that for GUI operations, code coverage is fast enough that you don’t notice any problems. But for long running functions, or loops code coverage is very slow.

This needed replacing.

What are we doing now that is so fast?

The new implementation doesn’t trace methods or call a method of our choosing. For each line we modify a counter. The location of the counter and modification of it are placed directly into the ilAsm code for each C#./VB.Net method. Our first implementation of .Net Coverage Validator could not do this because our shared memory mapped coverage data architecture did not allow it – the shared memory may have moved during the execution run and thus the embedded counter location would be invalidated. The new architecture allows the pointer to the counter to be fixed.

The implementation and testing for this only took a few hours. Amazing. I thought it was going to fraught with trouble, not having done much serious ilAsm for a year or so.

Result?

The new architecture is so lightweight that you barely notice the performance overhead. Less than 1%. Your code runs just about at full speed even with code coverage in place.

As you can imagine, getting that implemented, working and tested in less than a day is an incredible feeling. Especially compared to the previous performance level we had.

So why feel stupid?

Having acheived such good performance (and naturally feeling quite good about yourself for a while afterwards) its hard not to look back on the previous implementation and think “Why did we accept that?, We could have done so much better”. And that is where the feeling stupid comes in. You’ve got to be self critical to improve. Pat yourself on the back for the good times and reflect on the past to try to recognise where you could have done better so that you don’t make the same mistake in the future.

And now for our next trick…

The inspiration for our first .Net Coverage Validator implementation came from our Java Coverage Validator tool. Java opcodes don’t allow you to modify memory directly like .Net ilAsm does, so we had to use the method calling technique for Java. However given our success with .Net we’ve gone back to the JVMTI header files (which didn’t exist when we first wrote the Java tools) and have found there may be a way to improve things. We’ll be looking at that soon.

Share

How to prevent a memory tool from monitoring your C/C++ allocations

By , July 10, 2010 10:53 am

A little known fact is that the Microsoft C Runtime (CRT) has a feature which allows some allocations (in the debug runtime) to be tagged with flags that causes these allocations to be ignored by the built in memory tracing routines. A good memory allocation tool will also use these flags to determine when to ignore memory allocations – thus not reporting any allocations that Microsoft think should remain hidden.

A customer problem

The inspiration for this article was a customer reporting that Memory Validator was not reporting any allocations in a particular DLL of his mixed mode .Net/native application. The application was interesting in that it was a combination of C#, C++ written with one version of Visual Studio and some other DLLs also written in C++ with another version of Visual Studio. Only the memory for one of the DLLs was not being reported by Memory Validator and the customer wanted to know why and could we please fix the problem?

After some investigation we found the problem was a not with Memory Validator but with the DLL in question making a call to _CrtSetDbgFlag(0); which turned off all memory tracking for that DLL. Memory Validator honours the memory tracking flags built into Visual Studio and thus did not report these memory allocations. Armed with this information the customer did some digging into their code base and found that someone had deliberately added this call into their code. Removing the call fixed the problem.

The rest of this article explains how Microsoft tags data to be ignored and what flags are used to control this process.

Why does Microsoft mark these allocation as ignore?

The reason for this is that these allocations are for internal housekeeping and sometimes also for one-off allocations that will exist until the end of the application lifetime. Such allocations could show up as memory leaks at the end of the application – that would be misleading as they were intended to persist. Better to mark them as “ignore” and not report them during a memory leak report.

Microsoft debug CRT header block

Microsoft’s debug CRT prefixes each allocation with a header block. That header block looks like this:

#define nNoMansLandSize 4

typedef struct _CrtMemBlockHeader
{
    struct _CrtMemBlockHeader * pBlockHeaderNext;
    struct _CrtMemBlockHeader * pBlockHeaderPrev;
    char *                      szFileName;
    int                         nLine;
#ifdef _WIN64
    /* These items are reversed on Win64 to eliminate gaps in the struct
     * and ensure that sizeof(struct)%16 == 0, so 16-byte alignment is
     * maintained in the debug heap.
     */
    int                         nBlockUse;
    size_t                      nDataSize;
#else  /* _WIN64 */
    size_t                      nDataSize;
    int                         nBlockUse;
#endif  /* _WIN64 */
    long                        lRequest;
    unsigned char               gap[nNoMansLandSize];
    /* followed by:
     *  unsigned char           data[nDataSize];
     *  unsigned char           anotherGap[nNoMansLandSize];
     */
} _CrtMemBlockHeader;

How does Microsoft tag an allocation as ignore?

When the CRT wishes an allocation to be ignored for memory tracking purposes, six values in the debug memory allocation header for each allocation are set to specific values.

Member Value #define
nLine 0xFEDCBABC IGNORE_LINE
nBlockUse 0x3 IGNORE_BLOCK
lRequest 0x0 IGNORE_REQ
szFileName NULL
pBlockHeaderNext NULL
pBlockHeaderPrev NULL

The Microsoft code goes out of its way to ensure no useful information can be gained from the header block for these ignored items.

When we first created MV we noticed that items marked as ignored should be ignored, otherwise you can end up with FALSE positive noise reported at the end of a memory debugging session due to the internal housekeeping of MFC/CRT.

How can you use this information in your application?

Microsoft also provides some flags which you can control which allows you to influence if any memory is reported as leaked. This is in addition to the CRT marking its own allocations as “ignore”. You can set these flags using the _CrtSetDbgFlag(int); function.

The following flags can be passed to _CrtSetDbgFlag() in any combination.

Flag Default Meaning
_CRTDBG_ALLOC_MEM_DF On On: Enable debug heap allocations and use of memory block type identifiers.
_CRTDBG_CHECK_ALWAYS_DF Off On: Call _CrtCheckMemory at every allocation and deallocation request. (Very slow!)
_CRTDBG_CHECK_CRT_DF Off On: Include _CRT_BLOCK types in leak detection and memory state difference operations.
_CRTDBG_DELAY_FREE_MEM_DF Off Keep freed memory blocks in the heap’s linked list, assign them the _FREE_BLOCK type, and fill them with the byte value 0xDD. CAUTION! Using this option will use lots of memory.
_CRTDBG_LEAK_CHECK_DF Off ON: Perform automatic leak checking at program exit via a call to _CrtDumpMemoryLeaks and generate an error report if the application failed to free all the memory it allocated.

How do I disable memory tracking for the CRT?

If you call _CrtSetDbgFlag(0); any memory allocated after that point will not be tracked.

With the above settings, all blocks are marked as ignore. You can see the code for this in the Microsoft C runtime.

The code that marks the block as “ignore” is at line 404 in dbgheap.c in the Microsoft C runtime (also used by MFC). When your code arrives here, nBLockUse == 1 and _crtDbgFlag == 0.

dbgheap.c line 404 (line number will vary with Visual Studio version)
                if (_BLOCK_TYPE(nBlockUse) != _CRT_BLOCK &&
                     !(_crtDbgFlag & _CRTDBG_ALLOC_MEM_DF))
                     fIgnore = TRUE;

This sets fIgnore to TRUE. From this point onwards the memory tracking code ignores the memory and sets the values mentioned above in the memory block header.

Default values

The default value for _crtDbgFlag is set elsewhere in the Microsoft code with this line:

extern "C"
int _crtDbgFlag = _CRTDBG_ALLOC_MEM_DF | _CRTDBG_CHECK_DEFAULT_DF;
Share

Don’t use srand(clock()), use srand((unsigned)time(NULL)) instead

By , July 9, 2010 9:42 am

Typically you use srand() when you need to start the random number generator in a random place. You may do this because you are going to generate some keys or coupons and want them to start in an unpredictable place.

From time to time we provide special offers to customers in the form of a unique coupon code that can be used at purchase to get a specific discount. These coupons are also used to provide discounts to customers upgrading from say Performance Validator to C++ Developer Suite so that they do not pay for Performance Validator twice.

When the coupon management system was written, we used srand(clock()) thinking that would be an acceptable random value for generating coupons. The thinking was the management system would be running all the time and thus clock() would return a value that was unlikely to be hit twice for the number of valid coupons at any one time. However, the way the system is used is that users close the coupon management system when not in use and thus clock() will return values close to the starting time (start the app, navigate to the appropriate place, generate a coupon).

Result: Sooner or later a duplicate coupon is created. And that is when we noticed this problem.

This resulted in a confused customer (“My coupon has already been used”), a confused member of customer support (“That shouldn’t be possible!”) followed by some checking of the coupon files and then the code to see how it happened. Easy to fix, but better selection of the seed in the first place would have prevented the problem.

So if you want better random numbers don’t use clock() to seed srand().

Better seeds

  • Use time(NULL) to get the time of day and cast the result to seed srand().
    time(NULL) returns the number of seconds elapsed since midnight January 1st, 1970.
  • Use rdtsc() to get the CPU timestamp and cast the result to seed srand().
    rdtsc() is unlikely to return duplicate values as it returns the number of instructions executed by the processor since startup.
Share

Panorama Theme by Themocracy