November 5, 2009

How to Crash every WPF application

Now that we’ve released Logos 4, our users are really helping us stress-test WPF. (It’s still a little remarkable that after three years, ours is the first (and only!) WPF application installed on many of our users’ systems.)

On one system, the application was crashing at startup, with the following exception:

System.TypeInitializationException: The type initializer for
  'System.Windows.Media.FontFamily' threw an exception. --->
  System.ArgumentException: Illegal characters in path. 
    at System.IO.Path.CheckInvalidPathChars(String path) 
    at System.IO.Path.GetFileName(String path) 
    at MS.Internal.FontCache.FontSourceCollection.SetFontSources() 
    at MS.Internal.FontCache.FontSourceCollection.GetEnumerator() 
    at MS.Internal.FontCache.FamilyCollection.BuildFamilyList(List`1& familyList,
      SortedDictionary`2& familyNameList, SortedList`2& frequentStrings) 
    at MS.Internal.FontCache.FamilyCollection.MS.Internal.FontCache.
      IFontCacheElement.AddToCache(CheckedPointer newPointer, ElementCacher cacher) 
    at MS.Internal.FontCache.HashTable.Lookup(IFontCacheElement e, Boolean add) 
    at MS.Internal.FontCache.CacheManager.Lookup(IFontCacheElement e) 
    at System.Windows.Media.FontFamily.PreCreateDefaultFamilyCollection() 
    at System.Windows.Media.FontFamily..cctor()

We traced this to having an illegal path char in one of the fonts listed in the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Fonts registry key. To reproduce the problem, you can simply edit one of those values on your own machine and add a colon, pipe, or any other illegal path character to one of the values. Now, any WPF application on your system will crash as soon as it attempts to display its first UI.

If the user of your application discovers this issue, the only thing to do is to examine each of the Fonts registry values and correct/delete any that contain invalid characters. (Or write a program to do this for you.)

I’ve filed this as Connect issue 508419; we’ve also noted that others have encountered the same problem (and fixed it in a similar way).

Posted by Bradley Grainger at 10:24 AM | Comments (2) | TrackBack

September 12, 2009

"File not found" CryptographicException

The Data Protection API (in Windows 2000 or later) provides methods to securely store secret information (e.g., a cached password) on a local computer. Only the logged-in user can decrypt the protected data (if they also have the key that was used to protect it). In .NET, these APIs are exposed through the ProtectedData class.

On one of our XP test systems, a call to ProtectedData.Protect unexpectedly failed with a CryptographicException that had a very puzzling exception message:

System.Security.Cryptography.CryptographicException: The system cannot
    find the file specified. 
  at System.Security.Cryptography.ProtectedData.Protect(Byte[] userData,
    Byte[] optionalEntropy, DataProtectionScope scope)

There is no information provided on which file is required or why it's missing, or even why protecting data (in memory) requires file system access in the first place. Searching the internet for the error message turned up just a few other programmers who were also experiencing, but no solutions.

Since the ProtectedData class is just a thin wrapper around the Win32 CryptProtectData function, I searched for the underlying Win32 error code, and found the answer: the crypto methods read the HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\User Shell Folders registry key; if there are missing values, the methods will fail. (This makes me think that the methods are incorrectly reading the "User Shell Folders" key, instead of calling SHGetFolderPath, as Raymond recommends, but I haven't been able to test that yet.)

Posted by Bradley Grainger at 1:56 PM | Comments (4) | TrackBack

June 25, 2009

Patching a Crash in Kensington MouseWorks

A co-worker had trouble installing the latest version of an application I've been working on. The log file showed that the problem was an access violation when our setup program called MsiInstallProduct. This seemed very unusual, since Windows APIs return error codes instead of crashing. I decided to see where the crash was happening by running the installer under WinDbg.

I set WinDbg to immediately break on exceptions and launched the app. It stopped almost immediately with the following callstack. The crash location is in kwm_dll.dll, which is a hook DLL installed by Kensington MouseWorks.

kmw_dll!CallWndProcFunc+0xa8
USER32!DispatchHookA+0x101
USER32!fnHkINLPCWPSTRUCTA+0x4f
USER32!__fnDWORD+0x24
ntdll!KiUserCallbackDispatcher+0x13
USER32!NtUserSetFocus+0xc
USER32!CreateDialogIndirectParamAorW+0x33
USER32!CreateDialogParamW+0x49
msi!CBasicUI::CreateProgressDialog+0x35
msi!CBasicUI::CheckDialog+0x47
msi!CBasicUI::SetProgressData+0x58
msi!CBasicUI::Initialize+0x11c
msi!MsiUIMessageContext::Initialize+0x230
msi!MsiUIMessageContext::RunInstall+0x22
msi!RunEngine+0xe0
msi!MsiInstallProductW+0xa1
BatchUpd!Application::RunCommandInstall+0x1f5
BatchUpd!Application::Run+0x11e3
BatchUpd!wWinMain+0x102

Disassembling the crash location showed the following (crashing code in bold; ebp does not contain a valid address).

mov     eax,dword ptr [ebp-4]
mov     ecx,dword ptr [eax]
push    ecx
mov     edx,dword ptr [ebp-4]
mov     eax,dword ptr [edx+4]
push    eax
mov     ecx,dword ptr [ebp-4]
mov     edx,dword ptr [ecx+0Ch]
push    edx
call    kmw_dll!CallWndProcFunc+0x12f0 (10004210)
add     esp,0Ch
mov     eax,dword ptr [ebp+10h]
push    eax
mov     ecx,dword ptr [ebp+0Ch]
push    ecx
mov     edx,dword ptr [ebp+8]
push    edx
mov     eax,dword ptr [kmw_dll!ShowOptsProc+0x6b94 (1000e0a4)]
push    eax
call    dword ptr [kmw_dll!ShowOptsProc+0x7c80 (1000f190)]
mov     esp,ebp
pop     ebp
ret     0Ch

One interesting thing about the code is that it looks like a Debug build (or a Release build with no optimizations): the disassembly is straightforward and seems like it has a one-to-one correspondence with the putative source code. The other thing of note (not shown above) is that the base address of the DLL is set to the default 0x10000000, which is a poor choice for a hook DLL that will be loaded into every process on the system.

ebp should be preserved across the function call, so I looked at the function that was just called (at address 0x10004210). I've added a few explanatory comments based on my understanding of what it's doing.

push    ebp				; save caller's value of ebp
mov     ebp,esp				; standard function prologue
sub     esp,offset +0x87 (00000088)	; BOOL bLocal0; char szLocal1[132];
cmp     dword ptr [ebp+0Ch],0		; if (param2 == 0)
je      kmw_dll!CallWndProcFunc+0x130b (1000422b) ; goto label0;
mov     dword ptr [ebp-88h],0		; bLocal0 = FALSE;
jmp     kmw_dll!CallWndProcFunc+0x1315 (10004235) ; goto label1;
label0:
mov     dword ptr [ebp-88h],offset  (00000001) ; bLocal0 = TRUE;
label1:
mov     eax,dword ptr [ebp-88h]		; push bLocal0
push    eax
lea     ecx,[ebp-84h]			; push &szLocal1[0]
push    ecx
mov     edx,dword ptr [ebp+10h]		; push param3
push    edx
mov     eax,dword ptr [ebp+8]		; push param1
push    eax
call    kmw_dll!ShowOptsProc+0x16e0 (10008bf0)	; fn(param1, param3, szLocal1, bLocal0)
add     esp,10h				; clean up parameters (C calling convention)
mov     esp,ebp				; "free" locals
pop     ebp				; restore caller's value of ebp
ret                                                                                                  

This function is allocating 0x88 (i.e., 136) bytes for local variable storage: enough for an int (or BOOL) and a 132 byte buffer. If this buffer were overflowed, the stack would be overwritten and ebp would be corrupted upon return. Some internet searching turns up posts that discuss a similar issue, stating that "Kensington MouseWorks ... crashes ... if the executable path is longer than 128 characters"; this seems to match our situation. Indeed, dumping the bytes at the old value of ebp-84h shows the full path of our setup application, which is too long for the buffer.

Since the buffer is stack allocated, it would be trivial to change its size by editing the instructions that create and reference the local variables. At a minimum, the buffer should be capable of storing MAX_PATH characters. Because this function doesn't supply the actual buffer length to the function it calls, we can make it as long as we (reasonably) want. I decided to increase the size for storage of locals in this function to 300 bytes. In version 6.3.2.4 of kmw_dll.dll (which seems like it may be newer than the latest available version, published in February 2006), this can be accomplished by editing the following bytes in the file. These changes simply change the numbers 136, -136, and -132 (which are the three offsets used in the code above) to 300, -300, and -296.

OffsetNew Bytes
0x42152C 01
0x4221D4 FE
0x422DD4 FE
0x4237D4 FE
0x423ED8 FE

The buffer should now be large enough to hold a file name up to MAX_PATH bytes long. With this new DLL installed in the C:\Windows\System32 folder, the setup program is able to launch the MSI and installation completes successfully.

Posted by Bradley Grainger at 7:45 PM | Comments (2) | TrackBack

April 28, 2009

How to use UMDH to find native memory leaks

The Debugging Tools for Windows packages come with a tool—UMDH.exe—that makes finding memory leaks in native code really pretty easy. There’s a Microsoft Knowledgebase Article that gives an overview of how to use the tool but it’s a little out of date. (It’s also very detailed, and I usually just want a summary.)

All these steps assume you have the latest Debugging Tools package installed and in your path. (Note that you need to run the tools for the same platform (x86 vs x64) as the target executable.)

Start Collecting Data

At an Administrator command prompt, run gflags.exe to start collecting stack traces for user-mode allocations:

gflags –i Program.exe +ust

Collect Snapshots

Start Program.exe running, and collect a baseline snapshot (this can be done from a regular command prompt):

umdh –pn:Program.exe –f:Dump1.txt

Perform the action that leaks memory, and collect a second snapshot:

umdh –pn:Program.exe –f:Dump2.txt

(If “Program.exe” is not a unique process name, the “-p:” command line argument can select a process by ID.)

Compare Snapshots

umdh –d Dump1.txt Dump2.txt > Diff.txt

Open Diff.txt in your favourite text editor (that can handle large files!). The memory leaks are listed in descending order of bytes leaked; each should be followed by the complete stack trace of the allocation call. Depending on the cause, this may either pinpoint the bug, or at least show a good place to set a breakpoint for debugging.

Stop Data Collection

The most important step in the whole process is to turn off the data collection for your application (once the memory leak is fixed), or else your program will run slowly while the OS kernel logs every memory allocation:

gflags –i Program.exe -ust

Posted by Bradley Grainger at 4:05 PM | Comments (0) | TrackBack

August 19, 2008

Image Format Error when Loading from a Stream

The Microsoft Windows Imaging Component (WIC) is “an extensible framework for encoding, decoding, and manipulating images”. It's also the core of WPF’s System.Windows.Media.Imaging classes; this meant that a curious exception I got when using BitmapSource eventually led me to discover a possible bug in IWICImagingFactory::CreateDecoderFromStream.

My code was loading a large number of images from disk. The files contained a header with some image metadata, followed immediately by a regular Windows bitmap (in the ubiquitous BMP file format). The code would read the header from the stream, then load the bitmap from the rest of the stream, as follows:

using (Stream stream = new FileStream(filename, FileMode.Open))

{

    // read header

    stream.Read(header, 0, header.Length);

    // etc.

 

    BitmapImage bitmap = new BitmapImage();

    bitmap.BeginInit();

    bitmap.CacheOption = BitmapCacheOption.OnLoad;

    bitmap.StreamSource = stream;

    bitmap.EndInit();

    return bitmap;

}

Some images would fail to load, with EndInit throwing a mysterious System.IO.FileFormatException: “The image format is unrecognized”. The InnerException was System.Runtime.InteropServices.COMException (0x88982F07), with an HRESULT of a WIC error code: WINCODEC_ERR_UNKNOWNIMAGEFORMAT.

My first thought was that the images were somehow corrupted, but further investigation showed that the files loaded without errors if the header preceding the bitmap was removed from the file, or if the bitmap data following the header was first copied to a new MemoryStream before being loaded. I observed the same behaviour with IWICImagingFactory::CreateDecoderFromStream when I rewrote the test harness as a C++ COM application: if the IStream containing the image contained any data preceding the bitmap data, an error HRESULT would sometimes be returned.

It appears that, in certain circumstances, CreateDecoderFromStream assumes that the bitmap data begins at the stream's origin, and absolute offsets within the stream are used when seeking; thus, the image data must begin at offset 0 within the stream. As a workaround, you can copy the image data to a new MemoryStream (but note that this may increase memory usage). The solution I chose was to write a thin Stream wrapper class that handles calls to Position, Seek, Length, etc. and adjusts the offsets so that the image now appears to start at offset 0; all other calls are passed straight through to the underlying FileStream. This allows WIC and WPF to load all the images without having to make an unnecessary copy of the bitmap, or having to change the legacy file format.

Posted by Bradley Grainger at 6:33 PM | Comments (1) | TrackBack

April 9, 2008

Exception 0xc0020001 in C++/CLI assembly

After reorganising some code in a C++/CLI assembly, I started getting exception "0xc0020001: The string binding is invalid" when shutting down the C# application that loaded that assembly.

When the program was run under the debugger, it would throw the exception from a function in crtdll.c that was processing the DLL_PROCESS_DETACH notification sent to DllMain. The error occurred when attempting to call the function pointer function_to_call (on line 444).

  437 /* cache the function to call. */

  438 function_to_call = (_PVFV)_decode_pointer(*onexitend);

  439 

  440 /* mark the function pointer as visited. */

  441 *onexitend = (_PVFV)_encoded_null();

  442 

  443 /* call the function, which can eventually change __onexitbegin and __onexitend */

  444 (*function_to_call)();

  445 

  446 onexitbegin_new = (_PVFV *)_decode_pointer(__onexitbegin);

  447 onexitend_new = (_PVFV *)_decode_pointer(__onexitend);

Here's where a feature of the Visual Studio debugger that I hadn't seen before came in very handy. If I set a breakpoint on line 444 and simply hovered my mouse over function_to_call, the debugger tooltip showed the full decorated name of the function, in this case, "_t2m@???__FstaticNativeObject@?1??NativeMethod@@YAHH@Z@YAXXZ@?A0x754dd9c9@@YAXXZ". 

Chris Brumme explains error C0020001 and identifies one of the causes as "trying to call into managed code … after the runtime has started shutting down". According to a forum post (about this same error), "t2m" stands for "transition to managed". The information in the decorated function name ("staticNativeObject" and "NativeMethod") was enough to piece together the rest of the puzzle. I had written code much like the following:

#pragma unmanaged

 

class NativeClass

{

public:

    NativeClass() { }

    ~NativeClass() { }

};

 

bool NativeMethod()

{

    static NativeClass staticNativeObject;

    return true;

}

Even though NativeMethod is emitted as native code, the disassembly showed that it registers a managed entry point for the NativeClass destructor (for staticNativeObject) with the atexit function. But by the time atexit ran this destructor (from DllMain when the C++/CLI assembly was unloaded), the CLR had already started shutting down, and the function call failed.

This problem can be solved by removing the static variable. Either make it non-static, or move it to class or file (or global!) scope. (Slightly more complex workarounds may, of course, be necessary depending on the expense or difficulty of initialising the object.)

It seems like the compiler is emitting incorrect code here—it should register a native entry point for the destructor (or call the managed version from AppDomain.DomainUnloaded) instead—so I filed a bug report with Microsoft Connect on this problem.

Posted by Bradley Grainger at 12:08 PM | Comments (0) | TrackBack

April 3, 2008

Managed Debugging Assistant Configuration Files

MDAs (Managed Debugging Assistants) are a feature of the CLR that help track down difficult-to-diagnose bugs in complex areas such as interop and threading. Basic information about MDAs can be found in the MSDN Library topic, Stephen Toub's MSDN magazine article, and Mike Stall's blog post. For fine-grained control over the MDAs that are enabled for an application (or to use MDAs without running the application under the Visual Studio debugger), you need to use an application-specific configuration file (as well as setting the MDA registry key or COMPLUS_MDA environment variable, as detailed in those articles). Unfortunately, advanced MDA usage (with an application-specific config file) can itself be a complex and difficult-to-diagnose process.

For starters, the config file must be named properly. Like a regular application config file, the name of this file needs to include the full name of the program (including the “.exe” suffix), followed by “.mda.config”, e.g., MdaTest.exe.mda.config.

I don't know of any way to check that the MDA configuration is being read, apart from deliberately introducing an error (see below) and seeing if the runtime raises the “InvalidConfigFile” MDA when the application is run.

If there is an error in the MDA config file, the following message will be displayed when the application is run:

An unhandled exception ('InvalidConfigFile Managed Debugging Assistant') occurred

If you're running the application under WinDbg, you'll see the following message instead:

<mda:msg xmlns:mda="http://schemas.microsoft.com/CLR/2004/10/mda">
  <!--
       The 'mdaConfig' configuration file is invalid.
   -->
  <mda:invalidConfigFileMsg break="true" configFile="mdaConfig"/>
</mda:msg>

There are two main causes I've found for this error.

1. Incorrect XML element names. Double-check every XML element and attribute name to ensure it's spelt correctly. The basic structure of the config file is:

<mdaConfig>
    <assistants>
        <!-- MDA elements go here -->
    </assistants>
</mdaConfig>

If any of the element names are not recognised, the InvalidConfigFile error will occur. Note that the online MSDN documentation for .NET 2.0 and .NET 3.0 has incorrect examples; be sure you're using the .NET 3.5 version of the documentation. (For example, this 2.0 page says to use “<fatalExecutionError />”, but the equivalent 3.5 page correctly gives “<fatalExecutionEngineError />” as the configuration element name.)

The MDA parser doesn't seem to log any information about why the config file is invalid; to determine the element causing the problem, remove all the elements from the config file and add them back in one-by-one (running the application after each addition) until you find the one that causes the problem.

2. Incorrectly ordered XML elements. The MDA configuration elements must be in alphabetical order, or the file will not load properly. So, this will not work:

<assistants>
    <bindingFailure/>
    <asynchronousThreadAbort/>
</assistants>

but this will:

<assistants>
    <asynchronousThreadAbort/>
    <bindingFailure/>
</assistants>

One final thing to note is that Visual Studio uses the settings in the Debug > Exceptions dialog and ignores the MDA config file when the application is started under the VS debugger. To debug with the MDA config file settings, use WinDbg or attach the Visual Studio debugger to the process after it has started.

Posted by Bradley Grainger at 1:23 PM | Comments (0) | TrackBack