04 January 2012

WebException: "The message limit length was exceeded"

An uncommon WebException thrown by HttpWebRequest.GetResponse() has the message "The underlying connection was closed: The message length limit was exceeded."

By default, HttpWebResponse only allows 64KB of HTTP headers; any more and it will throw this exception.

To resolve the issue, set the MaximumResponseHeadersLength property on the request to a larger value, or to -1 for no limit. (Note that the value is expressed in kilobytes, so the default value 64 allows 65,536 bytes of headers.)

Posted by Bradley Grainger at 01:04 PM

02 January 2012

Avoid System.Windows.Rect.ToString()

System.Windows.Rect.ToString() is documented as returning a string in “the following form: "X,Y,Width,Height"”.

It seems like this method is the complement to the Parse method, which accepts the “string representation of the rectangle, in the form "x, y, width, height"”.

Unfortunately, while Parse is culture-invariant (as documented), ToString follows the .NET convention of returning locale-sensitive results; you need to call the ToString(IFormatProvider) overload to produce a string in the "x,y,width,height" format (that can be accepted by Parse).

    Rect rect = new Rect(1.5, 2, 3.5, 4);
    string s;

    s = rect.ToString(); // "1.5,2,3.5,4"
    Rect.Parse(s); // success

    Thread.CurrentThread.CurrentCulture = new CultureInfo("de"); // or "es", "fr", etc.

    s = rect.ToString(); // "1,5;2;3,5;4"
    Rect.Parse(s); // throws FormatException

    s = rect.ToString(CultureInfo.InvariantCulture);
    Rect.Parse(s); // success

I filed a bug report, even though this is arguably an error in the documentation, not in the .NET Framework itself.

Posted by Bradley Grainger at 12:03 PM

30 December 2011

Printing from .NET 3.5 in Windows 7

Our users discovered a curious bug that appears to be caused by:

  1. Printing an XpsDocument
  2. that uses a font embedded in the application's resources
  3. from a .NET 3.5 application
  4. running on Windows 7

The printed output looks like the following image; various glyphs are substituted with larger sans-serif versions of themselves, causing a ransom-note-like appearance.

Corrupted print output from .NET 3.5

We found that changing any one of the conditions above fixes the problem, but unfortunately we need to print XPS using an embedded font, we're still using .NET 3.5, and we have to run on Windows 7.

Strangely enough, printing to the XPS Document Writer print driver, and then printing that document with the XPS Viewer built into Windows 7 doesn't reproduce the problem; it only happens when our app prints directly to an actual printer.

Workaround

I noticed that the XPS Viewer is a native application; this led me to discover the Windows 7 XPS Print API. In conjunction with the (also new in Windows 7) XPS Document API, this lets you print XPS documents from native code.

We were able to solve the problem by automating the workaround described above: our code writes its output to an XpsDocumentWriter backed by a temporary file (instead of a PrintQueue); we then use the native APIs to print the temporary XPS file to the currently-selected printer.

The first part is to define the native methods and COM interfaces we will need. (And, as noted in this StackOverflow question, the IXpsPrintJobStream interface is either declared or implemented incorrectly, so we have to call the Close method as if it existed on ISequentialStream.)

With those declared, the printing code can be written. This method (which should be called on a background thread), prints a XPS file to a specific printer, returning true if printing succeeded. (If it fails, the application should fall back to the .NET printing APIs.)

All the code below is also available in a gist.

    internal static class NativeMethods
    {
        [DllImport("XpsPrint.dll", ExactSpelling = true, CharSet = CharSet.Unicode)]
        public static extern int StartXpsPrintJob(string printerName, string jobName, string outputFileName, IntPtr progressEvent,
        SafeWaitHandle completionEvent, [MarshalAs(UnmanagedType.LPArray)] byte[] printablePagesOn, int printablePagesOnCount,
        out IXpsPrintJob xpsPrintJob, out IXpsPrintJobStream documentStream, out IXpsPrintJobStream printTicketStream);
    }

    [ComImport, Guid("E974D26D-3D9B-4D47-88CC-3872F2DC3585"), ClassInterface(ClassInterfaceType.None)]
    internal class XpsOMObjectFactory
    {
    }

    [ComImport, Guid("F9B2A685-A50D-4FC2-B764-B56E093EA0CA"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    internal interface IXpsOMObjectFactory
    {
        void CreatePackage();

        [return: MarshalAs(UnmanagedType.Interface)]
        IXpsOMPackage CreatePackageFromFile([MarshalAs(UnmanagedType.LPWStr)] string filename, bool reuseObjects);

        void CreatePackageFromStream();
        void CreateStoryFragmentsResource();
        void CreateDocumentStructureResource();
        void CreateSignatureBlockResource();
        void CreateRemoteDictionaryResource();
        void CreateRemoteDictionaryResourceFromStream();
        void CreatePartResources();
        void CreateDocumentSequence();
        void CreateDocument();
        void CreatePageReference();
        void CreatePage();
        void CreatePageFromStream();
        void CreateCanvas();
        void CreateGlyphs();
        void CreatePath();
        void CreateGeometry();
        void CreateGeometryFigure();
        void CreateMatrixTransform();
        void CreateSolidColorBrush();
        void CreateColorProfileResource();
        void CreateImageBrush();
        void CreateVisualBrush();
        void CreateImageResource();
        void CreatePrintTicketResource();
        void CreateFontResource();
        void CreateGradientStop();
        void CreateLinearGradientBrush();
        void CreateRadialGradientBrush();
        void CreateCoreProperties();
        void CreateDictionary();
        void CreatePartUriCollection();
        void CreatePackageWriterOnFile();
        void CreatePackageWriterOnStream();
        void CreatePartUri();
        void CreateReadOnlyStreamOnFile();
    }

    [ComImport, Guid("18C3DF65-81E1-4674-91DC-FC452F5A416F"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    internal interface IXpsOMPackage
    {
        void GetDocumentSequence();
        void SetDocumentSequence();
        void GetCoreProperties();
        void SetCoreProperties();
        void GetDiscardControlPartName();
        void SetDiscardControlPartName();
        void GetThumbnailResource();
        void SetThumbnailResource();
        void WriteToFile();

        void WriteToStream(IXpsPrintJobStream stream, bool optimizeMarkupSize);
    };

    // NOTE: It appears that the IID for IXpsPrintJobStream specified in XpsPrint.h --  
    // MIDL_INTERFACE("7a77dc5f-45d6-4dff-9307-d8cb846347ca") -- is not correct, or the object
    // doesn't implement QueryInterface correctly. However, we can QI for ISequentialStream and
    // successfully (at least in Windows 7 SP1 x86) call the Close method as if it existed on that
    // interface.
    // That is, we obtain the ISequentialStream interface, but work with it as the IXpsPrintJobStream interface.
    // Thanks to http://stackoverflow.com/questions/6123507/xps-printing-from-windows-service for this tip.
    [ComImport, Guid("0C733A30-2A1C-11CE-ADE5-00AA0044773D"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    internal interface IXpsPrintJobStream
    {
        // ISequentialStream methods
        void Read([MarshalAs(UnmanagedType.LPArray)] byte[] pv, uint cb, out uint pcbRead);
        void Write([MarshalAs(UnmanagedType.LPArray)] byte[] pv, uint cb, out uint pcbWritten);

        // IXpsPrintJobStream methods
        void Close();
    }

    [ComImport, Guid("5AB89B06-8194-425F-AB3B-D7A96E350161"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    internal interface IXpsPrintJob
    {
        void Cancel();
        IntPtr GetJobStatus();
    };
    /// <summary>
    /// Prints the specified XPS document to a printer using the native XPS Print API.
    /// </summary>
    /// <param name="xpsFilePath">The path to the XPS document.</param>
    /// <param name="printerName">The printer name.</param>
    /// <param name="printTicket">A PrintTicket with settings for this print job.</param>
    /// <returns><c>true</c> if the document was successfully printed; otherwise, <c>false</c>.</returns>
    /// <remarks>This method should be called from a background thread.</remarks>
    public bool Print(string xpsFilePath, string printerName, PrintTicket printTicket)
    {
        // try to create the XPS Object Model factory (only available on Windows 7 and Vista with the Platform Update)
        IXpsOMObjectFactory xpsFactory = null;
        try
        {
            xpsFactory = (IXpsOMObjectFactory)new XpsOMObjectFactory();
        }
        catch (COMException)
        {
            // OS doesn't support the XPS Document API
            return false;
        }

        bool success = false;
        IXpsOMPackage package = null;

        try
        {
            // load the saved document as a native XpsOMPackage
            package = xpsFactory.CreatePackageFromFile(xpsFilePath, false);

            using (ManualResetEvent handle = new ManualResetEvent(false))
            {
                // attempt to start the print job
                IXpsPrintJob printJob;
                IXpsPrintJobStream docStream, ticketStream;
                int hresult = NativeMethods.StartXpsPrintJob(printerName, jobTitle, null, IntPtr.Zero, handle.SafeWaitHandle,
                    null, 0, out printJob, out docStream, out ticketStream);

                // check for success (NOTE: checking HRESULT value directly instead of calling Marshal.ThrowExceptionForHR
                //   to avoid proliferation of 'catch' blocks)
                if (hresult >= 0)
                {
                    // write the current printer settings to the print ticket stream
                    byte[] ticketData = printTicket.GetXmlStream().ToArray();
                    uint bytesWritten;
                    ticketStream.Write(ticketData, (uint)ticketData.Length, out bytesWritten);
                    ticketStream.Close();

                    // write the XPS package to the document stream
                    package.WriteToStream(docStream, false);
                    docStream.Close();

                    // wait for printing to finish
                    handle.WaitOne();
                    success = true;
                }
            }
        }
        catch (COMException)
        {
            // printing failed
        }
        catch (DllNotFoundException)
        {
            // OS doesn't support XPS Print API
        }
        catch (EntryPointNotFoundException)
        {
            // OS doesn't support XPS Print API
        }

        // force the XPS package to be released, so that the temporary file can be deleted
        if (package != null)
            Marshal.FinalReleaseComObject(package);

        return success;
    }

Posted by Bradley Grainger at 04:01 PM

09 June 2011

Git Bash in Console2

I started using Console2 after reading Scott Hanselman’s recommendation. Not only can it display a regular command prompt and PowerShell in tabs (as he describes), any console process can be added. msysgit users can add Git Bash to by opening the Settings dialog and adding a tab with the following properties:

  • Title: Git Bash
  • Icon: C:\Program Files (x86)\Git\etc\git.ico
  • Shell: C:\Windows\SysWOW64\cmd.exe /c ""C:\Program Files (x86)\Git\bin\sh.exe" --login -i"
  • Startup dir: C:\YourCode

Posted by Bradley Grainger at 10:26 AM

13 April 2011

Generating a deterministic GUID

Although a new GUID is typically created in order to provide a unique ID, there are occasions when it’s useful for two different systems to generate the same GUID independently. RFC 4122 provides an algorithm for deterministic creation of a GUID based on a namespace ID (itself a GUID) and a name within that namespace. These name- based GUIDs will never collide with GUIDs from other sources (e.g., Guid.NewGuid), and have a very (very) small chance of colliding with other name-based GUIDs. As per section 4.3:

  • The UUIDs generated at different times from the same name in the same namespace MUST be equal.
  • The UUIDs generated from two different names in the same namespace should be different (with very high probability).
  • The UUIDs generated from the same name in two different namespaces should be different with (very high probability).
  • If two UUIDs that were generated from names are equal, then they were generated from the same name in the same namespace (with very high probability).

Because the .NET Framework doesn’t provide a way to create these GUIDs, it’s tempting to create a custom solution (e.g., using a MD5 hash as a GUID, because it has the same number of bytes), but because that doesn’t follow the rules of GUID creation, it’s not guaranteed to be unique with respect to other GUIDs.

The algorithm for generating these GUIDs is fairly straightforward; the most complicated part may be converting the GUID to network byte order as specified in the RFC. (It’s complicated enough that the RFC authors got it wrong; the example given in Appendix B is incorrect.)

There are libraries to do this for Python and C++; I didn’t find one for .NET, so I wrote GuidUtility.Create which implements the RFC 4122 rules. Using it is simple (once you’ve decided on the namespace ID to use):

    Guid guid = GuidUtility.Create(GuidUtility.DnsNamespace, "code.logos.com");

Posted by Bradley Grainger at 06:31 PM

29 December 2010

Binary Patching with bsdiff

bsdiff and bspatch are tools for building and applying patches to binary files. If the files are reasonably similar (e.g., built from the same source code), a patch that can transform v1 of a file into v2 is often significantly smaller than downloading the entire v2 file. According to Naïve Differences of Executable Code (the paper that introduced the algorithm), compression ratios of 10x or more are typically achieved.

The reference implementation is written in C and uses bzip2 to compress chunks in the patch file. I ported the algorithm to C# and used #ziplib for bzip2 support.

The primary file is BinaryPatchUtility.cs; the entire project (with bsdiff and bspatch front ends) is available at github. (Note that if you just want to run bsdiff from the command prompt, a Windows port is already available. This project is designed for reuse within another program, under the same license as the original bsdiff code.)

Posted by Bradley Grainger at 04:00 PM

16 December 2010

DirectoryInfo.GetFiles improved in .NET 4

A nice improvement in .NET 4 is that the following code runs significantly faster than it did under .NET 3.5:

    new DirectoryInfo(folderPath)
        .GetFiles()
        .Select<FileInfo, long>(fi => fi.Length)
        .ToList();

In .NET 3.5, the Length property (as well as most of the other properties on FileInfo) is fetched lazily; accessing the property hits the filesystem again for each file returned by GetFiles, incurring a substantial speed penalty. In .NET 4, GetFiles uses the information already returned by the OS when enumerating the contents of the directory to pre-populate the values of the FileInfo properties, avoiding subsequent filesystem accesses.

For a folder containing several thousand files, the .NET 4 implementation can be from three times (on a local drive) to 50 times (on a network path) faster.

For more information on this and other changes, see Justin Van Patten's MSDN magazine article, What's New in the Base Class Libraries in .NET Framework 4.

Posted by Bradley Grainger at 08:30 PM

06 December 2010

Fixing C++ projects that always rebuild

After upgrading to Visual Studio 2010, one of our C++ projects would always be built, even if none of its files changed. When setting MSBuild output verbosity to "Normal", the Build output window would show:

1>------ Build started: Project: SomeLib, Configuration: Release Win32 ------ 
1>InitializeBuildStatus: 
1>  Creating ".\Release\SomeLib.unsuccessfulbuild" because "AlwaysCreate" was specified. 
...
1>Build succeeded. 
1> 
1>Time Elapsed 00:00:00.47 
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ========== 

This occurred even though none of the files in that project had changed.

This forum thread and Connect issue describe the problem and state that it's usually caused by referencing a .h file in the project that doesn't exist on disk; the build system in VS2010 then assumes the project is out-of- date and builds it. The forum thread references a blog post about enabling C++ project system logging in order to diagnose the problem, but it didn't help me identify the missing file. Instead, I wrote the following script to parse the project file and check for missing files:

    // set this to your project
    string fileName = @"C:\Path\To\MyProject.vcxproj";

    XDocument project = XDocument.Load(fileName);
    XNamespace msbuild = XNamespace.Get("http://schemas.microsoft.com/developer/msbuild/2003");

    int present = 0, missing = 0;
    string folder = Path.GetDirectoryName(fileName);
    foreach (XElement elem in project.Root.Elements(msbuild + "ItemGroup").Elements(msbuild + "ClInclude"))
    {
        string name = (string)elem.Attribute("Include");
        string itemPath = Path.Combine(folder, name);
        if (!File.Exists(itemPath))
        {
            Console.WriteLine(name);
            missing++;
        }
        else
        {
            present++;
        }
    }

    Console.WriteLine("{0} files present, {1} missing.", present, missing);

You could change "ClInclude" to "ClCompile" to look for missing C++ files (but this should already be a build error) or to "None" to find missing non-source files.

Posted by Bradley Grainger at 03:53 PM

30 October 2010

Coroutines with C# 5's await

C# 5's new "await" keyword is not just for orchestrating and controlling concurrency on multiple threads; it can also be used to introduce "exotic" control flow, such as coroutines (see also here), to a single-threaded C# program.

I was trying to find a really good sample coroutine problem to solve with "await", but all the ones I found given as standard examples (e.g., odd word problem, same fringe problem) could be easily (and I think more suitably) solved with C# 2.0 iterators (because the data only flows one way between the sub/coroutines needed to solve those problems).

The idea I finally came up with was of merging two ordered sequences of numbers. In C# 2.0, you could write a Merge method that creates an enumerator for each input sequence, tracks whether each sequence still has items, and compares the heads of each sequence to determine which is smaller. This is a lot of code, and the scaffolding obscures the core of the method, which is the comparison of the two items:

    void MergeWithEnumerators(IEnumerable<int> leftSequence, IEnumerable<int> rightSequence)
    {
        // enumerate the sequences
        using (IEnumerator<int> left = leftSequence.GetEnumerator())
        using (IEnumerator<int> right = rightSequence.GetEnumerator())
        {
            // start the enumeration
            bool leftHasItems = left.MoveNext();
            bool rightHasItems = right.MoveNext();

            // continue until out of items
            while (leftHasItems || rightHasItems)
            {
                // determine whether to output from left or right
                bool outputLeft;
                if (leftHasItems && rightHasItems)
                    outputLeft = left.Current < right.Current;
                else
                    outputLeft = leftHasItems;

                // output correct element
                if (outputLeft)
                {
                    Console.WriteLine(left.Current);
                    leftHasItems = left.MoveNext();
                }
                else
                {
                    Console.WriteLine(right.Current);
                    rightHasItems = right.MoveNext();
                }
            }
        }
    }

(I've used Console.WriteLine for simplicity; in the real world, you'd probably want this method to return IEnumerable and yield return the merged elements.)

In C# 5, the merge could be written with coroutines. Imagine two Merge methods running in an interleaved fashion (not simultaneously), each processing one input sequence. If the methods can know what the smallest current item in the other sequence is, then their implementation is trivial: for each of the values in their own sequence, compare it to the smallest value from the other sequence. If it's smaller, print it; if not, switch to the other method and let it processs its sequence:

    // The Merge coroutine is run on each of the two input sequences.
    async void Merge(IEnumerable<int> seq, int? smallestInOther)
    {
        // process our entire sequence
        foreach (int value in seq)
        {
            // switch to the other if it has smaller values than we do
            if (value > smallestInOther)
                smallestInOther = await SwitchToOther(value);

            // our value is smaller, print it
            Console.WriteLine(value);
        }
    }

Of course, this code can't stand alone; it needs a scheduler that will start both the methods, and also an implementation of SwitchToOther, which will save this method's "resume" action, and invoke it later when the second method switches back to this one. There's a lot of code here, but much of it is just the boilerplate that's required to support the "await" keyword (GetAwaiter, BeginAwait, EndAwait).

    // continuations (queued up by "await SwitchToOther") that are ready to be resumed
    Queue<SwitchAwaiter> awaiters = new Queue<SwitchAwaiter>();

    // the smallest value in each input sequence; this needs to be passed to the other Merge method
    Queue<int> smallestValues = new Queue<int>();

    // Merges two sequences using coroutines.
    void Merge(IEnumerable<int> left, IEnumerable<int> right)
    {
        // start each of the methods running
        Merge(left, int.MinValue);
        Merge(right, smallestValues.Dequeue());

        // keep switching between them as requested
        while (awaiters.Count > 0)
        {
            SwitchAwaiter switchAwaiter = awaiters.Dequeue();

            // the last switch will have no smallest value (because the Merge method finished its
            // enumeration, and exited); we pass 'null' as a sentinel
            switchAwaiter.Resume(smallestValues.Count == 0 ? default(int?) : smallestValues.Dequeue());
        }
    }

    // Allows a method to switch to another method by 'awaiting' the returned value.
    SwitchAwaiter SwitchToOther(int smallest)
    {
        // this method is called by 'await', so we need to return a type that implements 'GetAwaiter'
        SwitchAwaiter switchAwaiter = new SwitchAwaiter();

        // queue this as the method to switch to in the future
        awaiters.Enqueue(switchAwaiter);
        smallestValues.Enqueue(smallest);

        return switchAwaiter;
    }

    class SwitchAwaiter
    {
        public SwitchAwaiter GetAwaiter()
        {
            // 'GetAwaiter' must return an object that implements 'BeginAwait' and 'EndAwait'
            // for simplicity, it can be this same object
            return this;
        }

        // 'BeginAwait' is called to "suspend" the method executing the 'await'.
        public bool BeginAwait(Action resume)
        {
            // save the action that will be used to resume the method
            m_resume = resume;

            // 'true' means that the awaiting method needs to return control to its caller and
            // that it will be resumed later on
            return true;
        }

        public int? EndAwait()
        {
            // once we resume the method, 'EndAwait' supplies a value to it (from the 'await' keyword)
            // in this case, we give it the smallest value from the other Merge method
            return m_smallest;
        }

        // 'Resume' returns control to the method that executed 'await SwitchToOther'. 'smallest' is the
        // smallest value in the sequence being enumerated by the other method.
        public void Resume(int? smallestInOther)
        {
            m_smallest = smallestInOther;
            m_resume();
        }

        Action m_resume;
        int? m_smallest;
    }

At the end, was it worth it? Yes, the Merge method itself is shorter and concisely expresses its purpose, but there's a lot of supporting code required, which may or may not be reusable to solve different problems. This method is also not scalable: if we wanted to merge three (or more) sequences, it's clearly wrong to have each coroutine pick which of its peers it's going to switch to; in that case, we'd definitely want a single consumer that's picking the overall next smallest value (perhaps using a min heap for efficiency if there are lots of sequences being merged).

So while it was an interesting challenge to interleave the execution of two methods on the same thread using "await", it isn't the best solution to this particular problem. (Maybe there is a similar problem for which "await" really is the killer solution; if you know of one, I'd love to hear about it.)

That's not to say that "await" is useless; far from it! There are many other problems (that we've solved more or less well in the Logos code today) that could be solved far more elegantly and concisely with the "await" keyword; I'm definitely looking forward to using it.

P.S. As an exercise for the reader, change "Console.WriteLine(value);" (in the first Merge method) to "await Output(value);" and rewrite the second Merge method to "yield return" the merged values, so that the merge can happen lazily.

Posted by Bradley Grainger at 10:30 AM

04 October 2010

Overriding GetHashCode for value types

I’ve written before that value types should override Object.Equals for performance reasons. GetHashCode should be overridden at the same time (for symmetry), but I’ve just learnt another good reason to override GetHashCode: the default implementation has a bug when your struct contains a nested value type for which value equality is not the same as bitwise equality.

As per Hans Passant’s detailed comment, the default ValueType.GetHashCode implementation (on the Microsoft CLR) XORs the bits of the struct together if it contains no reference types and has no padding.

Unfortunately, if the struct contains a member that can have two different bit patterns that represent the same logical value, this will generate different hash codes for values that should be considered equal. (The default Equals implementation will call the Equals methods of the nested types, so it is not subject to this bug.) The most common example is decimal: 1m and 1.0m have different bit representations in memory, so the following program produces rather unexpected output:

    static void Main()
    {
        Test t1 = new Test { Value = 1m };
        Test t2 = new Test { Value = 1.0m };
        Console.WriteLine(t1.Equals(t2)); // true
        Console.WriteLine(t1.GetHashCode() == t2.GetHashCode()); // false
    }

    struct Test
    {
        public decimal Value;
    }

If you’re implementing a value type that will ever be used as a key in a hashtable, always override Object.GetHashCode.

Posted by Bradley Grainger at 03:36 PM