Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Mixed-Mode Library Assembly Bug and Managed C++


Mixed-Mode Library Assembly Bug and Managed C++

Download the code for this issue

In the May edition of Windows Developer, John Dorsey talked about a recent bug discovered in the Microsoft C++ compiler ("From the Editor," WDN, May 2003). This month, I will take a deeper look at this issue and explain some of the background.

The mixed-mode assembly bug occurs when managed code runs in a DLL entrypoint (typically, DllMain). The entrypoint is only intended for simple initialization and the MSDN documentation gives a list of the functions that are considered to be unsafe to call, but even the most trivial .NET initialization code is dangerous to perform because using the framework is likely to entail one of these unsafe actions. The bug typically manifests itself in the DLL routine hanging, and if you are lucky it will be detected by the operating system, which will shut down the errant process.

This bug shows itself in specific, nontypical situations, but it is a potentially serious bug. The Visual C++ team should be credited for bringing the bug to light and for providing a workaround to prevent the problem from occurring in most of the code. It is important to note that the remedial work carried out by the Visual C++ team did not appear in the betas for Visual C++.NET 2003, but does appear in the released version. Consequently, texts written about Visual C++.NET 2003 during the Visual Studio beta cycle (including my own book from Microsoft Press) do not mention the issue. As you'll see later, if you do not take steps to try to prevent the bug, the compiler will issue a warning but will continue to compile the code. I welcome the chance to be able to explain what causes this warning and to describe what to do about the problem.

The Mixed-Mode Compiler Bug

The mixed-mode loader bug appears only in code written in Managed C++. This is a direct consequence of the powerful nature of the language. To extend the familiar maxim: Managed C++ gives you a fully loaded magnum, whereas the other .NET languages merely hand you a half empty water pistol; both will allow you to shoot yourself in the foot, but only Managed C++ allows you to do serious damage.

One of the great features of Managed C++ is that it allows you to mix managed and unmanaged code in the same source file using a technology called It Just Works! (IJW, so named because it just does). IJW is applied automatically when you use unmanaged code in your Managed C++ source; you do not have to instruct the compiler to apply it. When the compiler sees that your code will access unmanaged code, it adds the thunks needed to perform the transitions between the managed and unmanaged worlds and it embeds appropriate native code in your assembly. IJW is a wonderful facility because it means that you can continue to use the C++ static libraries, template libraries, and header files that you have developed previously (and more importantly, code that you have gone through the expense of testing), as well as DLLs and COM objects developed in C++ or other languages.

When you access unmanaged code through IJW, your assembly becomes mixed mode, so-called because the assembly contains both x86 code and Microsoft Intermediate Language (MSIL). Assemblies created with other .NET languages that use unmanaged code are not mixed mode because they use Platform Invoke or COM Interop to access the unmanaged code, and these technologies do not embed unmanaged code within the assembly.

Converting Mixed-Mode Assemblies to Pure Mode

A pure-mode assembly is one that only contains MSIL. You can generate such an assembly with Managed C++ if you decide not to use IJW. However, it is not as simple as replacing IJW calls to static import libraries with calls to Platform Invoke because the compiler does extra work that you have to undo. This extra work performed by the compiler is not immediately obvious because there are several conflicting features that the compiler is trying to achieve.

Mixed-mode library assemblies must have a DllMain entry point to perform initialization for the unmanaged code. Typically, such code will be required to initialize the C runtime library (CRT) to use the CRT functions or to be able to use global objects (the CRT is used to call constructors on global objects). Such actions are typical for native C++ projects, so the compiler automatically assumes that you will need the CRT (there is another good reason for assuming this, as you'll see in a moment). To do this, the compiler adds an unmanaged function called _DllMainCRTStartup and indicates that this is an entrypoint for the DLL. This method does the appropriate initialization and then calls your DllMain, if it exists. Thus, even if you do not provide a DllMain, you'll still get an unmanaged entrypoint.

If you provide a DllMain for your library assembly and compile it as managed code (in other words, you do not place #pragma unmanaged and #pragma managed around the function), you are putting your library in a dangerous position. The reason is that the code will be compiled to MSIL, which means that to call your DllMain, the .NET runtime must be running. If your DLL client is unmanaged, then the runtime will have to be started. If the assembly is loaded on a Windows XP system, the OS will identify that the DLL it is loading is an assembly and it will start the .NET runtime automatically. On other operating systems, the address defined as the entry point for the DLL will be called (you can see the address of the DLL entrypoint by passing the name of the DLL to dumpbin with the /headers switch) and this entrypoint will simply call the _CorDllMain function exported by mscoree.dll, which will initialize the .NET runtime and then call the code that is defined as the .NET entrypoint in the DLL (typically _DllMainCRTStartup). In both cases, .NET code will be executed in the context of the initialization of the DLL, a situation that is potentially dangerous.

Recognizing that mixed-mode assemblies are a potential problem, the Visual C++ team added an extra check when the code is compiled. If the assembly is mixed mode and you do not take the steps to remove the entrypoint, the compiler will issue a warning LNK4243:

LINK : warning LNK4243: DLL containing objects compiled with /clr is not linked with /NOENTRY; image may not run correctly

If you do not use unmanaged code in your library assembly, then you do not need an entrypoint, so the solution is simply to tell the linker not to produce one. To do this, you use the /noentry linker switch. However, if you do this with a Managed C++ DLL project, you'll get a linker error LNK2019 complaining that there is an unresolved external symbol _main. Seasoned ATL developers will recognize this error: It occurs when ATL code uses the CRT but is compiled with _ATL_MIN_CRT. This ATL symbol indicates to the compiler that the CRT is not used; hence, the linker complains about the inconsistency. The solution with ATL is to remove the symbol or to remove the CRT code. Since your library assembly has no CRT code, removing it from your library seems impossible. Or is it?

Application Domains and Unmanaged Calls

To understand this error, you have to delve into another issue of the .NET runtime highlighted by unmanaged calls through Managed C++. .NET code runs in an application domain in a process and a process can have multiple application domains. With the first version of the runtime, a bug occurred when a call was made to unmanaged code. The problem was that the runtime failed to make note of the application domain where the call occurred. Consequently, when the call returned back to the managed world, the runtime did not know which application domain to return to. Most processes did not show this problem because most .NET applications only have a single application domain. However, ASP.NET and Internet Explorer (when it hosts .NET code) both create application domains to isolate code.

To get around this problem, the first version of the runtime always returned back to the first application domain to be created in the process. Version 1.1 of the runtime addressed this issue by storing an identifier of the source application domain. However, since this behavior differed from Version 1.0, the C++ team offered a command-line switch (/clr:initialAppDomain) to get the old behavior. In addition, the compiler could not guarantee that a user would attempt to call a Version 1.1 library on Version 1.0 of the runtime. If this happened, it could cause severe problems because the MSIL to store the application ID simply does not exist with the old version of the runtime. To prevent this the compiler automatically adds a call to a method called _check_commonlanguageruntime_version, which is implemented by the CRT (but more importantly, it is unmanaged code) and is called when the DLL is loaded.

If you know that your code will run on Version 1.0 of the runtime, or if you want to get the Version 1.0 behavior, you can use /clr:initialAppDomain and the runtime check will not be made — hence /noentry will link without a unresolved symbol error. If you use /clr to get the Version 1.1 behavior, then you'll have to take extra steps to remove the dependence on the unmanaged code to check the version of the runtime. Microsoft provides a stub version of _check_commonlanguageruntime_version in the object file nochkclr.obj. However, although this is an empty version of the function, it is still unmanaged code and so you will get a mixed-mode library assembly. The solution here is to remove any references to nochkclr.obj in your build tool (Visual Studio.NET 2003 will add it to the Additional Dependencies of the Linker Input P roperty page) and to provide your own, nonnative, empty version of the runtime checker function:

// global, managed function

void _cdecl _check_commonlanguageruntime_version()

{}

Mixed-Mode Library Assemblies

If you decide that you need to generate a mixed-mode library assembly (for example, you want to call a static native library or template library), then you will have to take steps to reduce the problem of the DLL loader bug. To do this, you need to use /noentry to remove the entrypoint, and remove nochkclr.obj from the linker so that you get the runtime version checking. Removing the entrypoint means that you will have to force the linker to use the __DllMainCRTStartup@12 symbol in its symbol table with the /include switch. To summarize, the switches that you will need are:

cl command line: /clr /LD

linker command line: /noentry

/include:__DllMainCRTStartup@12

However, there is still more work to do. The CRT is still not initialized and it is your responsibility to do this explicitly, in the context of your DLL. Microsoft has provided a header (_vcclrit.h) that has initialization and termination routines that you can use to call your DLL initialization code (_DllMainCRTStartup) in a threadsafe manner. These routines will be compiled as managed code, but they will always be called after the DLL has loaded.

These functions are called __crt_dll_initialize(), which calls the _DllMainCRTStartup with DLL_PROCESS_ATTACH; and __crt_dll_terminate(), which calls the _DllMainCRTStartup with DLL_PROCESS_DETACH. How you give access to these routines depends on the capability of the DLL's client. If the process calling the DLL is managed, then the DLL should export a class with public static methods that call these routines. If the process is unmanaged, your DLL should export functions with __declspec(dllexport) that call these routines (remember that Managed C++ global functions are compiled as MSIL).

In both cases, the initialization function should be called before any call is made to other classes in the DLL, and the terminate function should be called after the last call to the DLL. This way, the initialization code in _DllMainCRTStartup (and any code that you have provided in a DllMain) will be called after the DLL has been loaded and initialized (hence, the loader deadlock issue will not occur) and after the .NET runtime has been initialized.

The mixed-mode library assembly bug will not be eliminated until the next version of the .NET runtime. However, I am sure that you will agree that digging deeper into the causes of the bug gives fascinating insight into how DLLs are loaded and initialized, and into the inner workings of the .NET runtime.

References

A description of the mixed-mode loader bug: http://msdn.microsoft.com/library/default.asp? url=/library/enus/dv_vstechart/html/ vcconMixedDLLLoadingProblem.asp.

Instructions for creating mixed-mode library assemblies without an entrypoint: http://msdn .microsoft.com/library/en-us/vcmex/html/vcconConvertingManagedExtensionsForCProjectsFromPureIntermediateLanguageToMixedMode.asp.

A description of the new /clr:initialAppDomain switch: http://www.windevnet.com/ documents/win1039544152348/. w::d


Richard Grimes is an author and speaker on .NET. His latest book, Programming with Managed Extensions for Microsoft Visual C++ .NET, updated for Visual C++.NET 2003, is available now from Microsoft Press. He can be contacted at [email protected].


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.