Showing posts with label Threading. Show all posts
Showing posts with label Threading. Show all posts

Monday, June 23, 2008

On libcurl, OpenSSL, and thread-safety

The cURL project with its libcurl is a frequent choice of developers requiring a feature-rich HTTP client. Indeed, libcurl is a good choice, it supports HTTP/1.1, several authentication mechanisms (including Kerberos / SPNEGO authentication), and HTTPS, to name a few important aspects. It can also be used in multi-threaded applications - well - it can but you have to be aware of some fundamental facts in order to avoid random segmentation faults or aborts - Linux is assumed here...

First, to avoid pitfall #1, be sure to disable the use of signals of the library by adding the following line of code to the initialization phase of your application:
curl_easy_setopt(handle, CURLOPT_NOSIGNAL, TRUE);

This option deactivates code that works around the fact that DNS lookups initiated via gethostbyname cannot be interrupted, there's no timeout facility. The libcurl developers of course knew about the one generic way to interrupt the current thread of execution and execute code in that context - signals.

In order to interrupt gethostbyname, libcurl saves the current state of execution via setjmp, initializes a timeout of n seconds by calling alarm, which results in signal delivery at that point-in-time. If gethostbyaddr does not return in time, the signal handler associated with SIGALARM, the delivered signal, is being called. The handler restores the original state using longjmp to rewind execution and indicates a timeout via the return value.
signalhandler()
{
longjmp(state, 1);
}

init()
{
signal(SIGALARM, signalhandler);
}

lookup()
{
if (setjmp(state) == 0) {
alarm(30);
gethostbyname();
/* snip */
}
else {
/* timeout */
}
}

In a single-threaded application, that works fine because alarm only affects the process and the main thread is identical to the process itself. In a multi-threaded application, however, the fact that the process gets to handle the signal and not a particular thread, the workaround can fail because an unrelated thread could handle SIGALARM. While this strategy may work in the case of LinuxThreads, it does not with NPTL, the state-of-the-art Linux thread library that implements POSIX semantics. The relevant part of the specification is as follows:

"There were two possible choices for alarm generation in multi-threaded applications: generation for the calling thread or generation for the process. The first option would not have been particularly useful since the alarm state is maintained on a per-process basis and the alarm that is established by the last invocation of alarm() is the only one that would be active.

Furthermore, allowing generation of an asynchronous signal for a thread would have introduced an exception to the overall signal model. This requires a compelling reason in order to be justified."


So, by setting the cURL option, you disable DNS timeouts but thereby avoid related segmentation faults in multi-threaded applications. If you don't plan to invest more time into fixing this issue, that's an acceptable solution.

Pitfall #2 is merely related to libcurl, but it does cause random "crashes" if you intend to use libcurl along with HTTPS requests in multi-threaded applications. The module of interest is OpenSSL, the primary backend for libcurl and HTTPS. By default, using libcurl and HTTPS in multiple threads can lead to crashes, even if you do not share libcurl-specific handles or, more generally, memory. The reason for that is OpenSSL, which is not thread-safe by default.

Rather than providing a thread-safe library out-of-the-box, the OpenSSL team decided to leave this as an exercise for the (documentation) reader and / or user. OpenSSL provides callbacks that define functions for serializing access to resources. Two sets of callbacks exist, one provides access to a static set of locks that can be locked and unlocked. The other allows for allocation, deallocation and lock / unlock of a lock object.
By default, these are not implemented. In other words, code using OpenSSL with threads can fail unless the developer read the relevant parts of the documentation and implemented the required callbacks for all supported platforms correctly.
Hmmm. Don't get me wrong, I'm the first to vote for reading the documentation / specification before writing code and I'm a big fan of generic and flexible interfaces but, IMHO, the OpenSSL team took the easy path here. What I'd have done is implement locking for supported platforms right in the library to cover all direct and indirect (e.g. libcurl) users.

So, should libcurl define these callbacks (it doesn't)? That's a tough one. The main issue is that these callback are process-global and thus must be implemented by every single library or module that makes use of OpenSSL. Depending on the linking scenario of these and OpenSSL (dynamic or static), there's no truly correct implementation strategy for these callbacks - modules could overwrite each others callbacks and cause memory leaks, hangs, and segmentation faults as a result of an implementation mismatch at an arbitrary point-in-time.

Consider the following scenario: a process loads library A which depends on libcurl. Consequently, library A implemented the callbacks and assigns these at load time.
A library used by library A, THIRDPARTY, uses libpq, the official PostgreSQL driver. The driver also uses OpenSSL and initializes the callbacks in the function PQinitSSL.
Now, in case of the following sequence of events, this scenario can cause a hang because the unlock implementation does not match the lock implementation:

1) Initialization of library A
2) Library A initializes OpenSSL locking callbacks
3) Library A receives libcurl request from multiple threads, that result in repeated lock and unlock callback invocations
4) Thread 1 invokes the lock callback in OpenSSL
5) A new request handled by thread 2 results in the initialization of THIRDPARTY, which loads libpq and calls PQinitSSL. The initializes overrides the locking callbacks
6) Thread 1 invokes the unlock callback which has no effect on the previously locked lock object because the implementation changed between lock and unlock. Application hang or undefined behavior

Of course there are other scenarios that can lead to problems, all of them caused by the global locking callbacks that must be implemented for thread-safe operation. My point is that whenever global resources and multiple libraries are involved, chances are that these cannot coexist.

I recommend one of the following two approaches to following the OpenSSL contract while also minimizing collisions with other libraries or modules:

Solution A) Beware of other libraries in your locking callback implementation

Implement and install your callback knowing that other libraries might have installed callbacks already. In particular, do not install callbacks if callbacks are already in place (and consistent) to avoid causing hangs or undefined behavior. Uninstall callbacks on unload to avoid crashes on subsequent callback invocations. Uninstall the callbacks only if these represent callbacks installed by your library - previously installed callbacks must not be affected. Additionally, new callbacks installed by other modules after the initialization of your library should not be affected as well (it's better to leave them in place rather than having no callbacks installed).

The following pseudo-code implements these recommendations:
init()
{
if (!all_callbacks_are_installed) {
install_callbacks();
}
else {
/* Do not interfere with existing callbacks. */
}
}

destroy()
{
/* Uninstall callbacks to avoid segmentation faults after unload. */
/* Only uninstall callbacks owned by this library. */
if (installed_callbacks == library_callbacks) {
uninstall callbacks();
}
}

Solution B) Use a private OpenSSL library (less desirable)

If dependencies allow, link OpenSSL statically so that callbacks are not shared and conflicts can be avoided. The major drawback here is that OpenSSL cannot be updated independent from your implementation which is critical in the case security updates must be applied. Used libraries depending on OpenSSL must be linked statically as well, which might not be an option in the case of proprietary libraries or libraries not available as an archive. If you go down that road, make sure not to export OpenSSL symbols (GCC: compile with -f visibility=hidden) to prevent other libraries from accessing your private (and possibly incompatible) copy. This does have its issues, but sometimes, there's no other way.

Whatever the reasoning against implementing the locking code directly in the library was, it unnecessarily complicates the task of writing stable multi-threaded code for developers. Fortunately, if some thought goes into the callback implementation, the facility provided by OpenSSL is good enough for completing the task of enabling the thread-safe operation of OpenSSL and libcurl as well as other OpenSSL dependencies.


With these two issues addressed, libcurl should integrate just fine in your muti-threaded code.

Monday, May 14, 2007

Nobody's in need of a re-entrant Kerberos 5, why?

IMHO, the Kerberos protocol is the solution for implementing Single-Sign-On (SSO), at least in the context of corporate intranets. Kerberos is practically the only solution available for SSO in a heterogenous computing environment comprised of different operating systems including UNIX variants and Windows. The only time a user ever has to provide a password is during the initial logon on a workstation. The credentials obtained can be passed to specific services and, more importantly, Kerberos allows delegation so that user credentials can be passed from the workstation via service A to service B, for example. Sure, several other SSO implementations exist but, to my knowledge, none of then can offer of the level of transparency and platform-coverage Kerberos can.

There's a catch however, Kerberos 5 implementations available for Linux / UNIX are not re-entrant and thus cannot be used in multi-threaded applications without serializing all code paths accessing the Kerberos API. The main Kerberos implementation I primarily refer to here is MIT Kerberos, the original implementation, Heimdal and friends shouldn't make any difference here however when it comes to a thread-safe implementation. Thinking about it, I haven't reviewed the Java GSSAPI source yet but my guess is that calls are serialized in the JNI part of the library to make sure that Java clients can safely use it across threads.

It might be hard to believe but, for once, Microsoft has managed to create something that is superior to Linux / UNIX equivalents (to be fair, there are other exceptions as well): The Microsoft Windows Kerberos implementation is fully re-entrant. Doesn't come as a surprise to me because Microsoft simply couldn't afford to provide an API that isn't - at last, Windows itself and practically all applications and services are inherently multi-threaded. In my opinion, that is exactly the reason why there's no re-entrant MIT Kerberos available on Linux / UNIX: Except for most GUI applications, common daemons are single-threaded and use a process model instead that often is fork(2) driven:
  • sendmail
  • pop3d / imapd variants
  • sshd
  • ntpd
  • Samba
  • PostgreSQL
  • Apache 1.3 and earlier
  • ...
So, in most cases, the lack of a re-entrant Kerberos implementation is not at all an issue. As a result, the current demand for it doesn't cause that issue to land on the project's TODO list. New implementations and more modern service implementations like Apache 2.0 and later (MPM = {worker|event}) and MySQL are fully multi-threaded, and, once Kerberos support comes into play, would have a problem or at least would need more time to implement Kerberos support (because of the locking business).

Granted, the number of installations dealing with multi-threaded applications (a small subset already) as well as Kerberos (in most cases, it's only an issue for medium to large-scale installations) is not exactly substantial - not substantial enough for supporting the refactoring of existing implementations towards re-entrance. This is a strong case of a minority being discriminated against, please spread the word to make a difference ;)

Seriously, in the Linux / UNIX world, there's currently no satisfactory solution available because even those saying that serialization solves it all are mistaken. Consider a large-scale Kerberos-enabled application using Apache as an interface to application users. In each application module, Kerberos may be used and calls must be synchronized. However, that's easier said than done because that synchronization must include all modules, it must span the whole process to solve this issue. If modules originate from more than one project or vendor, you've lost already - process-wide synchronization would require an agreement across all modules on the means of synchronization. If all modules are controlled by a single project or vendor and/or the source is available for those that are controlled by a different group, re-entrance can be established using a consistent locking protocol (e.g. by providing access to a single, process-wide mutex that is owned by a module all other modules have access to). Otherwise, there's practically no way to work around the lack of re-entrance (except for maybe preloading a library implementing Kerberos symbols that basically wraps all externals and serializes access or rolling your own release based on an open source Kerberos implementation).

You see, this dilemma is nerve-wracking because there's no satisfactory solution. Most likely, several more years will have to pass (where more multi-threaded daemons can emerge and require multi-threaded Kerberos implementations) until this problem will be solved with the availability of a re-entrant implementation...

How do you cope with Kerberos and thread-safety on Linux / UNIX platforms?