From 88101146f2ec7d53ffb793e365f05097ffd35fd3 Mon Sep 17 00:00:00 2001 From: marha Date: Mon, 18 Jul 2011 10:33:05 +0200 Subject: cvs version of pthreads --- pthreads/README.NONPORTABLE | 487 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 485 insertions(+), 2 deletions(-) (limited to 'pthreads/README.NONPORTABLE') diff --git a/pthreads/README.NONPORTABLE b/pthreads/README.NONPORTABLE index e2d69a72b..0821104d8 100644 --- a/pthreads/README.NONPORTABLE +++ b/pthreads/README.NONPORTABLE @@ -15,6 +15,11 @@ pthread_win32_test_features_np(int mask) PTW32_SYSTEM_INTERLOCKED_COMPARE_EXCHANGE Return TRUE if the native version of InterlockedCompareExchange() is being used. + This feature is not meaningful in recent + library versions as MSVC builds only support + system implemented ICE. Note that all Mingw + builds use inlined asm versions of all the + Interlocked routines. PTW32_ALERTABLE_ASYNC_CANCEL Return TRUE is the QueueUserAPCEx package QUSEREX.DLL is available and the AlertDrv.sys @@ -75,11 +80,11 @@ pthread_getw32threadhandle_np(pthread_t thread); DWORD pthread_getw32threadid_np (pthread_t thread) - Returns the win32 thread ID that the POSIX + Returns the Windows native thread ID that the POSIX thread "thread" is running as. Only valid when the library is built where - ! defined (__MINGW32__) || defined (__MSVCRT__) || defined (__DMC__) + ! (defined(__MINGW64__) || defined(__MINGW32__)) || defined (__MSVCRT__) || defined (__DMC__) and otherwise returns 0. @@ -298,3 +303,481 @@ Thread priority If it wishes, a Win32 application using pthreads-win32 can use the Win32 defined priority macros THREAD_PRIORITY_IDLE through THREAD_PRIORITY_TIME_CRITICAL. + + +The opacity of the pthread_t datatype +------------------------------------- +and possible solutions for portable null/compare/hash, etc +---------------------------------------------------------- + +Because pthread_t is an opague datatype an implementation is permitted to define +pthread_t in any way it wishes. That includes defining some bits, if it is +scalar, or members, if it is an aggregate, to store information that may be +extra to the unique identifying value of the ID. As a result, pthread_t values +may not be directly comparable. + +If you want your code to be portable you must adhere to the following contraints: + +1) Don't assume it is a scalar data type, e.g. an integer or pointer value. There +are several other implementations where pthread_t is also a struct. See our FAQ +Question 11 for our reasons for defining pthread_t as a struct. + +2) You must not compare them using relational or equality operators. You must use +the API function pthread_equal() to test for equality. + +3) Never attempt to reference individual members. + + +The problem + +Certain applications would like to be able to access only the 'pure' pthread_t +id values, primarily to use as keys into data structures to manage threads or +thread-related data, but this is not possible in a maximally portable and +standards compliant way for current POSIX threads implementations. + +For implementations that define pthread_t as a scalar, programmers often employ +direct relational and equality operators on pthread_t. This code will break when +ported to an implementation that defines pthread_t as an aggregate type. + +For implementations that define pthread_t as an aggregate, e.g. a struct, +programmers can use memcmp etc., but then face the prospect that the struct may +include alignment padding bytes or bits as well as extra implementation-specific +members that are not part of the unique identifying value. + +[While this is not currently the case for pthreads-win32, opacity also +means that an implementation is free to change the definition, which should +generally only require that applications be recompiled and relinked, not +rewritten.] + + +Doesn't the compiler take care of padding? + +The C89 and later standards only effectively guarrantee element-by-element +equivalence following an assignment or pass by value of a struct or union, +therefore undefined areas of any two otherwise equivalent pthread_t instances +can still compare differently, e.g. attempting to compare two such pthread_t +variables byte-by-byte, e.g. memcmp(&t1, &t2, sizeof(pthread_t) may give an +incorrect result. In practice I'm reasonably confident that compilers routinely +also copy the padding bytes, mainly because assignment of unions would be far +too complicated otherwise. But it just isn't guarranteed by the standard. + +Illustration: + +We have two thread IDs t1 and t2 + +pthread_t t1, t2; + +In an application we create the threads and intend to store the thread IDs in an +ordered data structure (linked list, tree, etc) so we need to be able to compare +them in order to insert them initially and also to traverse. + +Suppose pthread_t contains undefined padding bits and our compiler copies our +pthread_t [struct] element-by-element, then for the assignment: + +pthread_t temp = t1; + +temp and t1 will be equivalent and correct but a byte-for-byte comparison such as +memcmp(&temp, &t1, sizeof(pthread_t)) == 0 may not return true as we expect because +the undefined bits may not have the same values in the two variable instances. + +Similarly if passing by value under the same conditions. + +If, on the other hand, the undefined bits are at least constant through every +assignment and pass-by-value then the byte-for-byte comparison +memcmp(&temp, &t1, sizeof(pthread_t)) == 0 will always return the expected result. +How can we force the behaviour we need? + + +Solutions + +Adding new functions to the standard API or as non-portable extentions is +the only reliable and portable way to provide the necessary operations. +Remember also that POSIX is not tied to the C language. The most common +functions that have been suggested are: + +pthread_null() +pthread_compare() +pthread_hash() + +A single more general purpose function could also be defined as a +basis for at least the last two of the above functions. + +First we need to list the freedoms and constraints with restpect +to pthread_t so that we can be sure our solution is compatible with the +standard. + +What is known or may be deduced from the standard: +1) pthread_t must be able to be passed by value, so it must be a single object. +2) from (1) it must be copyable so cannot embed thread-state information, locks +or other volatile objects required to manage the thread it associates with. +3) pthread_t may carry additional information, e.g. for debugging or to manage +itself. +4) there is an implicit requirement that the size of pthread_t is determinable +at compile-time and size-invariant, because it must be able to copy the object +(i.e. through assignment and pass-by-value). Such copies must be genuine +duplicates, not merely a copy of a pointer to a common instance such as +would be the case if pthread_t were defined as an array. + + +Suppose we define the following function: + +/* This function shall return it's argument */ +pthread_t* pthread_normalize(pthread_t* thread); + +For scalar or aggregate pthread_t types this function would simply zero any bits +within the pthread_t that don't uniquely identify the thread, including padding, +such that client code can return consistent results from operations done on the +result. If the additional bits are a pointer to an associate structure then +this function would ensure that the memory used to store that associate +structure does not leak. After normalization the following compare would be +valid and repeatable: + +memcmp(pthread_normalize(&t1),pthread_normalize(&t2),sizeof(pthread_t)) + +Note 1: such comparisons are intended merely to order and sort pthread_t values +and allow them to index various data structures. They are not intended to reveal +anything about the relationships between threads, like startup order. + +Note 2: the normalized pthread_t is also a valid pthread_t that uniquely +identifies the same thread. + +Advantages: +1) In most existing implementations this function would reduce to a no-op that +emits no additional instructions, i.e after in-lining or optimisation, or if +defined as a macro: +#define pthread_normalise(tptr) (tptr) + +2) This single function allows an application to portably derive +application-level versions of any of the other required functions. + +3) It is a generic function that could enable unanticipated uses. + +Disadvantages: +1) Less efficient than dedicated compare or hash functions for implementations +that include significant extra non-id elements in pthread_t. + +2) Still need to be concerned about padding if copying normalized pthread_t. +See the later section on defining pthread_t to neutralise padding issues. + +Generally a pthread_t may need to be normalized every time it is used, +which could have a significant impact. However, this is a design decision +for the implementor in a competitive environment. An implementation is free +to define a pthread_t in a way that minimises or eliminates padding or +renders this function a no-op. + +Hazards: +1) Pass-by-reference directly modifies 'thread' so the application must +synchronise access or ensure that the pointer refers to a copy. The alternative +of pass-by-value/return-by-value was considered but then this requires two copy +operations, disadvantaging implementations where this function is not a no-op +in terms of speed of execution. This function is intended to be used in high +frequency situations and needs to be efficient, or at least not unnecessarily +inefficient. The alternative also sits awkwardly with functions like memcmp. + +2) [Non-compliant] code that uses relational and equality operators on +arithmetic or pointer style pthread_t types would need to be rewritten, but it +should be rewritten anyway. + + +C implementation of null/compare/hash functions using pthread_normalize(): + +/* In pthread.h */ +pthread_t* pthread_normalize(pthread_t* thread); + +/* In user code */ +/* User-level bitclear function - clear bits in loc corresponding to mask */ +void* bitclear (void* loc, void* mask, size_t count); + +typedef unsigned int hash_t; + +/* User-level hash function */ +hash_t hash(void* ptr, size_t count); + +/* + * User-level pthr_null function - modifies the origin thread handle. + * The concept of a null pthread_t is highly implementation dependent + * and this design may be far from the mark. For example, in an + * implementation "null" may mean setting a special value inside one + * element of pthread_t to mean "INVALID". However, if that value was zero and + * formed part of the id component then we may get away with this design. + */ +pthread_t* pthr_null(pthread_t* tp) +{ + /* + * This should have the same effect as memset(tp, 0, sizeof(pthread_t)) + * We're just showing that we can do it. + */ + void* p = (void*) pthread_normalize(tp); + return (pthread_t*) bitclear(p, p, sizeof(pthread_t)); +} + +/* + * Safe user-level pthr_compare function - modifies temporary thread handle copies + */ +int pthr_compare_safe(pthread_t thread1, pthread_t thread2) +{ + return memcmp(pthread_normalize(&thread1), pthread_normalize(&thread2), sizeof(pthread_t)); +} + +/* + * Fast user-level pthr_compare function - modifies origin thread handles + */ +int pthr_compare_fast(pthread_t* thread1, pthread_t* thread2) +{ + return memcmp(pthread_normalize(&thread1), pthread_normalize(&thread2), sizeof(pthread_t)); +} + +/* + * Safe user-level pthr_hash function - modifies temporary thread handle copy + */ +hash_t pthr_hash_safe(pthread_t thread) +{ + return hash((void *) pthread_normalize(&thread), sizeof(pthread_t)); +} + +/* + * Fast user-level pthr_hash function - modifies origin thread handle + */ +hash_t pthr_hash_fast(pthread_t thread) +{ + return hash((void *) pthread_normalize(&thread), sizeof(pthread_t)); +} + +/* User-level bitclear function - modifies the origin array */ +void* bitclear(void* loc, void* mask, size_t count) +{ + int i; + for (i=0; i < count; i++) { + (unsigned char) *loc++ &= ~((unsigned char) *mask++); + } +} + +/* Donald Knuth hash */ +hash_t hash(void* str, size_t count) +{ + hash_t hash = (hash_t) count; + unsigned int i = 0; + + for(i = 0; i < len; str++, i++) + { + hash = ((hash << 5) ^ (hash >> 27)) ^ (*str); + } + return hash; +} + +/* Example of advantage point (3) - split a thread handle into its id and non-id values */ +pthread_t id = thread, non-id = thread; +bitclear((void*) &non-id, (void*) pthread_normalize(&id), sizeof(pthread_t)); + + +A pthread_t type change proposal to neutralise the effects of padding + +Even if pthread_nornalize() is available, padding is still a problem because +the standard only garrantees element-by-element equivalence through +copy operations (assignment and pass-by-value). So padding bit values can +still change randomly after calls to pthread_normalize(). + +[I suspect that most compilers take the easy path and always byte-copy anyway, +partly because it becomes too complex to do (e.g. unions that contain sub-aggregates) +but also because programmers can easily design their aggregates to minimise and +often eliminate padding]. + +How can we eliminate the problem of padding bytes in structs? Could +defining pthread_t as a union rather than a struct provide a solution? + +In fact, the Linux pthread.h defines most of it's pthread_*_t objects (but not +pthread_t itself) as unions, possibly for this and/or other reasons. We'll +borrow some element naming from there but the ideas themselves are well known +- the __align element used to force alignment of the union comes from K&R's +storage allocator example. + +/* Essentially our current pthread_t renamed */ +typedef struct { + struct thread_state_t * __p; + long __x; /* sequence counter */ +} thread_id_t; + +Ensuring that the last element in the above struct is a long ensures that the +overall struct size is a multiple of sizeof(long), so there should be no trailing +padding in this struct or the union we define below. +(Later we'll see that we can handle internal but not trailing padding.) + +/* New pthread_t */ +typedef union { + char __size[sizeof(thread_id_t)]; /* array as the first element */ + thread_id_t __tid; + long __align; /* Ensure that the union starts on long boundary */ +} pthread_t; + +This guarrantees that, during an assignment or pass-by-value, the compiler copies +every byte in our thread_id_t because the compiler guarrantees that the __size +array, which we have ensured is the equal-largest element in the union, retains +equivalence. + +This means that pthread_t values stored, assigned and passed by value will at least +carry the value of any undefined padding bytes along and therefore ensure that +those values remain consistent. Our comparisons will return consistent results and +our hashes of [zero initialised] pthread_t values will also return consistent +results. + +We have also removed the need for a pthread_null() function; we can initialise +at declaration time or easily create our own const pthread_t to use in assignments +later: + +const pthread_t null_tid = {0}; /* braces are required */ + +pthread_t t; +... +t = null_tid; + + +Note that we don't have to explicitly make use of the __size array at all. It's +there just to force the compiler behaviour we want. + + +Partial solutions without a pthread_normalize function + + +An application-level pthread_null and pthread_compare proposal +(and pthread_hash proposal by extention) + +In order to deal with the problem of scalar/aggregate pthread_t type disparity in +portable code I suggest using an old-fashioned union, e.g.: + +Contraints: +- there is no padding, or padding values are preserved through assignment and + pass-by-value (see above); +- there are no extra non-id values in the pthread_t. + + +Example 1: A null initialiser for pthread_t variables... + +typedef union { + unsigned char b[sizeof(pthread_t)]; + pthread_t t; +} init_t; + +const init_t initial = {0}; + +pthread_t tid = initial.t; /* init tid to all zeroes */ + + +Example 2: A comparison function for pthread_t values + +typedef union { + unsigned char b[sizeof(pthread_t)]; + pthread_t t; +} pthcmp_t; + +int pthcmp(pthread_t left, pthread_t right) +{ + /* + * Compare two pthread handles in a way that imposes a repeatable but arbitrary + * ordering on them. + * I.e. given the same set of pthread_t handles the ordering should be the same + * each time but the order has no particular meaning other than that. E.g. + * the ordering does not imply the thread start sequence, or any other + * relationship between threads. + * + * Return values are: + * 1 : left is greater than right + * 0 : left is equal to right + * -1 : left is less than right + */ + int i; + pthcmp_t L, R; + L.t = left; + R.t = right; + for (i = 0; i < sizeof(pthread_t); i++) + { + if (L.b[i] > R.b[i]) + return 1; + else if (L.b[i] < R.b[i]) + return -1; + } + return 0; +} + +It has been pointed out that the C99 standard allows for the possibility that +integer types also may include padding bits, which could invalidate the above +method. This addition to C99 was specifically included after it was pointed +out that there was one, presumably not particularly well known, architecture +that included a padding bit in it's 32 bit integer type. See section 6.2.6.2 +of both the standard and the rationale, specifically the paragraph starting at +line 16 on page 43 of the rationale. + + +An aside + +Certain compilers, e.g. gcc and one of the IBM compilers, include a feature +extention: provided the union contains a member of the same type as the +object then the object may be cast to the union itself. + +We could use this feature to speed up the pthrcmp() function from example 2 +above by casting rather than assigning the pthread_t arguments to the union, e.g.: + +int pthcmp(pthread_t left, pthread_t right) +{ + /* + * Compare two pthread handles in a way that imposes a repeatable but arbitrary + * ordering on them. + * I.e. given the same set of pthread_t handles the ordering should be the same + * each time but the order has no particular meaning other than that. E.g. + * the ordering does not imply the thread start sequence, or any other + * relationship between threads. + * + * Return values are: + * 1 : left is greater than right + * 0 : left is equal to right + * -1 : left is less than right + */ + int i; + for (i = 0; i < sizeof(pthread_t); i++) + { + if (((pthcmp_t)left).b[i] > ((pthcmp_t)right).b[i]) + return 1; + else if (((pthcmp_t)left).b[i] < ((pthcmp_t)right).b[i]) + return -1; + } + return 0; +} + + +Result thus far + +We can't remove undefined bits if they are there in pthread_t already, but we have +attempted to render them inert for comparison and hashing functions by making them +consistent through assignment, copy and pass-by-value. + +Note: Hashing pthread_t values requires that all pthread_t variables be initialised +to the same value (usually all zeros) before being assigned a proper thread ID, i.e. +to ensure that any padding bits are zero, or at least the same value for all +pthread_t. Since all pthread_t values are generated by the library in the first +instance this need not be an application-level operation. + + +Conclusion + +I've attempted to resolve the multiple issues of type opacity and the possible +presence of undefined bits and bytes in pthread_t values, which prevent +applications from comparing or hashing pthread handles. + +Two complimentary partial solutions have been proposed, one an application-level +scheme to handle both scalar and aggregate pthread_t types equally, plus a +definition of pthread_t itself that neutralises padding bits and bytes by +coercing semantics out of the compiler to eliminate variations in the values of +padding bits. + +I have not provided any solution to the problem of handling extra values embedded +in pthread_t, e.g. debugging or trap information that an implementation is entitled +to include. Therefore none of this replaces the portability and flexibility of API +functions but what functions are needed? The threads standard is unlikely to +include that can be implemented by a combination of existing features and more +generic functions (several references in the threads rationale suggest this. +Therefore I propose that the following function could replace the several functions +that have been suggested in conversations: + +pthread_t * pthread_normalize(pthread_t * handle); + +For most existing pthreads implementations this function, or macro, would reduce to +a no-op with zero call overhead. -- cgit v1.2.3