1 files changed, 485 insertions, 2 deletions
diff --git a/pthreads/README.NONPORTABLE b/pthreads/README.NONPORTABLE
index e2d69a72b..0821104d8 100644
--- a/pthreads/README.NONPORTABLE
+++ b/pthreads/README.NONPORTABLE
@@ -15,6 +15,11 @@ pthread_win32_test_features_np(int mask)
 		PTW32_SYSTEM_INTERLOCKED_COMPARE_EXCHANGE
 			Return TRUE if the native version of
 			InterlockedCompareExchange() is being used.
+			This feature is not meaningful in recent
+			library versions as MSVC builds only support
+			system implemented ICE. Note that all Mingw
+			builds use inlined asm versions of all the
+			Interlocked routines.
 		PTW32_ALERTABLE_ASYNC_CANCEL
 			Return TRUE is the QueueUserAPCEx package
 			QUSEREX.DLL is available and the AlertDrv.sys
@@ -75,11 +80,11 @@ pthread_getw32threadhandle_np(pthread_t thread);
 DWORD
 pthread_getw32threadid_np (pthread_t thread)
 
-	Returns the win32 thread ID that the POSIX
+	Returns the Windows native thread ID that the POSIX
 	thread "thread" is running as.
 
         Only valid when the library is built where
-        ! defined (__MINGW32__) || defined (__MSVCRT__) || defined (__DMC__)
+        ! (defined(__MINGW64__) || defined(__MINGW32__)) || defined (__MSVCRT__) || defined (__DMC__)
         and otherwise returns 0.
 
 
@@ -298,3 +303,481 @@ Thread priority
 	If it wishes, a Win32 application using pthreads-win32 can use the Win32
 	defined priority macros THREAD_PRIORITY_IDLE through
 	THREAD_PRIORITY_TIME_CRITICAL.
+
+
+The opacity of the pthread_t datatype
+-------------------------------------
+and possible solutions for portable null/compare/hash, etc
+----------------------------------------------------------
+
+Because pthread_t is an opague datatype an implementation is permitted to define
+pthread_t in any way it wishes. That includes defining some bits, if it is
+scalar, or members, if it is an aggregate, to store information that may be
+extra to the unique identifying value of the ID. As a result, pthread_t values
+may not be directly comparable.
+
+If you want your code to be portable you must adhere to the following contraints:
+
+1) Don't assume it is a scalar data type, e.g. an integer or pointer value. There
+are several other implementations where pthread_t is also a struct. See our FAQ
+Question 11 for our reasons for defining pthread_t as a struct.
+
+2) You must not compare them using relational or equality operators. You must use
+the API function pthread_equal() to test for equality.
+
+3) Never attempt to reference individual members.
+
+
+The problem
+
+Certain applications would like to be able to access only the 'pure' pthread_t
+id values, primarily to use as keys into data structures to manage threads or
+thread-related data, but this is not possible in a maximally portable and
+standards compliant way for current POSIX threads implementations.
+
+For implementations that define pthread_t as a scalar, programmers often employ
+direct relational and equality operators on pthread_t. This code will break when
+ported to an implementation that defines pthread_t as an aggregate type.
+
+For implementations that define pthread_t as an aggregate, e.g. a struct,
+programmers can use memcmp etc., but then face the prospect that the struct may
+include alignment padding bytes or bits as well as extra implementation-specific
+members that are not part of the unique identifying value.
+
+[While this is not currently the case for pthreads-win32, opacity also
+means that an implementation is free to change the definition, which should
+generally only require that applications be recompiled and relinked, not
+rewritten.]
+
+
+Doesn't the compiler take care of padding?
+
+The C89 and later standards only effectively guarrantee element-by-element
+equivalence following an assignment or pass by value of a struct or union,
+therefore undefined areas of any two otherwise equivalent pthread_t instances
+can still compare differently, e.g. attempting to compare two such pthread_t
+variables byte-by-byte, e.g. memcmp(&t1, &t2, sizeof(pthread_t) may give an
+incorrect result. In practice I'm reasonably confident that compilers routinely
+also copy the padding bytes, mainly because assignment of unions would be far
+too complicated otherwise. But it just isn't guarranteed by the standard.
+
+Illustration:
+
+We have two thread IDs t1 and t2
+
+pthread_t t1, t2;
+
+In an application we create the threads and intend to store the thread IDs in an
+ordered data structure (linked list, tree, etc) so we need to be able to compare
+them in order to insert them initially and also to traverse.
+
+Suppose pthread_t contains undefined padding bits and our compiler copies our
+pthread_t [struct] element-by-element, then for the assignment:
+
+pthread_t temp = t1;
+
+temp and t1 will be equivalent and correct but a byte-for-byte comparison such as
+memcmp(&temp, &t1, sizeof(pthread_t)) == 0 may not return true as we expect because
+the undefined bits may not have the same values in the two variable instances.
+
+Similarly if passing by value under the same conditions.
+
+If, on the other hand, the undefined bits are at least constant through every
+assignment and pass-by-value then the byte-for-byte comparison
+memcmp(&temp, &t1, sizeof(pthread_t)) == 0 will always return the expected result.
+How can we force the behaviour we need?
+
+
+Solutions
+
+Adding new functions to the standard API or as non-portable extentions is
+the only reliable and portable way to provide the necessary operations.
+Remember also that POSIX is not tied to the C language. The most common
+functions that have been suggested are:
+
+pthread_null()
+pthread_compare()
+pthread_hash()
+
+A single more general purpose function could also be defined as a
+basis for at least the last two of the above functions.
+
+First we need to list the freedoms and constraints with restpect
+to pthread_t so that we can be sure our solution is compatible with the
+standard.
+
+What is known or may be deduced from the standard:
+1) pthread_t must be able to be passed by value, so it must be a single object.
+2) from (1) it must be copyable so cannot embed thread-state information, locks
+or other volatile objects required to manage the thread it associates with.
+3) pthread_t may carry additional information, e.g. for debugging or to manage
+itself.
+4) there is an implicit requirement that the size of pthread_t is determinable
+at compile-time and size-invariant, because it must be able to copy the object
+(i.e. through assignment and pass-by-value). Such copies must be genuine
+duplicates, not merely a copy of a pointer to a common instance such as
+would be the case if pthread_t were defined as an array.
+
+
+Suppose we define the following function:
+
+/* This function shall return it's argument */
+pthread_t* pthread_normalize(pthread_t* thread);
+
+For scalar or aggregate pthread_t types this function would simply zero any bits
+within the pthread_t that don't uniquely identify the thread, including padding,
+such that client code can return consistent results from operations done on the
+result. If the additional bits are a pointer to an associate structure then
+this function would ensure that the memory used to store that associate
+structure does not leak. After normalization the following compare would be
+valid and repeatable:
+
+memcmp(pthread_normalize(&t1),pthread_normalize(&t2),sizeof(pthread_t))
+
+Note 1: such comparisons are intended merely to order and sort pthread_t values
+and allow them to index various data structures. They are not intended to reveal
+anything about the relationships between threads, like startup order.
+
+Note 2: the normalized pthread_t is also a valid pthread_t that uniquely
+identifies the same thread.
+
+Advantages:
+1) In most existing implementations this function would reduce to a no-op that
+emits no additional instructions, i.e after in-lining or optimisation, or if
+defined as a macro:
+#define pthread_normalise(tptr) (tptr)
+
+2) This single function allows an application to portably derive
+application-level versions of any of the other required functions.
+
+3) It is a generic function that could enable unanticipated uses.
+
+Disadvantages:
+1) Less efficient than dedicated compare or hash functions for implementations
+that include significant extra non-id elements in pthread_t.
+
+2) Still need to be concerned about padding if copying normalized pthread_t.
+See the later section on defining pthread_t to neutralise padding issues.
+
+Generally a pthread_t may need to be normalized every time it is used,
+which could have a significant impact. However, this is a design decision
+for the implementor in a competitive environment. An implementation is free
+to define a pthread_t in a way that minimises or eliminates padding or
+renders this function a no-op.
+
+Hazards:
+1) Pass-by-reference directly modifies 'thread' so the application must
+synchronise access or ensure that the pointer refers to a copy. The alternative
+of pass-by-value/return-by-value was considered but then this requires two copy
+operations, disadvantaging implementations where this function is not a no-op
+in terms of speed of execution. This function is intended to be used in high
+frequency situations and needs to be efficient, or at least not unnecessarily
+inefficient. The alternative also sits awkwardly with functions like memcmp.
+
+2) [Non-compliant] code that uses relational and equality operators on
+arithmetic or pointer style pthread_t types would need to be rewritten, but it
+should be rewritten anyway.
+
+
+C implementation of null/compare/hash functions using pthread_normalize():
+
+/* In pthread.h */
+pthread_t* pthread_normalize(pthread_t* thread);
+
+/* In user code */
+/* User-level bitclear function - clear bits in loc corresponding to mask */
+void* bitclear (void* loc, void* mask, size_t count);
+
+typedef unsigned int hash_t;
+
+/* User-level hash function */
+hash_t hash(void* ptr, size_t count);
+
+/*
+ * User-level pthr_null function - modifies the origin thread handle.
+ * The concept of a null pthread_t is highly implementation dependent
+ * and this design may be far from the mark. For example, in an
+ * implementation "null" may mean setting a special value inside one
+ * element of pthread_t to mean "INVALID". However, if that value was zero and
+ * formed part of the id component then we may get away with this design.
+ */
+pthread_t* pthr_null(pthread_t* tp)
+{
+  /* 
+   * This should have the same effect as memset(tp, 0, sizeof(pthread_t))
+   * We're just showing that we can do it.
+   */
+  void* p = (void*) pthread_normalize(tp);
+  return (pthread_t*) bitclear(p, p, sizeof(pthread_t));
+}
+
+/*
+ * Safe user-level pthr_compare function - modifies temporary thread handle copies
+ */
+int pthr_compare_safe(pthread_t thread1, pthread_t thread2)
+{
+  return memcmp(pthread_normalize(&thread1), pthread_normalize(&thread2), sizeof(pthread_t));
+}
+
+/*
+ * Fast user-level pthr_compare function - modifies origin thread handles
+ */
+int pthr_compare_fast(pthread_t* thread1, pthread_t* thread2)
+{
+  return memcmp(pthread_normalize(&thread1), pthread_normalize(&thread2), sizeof(pthread_t));
+}
+
+/*
+ * Safe user-level pthr_hash function - modifies temporary thread handle copy
+ */
+hash_t pthr_hash_safe(pthread_t thread)
+{
+  return hash((void *) pthread_normalize(&thread), sizeof(pthread_t));
+}
+
+/*
+ * Fast user-level pthr_hash function - modifies origin thread handle
+ */
+hash_t pthr_hash_fast(pthread_t thread)
+{
+  return hash((void *) pthread_normalize(&thread), sizeof(pthread_t));
+}
+
+/* User-level bitclear function - modifies the origin array */
+void* bitclear(void* loc, void* mask, size_t count)
+{
+  int i;
+  for (i=0; i < count; i++) {
+    (unsigned char) *loc++ &= ~((unsigned char) *mask++);
+  }
+}
+
+/* Donald Knuth hash */
+hash_t hash(void* str, size_t count)
+{
+   hash_t hash = (hash_t) count;
+   unsigned int i = 0;
+
+   for(i = 0; i < len; str++, i++)
+   {
+      hash = ((hash << 5) ^ (hash >> 27)) ^ (*str);
+   }
+   return hash;
+}
+
+/* Example of advantage point (3) - split a thread handle into its id and non-id values */
+pthread_t id = thread, non-id = thread;
+bitclear((void*) &non-id, (void*) pthread_normalize(&id), sizeof(pthread_t));
+
+
+A pthread_t type change proposal to neutralise the effects of padding
+
+Even if pthread_nornalize() is available, padding is still a problem because
+the standard only garrantees element-by-element equivalence through
+copy operations (assignment and pass-by-value). So padding bit values can
+still change randomly after calls to pthread_normalize().
+
+[I suspect that most compilers take the easy path and always byte-copy anyway,
+partly because it becomes too complex to do (e.g. unions that contain sub-aggregates)
+but also because programmers can easily design their aggregates to minimise and
+often eliminate padding].
+
+How can we eliminate the problem of padding bytes in structs? Could
+defining pthread_t as a union rather than a struct provide a solution?
+
+In fact, the Linux pthread.h defines most of it's pthread_*_t objects (but not
+pthread_t itself) as unions, possibly for this and/or other reasons. We'll
+borrow some element naming from there but the ideas themselves are well known
+- the __align element used to force alignment of the union comes from K&R's
+storage allocator example.
+
+/* Essentially our current pthread_t renamed */
+typedef struct {
+  struct thread_state_t * __p;
+  long __x; /* sequence counter */
+} thread_id_t;
+
+Ensuring that the last element in the above struct is a long ensures that the
+overall struct size is a multiple of sizeof(long), so there should be no trailing
+padding in this struct or the union we define below.
+(Later we'll see that we can handle internal but not trailing padding.)
+
+/* New pthread_t */
+typedef union {
+  char __size[sizeof(thread_id_t)]; /* array as the first element */
+  thread_id_t __tid;
+  long __align;  /* Ensure that the union starts on long boundary */
+} pthread_t;
+
+This guarrantees that, during an assignment or pass-by-value, the compiler copies
+every byte in our thread_id_t because the compiler guarrantees that the __size
+array, which we have ensured is the equal-largest element in the union, retains
+equivalence.
+
+This means that pthread_t values stored, assigned and passed by value will at least
+carry the value of any undefined padding bytes along and therefore ensure that
+those values remain consistent. Our comparisons will return consistent results and
+our hashes of [zero initialised] pthread_t values will also return consistent
+results.
+
+We have also removed the need for a pthread_null() function; we can initialise
+at declaration time or easily create our own const pthread_t to use in assignments
+later:
+
+const pthread_t null_tid = {0}; /* braces are required */
+
+pthread_t t;
+...
+t = null_tid;
+
+
+Note that we don't have to explicitly make use of the __size array at all. It's
+there just to force the compiler behaviour we want.
+
+
+Partial solutions without a pthread_normalize function
+
+
+An application-level pthread_null and pthread_compare proposal
+(and pthread_hash proposal by extention)
+
+In order to deal with the problem of scalar/aggregate pthread_t type disparity in
+portable code I suggest using an old-fashioned union, e.g.:
+
+Contraints:
+- there is no padding, or padding values are preserved through assignment and
+  pass-by-value (see above);
+- there are no extra non-id values in the pthread_t.
+
+
+Example 1: A null initialiser for pthread_t variables...
+
+typedef union {
+    unsigned char b[sizeof(pthread_t)];
+    pthread_t t;
+} init_t;
+
+const init_t initial = {0};
+
+pthread_t tid = initial.t; /* init tid to all zeroes */
+
+
+Example 2: A comparison function for pthread_t values
+
+typedef union {
+   unsigned char b[sizeof(pthread_t)];
+   pthread_t t;
+} pthcmp_t;
+
+int pthcmp(pthread_t left, pthread_t right)
+{
+  /*
+  * Compare two pthread handles in a way that imposes a repeatable but arbitrary
+  * ordering on them.
+  * I.e. given the same set of pthread_t handles the ordering should be the same
+  * each time but the order has no particular meaning other than that. E.g.
+  * the ordering does not imply the thread start sequence, or any other
+  * relationship between threads.
+  *
+  * Return values are:
+  * 1 : left is greater than right
+  * 0 : left is equal to right
+  * -1 : left is less than right
+  */
+  int i;
+  pthcmp_t L, R;
+  L.t = left;
+  R.t = right;
+  for (i = 0; i < sizeof(pthread_t); i++)
+  {
+    if (L.b[i] > R.b[i])
+      return 1;
+    else if (L.b[i] < R.b[i])
+      return -1;
+  }
+  return 0;
+}
+
+It has been pointed out that the C99 standard allows for the possibility that
+integer types also may include padding bits, which could invalidate the above
+method. This addition to C99 was specifically included after it was pointed
+out that there was one, presumably not particularly well known, architecture
+that included a padding bit in it's 32 bit integer type. See section 6.2.6.2
+of both the standard and the rationale, specifically the paragraph starting at
+line 16 on page 43 of the rationale.
+
+
+An aside
+
+Certain compilers, e.g. gcc and one of the IBM compilers, include a feature
+extention: provided the union contains a member of the same type as the
+object then the object may be cast to the union itself.
+
+We could use this feature to speed up the pthrcmp() function from example 2
+above by casting rather than assigning the pthread_t arguments to the union, e.g.:
+
+int pthcmp(pthread_t left, pthread_t right)
+{
+  /*
+  * Compare two pthread handles in a way that imposes a repeatable but arbitrary
+  * ordering on them.
+  * I.e. given the same set of pthread_t handles the ordering should be the same
+  * each time but the order has no particular meaning other than that. E.g.
+  * the ordering does not imply the thread start sequence, or any other
+  * relationship between threads.
+  *
+  * Return values are:
+  * 1 : left is greater than right
+  * 0 : left is equal to right
+  * -1 : left is less than right
+  */
+  int i;
+  for (i = 0; i < sizeof(pthread_t); i++)
+  {
+    if (((pthcmp_t)left).b[i] > ((pthcmp_t)right).b[i])
+      return 1;
+    else if (((pthcmp_t)left).b[i] < ((pthcmp_t)right).b[i])
+      return -1;
+  }
+  return 0;
+}
+
+
+Result thus far
+
+We can't remove undefined bits if they are there in pthread_t already, but we have
+attempted to render them inert for comparison and hashing functions by making them
+consistent through assignment, copy and pass-by-value.
+
+Note: Hashing pthread_t values requires that all pthread_t variables be initialised
+to the same value (usually all zeros) before being assigned a proper thread ID, i.e.
+to ensure that any padding bits are zero, or at least the same value for all
+pthread_t. Since all pthread_t values are generated by the library in the first
+instance this need not be an application-level operation.
+
+
+Conclusion
+
+I've attempted to resolve the multiple issues of type opacity and the possible
+presence of undefined bits and bytes in pthread_t values, which prevent
+applications from comparing or hashing pthread handles.
+
+Two complimentary partial solutions have been proposed, one an application-level
+scheme to handle both scalar and aggregate pthread_t types equally, plus a
+definition of pthread_t itself that neutralises padding bits and bytes by
+coercing semantics out of the compiler to eliminate variations in the values of
+padding bits.
+
+I have not provided any solution to the problem of handling extra values embedded
+in pthread_t, e.g. debugging or trap information that an implementation is entitled
+to include. Therefore none of this replaces the portability and flexibility of API
+functions but what functions are needed? The threads standard is unlikely to
+include that can be implemented by a combination of existing features and more
+generic functions (several references in the threads rationale suggest this.
+Therefore I propose that the following function could replace the several functions
+that have been suggested in conversations:
+
+pthread_t * pthread_normalize(pthread_t * handle);
+
+For most existing pthreads implementations this function, or macro, would reduce to
+a no-op with zero call overhead.