diff options
-rw-r--r-- | mesalib/docs/news.html | 7 | ||||
-rw-r--r-- | mesalib/docs/relnotes-7.10.2.html | 206 | ||||
-rw-r--r-- | mesalib/docs/relnotes.html | 1 | ||||
-rw-r--r-- | mesalib/src/mesa/main/arrayobj.c | 1128 | ||||
-rw-r--r-- | mesalib/src/mesa/main/extensions.c | 2 | ||||
-rw-r--r-- | mesalib/src/mesa/main/state.c | 33 | ||||
-rw-r--r-- | mesalib/src/mesa/main/varray.c | 2 | ||||
-rw-r--r-- | mesalib/src/mesa/main/varray.h | 527 | ||||
-rw-r--r-- | pixman/Makefile.am | 7 | ||||
-rw-r--r-- | pixman/pixman/pixman-arm-neon-asm.S | 5435 | ||||
-rw-r--r-- | xorg-server/Xext/geext.c | 9 | ||||
-rw-r--r-- | xorg-server/dix/eventconvert.c | 14 | ||||
-rw-r--r-- | xorg-server/dix/getevents.c | 2632 | ||||
-rw-r--r-- | xorg-server/xkeyboard-config/rules/base.xml.in | 2 |
14 files changed, 5095 insertions, 4910 deletions
diff --git a/mesalib/docs/news.html b/mesalib/docs/news.html index 6a706ab3c..a6a658aef 100644 --- a/mesalib/docs/news.html +++ b/mesalib/docs/news.html @@ -11,6 +11,13 @@ <H1>News</H1> +<h2>April 6, 2011</h2> + +<p> +<a href="relnotes-7.10.2.html">Mesa 7.10.2</a> is released. This is a bug +fix release release. +</p> + <h2>March 2, 2011</h2> <p> diff --git a/mesalib/docs/relnotes-7.10.2.html b/mesalib/docs/relnotes-7.10.2.html new file mode 100644 index 000000000..55b6794a1 --- /dev/null +++ b/mesalib/docs/relnotes-7.10.2.html @@ -0,0 +1,206 @@ +<HTML> + +<head> +<TITLE>Mesa Release Notes</TITLE> +<link rel="stylesheet" type="text/css" href="mesa.css"> +<meta http-equiv="content-type" content="text/html; charset=utf-8" /> +</head> + +<BODY> + +<body bgcolor="#eeeeee"> + +<H1>Mesa 7.10.2 Release Notes / April 6, 2011</H1> + +<p> +Mesa 7.10.2 is a bug fix release which fixes bugs found since the 7.10 release. +</p> +<p> +Mesa 7.10.2 implements the OpenGL 2.1 API, but the version reported by +glGetString(GL_VERSION) depends on the particular driver being used. +Some drivers don't support all the features required in OpenGL 2.1. +</p> +<p> +See the <a href="install.html">Compiling/Installing page</a> for prerequisites +for DRI hardware acceleration. +</p> + + +<h2>MD5 checksums</h2> +<pre> +2f9f444265534a2cfd9a99d1a8291089 MesaLib-7.10.2.tar.gz +f5de82852f1243f42cc004039e10b771 MesaLib-7.10.2.tar.bz2 +47836e37bab6fcafe3ac90c9544ba0e9 MesaLib-7.10.2.zip +175120325828f313621cc5bc6c504803 MesaGLUT-7.10.2.tar.gz +8c71d273f5f8d6c5eda4ffc39e0fe03e MesaGLUT-7.10.2.tar.bz2 +03036c8efe7b791a90fa0f2c41b43f43 MesaGLUT-7.10.2.zip +</pre> + + +<h2>New features</h2> +<p>None.</p> + +<h2>Bug fixes</h2> +<p>This list is likely incomplete.</p> +<ul> + +<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=29172">Bug 29172</a> - Arrandale - Pill Popper Pops Pills</li> + +<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31159">Bug 31159</a> - shadow problem in 0ad game</li> + +<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=32688">Bug 32688</a> - [RADEON:KMS:R300G] some games have a wireframe or outline visible</li> + +<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=32949">Bug 32949</a> - [glsl wine] Need for Speed renders incorrectly with GLSL enabled</li> + +<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=34203">Bug 34203</a> - [GLSL] fail to call long chains across shaders</li> + +<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=34376">Bug 34376</a> - [GLSL] allowing assignment to unsized array + <ul> + <li>The commit message incorrectly + lists <a href="https://bugs.freedesktop.org/show_bug.cgi?id=34367">bug + 34367</a>.</li> + </ul> +</li> + +<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=34370">Bug 34370</a> - [GLSL] "i<5 && i<4" in for loop fails</li> + +<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=34374">Bug 34374</a> - [GLSL] fail to redeclare an array using initializer</li> + +<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=35073">Bug 35073</a> - [GM45] Alpha test is broken when rendering to FBO with no color attachment</li> + +<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=35483">Bug 35483</a> - util_blit_pixels_writemask: crash in line 322 of src/gallium/auxiliary/util/u_blit.c</li> + +<!-- <li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=">Bug </a> - </li> --> + +</ul> + + +<h2>Changes</h2> +<p>The full set of changes can be viewed by using the following GIT command:</p> + +<pre> + git log mesa-7.10.1..mesa-7.10.2 +</pre> + +<p><em>Note:</em> Reverted commits and the reverts are not included in this list.</p> + +<p>Alex Deucher (2): +<ul> + <li>r600c: add new ontario pci ids</li> + <li>r600g: add some additional ontario pci ids</li> +</ul></p> + +<p>Benjamin Franzke (1): +<ul> + <li>st/dri: Fix surfaceless gl using contexts with previous bound surfaces</li> +</ul></p> + +<p>Brian Paul (9): +<ul> + <li>docs: pull 7.9.2 release notes into 7.10 branch</li> + <li>docs: update news.html with 7.10.1 and 7.9.2 releases</li> + <li>docs: fill in 7.10.1 release data</li> + <li>docs: add, fix release notes links</li> + <li>docs: update info about Mesa packaging/contents</li> + <li>docs: update prerequisites, remove old demo info</li> + <li>mesa: Guard against null pointer deref in fbo validation</li> + <li>st/mesa: Apply LOD bias from correct texture unit</li> + <li>glsl: silence warning in printf() with a cast</li> +</ul></p> + +<p>Chad Versace (1): +<ul> + <li>i965: Fix tex_swizzle when depth mode is GL_RED</li> +</ul></p> + +<p>Dave Airlie (1): +<ul> + <li>r600: don't close fd on failed load</li> +</ul></p> + +<p>Eric Anholt (2): +<ul> + <li>i965: Apply a workaround for the Ironlake "vertex flashing".</li> + <li>i965: Fix alpha testing when there is no color buffer in the FBO.</li> +</ul></p> + +<p>Fabian Bieler (1): +<ul> + <li>st/mesa: Apply LOD from texture object</li> +</ul></p> + +<p>Henri Verbeet (1): +<ul> + <li>st/mesa: Validate state before doing blits.</li> +</ul></p> + +<p>Ian Romanick (13): +<ul> + <li>docs: Add 7.10.1 md5sums</li> + <li>glsl: Refactor AST-to-HIR code handling variable initializers</li> + <li>glsl: Refactor AST-to-HIR code handling variable redeclarations</li> + <li>glsl: Process redeclarations before initializers</li> + <li>glsl: Function signatures cannot have NULL return type</li> + <li>glsl: Add several function / call related validations</li> + <li>linker: Add imported functions to the linked IR</li> + <li>glsl: Use insert_before for lists instead of open coding it</li> + <li>glsl: Only allow unsized array assignment in an initializer</li> + <li>glcpp: Refresh autogenerated lexer files</li> + <li>docs: Initial bits of 7.10.2 release notes</li> + <li>mesa: set version string to 7.10.2</li> + <li>mesa: Remove nonexistant files from _FILES lists</li> +</ul></p> + +<p>Jerome Glisse (1): +<ul> + <li>r600g: move user fence into base radeon structure</li> +</ul></p> + +<p>José Fonseca (2): +<ul> + <li>mesa: Fix typo glGet*v(GL_TEXTURE_COORD_ARRAY_*).</li> + <li>mesa: More glGet* fixes.</li> +</ul></p> + +<p>Kenneth Graunke (4): +<ul> + <li>glcpp: Rework lexer to use a SKIP state rather than REJECT.</li> + <li>glcpp: Remove trailing contexts from #if rules.</li> + <li>i965/fs: Fix linear gl_Color interpolation on pre-gen6 hardware.</li> + <li>glsl: Accept precision qualifiers on sampler types, but only in ES.</li> +</ul></p> + +<p>Marek Olšák (15): +<ul> + <li>st/mesa: fix crash when DrawBuffer->_ColorDrawBuffers[0] is NULL</li> + <li>st/mesa: fail to alloc a renderbuffer if st_choose_renderbuffer_format fails</li> + <li>r300/compiler: fix the saturate modifier when applied to TEX instructions</li> + <li>r300/compiler: fix translating the src negate bits in pair_translate</li> + <li>r300/compiler: Abs doesn't cancel Negate (in the conversion to native swizzles)</li> + <li>r300/compiler: TEX instructions don't support negation on source arguments</li> + <li>r300/compiler: do not set TEX_IGNORE_UNCOVERED on r500</li> + <li>r300/compiler: saturate Z before the shadow comparison</li> + <li>r300/compiler: fix equal and notequal shadow compare functions</li> + <li>r300/compiler: remove unused variables</li> + <li>st/mesa: fix crash when using both user and vbo buffers with the same stride</li> + <li>r300g: fix alpha-test with no colorbuffer</li> + <li>r300g: tell the GLSL compiler to lower the continue opcode</li> + <li>r300/compiler: propagate SaturateMode down to the result of shadow comparison</li> + <li>r300/compiler: apply the texture swizzle to shadow pass and fail values too</li> +</ul></p> + +<p>Michel Dänzer (1): +<ul> + <li>Use proper source row stride when getting depth/stencil texels.</li> +</ul></p> + +<p>Tom Stellard (4): +<ul> + <li>r300/compiler: Use a 4-bit writemask in pair instructions</li> + <li>prog_optimize: Fix reallocating registers for shaders with loops</li> + <li>r300/compiler: Fix vertex shader MAD instructions with constant swizzles</li> + <li>r300/compiler: Don't try to convert RGB to Alpha in full instructions</li> +</ul></p> + +</body> +</html> diff --git a/mesalib/docs/relnotes.html b/mesalib/docs/relnotes.html index b0ca3ef43..c1a7ab78d 100644 --- a/mesalib/docs/relnotes.html +++ b/mesalib/docs/relnotes.html @@ -14,6 +14,7 @@ The release notes summarize what's new or changed in each Mesa release. <UL> <LI><A HREF="relnotes-7.11.html">7.11 release notes</A> +<LI><A HREF="relnotes-7.10.2.html">7.10.2 release notes</A> <LI><A HREF="relnotes-7.10.1.html">7.10.1 release notes</A> <LI><A HREF="relnotes-7.10.html">7.10 release notes</A> <LI><A HREF="relnotes-7.9.2.html">7.9.2 release notes</A> diff --git a/mesalib/src/mesa/main/arrayobj.c b/mesalib/src/mesa/main/arrayobj.c index 1033ce639..85a8e0e56 100644 --- a/mesalib/src/mesa/main/arrayobj.c +++ b/mesalib/src/mesa/main/arrayobj.c @@ -1,579 +1,549 @@ -/*
- * Mesa 3-D graphics library
- * Version: 7.6
- *
- * Copyright (C) 1999-2008 Brian Paul All Rights Reserved.
- * (C) Copyright IBM Corporation 2006
- * Copyright (C) 2009 VMware, Inc. All Rights Reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included
- * in all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
- * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * BRIAN PAUL OR IBM BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
- * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
- * OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-
-
-/**
- * \file arrayobj.c
- * Functions for the GL_APPLE_vertex_array_object extension.
- *
- * \todo
- * The code in this file borrows a lot from bufferobj.c. There's a certain
- * amount of cruft left over from that origin that may be unnecessary.
- *
- * \author Ian Romanick <idr@us.ibm.com>
- * \author Brian Paul
- */
-
-
-#include "glheader.h"
-#include "hash.h"
-#include "imports.h"
-#include "context.h"
-#include "mfeatures.h"
-#if FEATURE_ARB_vertex_buffer_object
-#include "bufferobj.h"
-#endif
-#include "arrayobj.h"
-#include "macros.h"
-#include "mtypes.h"
-#include "main/dispatch.h"
-
-
-/**
- * Look up the array object for the given ID.
- *
- * \returns
- * Either a pointer to the array object with the specified ID or \c NULL for
- * a non-existent ID. The spec defines ID 0 as being technically
- * non-existent.
- */
-
-static INLINE struct gl_array_object *
-lookup_arrayobj(struct gl_context *ctx, GLuint id)
-{
- if (id == 0)
- return NULL;
- else
- return (struct gl_array_object *)
- _mesa_HashLookup(ctx->Array.Objects, id);
-}
-
-
-/**
- * For all the vertex arrays in the array object, unbind any pointers
- * to any buffer objects (VBOs).
- * This is done just prior to array object destruction.
- */
-static void
-unbind_array_object_vbos(struct gl_context *ctx, struct gl_array_object *obj)
-{
- GLuint i;
-
- _mesa_reference_buffer_object(ctx, &obj->Vertex.BufferObj, NULL);
- _mesa_reference_buffer_object(ctx, &obj->Weight.BufferObj, NULL);
- _mesa_reference_buffer_object(ctx, &obj->Normal.BufferObj, NULL);
- _mesa_reference_buffer_object(ctx, &obj->Color.BufferObj, NULL);
- _mesa_reference_buffer_object(ctx, &obj->SecondaryColor.BufferObj, NULL);
- _mesa_reference_buffer_object(ctx, &obj->FogCoord.BufferObj, NULL);
- _mesa_reference_buffer_object(ctx, &obj->Index.BufferObj, NULL);
- _mesa_reference_buffer_object(ctx, &obj->EdgeFlag.BufferObj, NULL);
-
- for (i = 0; i < Elements(obj->TexCoord); i++)
- _mesa_reference_buffer_object(ctx, &obj->TexCoord[i].BufferObj, NULL);
-
- for (i = 0; i < Elements(obj->VertexAttrib); i++)
- _mesa_reference_buffer_object(ctx, &obj->VertexAttrib[i].BufferObj,NULL);
-
-#if FEATURE_point_size_array
- _mesa_reference_buffer_object(ctx, &obj->PointSize.BufferObj, NULL);
-#endif
-}
-
-
-/**
- * Allocate and initialize a new vertex array object.
- *
- * This function is intended to be called via
- * \c dd_function_table::NewArrayObject.
- */
-struct gl_array_object *
-_mesa_new_array_object( struct gl_context *ctx, GLuint name )
-{
- struct gl_array_object *obj = CALLOC_STRUCT(gl_array_object);
- if (obj)
- _mesa_initialize_array_object(ctx, obj, name);
- return obj;
-}
-
-
-/**
- * Delete an array object.
- *
- * This function is intended to be called via
- * \c dd_function_table::DeleteArrayObject.
- */
-void
-_mesa_delete_array_object( struct gl_context *ctx, struct gl_array_object *obj )
-{
- (void) ctx;
- unbind_array_object_vbos(ctx, obj);
- _glthread_DESTROY_MUTEX(obj->Mutex);
- free(obj);
-}
-
-
-/**
- * Set ptr to arrayObj w/ reference counting.
- */
-void
-_mesa_reference_array_object(struct gl_context *ctx,
- struct gl_array_object **ptr,
- struct gl_array_object *arrayObj)
-{
- if (*ptr == arrayObj)
- return;
-
- if (*ptr) {
- /* Unreference the old array object */
- GLboolean deleteFlag = GL_FALSE;
- struct gl_array_object *oldObj = *ptr;
-
- _glthread_LOCK_MUTEX(oldObj->Mutex);
- ASSERT(oldObj->RefCount > 0);
- oldObj->RefCount--;
-#if 0
- printf("ArrayObj %p %d DECR to %d\n",
- (void *) oldObj, oldObj->Name, oldObj->RefCount);
-#endif
- deleteFlag = (oldObj->RefCount == 0);
- _glthread_UNLOCK_MUTEX(oldObj->Mutex);
-
- if (deleteFlag) {
- ASSERT(ctx->Driver.DeleteArrayObject);
- ctx->Driver.DeleteArrayObject(ctx, oldObj);
- }
-
- *ptr = NULL;
- }
- ASSERT(!*ptr);
-
- if (arrayObj) {
- /* reference new array object */
- _glthread_LOCK_MUTEX(arrayObj->Mutex);
- if (arrayObj->RefCount == 0) {
- /* this array's being deleted (look just above) */
- /* Not sure this can every really happen. Warn if it does. */
- _mesa_problem(NULL, "referencing deleted array object");
- *ptr = NULL;
- }
- else {
- arrayObj->RefCount++;
-#if 0
- printf("ArrayObj %p %d INCR to %d\n",
- (void *) arrayObj, arrayObj->Name, arrayObj->RefCount);
-#endif
- *ptr = arrayObj;
- }
- _glthread_UNLOCK_MUTEX(arrayObj->Mutex);
- }
-}
-
-
-
-static void
-init_array(struct gl_context *ctx,
- struct gl_client_array *array, GLint size, GLint type)
-{
- array->Size = size;
- array->Type = type;
- array->Format = GL_RGBA; /* only significant for GL_EXT_vertex_array_bgra */
- array->Stride = 0;
- array->StrideB = 0;
- array->Ptr = NULL;
- array->Enabled = GL_FALSE;
- array->Normalized = GL_FALSE;
-#if FEATURE_ARB_vertex_buffer_object
- /* Vertex array buffers */
- _mesa_reference_buffer_object(ctx, &array->BufferObj,
- ctx->Shared->NullBufferObj);
-#endif
-}
-
-
-/**
- * Initialize a gl_array_object's arrays.
- */
-void
-_mesa_initialize_array_object( struct gl_context *ctx,
- struct gl_array_object *obj,
- GLuint name )
-{
- GLuint i;
-
- obj->Name = name;
-
- _glthread_INIT_MUTEX(obj->Mutex);
- obj->RefCount = 1;
-
- /* Init the individual arrays */
- init_array(ctx, &obj->Vertex, 4, GL_FLOAT);
- init_array(ctx, &obj->Weight, 1, GL_FLOAT);
- init_array(ctx, &obj->Normal, 3, GL_FLOAT);
- init_array(ctx, &obj->Color, 4, GL_FLOAT);
- init_array(ctx, &obj->SecondaryColor, 3, GL_FLOAT);
- init_array(ctx, &obj->FogCoord, 1, GL_FLOAT);
- init_array(ctx, &obj->Index, 1, GL_FLOAT);
- for (i = 0; i < Elements(obj->TexCoord); i++) {
- init_array(ctx, &obj->TexCoord[i], 4, GL_FLOAT);
- }
- init_array(ctx, &obj->EdgeFlag, 1, GL_BOOL);
- for (i = 0; i < Elements(obj->VertexAttrib); i++) {
- init_array(ctx, &obj->VertexAttrib[i], 4, GL_FLOAT);
- }
-
-#if FEATURE_point_size_array
- init_array(ctx, &obj->PointSize, 1, GL_FLOAT);
-#endif
-}
-
-
-/**
- * Add the given array object to the array object pool.
- */
-static void
-save_array_object( struct gl_context *ctx, struct gl_array_object *obj )
-{
- if (obj->Name > 0) {
- /* insert into hash table */
- _mesa_HashInsert(ctx->Array.Objects, obj->Name, obj);
- }
-}
-
-
-/**
- * Remove the given array object from the array object pool.
- * Do not deallocate the array object though.
- */
-static void
-remove_array_object( struct gl_context *ctx, struct gl_array_object *obj )
-{
- if (obj->Name > 0) {
- /* remove from hash table */
- _mesa_HashRemove(ctx->Array.Objects, obj->Name);
- }
-}
-
-
-
-/**
- * Compute the index of the last array element that can be safely accessed
- * in a vertex array. We can really only do this when the array lives in
- * a VBO.
- * The array->_MaxElement field will be updated.
- * Later in glDrawArrays/Elements/etc we can do some bounds checking.
- */
-static void
-compute_max_element(struct gl_client_array *array)
-{
- if (array->BufferObj->Name) {
- /* Compute the max element we can access in the VBO without going
- * out of bounds.
- */
- array->_MaxElement = ((GLsizeiptrARB) array->BufferObj->Size
- - (GLsizeiptrARB) array->Ptr + array->StrideB
- - array->_ElementSize) / array->StrideB;
- if (0)
- printf("%s Object %u Size %u MaxElement %u\n",
- __FUNCTION__,
- array->BufferObj->Name,
- (GLuint) array->BufferObj->Size,
- array->_MaxElement);
- }
- else {
- /* user-space array, no idea how big it is */
- array->_MaxElement = 2 * 1000 * 1000 * 1000; /* just a big number */
- }
-}
-
-
-/**
- * Helper for update_arrays().
- * \return min(current min, array->_MaxElement).
- */
-static GLuint
-update_min(GLuint min, struct gl_client_array *array)
-{
- compute_max_element(array);
- if (array->Enabled)
- return MIN2(min, array->_MaxElement);
- else
- return min;
-}
-
-
-/**
- * Examine vertex arrays to update the gl_array_object::_MaxElement field.
- */
-void
-_mesa_update_array_object_max_element(struct gl_context *ctx,
- struct gl_array_object *arrayObj)
-{
- GLuint i, min = ~0;
-
- min = update_min(min, &arrayObj->Vertex);
- min = update_min(min, &arrayObj->Weight);
- min = update_min(min, &arrayObj->Normal);
- min = update_min(min, &arrayObj->Color);
- min = update_min(min, &arrayObj->SecondaryColor);
- min = update_min(min, &arrayObj->FogCoord);
- min = update_min(min, &arrayObj->Index);
- min = update_min(min, &arrayObj->EdgeFlag);
-#if FEATURE_point_size_array
- min = update_min(min, &arrayObj->PointSize);
-#endif
- for (i = 0; i < ctx->Const.MaxTextureCoordUnits; i++)
- min = update_min(min, &arrayObj->TexCoord[i]);
- for (i = 0; i < Elements(arrayObj->VertexAttrib); i++)
- min = update_min(min, &arrayObj->VertexAttrib[i]);
-
- /* _MaxElement is one past the last legal array element */
- arrayObj->_MaxElement = min;
-}
-
-
-/**********************************************************************/
-/* API Functions */
-/**********************************************************************/
-
-
-/**
- * Helper for _mesa_BindVertexArray() and _mesa_BindVertexArrayAPPLE().
- * \param genRequired specifies behavour when id was not generated with
- * glGenVertexArrays().
- */
-static void
-bind_vertex_array(struct gl_context *ctx, GLuint id, GLboolean genRequired)
-{
- struct gl_array_object * const oldObj = ctx->Array.ArrayObj;
- struct gl_array_object *newObj = NULL;
- ASSERT_OUTSIDE_BEGIN_END(ctx);
-
- ASSERT(oldObj != NULL);
-
- if ( oldObj->Name == id )
- return; /* rebinding the same array object- no change */
-
- /*
- * Get pointer to new array object (newObj)
- */
- if (id == 0) {
- /* The spec says there is no array object named 0, but we use
- * one internally because it simplifies things.
- */
- newObj = ctx->Array.DefaultArrayObj;
- }
- else {
- /* non-default array object */
- newObj = lookup_arrayobj(ctx, id);
- if (!newObj) {
- if (genRequired) {
- _mesa_error(ctx, GL_INVALID_OPERATION, "glBindVertexArray(id)");
- return;
- }
-
- /* For APPLE version, generate a new array object now */
- newObj = (*ctx->Driver.NewArrayObject)(ctx, id);
- if (!newObj) {
- _mesa_error(ctx, GL_OUT_OF_MEMORY, "glBindVertexArrayAPPLE");
- return;
- }
- save_array_object(ctx, newObj);
- }
- }
-
- ctx->NewState |= _NEW_ARRAY;
- ctx->Array.NewState |= _NEW_ARRAY_ALL;
- _mesa_reference_array_object(ctx, &ctx->Array.ArrayObj, newObj);
-
- /* Pass BindVertexArray call to device driver */
- if (ctx->Driver.BindArrayObject && newObj)
- ctx->Driver.BindArrayObject(ctx, newObj);
-}
-
-
-/**
- * ARB version of glBindVertexArray()
- * This function behaves differently from glBindVertexArrayAPPLE() in
- * that this function requires all ids to have been previously generated
- * by glGenVertexArrays[APPLE]().
- */
-void GLAPIENTRY
-_mesa_BindVertexArray( GLuint id )
-{
- GET_CURRENT_CONTEXT(ctx);
- bind_vertex_array(ctx, id, GL_TRUE);
-}
-
-
-/**
- * Bind a new array.
- *
- * \todo
- * The binding could be done more efficiently by comparing the non-NULL
- * pointers in the old and new objects. The only arrays that are "dirty" are
- * the ones that are non-NULL in either object.
- */
-void GLAPIENTRY
-_mesa_BindVertexArrayAPPLE( GLuint id )
-{
- GET_CURRENT_CONTEXT(ctx);
- bind_vertex_array(ctx, id, GL_FALSE);
-}
-
-
-/**
- * Delete a set of array objects.
- *
- * \param n Number of array objects to delete.
- * \param ids Array of \c n array object IDs.
- */
-void GLAPIENTRY
-_mesa_DeleteVertexArraysAPPLE(GLsizei n, const GLuint *ids)
-{
- GET_CURRENT_CONTEXT(ctx);
- GLsizei i;
- ASSERT_OUTSIDE_BEGIN_END(ctx);
-
- if (n < 0) {
- _mesa_error(ctx, GL_INVALID_VALUE, "glDeleteVertexArrayAPPLE(n)");
- return;
- }
-
- for (i = 0; i < n; i++) {
- struct gl_array_object *obj = lookup_arrayobj(ctx, ids[i]);
-
- if ( obj != NULL ) {
- ASSERT( obj->Name == ids[i] );
-
- /* If the array object is currently bound, the spec says "the binding
- * for that object reverts to zero and the default vertex array
- * becomes current."
- */
- if ( obj == ctx->Array.ArrayObj ) {
- CALL_BindVertexArrayAPPLE( ctx->Exec, (0) );
- }
-
- /* The ID is immediately freed for re-use */
- remove_array_object(ctx, obj);
-
- /* Unreference the array object.
- * If refcount hits zero, the object will be deleted.
- */
- _mesa_reference_array_object(ctx, &obj, NULL);
- }
- }
-}
-
-
-/**
- * Generate a set of unique array object IDs and store them in \c arrays.
- * Helper for _mesa_GenVertexArrays[APPLE]() functions below.
- * \param n Number of IDs to generate.
- * \param arrays Array of \c n locations to store the IDs.
- * \param vboOnly Will arrays have to reside in VBOs?
- */
-static void
-gen_vertex_arrays(struct gl_context *ctx, GLsizei n, GLuint *arrays,
- GLboolean vboOnly)
-{
- GLuint first;
- GLint i;
- ASSERT_OUTSIDE_BEGIN_END(ctx);
-
- if (n < 0) {
- _mesa_error(ctx, GL_INVALID_VALUE, "glGenVertexArraysAPPLE");
- return;
- }
-
- if (!arrays) {
- return;
- }
-
- first = _mesa_HashFindFreeKeyBlock(ctx->Array.Objects, n);
-
- /* Allocate new, empty array objects and return identifiers */
- for (i = 0; i < n; i++) {
- struct gl_array_object *obj;
- GLuint name = first + i;
-
- obj = (*ctx->Driver.NewArrayObject)( ctx, name );
- if (!obj) {
- _mesa_error(ctx, GL_OUT_OF_MEMORY, "glGenVertexArraysAPPLE");
- return;
- }
- obj->VBOonly = vboOnly;
- save_array_object(ctx, obj);
- arrays[i] = first + i;
- }
-}
-
-
-/**
- * ARB version of glGenVertexArrays()
- * All arrays will be required to live in VBOs.
- */
-void GLAPIENTRY
-_mesa_GenVertexArrays(GLsizei n, GLuint *arrays)
-{
- GET_CURRENT_CONTEXT(ctx);
- gen_vertex_arrays(ctx, n, arrays, GL_TRUE);
-}
-
-
-/**
- * APPLE version of glGenVertexArraysAPPLE()
- * Arrays may live in VBOs or ordinary memory.
- */
-void GLAPIENTRY
-_mesa_GenVertexArraysAPPLE(GLsizei n, GLuint *arrays)
-{
- GET_CURRENT_CONTEXT(ctx);
- gen_vertex_arrays(ctx, n, arrays, GL_FALSE);
-}
-
-
-/**
- * Determine if ID is the name of an array object.
- *
- * \param id ID of the potential array object.
- * \return \c GL_TRUE if \c id is the name of a array object,
- * \c GL_FALSE otherwise.
- */
-GLboolean GLAPIENTRY
-_mesa_IsVertexArrayAPPLE( GLuint id )
-{
- struct gl_array_object * obj;
- GET_CURRENT_CONTEXT(ctx);
- ASSERT_OUTSIDE_BEGIN_END_WITH_RETVAL(ctx, GL_FALSE);
-
- if (id == 0)
- return GL_FALSE;
-
- obj = lookup_arrayobj(ctx, id);
-
- return (obj != NULL) ? GL_TRUE : GL_FALSE;
-}
+/* + * Mesa 3-D graphics library + * Version: 7.6 + * + * Copyright (C) 1999-2008 Brian Paul All Rights Reserved. + * (C) Copyright IBM Corporation 2006 + * Copyright (C) 2009 VMware, Inc. All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included + * in all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * BRIAN PAUL OR IBM BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, + * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF + * OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + + +/** + * \file arrayobj.c + * Functions for the GL_APPLE_vertex_array_object extension. + * + * \todo + * The code in this file borrows a lot from bufferobj.c. There's a certain + * amount of cruft left over from that origin that may be unnecessary. + * + * \author Ian Romanick <idr@us.ibm.com> + * \author Brian Paul + */ + + +#include "glheader.h" +#include "hash.h" +#include "imports.h" +#include "context.h" +#include "mfeatures.h" +#if FEATURE_ARB_vertex_buffer_object +#include "bufferobj.h" +#endif +#include "arrayobj.h" +#include "macros.h" +#include "mtypes.h" +#include "varray.h" +#include "main/dispatch.h" + + +/** + * Look up the array object for the given ID. + * + * \returns + * Either a pointer to the array object with the specified ID or \c NULL for + * a non-existent ID. The spec defines ID 0 as being technically + * non-existent. + */ + +static INLINE struct gl_array_object * +lookup_arrayobj(struct gl_context *ctx, GLuint id) +{ + if (id == 0) + return NULL; + else + return (struct gl_array_object *) + _mesa_HashLookup(ctx->Array.Objects, id); +} + + +/** + * For all the vertex arrays in the array object, unbind any pointers + * to any buffer objects (VBOs). + * This is done just prior to array object destruction. + */ +static void +unbind_array_object_vbos(struct gl_context *ctx, struct gl_array_object *obj) +{ + GLuint i; + + _mesa_reference_buffer_object(ctx, &obj->Vertex.BufferObj, NULL); + _mesa_reference_buffer_object(ctx, &obj->Weight.BufferObj, NULL); + _mesa_reference_buffer_object(ctx, &obj->Normal.BufferObj, NULL); + _mesa_reference_buffer_object(ctx, &obj->Color.BufferObj, NULL); + _mesa_reference_buffer_object(ctx, &obj->SecondaryColor.BufferObj, NULL); + _mesa_reference_buffer_object(ctx, &obj->FogCoord.BufferObj, NULL); + _mesa_reference_buffer_object(ctx, &obj->Index.BufferObj, NULL); + _mesa_reference_buffer_object(ctx, &obj->EdgeFlag.BufferObj, NULL); + + for (i = 0; i < Elements(obj->TexCoord); i++) + _mesa_reference_buffer_object(ctx, &obj->TexCoord[i].BufferObj, NULL); + + for (i = 0; i < Elements(obj->VertexAttrib); i++) + _mesa_reference_buffer_object(ctx, &obj->VertexAttrib[i].BufferObj,NULL); + +#if FEATURE_point_size_array + _mesa_reference_buffer_object(ctx, &obj->PointSize.BufferObj, NULL); +#endif +} + + +/** + * Allocate and initialize a new vertex array object. + * + * This function is intended to be called via + * \c dd_function_table::NewArrayObject. + */ +struct gl_array_object * +_mesa_new_array_object( struct gl_context *ctx, GLuint name ) +{ + struct gl_array_object *obj = CALLOC_STRUCT(gl_array_object); + if (obj) + _mesa_initialize_array_object(ctx, obj, name); + return obj; +} + + +/** + * Delete an array object. + * + * This function is intended to be called via + * \c dd_function_table::DeleteArrayObject. + */ +void +_mesa_delete_array_object( struct gl_context *ctx, struct gl_array_object *obj ) +{ + (void) ctx; + unbind_array_object_vbos(ctx, obj); + _glthread_DESTROY_MUTEX(obj->Mutex); + free(obj); +} + + +/** + * Set ptr to arrayObj w/ reference counting. + */ +void +_mesa_reference_array_object(struct gl_context *ctx, + struct gl_array_object **ptr, + struct gl_array_object *arrayObj) +{ + if (*ptr == arrayObj) + return; + + if (*ptr) { + /* Unreference the old array object */ + GLboolean deleteFlag = GL_FALSE; + struct gl_array_object *oldObj = *ptr; + + _glthread_LOCK_MUTEX(oldObj->Mutex); + ASSERT(oldObj->RefCount > 0); + oldObj->RefCount--; +#if 0 + printf("ArrayObj %p %d DECR to %d\n", + (void *) oldObj, oldObj->Name, oldObj->RefCount); +#endif + deleteFlag = (oldObj->RefCount == 0); + _glthread_UNLOCK_MUTEX(oldObj->Mutex); + + if (deleteFlag) { + ASSERT(ctx->Driver.DeleteArrayObject); + ctx->Driver.DeleteArrayObject(ctx, oldObj); + } + + *ptr = NULL; + } + ASSERT(!*ptr); + + if (arrayObj) { + /* reference new array object */ + _glthread_LOCK_MUTEX(arrayObj->Mutex); + if (arrayObj->RefCount == 0) { + /* this array's being deleted (look just above) */ + /* Not sure this can every really happen. Warn if it does. */ + _mesa_problem(NULL, "referencing deleted array object"); + *ptr = NULL; + } + else { + arrayObj->RefCount++; +#if 0 + printf("ArrayObj %p %d INCR to %d\n", + (void *) arrayObj, arrayObj->Name, arrayObj->RefCount); +#endif + *ptr = arrayObj; + } + _glthread_UNLOCK_MUTEX(arrayObj->Mutex); + } +} + + + +static void +init_array(struct gl_context *ctx, + struct gl_client_array *array, GLint size, GLint type) +{ + array->Size = size; + array->Type = type; + array->Format = GL_RGBA; /* only significant for GL_EXT_vertex_array_bgra */ + array->Stride = 0; + array->StrideB = 0; + array->Ptr = NULL; + array->Enabled = GL_FALSE; + array->Normalized = GL_FALSE; +#if FEATURE_ARB_vertex_buffer_object + /* Vertex array buffers */ + _mesa_reference_buffer_object(ctx, &array->BufferObj, + ctx->Shared->NullBufferObj); +#endif +} + + +/** + * Initialize a gl_array_object's arrays. + */ +void +_mesa_initialize_array_object( struct gl_context *ctx, + struct gl_array_object *obj, + GLuint name ) +{ + GLuint i; + + obj->Name = name; + + _glthread_INIT_MUTEX(obj->Mutex); + obj->RefCount = 1; + + /* Init the individual arrays */ + init_array(ctx, &obj->Vertex, 4, GL_FLOAT); + init_array(ctx, &obj->Weight, 1, GL_FLOAT); + init_array(ctx, &obj->Normal, 3, GL_FLOAT); + init_array(ctx, &obj->Color, 4, GL_FLOAT); + init_array(ctx, &obj->SecondaryColor, 3, GL_FLOAT); + init_array(ctx, &obj->FogCoord, 1, GL_FLOAT); + init_array(ctx, &obj->Index, 1, GL_FLOAT); + for (i = 0; i < Elements(obj->TexCoord); i++) { + init_array(ctx, &obj->TexCoord[i], 4, GL_FLOAT); + } + init_array(ctx, &obj->EdgeFlag, 1, GL_BOOL); + for (i = 0; i < Elements(obj->VertexAttrib); i++) { + init_array(ctx, &obj->VertexAttrib[i], 4, GL_FLOAT); + } + +#if FEATURE_point_size_array + init_array(ctx, &obj->PointSize, 1, GL_FLOAT); +#endif +} + + +/** + * Add the given array object to the array object pool. + */ +static void +save_array_object( struct gl_context *ctx, struct gl_array_object *obj ) +{ + if (obj->Name > 0) { + /* insert into hash table */ + _mesa_HashInsert(ctx->Array.Objects, obj->Name, obj); + } +} + + +/** + * Remove the given array object from the array object pool. + * Do not deallocate the array object though. + */ +static void +remove_array_object( struct gl_context *ctx, struct gl_array_object *obj ) +{ + if (obj->Name > 0) { + /* remove from hash table */ + _mesa_HashRemove(ctx->Array.Objects, obj->Name); + } +} + + + +/** + * Helper for update_arrays(). + * \return min(current min, array->_MaxElement). + */ +static GLuint +update_min(GLuint min, struct gl_client_array *array) +{ + _mesa_update_array_max_element(array); + if (array->Enabled) + return MIN2(min, array->_MaxElement); + else + return min; +} + + +/** + * Examine vertex arrays to update the gl_array_object::_MaxElement field. + */ +void +_mesa_update_array_object_max_element(struct gl_context *ctx, + struct gl_array_object *arrayObj) +{ + GLuint i, min = ~0; + + min = update_min(min, &arrayObj->Vertex); + min = update_min(min, &arrayObj->Weight); + min = update_min(min, &arrayObj->Normal); + min = update_min(min, &arrayObj->Color); + min = update_min(min, &arrayObj->SecondaryColor); + min = update_min(min, &arrayObj->FogCoord); + min = update_min(min, &arrayObj->Index); + min = update_min(min, &arrayObj->EdgeFlag); +#if FEATURE_point_size_array + min = update_min(min, &arrayObj->PointSize); +#endif + for (i = 0; i < ctx->Const.MaxTextureCoordUnits; i++) + min = update_min(min, &arrayObj->TexCoord[i]); + for (i = 0; i < Elements(arrayObj->VertexAttrib); i++) + min = update_min(min, &arrayObj->VertexAttrib[i]); + + /* _MaxElement is one past the last legal array element */ + arrayObj->_MaxElement = min; +} + + +/**********************************************************************/ +/* API Functions */ +/**********************************************************************/ + + +/** + * Helper for _mesa_BindVertexArray() and _mesa_BindVertexArrayAPPLE(). + * \param genRequired specifies behavour when id was not generated with + * glGenVertexArrays(). + */ +static void +bind_vertex_array(struct gl_context *ctx, GLuint id, GLboolean genRequired) +{ + struct gl_array_object * const oldObj = ctx->Array.ArrayObj; + struct gl_array_object *newObj = NULL; + ASSERT_OUTSIDE_BEGIN_END(ctx); + + ASSERT(oldObj != NULL); + + if ( oldObj->Name == id ) + return; /* rebinding the same array object- no change */ + + /* + * Get pointer to new array object (newObj) + */ + if (id == 0) { + /* The spec says there is no array object named 0, but we use + * one internally because it simplifies things. + */ + newObj = ctx->Array.DefaultArrayObj; + } + else { + /* non-default array object */ + newObj = lookup_arrayobj(ctx, id); + if (!newObj) { + if (genRequired) { + _mesa_error(ctx, GL_INVALID_OPERATION, "glBindVertexArray(id)"); + return; + } + + /* For APPLE version, generate a new array object now */ + newObj = (*ctx->Driver.NewArrayObject)(ctx, id); + if (!newObj) { + _mesa_error(ctx, GL_OUT_OF_MEMORY, "glBindVertexArrayAPPLE"); + return; + } + save_array_object(ctx, newObj); + } + } + + ctx->NewState |= _NEW_ARRAY; + ctx->Array.NewState |= _NEW_ARRAY_ALL; + _mesa_reference_array_object(ctx, &ctx->Array.ArrayObj, newObj); + + /* Pass BindVertexArray call to device driver */ + if (ctx->Driver.BindArrayObject && newObj) + ctx->Driver.BindArrayObject(ctx, newObj); +} + + +/** + * ARB version of glBindVertexArray() + * This function behaves differently from glBindVertexArrayAPPLE() in + * that this function requires all ids to have been previously generated + * by glGenVertexArrays[APPLE](). + */ +void GLAPIENTRY +_mesa_BindVertexArray( GLuint id ) +{ + GET_CURRENT_CONTEXT(ctx); + bind_vertex_array(ctx, id, GL_TRUE); +} + + +/** + * Bind a new array. + * + * \todo + * The binding could be done more efficiently by comparing the non-NULL + * pointers in the old and new objects. The only arrays that are "dirty" are + * the ones that are non-NULL in either object. + */ +void GLAPIENTRY +_mesa_BindVertexArrayAPPLE( GLuint id ) +{ + GET_CURRENT_CONTEXT(ctx); + bind_vertex_array(ctx, id, GL_FALSE); +} + + +/** + * Delete a set of array objects. + * + * \param n Number of array objects to delete. + * \param ids Array of \c n array object IDs. + */ +void GLAPIENTRY +_mesa_DeleteVertexArraysAPPLE(GLsizei n, const GLuint *ids) +{ + GET_CURRENT_CONTEXT(ctx); + GLsizei i; + ASSERT_OUTSIDE_BEGIN_END(ctx); + + if (n < 0) { + _mesa_error(ctx, GL_INVALID_VALUE, "glDeleteVertexArrayAPPLE(n)"); + return; + } + + for (i = 0; i < n; i++) { + struct gl_array_object *obj = lookup_arrayobj(ctx, ids[i]); + + if ( obj != NULL ) { + ASSERT( obj->Name == ids[i] ); + + /* If the array object is currently bound, the spec says "the binding + * for that object reverts to zero and the default vertex array + * becomes current." + */ + if ( obj == ctx->Array.ArrayObj ) { + CALL_BindVertexArrayAPPLE( ctx->Exec, (0) ); + } + + /* The ID is immediately freed for re-use */ + remove_array_object(ctx, obj); + + /* Unreference the array object. + * If refcount hits zero, the object will be deleted. + */ + _mesa_reference_array_object(ctx, &obj, NULL); + } + } +} + + +/** + * Generate a set of unique array object IDs and store them in \c arrays. + * Helper for _mesa_GenVertexArrays[APPLE]() functions below. + * \param n Number of IDs to generate. + * \param arrays Array of \c n locations to store the IDs. + * \param vboOnly Will arrays have to reside in VBOs? + */ +static void +gen_vertex_arrays(struct gl_context *ctx, GLsizei n, GLuint *arrays, + GLboolean vboOnly) +{ + GLuint first; + GLint i; + ASSERT_OUTSIDE_BEGIN_END(ctx); + + if (n < 0) { + _mesa_error(ctx, GL_INVALID_VALUE, "glGenVertexArraysAPPLE"); + return; + } + + if (!arrays) { + return; + } + + first = _mesa_HashFindFreeKeyBlock(ctx->Array.Objects, n); + + /* Allocate new, empty array objects and return identifiers */ + for (i = 0; i < n; i++) { + struct gl_array_object *obj; + GLuint name = first + i; + + obj = (*ctx->Driver.NewArrayObject)( ctx, name ); + if (!obj) { + _mesa_error(ctx, GL_OUT_OF_MEMORY, "glGenVertexArraysAPPLE"); + return; + } + obj->VBOonly = vboOnly; + save_array_object(ctx, obj); + arrays[i] = first + i; + } +} + + +/** + * ARB version of glGenVertexArrays() + * All arrays will be required to live in VBOs. + */ +void GLAPIENTRY +_mesa_GenVertexArrays(GLsizei n, GLuint *arrays) +{ + GET_CURRENT_CONTEXT(ctx); + gen_vertex_arrays(ctx, n, arrays, GL_TRUE); +} + + +/** + * APPLE version of glGenVertexArraysAPPLE() + * Arrays may live in VBOs or ordinary memory. + */ +void GLAPIENTRY +_mesa_GenVertexArraysAPPLE(GLsizei n, GLuint *arrays) +{ + GET_CURRENT_CONTEXT(ctx); + gen_vertex_arrays(ctx, n, arrays, GL_FALSE); +} + + +/** + * Determine if ID is the name of an array object. + * + * \param id ID of the potential array object. + * \return \c GL_TRUE if \c id is the name of a array object, + * \c GL_FALSE otherwise. + */ +GLboolean GLAPIENTRY +_mesa_IsVertexArrayAPPLE( GLuint id ) +{ + struct gl_array_object * obj; + GET_CURRENT_CONTEXT(ctx); + ASSERT_OUTSIDE_BEGIN_END_WITH_RETVAL(ctx, GL_FALSE); + + if (id == 0) + return GL_FALSE; + + obj = lookup_arrayobj(ctx, id); + + return (obj != NULL) ? GL_TRUE : GL_FALSE; +} diff --git a/mesalib/src/mesa/main/extensions.c b/mesalib/src/mesa/main/extensions.c index 285e08d75..e5711f21a 100644 --- a/mesalib/src/mesa/main/extensions.c +++ b/mesalib/src/mesa/main/extensions.c @@ -908,7 +908,7 @@ _mesa_make_extension_string(struct gl_context *ctx) return NULL; } - extension_indices = malloc(count * sizeof extension_indices); + extension_indices = malloc(count * sizeof(extension_index)); if (extension_indices == NULL) { free(exts); free(extra_extensions); diff --git a/mesalib/src/mesa/main/state.c b/mesalib/src/mesa/main/state.c index 1d1ae4737..4696dbb52 100644 --- a/mesalib/src/mesa/main/state.c +++ b/mesalib/src/mesa/main/state.c @@ -48,6 +48,7 @@ #include "texenvprogram.h" #include "texobj.h" #include "texstate.h" +#include "varray.h" static void @@ -61,43 +62,13 @@ update_separate_specular(struct gl_context *ctx) /** - * Compute the index of the last array element that can be safely accessed - * in a vertex array. We can really only do this when the array lives in - * a VBO. - * The array->_MaxElement field will be updated. - * Later in glDrawArrays/Elements/etc we can do some bounds checking. - */ -static void -compute_max_element(struct gl_client_array *array) -{ - assert(array->Enabled); - if (array->BufferObj->Name) { - GLsizeiptrARB offset = (GLsizeiptrARB) array->Ptr; - GLsizeiptrARB obj_size = (GLsizeiptrARB) array->BufferObj->Size; - - if (offset < obj_size) { - array->_MaxElement = (obj_size - offset + - array->StrideB - - array->_ElementSize) / array->StrideB; - } else { - array->_MaxElement = 0; - } - } - else { - /* user-space array, no idea how big it is */ - array->_MaxElement = 2 * 1000 * 1000 * 1000; /* just a big number */ - } -} - - -/** * Helper for update_arrays(). * \return min(current min, array->_MaxElement). */ static GLuint update_min(GLuint min, struct gl_client_array *array) { - compute_max_element(array); + _mesa_update_array_max_element(array); return MIN2(min, array->_MaxElement); } diff --git a/mesalib/src/mesa/main/varray.c b/mesalib/src/mesa/main/varray.c index cfed4b506..d20e2c719 100644 --- a/mesalib/src/mesa/main/varray.c +++ b/mesalib/src/mesa/main/varray.c @@ -484,7 +484,7 @@ _mesa_DisableVertexAttribArrayARB(GLuint index) if (index >= ctx->Const.VertexProgram.MaxAttribs) { _mesa_error(ctx, GL_INVALID_VALUE, - "glEnableVertexAttribArrayARB(index)"); + "glDisableVertexAttribArrayARB(index)"); return; } diff --git a/mesalib/src/mesa/main/varray.h b/mesalib/src/mesa/main/varray.h index 1e3ab10c9..1c423200f 100644 --- a/mesalib/src/mesa/main/varray.h +++ b/mesalib/src/mesa/main/varray.h @@ -1,248 +1,279 @@ -/*
- * Mesa 3-D graphics library
- * Version: 7.6
- *
- * Copyright (C) 1999-2008 Brian Paul All Rights Reserved.
- * Copyright (C) 2009 VMware, Inc. All Rights Reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included
- * in all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
- * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * BRIAN PAUL BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
- * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- */
-
-
-#ifndef VARRAY_H
-#define VARRAY_H
-
-
-#include "glheader.h"
-#include "mfeatures.h"
-
-struct gl_client_array;
-struct gl_context;
-
-#if _HAVE_FULL_GL
-
-extern void GLAPIENTRY
-_mesa_VertexPointer(GLint size, GLenum type, GLsizei stride,
- const GLvoid *ptr);
-
-extern void GLAPIENTRY
-_mesa_UnlockArraysEXT( void );
-
-extern void GLAPIENTRY
-_mesa_LockArraysEXT(GLint first, GLsizei count);
-
-
-extern void GLAPIENTRY
-_mesa_NormalPointer(GLenum type, GLsizei stride, const GLvoid *ptr);
-
-
-extern void GLAPIENTRY
-_mesa_ColorPointer(GLint size, GLenum type, GLsizei stride, const GLvoid *ptr);
-
-
-extern void GLAPIENTRY
-_mesa_IndexPointer(GLenum type, GLsizei stride, const GLvoid *ptr);
-
-
-extern void GLAPIENTRY
-_mesa_TexCoordPointer(GLint size, GLenum type, GLsizei stride,
- const GLvoid *ptr);
-
-
-extern void GLAPIENTRY
-_mesa_EdgeFlagPointer(GLsizei stride, const GLvoid *ptr);
-
-
-extern void GLAPIENTRY
-_mesa_VertexPointerEXT(GLint size, GLenum type, GLsizei stride,
- GLsizei count, const GLvoid *ptr);
-
-
-extern void GLAPIENTRY
-_mesa_NormalPointerEXT(GLenum type, GLsizei stride, GLsizei count,
- const GLvoid *ptr);
-
-
-extern void GLAPIENTRY
-_mesa_ColorPointerEXT(GLint size, GLenum type, GLsizei stride, GLsizei count,
- const GLvoid *ptr);
-
-
-extern void GLAPIENTRY
-_mesa_IndexPointerEXT(GLenum type, GLsizei stride, GLsizei count,
- const GLvoid *ptr);
-
-
-extern void GLAPIENTRY
-_mesa_TexCoordPointerEXT(GLint size, GLenum type, GLsizei stride,
- GLsizei count, const GLvoid *ptr);
-
-
-extern void GLAPIENTRY
-_mesa_EdgeFlagPointerEXT(GLsizei stride, GLsizei count, const GLboolean *ptr);
-
-
-extern void GLAPIENTRY
-_mesa_FogCoordPointerEXT(GLenum type, GLsizei stride, const GLvoid *ptr);
-
-
-extern void GLAPIENTRY
-_mesa_SecondaryColorPointerEXT(GLint size, GLenum type,
- GLsizei stride, const GLvoid *ptr);
-
-
-extern void GLAPIENTRY
-_mesa_PointSizePointer(GLenum type, GLsizei stride, const GLvoid *ptr);
-
-
-extern void GLAPIENTRY
-_mesa_VertexAttribPointerNV(GLuint index, GLint size, GLenum type,
- GLsizei stride, const GLvoid *pointer);
-
-
-extern void GLAPIENTRY
-_mesa_VertexAttribPointerARB(GLuint index, GLint size, GLenum type,
- GLboolean normalized, GLsizei stride,
- const GLvoid *pointer);
-
-void GLAPIENTRY
-_mesa_VertexAttribIPointer(GLuint index, GLint size, GLenum type,
- GLsizei stride, const GLvoid *ptr);
-
-
-extern void GLAPIENTRY
-_mesa_EnableVertexAttribArrayARB(GLuint index);
-
-
-extern void GLAPIENTRY
-_mesa_DisableVertexAttribArrayARB(GLuint index);
-
-
-extern void GLAPIENTRY
-_mesa_GetVertexAttribdvARB(GLuint index, GLenum pname, GLdouble *params);
-
-
-extern void GLAPIENTRY
-_mesa_GetVertexAttribfvARB(GLuint index, GLenum pname, GLfloat *params);
-
-
-extern void GLAPIENTRY
-_mesa_GetVertexAttribivARB(GLuint index, GLenum pname, GLint *params);
-
-
-extern void GLAPIENTRY
-_mesa_GetVertexAttribIiv(GLuint index, GLenum pname, GLint *params);
-
-
-extern void GLAPIENTRY
-_mesa_GetVertexAttribIuiv(GLuint index, GLenum pname, GLuint *params);
-
-
-extern void GLAPIENTRY
-_mesa_GetVertexAttribPointervARB(GLuint index, GLenum pname, GLvoid **pointer);
-
-
-extern void GLAPIENTRY
-_mesa_InterleavedArrays(GLenum format, GLsizei stride, const GLvoid *pointer);
-
-
-extern void GLAPIENTRY
-_mesa_MultiDrawArraysEXT( GLenum mode, const GLint *first,
- const GLsizei *count, GLsizei primcount );
-
-extern void GLAPIENTRY
-_mesa_MultiDrawElementsEXT( GLenum mode, const GLsizei *count, GLenum type,
- const GLvoid **indices, GLsizei primcount );
-
-extern void GLAPIENTRY
-_mesa_MultiDrawElementsBaseVertex( GLenum mode,
- const GLsizei *count, GLenum type,
- const GLvoid **indices, GLsizei primcount,
- const GLint *basevertex);
-
-extern void GLAPIENTRY
-_mesa_MultiModeDrawArraysIBM( const GLenum * mode, const GLint * first,
- const GLsizei * count,
- GLsizei primcount, GLint modestride );
-
-
-extern void GLAPIENTRY
-_mesa_MultiModeDrawElementsIBM( const GLenum * mode, const GLsizei * count,
- GLenum type, const GLvoid * const * indices,
- GLsizei primcount, GLint modestride );
-
-extern void GLAPIENTRY
-_mesa_LockArraysEXT(GLint first, GLsizei count);
-
-extern void GLAPIENTRY
-_mesa_UnlockArraysEXT( void );
-
-
-extern void GLAPIENTRY
-_mesa_DrawArrays(GLenum mode, GLint first, GLsizei count);
-
-extern void GLAPIENTRY
-_mesa_DrawElements(GLenum mode, GLsizei count, GLenum type,
- const GLvoid *indices);
-
-extern void GLAPIENTRY
-_mesa_DrawRangeElements(GLenum mode, GLuint start, GLuint end, GLsizei count,
- GLenum type, const GLvoid *indices);
-
-extern void GLAPIENTRY
-_mesa_DrawElementsBaseVertex(GLenum mode, GLsizei count, GLenum type,
- const GLvoid *indices, GLint basevertex);
-
-extern void GLAPIENTRY
-_mesa_DrawRangeElementsBaseVertex(GLenum mode, GLuint start, GLuint end,
- GLsizei count, GLenum type,
- const GLvoid *indices,
- GLint basevertex);
-
-extern void GLAPIENTRY
-_mesa_PrimitiveRestartIndex(GLuint index);
-
-
-extern void GLAPIENTRY
-_mesa_VertexAttribDivisor(GLuint index, GLuint divisor);
-
-
-extern void
-_mesa_copy_client_array(struct gl_context *ctx,
- struct gl_client_array *dst,
- struct gl_client_array *src);
-
-
-extern void
-_mesa_print_arrays(struct gl_context *ctx);
-
-extern void
-_mesa_init_varray( struct gl_context * ctx );
-
-extern void
-_mesa_free_varray_data(struct gl_context *ctx);
-
-#else
-
-/** No-op */
-#define _mesa_init_varray( c ) ((void)0)
-#define _mesa_free_varray_data( c ) ((void)0)
-
-#endif
-
-#endif
+/* + * Mesa 3-D graphics library + * Version: 7.6 + * + * Copyright (C) 1999-2008 Brian Paul All Rights Reserved. + * Copyright (C) 2009 VMware, Inc. All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included + * in all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * BRIAN PAUL BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN + * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + */ + + +#ifndef VARRAY_H +#define VARRAY_H + + +#include "glheader.h" +#include "mfeatures.h" + +struct gl_client_array; +struct gl_context; + + +/** + * Compute the index of the last array element that can be safely accessed in + * a vertex array. We can really only do this when the array lives in a VBO. + * The array->_MaxElement field will be updated. + * Later in glDrawArrays/Elements/etc we can do some bounds checking. + */ +static INLINE void +_mesa_update_array_max_element(struct gl_client_array *array) +{ + assert(array->Enabled); + + if (array->BufferObj->Name) { + GLsizeiptrARB offset = (GLsizeiptrARB) array->Ptr; + GLsizeiptrARB bufSize = (GLsizeiptrARB) array->BufferObj->Size; + + if (offset < bufSize) { + array->_MaxElement = (bufSize - offset + array->StrideB + - array->_ElementSize) / array->StrideB; + } + else { + array->_MaxElement = 0; + } + } + else { + /* user-space array, no idea how big it is */ + array->_MaxElement = 2 * 1000 * 1000 * 1000; /* just a big number */ + } +} + + +#if _HAVE_FULL_GL + +extern void GLAPIENTRY +_mesa_VertexPointer(GLint size, GLenum type, GLsizei stride, + const GLvoid *ptr); + +extern void GLAPIENTRY +_mesa_UnlockArraysEXT( void ); + +extern void GLAPIENTRY +_mesa_LockArraysEXT(GLint first, GLsizei count); + + +extern void GLAPIENTRY +_mesa_NormalPointer(GLenum type, GLsizei stride, const GLvoid *ptr); + + +extern void GLAPIENTRY +_mesa_ColorPointer(GLint size, GLenum type, GLsizei stride, const GLvoid *ptr); + + +extern void GLAPIENTRY +_mesa_IndexPointer(GLenum type, GLsizei stride, const GLvoid *ptr); + + +extern void GLAPIENTRY +_mesa_TexCoordPointer(GLint size, GLenum type, GLsizei stride, + const GLvoid *ptr); + + +extern void GLAPIENTRY +_mesa_EdgeFlagPointer(GLsizei stride, const GLvoid *ptr); + + +extern void GLAPIENTRY +_mesa_VertexPointerEXT(GLint size, GLenum type, GLsizei stride, + GLsizei count, const GLvoid *ptr); + + +extern void GLAPIENTRY +_mesa_NormalPointerEXT(GLenum type, GLsizei stride, GLsizei count, + const GLvoid *ptr); + + +extern void GLAPIENTRY +_mesa_ColorPointerEXT(GLint size, GLenum type, GLsizei stride, GLsizei count, + const GLvoid *ptr); + + +extern void GLAPIENTRY +_mesa_IndexPointerEXT(GLenum type, GLsizei stride, GLsizei count, + const GLvoid *ptr); + + +extern void GLAPIENTRY +_mesa_TexCoordPointerEXT(GLint size, GLenum type, GLsizei stride, + GLsizei count, const GLvoid *ptr); + + +extern void GLAPIENTRY +_mesa_EdgeFlagPointerEXT(GLsizei stride, GLsizei count, const GLboolean *ptr); + + +extern void GLAPIENTRY +_mesa_FogCoordPointerEXT(GLenum type, GLsizei stride, const GLvoid *ptr); + + +extern void GLAPIENTRY +_mesa_SecondaryColorPointerEXT(GLint size, GLenum type, + GLsizei stride, const GLvoid *ptr); + + +extern void GLAPIENTRY +_mesa_PointSizePointer(GLenum type, GLsizei stride, const GLvoid *ptr); + + +extern void GLAPIENTRY +_mesa_VertexAttribPointerNV(GLuint index, GLint size, GLenum type, + GLsizei stride, const GLvoid *pointer); + + +extern void GLAPIENTRY +_mesa_VertexAttribPointerARB(GLuint index, GLint size, GLenum type, + GLboolean normalized, GLsizei stride, + const GLvoid *pointer); + +void GLAPIENTRY +_mesa_VertexAttribIPointer(GLuint index, GLint size, GLenum type, + GLsizei stride, const GLvoid *ptr); + + +extern void GLAPIENTRY +_mesa_EnableVertexAttribArrayARB(GLuint index); + + +extern void GLAPIENTRY +_mesa_DisableVertexAttribArrayARB(GLuint index); + + +extern void GLAPIENTRY +_mesa_GetVertexAttribdvARB(GLuint index, GLenum pname, GLdouble *params); + + +extern void GLAPIENTRY +_mesa_GetVertexAttribfvARB(GLuint index, GLenum pname, GLfloat *params); + + +extern void GLAPIENTRY +_mesa_GetVertexAttribivARB(GLuint index, GLenum pname, GLint *params); + + +extern void GLAPIENTRY +_mesa_GetVertexAttribIiv(GLuint index, GLenum pname, GLint *params); + + +extern void GLAPIENTRY +_mesa_GetVertexAttribIuiv(GLuint index, GLenum pname, GLuint *params); + + +extern void GLAPIENTRY +_mesa_GetVertexAttribPointervARB(GLuint index, GLenum pname, GLvoid **pointer); + + +extern void GLAPIENTRY +_mesa_InterleavedArrays(GLenum format, GLsizei stride, const GLvoid *pointer); + + +extern void GLAPIENTRY +_mesa_MultiDrawArraysEXT( GLenum mode, const GLint *first, + const GLsizei *count, GLsizei primcount ); + +extern void GLAPIENTRY +_mesa_MultiDrawElementsEXT( GLenum mode, const GLsizei *count, GLenum type, + const GLvoid **indices, GLsizei primcount ); + +extern void GLAPIENTRY +_mesa_MultiDrawElementsBaseVertex( GLenum mode, + const GLsizei *count, GLenum type, + const GLvoid **indices, GLsizei primcount, + const GLint *basevertex); + +extern void GLAPIENTRY +_mesa_MultiModeDrawArraysIBM( const GLenum * mode, const GLint * first, + const GLsizei * count, + GLsizei primcount, GLint modestride ); + + +extern void GLAPIENTRY +_mesa_MultiModeDrawElementsIBM( const GLenum * mode, const GLsizei * count, + GLenum type, const GLvoid * const * indices, + GLsizei primcount, GLint modestride ); + +extern void GLAPIENTRY +_mesa_LockArraysEXT(GLint first, GLsizei count); + +extern void GLAPIENTRY +_mesa_UnlockArraysEXT( void ); + + +extern void GLAPIENTRY +_mesa_DrawArrays(GLenum mode, GLint first, GLsizei count); + +extern void GLAPIENTRY +_mesa_DrawElements(GLenum mode, GLsizei count, GLenum type, + const GLvoid *indices); + +extern void GLAPIENTRY +_mesa_DrawRangeElements(GLenum mode, GLuint start, GLuint end, GLsizei count, + GLenum type, const GLvoid *indices); + +extern void GLAPIENTRY +_mesa_DrawElementsBaseVertex(GLenum mode, GLsizei count, GLenum type, + const GLvoid *indices, GLint basevertex); + +extern void GLAPIENTRY +_mesa_DrawRangeElementsBaseVertex(GLenum mode, GLuint start, GLuint end, + GLsizei count, GLenum type, + const GLvoid *indices, + GLint basevertex); + +extern void GLAPIENTRY +_mesa_PrimitiveRestartIndex(GLuint index); + + +extern void GLAPIENTRY +_mesa_VertexAttribDivisor(GLuint index, GLuint divisor); + + +extern void +_mesa_copy_client_array(struct gl_context *ctx, + struct gl_client_array *dst, + struct gl_client_array *src); + + +extern void +_mesa_print_arrays(struct gl_context *ctx); + +extern void +_mesa_init_varray( struct gl_context * ctx ); + +extern void +_mesa_free_varray_data(struct gl_context *ctx); + +#else + +/** No-op */ +#define _mesa_init_varray( c ) ((void)0) +#define _mesa_free_varray_data( c ) ((void)0) + +#endif + +#endif diff --git a/pixman/Makefile.am b/pixman/Makefile.am index f479a6632..658a375c3 100644 --- a/pixman/Makefile.am +++ b/pixman/Makefile.am @@ -12,10 +12,10 @@ snapshot: GPGKEY=6FF7C1A8 USERNAME=$$USER -RELEASE_OR_SNAPSHOT = $$(if test "x$(CAIRO_VERSION_MINOR)" = "x$$(echo "$(CAIRO_VERSION_MINOR)/2*2" | bc)" ; then echo release; else echo snapshot; fi) +RELEASE_OR_SNAPSHOT = $$(if test "x$(PIXMAN_VERSION_MINOR)" = "x$$(echo "$(PIXMAN_VERSION_MINOR)/2*2" | bc)" ; then echo release; else echo snapshot; fi) RELEASE_CAIRO_HOST = $(USERNAME)@cairographics.org -RELEASE_CAIRO_DIR = /srv/cairo.freedesktop.org/www/releases -RELEASE_CAIRO_URL = http://cairographics.org/releases +RELEASE_CAIRO_DIR = /srv/cairo.freedesktop.org/www/$(RELEASE_OR_SNAPSHOT)s +RELEASE_CAIRO_URL = http://cairographics.org/$(RELEASE_OR_SNAPSHOT)s RELEASE_XORG_URL = http://xorg.freedesktop.org/archive/individual/lib RELEASE_XORG_HOST = $(USERNAME)@xorg.freedesktop.org RELEASE_XORG_DIR = /srv/xorg.freedesktop.org/archive/individual/lib @@ -83,7 +83,6 @@ release-tag: git tag -u $(GPGKEY) -m "$(PACKAGE) $(VERSION) release" $(PACKAGE)-$(VERSION) release-upload: release-check $(tar_gz) $(tar_bz2) $(sha1_tgz) $(sha1_tbz2) $(md5_tgz) $(gpg_file) - mkdir -p releases scp $(tar_gz) $(sha1_tgz) $(gpg_file) $(RELEASE_CAIRO_HOST):$(RELEASE_CAIRO_DIR) scp $(tar_gz) $(tar_bz2) $(RELEASE_XORG_HOST):$(RELEASE_XORG_DIR) ssh $(RELEASE_CAIRO_HOST) "rm -f $(RELEASE_CAIRO_DIR)/LATEST-$(PACKAGE)-[0-9]* && ln -s $(tar_gz) $(RELEASE_CAIRO_DIR)/LATEST-$(PACKAGE)-$(VERSION)" diff --git a/pixman/pixman/pixman-arm-neon-asm.S b/pixman/pixman/pixman-arm-neon-asm.S index 6cca0f2cc..1d3e64e1b 100644 --- a/pixman/pixman/pixman-arm-neon-asm.S +++ b/pixman/pixman/pixman-arm-neon-asm.S @@ -1,2718 +1,2717 @@ -/*
- * Copyright © 2009 Nokia Corporation
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
- * DEALINGS IN THE SOFTWARE.
- *
- * Author: Siarhei Siamashka (siarhei.siamashka@nokia.com)
- */
-
-/*
- * This file contains implementations of NEON optimized pixel processing
- * functions. There is no full and detailed tutorial, but some functions
- * (those which are exposing some new or interesting features) are
- * extensively commented and can be used as examples.
- *
- * You may want to have a look at the comments for following functions:
- * - pixman_composite_over_8888_0565_asm_neon
- * - pixman_composite_over_n_8_0565_asm_neon
- */
-
-/* Prevent the stack from becoming executable for no reason... */
-#if defined(__linux__) && defined(__ELF__)
-.section .note.GNU-stack,"",%progbits
-#endif
-
- .text
- .fpu neon
- .arch armv7a
- .object_arch armv4
- .eabi_attribute 10, 0 /* suppress Tag_FP_arch */
- .eabi_attribute 12, 0 /* suppress Tag_Advanced_SIMD_arch */
- .arm
- .altmacro
-
-#include "pixman-arm-neon-asm.h"
-
-/* Global configuration options and preferences */
-
-/*
- * The code can optionally make use of unaligned memory accesses to improve
- * performance of handling leading/trailing pixels for each scanline.
- * Configuration variable RESPECT_STRICT_ALIGNMENT can be set to 0 for
- * example in linux if unaligned memory accesses are not configured to
- * generate.exceptions.
- */
-.set RESPECT_STRICT_ALIGNMENT, 1
-
-/*
- * Set default prefetch type. There is a choice between the following options:
- *
- * PREFETCH_TYPE_NONE (may be useful for the ARM cores where PLD is set to work
- * as NOP to workaround some HW bugs or for whatever other reason)
- *
- * PREFETCH_TYPE_SIMPLE (may be useful for simple single-issue ARM cores where
- * advanced prefetch intruduces heavy overhead)
- *
- * PREFETCH_TYPE_ADVANCED (useful for superscalar cores such as ARM Cortex-A8
- * which can run ARM and NEON instructions simultaneously so that extra ARM
- * instructions do not add (many) extra cycles, but improve prefetch efficiency)
- *
- * Note: some types of function can't support advanced prefetch and fallback
- * to simple one (those which handle 24bpp pixels)
- */
-.set PREFETCH_TYPE_DEFAULT, PREFETCH_TYPE_ADVANCED
-
-/* Prefetch distance in pixels for simple prefetch */
-.set PREFETCH_DISTANCE_SIMPLE, 64
-
-/*
- * Implementation of pixman_composite_over_8888_0565_asm_neon
- *
- * This function takes a8r8g8b8 source buffer, r5g6b5 destination buffer and
- * performs OVER compositing operation. Function fast_composite_over_8888_0565
- * from pixman-fast-path.c does the same in C and can be used as a reference.
- *
- * First we need to have some NEON assembly code which can do the actual
- * operation on the pixels and provide it to the template macro.
- *
- * Template macro quite conveniently takes care of emitting all the necessary
- * code for memory reading and writing (including quite tricky cases of
- * handling unaligned leading/trailing pixels), so we only need to deal with
- * the data in NEON registers.
- *
- * NEON registers allocation in general is recommented to be the following:
- * d0, d1, d2, d3 - contain loaded source pixel data
- * d4, d5, d6, d7 - contain loaded destination pixels (if they are needed)
- * d24, d25, d26, d27 - contain loading mask pixel data (if mask is used)
- * d28, d29, d30, d31 - place for storing the result (destination pixels)
- *
- * As can be seen above, four 64-bit NEON registers are used for keeping
- * intermediate pixel data and up to 8 pixels can be processed in one step
- * for 32bpp formats (16 pixels for 16bpp, 32 pixels for 8bpp).
- *
- * This particular function uses the following registers allocation:
- * d0, d1, d2, d3 - contain loaded source pixel data
- * d4, d5 - contain loaded destination pixels (they are needed)
- * d28, d29 - place for storing the result (destination pixels)
- */
-
-/*
- * Step one. We need to have some code to do some arithmetics on pixel data.
- * This is implemented as a pair of macros: '*_head' and '*_tail'. When used
- * back-to-back, they take pixel data from {d0, d1, d2, d3} and {d4, d5},
- * perform all the needed calculations and write the result to {d28, d29}.
- * The rationale for having two macros and not just one will be explained
- * later. In practice, any single monolitic function which does the work can
- * be split into two parts in any arbitrary way without affecting correctness.
- *
- * There is one special trick here too. Common template macro can optionally
- * make our life a bit easier by doing R, G, B, A color components
- * deinterleaving for 32bpp pixel formats (and this feature is used in
- * 'pixman_composite_over_8888_0565_asm_neon' function). So it means that
- * instead of having 8 packed pixels in {d0, d1, d2, d3} registers, we
- * actually use d0 register for blue channel (a vector of eight 8-bit
- * values), d1 register for green, d2 for red and d3 for alpha. This
- * simple conversion can be also done with a few NEON instructions:
- *
- * Packed to planar conversion:
- * vuzp.8 d0, d1
- * vuzp.8 d2, d3
- * vuzp.8 d1, d3
- * vuzp.8 d0, d2
- *
- * Planar to packed conversion:
- * vzip.8 d0, d2
- * vzip.8 d1, d3
- * vzip.8 d2, d3
- * vzip.8 d0, d1
- *
- * But pixel can be loaded directly in planar format using VLD4.8 NEON
- * instruction. It is 1 cycle slower than VLD1.32, so this is not always
- * desirable, that's why deinterleaving is optional.
- *
- * But anyway, here is the code:
- */
-.macro pixman_composite_over_8888_0565_process_pixblock_head
- /* convert 8 r5g6b5 pixel data from {d4, d5} to planar 8-bit format
- and put data into d6 - red, d7 - green, d30 - blue */
- vshrn.u16 d6, q2, #8
- vshrn.u16 d7, q2, #3
- vsli.u16 q2, q2, #5
- vsri.u8 d6, d6, #5
- vmvn.8 d3, d3 /* invert source alpha */
- vsri.u8 d7, d7, #6
- vshrn.u16 d30, q2, #2
- /* now do alpha blending, storing results in 8-bit planar format
- into d16 - red, d19 - green, d18 - blue */
- vmull.u8 q10, d3, d6
- vmull.u8 q11, d3, d7
- vmull.u8 q12, d3, d30
- vrshr.u16 q13, q10, #8
- vrshr.u16 q3, q11, #8
- vrshr.u16 q15, q12, #8
- vraddhn.u16 d20, q10, q13
- vraddhn.u16 d23, q11, q3
- vraddhn.u16 d22, q12, q15
-.endm
-
-.macro pixman_composite_over_8888_0565_process_pixblock_tail
- /* ... continue alpha blending */
- vqadd.u8 d16, d2, d20
- vqadd.u8 q9, q0, q11
- /* convert the result to r5g6b5 and store it into {d28, d29} */
- vshll.u8 q14, d16, #8
- vshll.u8 q8, d19, #8
- vshll.u8 q9, d18, #8
- vsri.u16 q14, q8, #5
- vsri.u16 q14, q9, #11
-.endm
-
-/*
- * OK, now we got almost everything that we need. Using the above two
- * macros, the work can be done right. But now we want to optimize
- * it a bit. ARM Cortex-A8 is an in-order core, and benefits really
- * a lot from good code scheduling and software pipelining.
- *
- * Let's construct some code, which will run in the core main loop.
- * Some pseudo-code of the main loop will look like this:
- * head
- * while (...) {
- * tail
- * head
- * }
- * tail
- *
- * It may look a bit weird, but this setup allows to hide instruction
- * latencies better and also utilize dual-issue capability more
- * efficiently (make pairs of load-store and ALU instructions).
- *
- * So what we need now is a '*_tail_head' macro, which will be used
- * in the core main loop. A trivial straightforward implementation
- * of this macro would look like this:
- *
- * pixman_composite_over_8888_0565_process_pixblock_tail
- * vst1.16 {d28, d29}, [DST_W, :128]!
- * vld1.16 {d4, d5}, [DST_R, :128]!
- * vld4.32 {d0, d1, d2, d3}, [SRC]!
- * pixman_composite_over_8888_0565_process_pixblock_head
- * cache_preload 8, 8
- *
- * Now it also got some VLD/VST instructions. We simply can't move from
- * processing one block of pixels to the other one with just arithmetics.
- * The previously processed data needs to be written to memory and new
- * data needs to be fetched. Fortunately, this main loop does not deal
- * with partial leading/trailing pixels and can load/store a full block
- * of pixels in a bulk. Additionally, destination buffer is already
- * 16 bytes aligned here (which is good for performance).
- *
- * New things here are DST_R, DST_W, SRC and MASK identifiers. These
- * are the aliases for ARM registers which are used as pointers for
- * accessing data. We maintain separate pointers for reading and writing
- * destination buffer (DST_R and DST_W).
- *
- * Another new thing is 'cache_preload' macro. It is used for prefetching
- * data into CPU L2 cache and improve performance when dealing with large
- * images which are far larger than cache size. It uses one argument
- * (actually two, but they need to be the same here) - number of pixels
- * in a block. Looking into 'pixman-arm-neon-asm.h' can provide some
- * details about this macro. Moreover, if good performance is needed
- * the code from this macro needs to be copied into '*_tail_head' macro
- * and mixed with the rest of code for optimal instructions scheduling.
- * We are actually doing it below.
- *
- * Now after all the explanations, here is the optimized code.
- * Different instruction streams (originaling from '*_head', '*_tail'
- * and 'cache_preload' macro) use different indentation levels for
- * better readability. Actually taking the code from one of these
- * indentation levels and ignoring a few VLD/VST instructions would
- * result in exactly the code from '*_head', '*_tail' or 'cache_preload'
- * macro!
- */
-
-#if 1
-
-.macro pixman_composite_over_8888_0565_process_pixblock_tail_head
- vqadd.u8 d16, d2, d20
- vld1.16 {d4, d5}, [DST_R, :128]!
- vqadd.u8 q9, q0, q11
- vshrn.u16 d6, q2, #8
- fetch_src_pixblock
- vshrn.u16 d7, q2, #3
- vsli.u16 q2, q2, #5
- vshll.u8 q14, d16, #8
- PF add PF_X, PF_X, #8
- vshll.u8 q8, d19, #8
- PF tst PF_CTL, #0xF
- vsri.u8 d6, d6, #5
- PF addne PF_X, PF_X, #8
- vmvn.8 d3, d3
- PF subne PF_CTL, PF_CTL, #1
- vsri.u8 d7, d7, #6
- vshrn.u16 d30, q2, #2
- vmull.u8 q10, d3, d6
- PF pld, [PF_SRC, PF_X, lsl #src_bpp_shift]
- vmull.u8 q11, d3, d7
- vmull.u8 q12, d3, d30
- PF pld, [PF_DST, PF_X, lsl #dst_bpp_shift]
- vsri.u16 q14, q8, #5
- PF cmp PF_X, ORIG_W
- vshll.u8 q9, d18, #8
- vrshr.u16 q13, q10, #8
- PF subge PF_X, PF_X, ORIG_W
- vrshr.u16 q3, q11, #8
- vrshr.u16 q15, q12, #8
- PF subges PF_CTL, PF_CTL, #0x10
- vsri.u16 q14, q9, #11
- PF ldrgeb DUMMY, [PF_SRC, SRC_STRIDE, lsl #src_bpp_shift]!
- vraddhn.u16 d20, q10, q13
- vraddhn.u16 d23, q11, q3
- PF ldrgeb DUMMY, [PF_DST, DST_STRIDE, lsl #dst_bpp_shift]!
- vraddhn.u16 d22, q12, q15
- vst1.16 {d28, d29}, [DST_W, :128]!
-.endm
-
-#else
-
-/* If we did not care much about the performance, we would just use this... */
-.macro pixman_composite_over_8888_0565_process_pixblock_tail_head
- pixman_composite_over_8888_0565_process_pixblock_tail
- vst1.16 {d28, d29}, [DST_W, :128]!
- vld1.16 {d4, d5}, [DST_R, :128]!
- fetch_src_pixblock
- pixman_composite_over_8888_0565_process_pixblock_head
- cache_preload 8, 8
-.endm
-
-#endif
-
-/*
- * And now the final part. We are using 'generate_composite_function' macro
- * to put all the stuff together. We are specifying the name of the function
- * which we want to get, number of bits per pixel for the source, mask and
- * destination (0 if unused, like mask in this case). Next come some bit
- * flags:
- * FLAG_DST_READWRITE - tells that the destination buffer is both read
- * and written, for write-only buffer we would use
- * FLAG_DST_WRITEONLY flag instead
- * FLAG_DEINTERLEAVE_32BPP - tells that we prefer to work with planar data
- * and separate color channels for 32bpp format.
- * The next things are:
- * - the number of pixels processed per iteration (8 in this case, because
- * that's the maximum what can fit into four 64-bit NEON registers).
- * - prefetch distance, measured in pixel blocks. In this case it is 5 times
- * by 8 pixels. That would be 40 pixels, or up to 160 bytes. Optimal
- * prefetch distance can be selected by running some benchmarks.
- *
- * After that we specify some macros, these are 'default_init',
- * 'default_cleanup' here which are empty (but it is possible to have custom
- * init/cleanup macros to be able to save/restore some extra NEON registers
- * like d8-d15 or do anything else) followed by
- * 'pixman_composite_over_8888_0565_process_pixblock_head',
- * 'pixman_composite_over_8888_0565_process_pixblock_tail' and
- * 'pixman_composite_over_8888_0565_process_pixblock_tail_head'
- * which we got implemented above.
- *
- * The last part is the NEON registers allocation scheme.
- */
-generate_composite_function \
- pixman_composite_over_8888_0565_asm_neon, 32, 0, 16, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- default_init, \
- default_cleanup, \
- pixman_composite_over_8888_0565_process_pixblock_head, \
- pixman_composite_over_8888_0565_process_pixblock_tail, \
- pixman_composite_over_8888_0565_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 4, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 24 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_over_n_0565_process_pixblock_head
- /* convert 8 r5g6b5 pixel data from {d4, d5} to planar 8-bit format
- and put data into d6 - red, d7 - green, d30 - blue */
- vshrn.u16 d6, q2, #8
- vshrn.u16 d7, q2, #3
- vsli.u16 q2, q2, #5
- vsri.u8 d6, d6, #5
- vsri.u8 d7, d7, #6
- vshrn.u16 d30, q2, #2
- /* now do alpha blending, storing results in 8-bit planar format
- into d16 - red, d19 - green, d18 - blue */
- vmull.u8 q10, d3, d6
- vmull.u8 q11, d3, d7
- vmull.u8 q12, d3, d30
- vrshr.u16 q13, q10, #8
- vrshr.u16 q3, q11, #8
- vrshr.u16 q15, q12, #8
- vraddhn.u16 d20, q10, q13
- vraddhn.u16 d23, q11, q3
- vraddhn.u16 d22, q12, q15
-.endm
-
-.macro pixman_composite_over_n_0565_process_pixblock_tail
- /* ... continue alpha blending */
- vqadd.u8 d16, d2, d20
- vqadd.u8 q9, q0, q11
- /* convert the result to r5g6b5 and store it into {d28, d29} */
- vshll.u8 q14, d16, #8
- vshll.u8 q8, d19, #8
- vshll.u8 q9, d18, #8
- vsri.u16 q14, q8, #5
- vsri.u16 q14, q9, #11
-.endm
-
-/* TODO: expand macros and do better instructions scheduling */
-.macro pixman_composite_over_n_0565_process_pixblock_tail_head
- pixman_composite_over_n_0565_process_pixblock_tail
- vld1.16 {d4, d5}, [DST_R, :128]!
- vst1.16 {d28, d29}, [DST_W, :128]!
- pixman_composite_over_n_0565_process_pixblock_head
- cache_preload 8, 8
-.endm
-
-.macro pixman_composite_over_n_0565_init
- add DUMMY, sp, #ARGS_STACK_OFFSET
- vld1.32 {d3[0]}, [DUMMY]
- vdup.8 d0, d3[0]
- vdup.8 d1, d3[1]
- vdup.8 d2, d3[2]
- vdup.8 d3, d3[3]
- vmvn.8 d3, d3 /* invert source alpha */
-.endm
-
-generate_composite_function \
- pixman_composite_over_n_0565_asm_neon, 0, 0, 16, \
- FLAG_DST_READWRITE, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- pixman_composite_over_n_0565_init, \
- default_cleanup, \
- pixman_composite_over_n_0565_process_pixblock_head, \
- pixman_composite_over_n_0565_process_pixblock_tail, \
- pixman_composite_over_n_0565_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 4, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 24 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_src_8888_0565_process_pixblock_head
- vshll.u8 q8, d1, #8
- vshll.u8 q14, d2, #8
- vshll.u8 q9, d0, #8
-.endm
-
-.macro pixman_composite_src_8888_0565_process_pixblock_tail
- vsri.u16 q14, q8, #5
- vsri.u16 q14, q9, #11
-.endm
-
-.macro pixman_composite_src_8888_0565_process_pixblock_tail_head
- vsri.u16 q14, q8, #5
- PF add PF_X, PF_X, #8
- PF tst PF_CTL, #0xF
- fetch_src_pixblock
- PF addne PF_X, PF_X, #8
- PF subne PF_CTL, PF_CTL, #1
- vsri.u16 q14, q9, #11
- PF cmp PF_X, ORIG_W
- PF pld, [PF_SRC, PF_X, lsl #src_bpp_shift]
- vshll.u8 q8, d1, #8
- vst1.16 {d28, d29}, [DST_W, :128]!
- PF subge PF_X, PF_X, ORIG_W
- PF subges PF_CTL, PF_CTL, #0x10
- vshll.u8 q14, d2, #8
- PF ldrgeb DUMMY, [PF_SRC, SRC_STRIDE, lsl #src_bpp_shift]!
- vshll.u8 q9, d0, #8
-.endm
-
-generate_composite_function \
- pixman_composite_src_8888_0565_asm_neon, 32, 0, 16, \
- FLAG_DST_WRITEONLY | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 10, /* prefetch distance */ \
- default_init, \
- default_cleanup, \
- pixman_composite_src_8888_0565_process_pixblock_head, \
- pixman_composite_src_8888_0565_process_pixblock_tail, \
- pixman_composite_src_8888_0565_process_pixblock_tail_head
-
-/******************************************************************************/
-
-.macro pixman_composite_src_0565_8888_process_pixblock_head
- vshrn.u16 d30, q0, #8
- vshrn.u16 d29, q0, #3
- vsli.u16 q0, q0, #5
- vmov.u8 d31, #255
- vsri.u8 d30, d30, #5
- vsri.u8 d29, d29, #6
- vshrn.u16 d28, q0, #2
-.endm
-
-.macro pixman_composite_src_0565_8888_process_pixblock_tail
-.endm
-
-/* TODO: expand macros and do better instructions scheduling */
-.macro pixman_composite_src_0565_8888_process_pixblock_tail_head
- pixman_composite_src_0565_8888_process_pixblock_tail
- vst4.8 {d28, d29, d30, d31}, [DST_W, :128]!
- fetch_src_pixblock
- pixman_composite_src_0565_8888_process_pixblock_head
- cache_preload 8, 8
-.endm
-
-generate_composite_function \
- pixman_composite_src_0565_8888_asm_neon, 16, 0, 32, \
- FLAG_DST_WRITEONLY | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 10, /* prefetch distance */ \
- default_init, \
- default_cleanup, \
- pixman_composite_src_0565_8888_process_pixblock_head, \
- pixman_composite_src_0565_8888_process_pixblock_tail, \
- pixman_composite_src_0565_8888_process_pixblock_tail_head
-
-/******************************************************************************/
-
-.macro pixman_composite_add_8_8_process_pixblock_head
- vqadd.u8 q14, q0, q2
- vqadd.u8 q15, q1, q3
-.endm
-
-.macro pixman_composite_add_8_8_process_pixblock_tail
-.endm
-
-.macro pixman_composite_add_8_8_process_pixblock_tail_head
- fetch_src_pixblock
- PF add PF_X, PF_X, #32
- PF tst PF_CTL, #0xF
- vld1.8 {d4, d5, d6, d7}, [DST_R, :128]!
- PF addne PF_X, PF_X, #32
- PF subne PF_CTL, PF_CTL, #1
- vst1.8 {d28, d29, d30, d31}, [DST_W, :128]!
- PF cmp PF_X, ORIG_W
- PF pld, [PF_SRC, PF_X, lsl #src_bpp_shift]
- PF pld, [PF_DST, PF_X, lsl #dst_bpp_shift]
- PF subge PF_X, PF_X, ORIG_W
- PF subges PF_CTL, PF_CTL, #0x10
- vqadd.u8 q14, q0, q2
- PF ldrgeb DUMMY, [PF_SRC, SRC_STRIDE, lsl #src_bpp_shift]!
- PF ldrgeb DUMMY, [PF_DST, DST_STRIDE, lsl #dst_bpp_shift]!
- vqadd.u8 q15, q1, q3
-.endm
-
-generate_composite_function \
- pixman_composite_add_8_8_asm_neon, 8, 0, 8, \
- FLAG_DST_READWRITE, \
- 32, /* number of pixels, processed in a single block */ \
- 10, /* prefetch distance */ \
- default_init, \
- default_cleanup, \
- pixman_composite_add_8_8_process_pixblock_head, \
- pixman_composite_add_8_8_process_pixblock_tail, \
- pixman_composite_add_8_8_process_pixblock_tail_head
-
-/******************************************************************************/
-
-.macro pixman_composite_add_8888_8888_process_pixblock_tail_head
- fetch_src_pixblock
- PF add PF_X, PF_X, #8
- PF tst PF_CTL, #0xF
- vld1.32 {d4, d5, d6, d7}, [DST_R, :128]!
- PF addne PF_X, PF_X, #8
- PF subne PF_CTL, PF_CTL, #1
- vst1.32 {d28, d29, d30, d31}, [DST_W, :128]!
- PF cmp PF_X, ORIG_W
- PF pld, [PF_SRC, PF_X, lsl #src_bpp_shift]
- PF pld, [PF_DST, PF_X, lsl #dst_bpp_shift]
- PF subge PF_X, PF_X, ORIG_W
- PF subges PF_CTL, PF_CTL, #0x10
- vqadd.u8 q14, q0, q2
- PF ldrgeb DUMMY, [PF_SRC, SRC_STRIDE, lsl #src_bpp_shift]!
- PF ldrgeb DUMMY, [PF_DST, DST_STRIDE, lsl #dst_bpp_shift]!
- vqadd.u8 q15, q1, q3
-.endm
-
-generate_composite_function \
- pixman_composite_add_8888_8888_asm_neon, 32, 0, 32, \
- FLAG_DST_READWRITE, \
- 8, /* number of pixels, processed in a single block */ \
- 10, /* prefetch distance */ \
- default_init, \
- default_cleanup, \
- pixman_composite_add_8_8_process_pixblock_head, \
- pixman_composite_add_8_8_process_pixblock_tail, \
- pixman_composite_add_8888_8888_process_pixblock_tail_head
-
-generate_composite_function_single_scanline \
- pixman_composite_scanline_add_asm_neon, 32, 0, 32, \
- FLAG_DST_READWRITE, \
- 8, /* number of pixels, processed in a single block */ \
- default_init, \
- default_cleanup, \
- pixman_composite_add_8_8_process_pixblock_head, \
- pixman_composite_add_8_8_process_pixblock_tail, \
- pixman_composite_add_8888_8888_process_pixblock_tail_head
-
-/******************************************************************************/
-
-.macro pixman_composite_out_reverse_8888_8888_process_pixblock_head
- vmvn.8 d24, d3 /* get inverted alpha */
- /* do alpha blending */
- vmull.u8 q8, d24, d4
- vmull.u8 q9, d24, d5
- vmull.u8 q10, d24, d6
- vmull.u8 q11, d24, d7
-.endm
-
-.macro pixman_composite_out_reverse_8888_8888_process_pixblock_tail
- vrshr.u16 q14, q8, #8
- vrshr.u16 q15, q9, #8
- vrshr.u16 q12, q10, #8
- vrshr.u16 q13, q11, #8
- vraddhn.u16 d28, q14, q8
- vraddhn.u16 d29, q15, q9
- vraddhn.u16 d30, q12, q10
- vraddhn.u16 d31, q13, q11
-.endm
-
-.macro pixman_composite_out_reverse_8888_8888_process_pixblock_tail_head
- vld4.8 {d4, d5, d6, d7}, [DST_R, :128]!
- vrshr.u16 q14, q8, #8
- PF add PF_X, PF_X, #8
- PF tst PF_CTL, #0xF
- vrshr.u16 q15, q9, #8
- vrshr.u16 q12, q10, #8
- vrshr.u16 q13, q11, #8
- PF addne PF_X, PF_X, #8
- PF subne PF_CTL, PF_CTL, #1
- vraddhn.u16 d28, q14, q8
- vraddhn.u16 d29, q15, q9
- PF cmp PF_X, ORIG_W
- vraddhn.u16 d30, q12, q10
- vraddhn.u16 d31, q13, q11
- fetch_src_pixblock
- PF pld, [PF_SRC, PF_X, lsl #src_bpp_shift]
- vmvn.8 d22, d3
- PF pld, [PF_DST, PF_X, lsl #dst_bpp_shift]
- vst4.8 {d28, d29, d30, d31}, [DST_W, :128]!
- PF subge PF_X, PF_X, ORIG_W
- vmull.u8 q8, d22, d4
- PF subges PF_CTL, PF_CTL, #0x10
- vmull.u8 q9, d22, d5
- PF ldrgeb DUMMY, [PF_SRC, SRC_STRIDE, lsl #src_bpp_shift]!
- vmull.u8 q10, d22, d6
- PF ldrgeb DUMMY, [PF_DST, DST_STRIDE, lsl #dst_bpp_shift]!
- vmull.u8 q11, d22, d7
-.endm
-
-generate_composite_function_single_scanline \
- pixman_composite_scanline_out_reverse_asm_neon, 32, 0, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- default_init, \
- default_cleanup, \
- pixman_composite_out_reverse_8888_8888_process_pixblock_head, \
- pixman_composite_out_reverse_8888_8888_process_pixblock_tail, \
- pixman_composite_out_reverse_8888_8888_process_pixblock_tail_head
-
-/******************************************************************************/
-
-.macro pixman_composite_over_8888_8888_process_pixblock_head
- pixman_composite_out_reverse_8888_8888_process_pixblock_head
-.endm
-
-.macro pixman_composite_over_8888_8888_process_pixblock_tail
- pixman_composite_out_reverse_8888_8888_process_pixblock_tail
- vqadd.u8 q14, q0, q14
- vqadd.u8 q15, q1, q15
-.endm
-
-.macro pixman_composite_over_8888_8888_process_pixblock_tail_head
- vld4.8 {d4, d5, d6, d7}, [DST_R, :128]!
- vrshr.u16 q14, q8, #8
- PF add PF_X, PF_X, #8
- PF tst PF_CTL, #0xF
- vrshr.u16 q15, q9, #8
- vrshr.u16 q12, q10, #8
- vrshr.u16 q13, q11, #8
- PF addne PF_X, PF_X, #8
- PF subne PF_CTL, PF_CTL, #1
- vraddhn.u16 d28, q14, q8
- vraddhn.u16 d29, q15, q9
- PF cmp PF_X, ORIG_W
- vraddhn.u16 d30, q12, q10
- vraddhn.u16 d31, q13, q11
- vqadd.u8 q14, q0, q14
- vqadd.u8 q15, q1, q15
- fetch_src_pixblock
- PF pld, [PF_SRC, PF_X, lsl #src_bpp_shift]
- vmvn.8 d22, d3
- PF pld, [PF_DST, PF_X, lsl #dst_bpp_shift]
- vst4.8 {d28, d29, d30, d31}, [DST_W, :128]!
- PF subge PF_X, PF_X, ORIG_W
- vmull.u8 q8, d22, d4
- PF subges PF_CTL, PF_CTL, #0x10
- vmull.u8 q9, d22, d5
- PF ldrgeb DUMMY, [PF_SRC, SRC_STRIDE, lsl #src_bpp_shift]!
- vmull.u8 q10, d22, d6
- PF ldrgeb DUMMY, [PF_DST, DST_STRIDE, lsl #dst_bpp_shift]!
- vmull.u8 q11, d22, d7
-.endm
-
-generate_composite_function \
- pixman_composite_over_8888_8888_asm_neon, 32, 0, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- default_init, \
- default_cleanup, \
- pixman_composite_over_8888_8888_process_pixblock_head, \
- pixman_composite_over_8888_8888_process_pixblock_tail, \
- pixman_composite_over_8888_8888_process_pixblock_tail_head
-
-generate_composite_function_single_scanline \
- pixman_composite_scanline_over_asm_neon, 32, 0, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- default_init, \
- default_cleanup, \
- pixman_composite_over_8888_8888_process_pixblock_head, \
- pixman_composite_over_8888_8888_process_pixblock_tail, \
- pixman_composite_over_8888_8888_process_pixblock_tail_head
-
-/******************************************************************************/
-
-/* TODO: expand macros and do better instructions scheduling */
-.macro pixman_composite_over_n_8888_process_pixblock_tail_head
- pixman_composite_over_8888_8888_process_pixblock_tail
- vld4.8 {d4, d5, d6, d7}, [DST_R, :128]!
- vst4.8 {d28, d29, d30, d31}, [DST_W, :128]!
- pixman_composite_over_8888_8888_process_pixblock_head
- cache_preload 8, 8
-.endm
-
-.macro pixman_composite_over_n_8888_init
- add DUMMY, sp, #ARGS_STACK_OFFSET
- vld1.32 {d3[0]}, [DUMMY]
- vdup.8 d0, d3[0]
- vdup.8 d1, d3[1]
- vdup.8 d2, d3[2]
- vdup.8 d3, d3[3]
-.endm
-
-generate_composite_function \
- pixman_composite_over_n_8888_asm_neon, 0, 0, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- pixman_composite_over_n_8888_init, \
- default_cleanup, \
- pixman_composite_over_8888_8888_process_pixblock_head, \
- pixman_composite_over_8888_8888_process_pixblock_tail, \
- pixman_composite_over_n_8888_process_pixblock_tail_head
-
-/******************************************************************************/
-
-.macro pixman_composite_over_reverse_n_8888_process_pixblock_tail_head
- vrshr.u16 q14, q8, #8
- PF add PF_X, PF_X, #8
- PF tst PF_CTL, #0xF
- vrshr.u16 q15, q9, #8
- vrshr.u16 q12, q10, #8
- vrshr.u16 q13, q11, #8
- PF addne PF_X, PF_X, #8
- PF subne PF_CTL, PF_CTL, #1
- vraddhn.u16 d28, q14, q8
- vraddhn.u16 d29, q15, q9
- PF cmp PF_X, ORIG_W
- vraddhn.u16 d30, q12, q10
- vraddhn.u16 d31, q13, q11
- vqadd.u8 q14, q0, q14
- vqadd.u8 q15, q1, q15
- vld4.8 {d0, d1, d2, d3}, [DST_R, :128]!
- vmvn.8 d22, d3
- PF pld, [PF_DST, PF_X, lsl #dst_bpp_shift]
- vst4.8 {d28, d29, d30, d31}, [DST_W, :128]!
- PF subge PF_X, PF_X, ORIG_W
- vmull.u8 q8, d22, d4
- PF subges PF_CTL, PF_CTL, #0x10
- vmull.u8 q9, d22, d5
- vmull.u8 q10, d22, d6
- PF ldrgeb DUMMY, [PF_DST, DST_STRIDE, lsl #dst_bpp_shift]!
- vmull.u8 q11, d22, d7
-.endm
-
-.macro pixman_composite_over_reverse_n_8888_init
- add DUMMY, sp, #ARGS_STACK_OFFSET
- vld1.32 {d7[0]}, [DUMMY]
- vdup.8 d4, d7[0]
- vdup.8 d5, d7[1]
- vdup.8 d6, d7[2]
- vdup.8 d7, d7[3]
-.endm
-
-generate_composite_function \
- pixman_composite_over_reverse_n_8888_asm_neon, 0, 0, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- pixman_composite_over_reverse_n_8888_init, \
- default_cleanup, \
- pixman_composite_over_8888_8888_process_pixblock_head, \
- pixman_composite_over_8888_8888_process_pixblock_tail, \
- pixman_composite_over_reverse_n_8888_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 0, /* dst_r_basereg */ \
- 4, /* src_basereg */ \
- 24 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_over_8888_8_0565_process_pixblock_head
- vmull.u8 q0, d24, d8 /* IN for SRC pixels (part1) */
- vmull.u8 q1, d24, d9
- vmull.u8 q6, d24, d10
- vmull.u8 q7, d24, d11
- vshrn.u16 d6, q2, #8 /* convert DST_R data to 32-bpp (part1) */
- vshrn.u16 d7, q2, #3
- vsli.u16 q2, q2, #5
- vrshr.u16 q8, q0, #8 /* IN for SRC pixels (part2) */
- vrshr.u16 q9, q1, #8
- vrshr.u16 q10, q6, #8
- vrshr.u16 q11, q7, #8
- vraddhn.u16 d0, q0, q8
- vraddhn.u16 d1, q1, q9
- vraddhn.u16 d2, q6, q10
- vraddhn.u16 d3, q7, q11
- vsri.u8 d6, d6, #5 /* convert DST_R data to 32-bpp (part2) */
- vsri.u8 d7, d7, #6
- vmvn.8 d3, d3
- vshrn.u16 d30, q2, #2
- vmull.u8 q8, d3, d6 /* now do alpha blending */
- vmull.u8 q9, d3, d7
- vmull.u8 q10, d3, d30
-.endm
-
-.macro pixman_composite_over_8888_8_0565_process_pixblock_tail
- /* 3 cycle bubble (after vmull.u8) */
- vrshr.u16 q13, q8, #8
- vrshr.u16 q11, q9, #8
- vrshr.u16 q15, q10, #8
- vraddhn.u16 d16, q8, q13
- vraddhn.u16 d27, q9, q11
- vraddhn.u16 d26, q10, q15
- vqadd.u8 d16, d2, d16
- /* 1 cycle bubble */
- vqadd.u8 q9, q0, q13
- vshll.u8 q14, d16, #8 /* convert to 16bpp */
- vshll.u8 q8, d19, #8
- vshll.u8 q9, d18, #8
- vsri.u16 q14, q8, #5
- /* 1 cycle bubble */
- vsri.u16 q14, q9, #11
-.endm
-
-.macro pixman_composite_over_8888_8_0565_process_pixblock_tail_head
- vld1.16 {d4, d5}, [DST_R, :128]!
- vshrn.u16 d6, q2, #8
- fetch_mask_pixblock
- vshrn.u16 d7, q2, #3
- fetch_src_pixblock
- vmull.u8 q6, d24, d10
- vrshr.u16 q13, q8, #8
- vrshr.u16 q11, q9, #8
- vrshr.u16 q15, q10, #8
- vraddhn.u16 d16, q8, q13
- vraddhn.u16 d27, q9, q11
- vraddhn.u16 d26, q10, q15
- vqadd.u8 d16, d2, d16
- vmull.u8 q1, d24, d9
- vqadd.u8 q9, q0, q13
- vshll.u8 q14, d16, #8
- vmull.u8 q0, d24, d8
- vshll.u8 q8, d19, #8
- vshll.u8 q9, d18, #8
- vsri.u16 q14, q8, #5
- vmull.u8 q7, d24, d11
- vsri.u16 q14, q9, #11
-
- cache_preload 8, 8
-
- vsli.u16 q2, q2, #5
- vrshr.u16 q8, q0, #8
- vrshr.u16 q9, q1, #8
- vrshr.u16 q10, q6, #8
- vrshr.u16 q11, q7, #8
- vraddhn.u16 d0, q0, q8
- vraddhn.u16 d1, q1, q9
- vraddhn.u16 d2, q6, q10
- vraddhn.u16 d3, q7, q11
- vsri.u8 d6, d6, #5
- vsri.u8 d7, d7, #6
- vmvn.8 d3, d3
- vshrn.u16 d30, q2, #2
- vst1.16 {d28, d29}, [DST_W, :128]!
- vmull.u8 q8, d3, d6
- vmull.u8 q9, d3, d7
- vmull.u8 q10, d3, d30
-.endm
-
-generate_composite_function \
- pixman_composite_over_8888_8_0565_asm_neon, 32, 8, 16, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- default_init_need_all_regs, \
- default_cleanup_need_all_regs, \
- pixman_composite_over_8888_8_0565_process_pixblock_head, \
- pixman_composite_over_8888_8_0565_process_pixblock_tail, \
- pixman_composite_over_8888_8_0565_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 4, /* dst_r_basereg */ \
- 8, /* src_basereg */ \
- 24 /* mask_basereg */
-
-/******************************************************************************/
-
-/*
- * This function needs a special initialization of solid mask.
- * Solid source pixel data is fetched from stack at ARGS_STACK_OFFSET
- * offset, split into color components and replicated in d8-d11
- * registers. Additionally, this function needs all the NEON registers,
- * so it has to save d8-d15 registers which are callee saved according
- * to ABI. These registers are restored from 'cleanup' macro. All the
- * other NEON registers are caller saved, so can be clobbered freely
- * without introducing any problems.
- */
-.macro pixman_composite_over_n_8_0565_init
- add DUMMY, sp, #ARGS_STACK_OFFSET
- vpush {d8-d15}
- vld1.32 {d11[0]}, [DUMMY]
- vdup.8 d8, d11[0]
- vdup.8 d9, d11[1]
- vdup.8 d10, d11[2]
- vdup.8 d11, d11[3]
-.endm
-
-.macro pixman_composite_over_n_8_0565_cleanup
- vpop {d8-d15}
-.endm
-
-generate_composite_function \
- pixman_composite_over_n_8_0565_asm_neon, 0, 8, 16, \
- FLAG_DST_READWRITE, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- pixman_composite_over_n_8_0565_init, \
- pixman_composite_over_n_8_0565_cleanup, \
- pixman_composite_over_8888_8_0565_process_pixblock_head, \
- pixman_composite_over_8888_8_0565_process_pixblock_tail, \
- pixman_composite_over_8888_8_0565_process_pixblock_tail_head
-
-/******************************************************************************/
-
-.macro pixman_composite_over_8888_n_0565_init
- add DUMMY, sp, #(ARGS_STACK_OFFSET + 8)
- vpush {d8-d15}
- vld1.32 {d24[0]}, [DUMMY]
- vdup.8 d24, d24[3]
-.endm
-
-.macro pixman_composite_over_8888_n_0565_cleanup
- vpop {d8-d15}
-.endm
-
-generate_composite_function \
- pixman_composite_over_8888_n_0565_asm_neon, 32, 0, 16, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- pixman_composite_over_8888_n_0565_init, \
- pixman_composite_over_8888_n_0565_cleanup, \
- pixman_composite_over_8888_8_0565_process_pixblock_head, \
- pixman_composite_over_8888_8_0565_process_pixblock_tail, \
- pixman_composite_over_8888_8_0565_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 4, /* dst_r_basereg */ \
- 8, /* src_basereg */ \
- 24 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_src_0565_0565_process_pixblock_head
-.endm
-
-.macro pixman_composite_src_0565_0565_process_pixblock_tail
-.endm
-
-.macro pixman_composite_src_0565_0565_process_pixblock_tail_head
- vst1.16 {d0, d1, d2, d3}, [DST_W, :128]!
- fetch_src_pixblock
- cache_preload 16, 16
-.endm
-
-generate_composite_function \
- pixman_composite_src_0565_0565_asm_neon, 16, 0, 16, \
- FLAG_DST_WRITEONLY, \
- 16, /* number of pixels, processed in a single block */ \
- 10, /* prefetch distance */ \
- default_init, \
- default_cleanup, \
- pixman_composite_src_0565_0565_process_pixblock_head, \
- pixman_composite_src_0565_0565_process_pixblock_tail, \
- pixman_composite_src_0565_0565_process_pixblock_tail_head, \
- 0, /* dst_w_basereg */ \
- 0, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 0 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_src_n_8_process_pixblock_head
-.endm
-
-.macro pixman_composite_src_n_8_process_pixblock_tail
-.endm
-
-.macro pixman_composite_src_n_8_process_pixblock_tail_head
- vst1.8 {d0, d1, d2, d3}, [DST_W, :128]!
-.endm
-
-.macro pixman_composite_src_n_8_init
- add DUMMY, sp, #ARGS_STACK_OFFSET
- vld1.32 {d0[0]}, [DUMMY]
- vsli.u64 d0, d0, #8
- vsli.u64 d0, d0, #16
- vsli.u64 d0, d0, #32
- vorr d1, d0, d0
- vorr q1, q0, q0
-.endm
-
-.macro pixman_composite_src_n_8_cleanup
-.endm
-
-generate_composite_function \
- pixman_composite_src_n_8_asm_neon, 0, 0, 8, \
- FLAG_DST_WRITEONLY, \
- 32, /* number of pixels, processed in a single block */ \
- 0, /* prefetch distance */ \
- pixman_composite_src_n_8_init, \
- pixman_composite_src_n_8_cleanup, \
- pixman_composite_src_n_8_process_pixblock_head, \
- pixman_composite_src_n_8_process_pixblock_tail, \
- pixman_composite_src_n_8_process_pixblock_tail_head, \
- 0, /* dst_w_basereg */ \
- 0, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 0 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_src_n_0565_process_pixblock_head
-.endm
-
-.macro pixman_composite_src_n_0565_process_pixblock_tail
-.endm
-
-.macro pixman_composite_src_n_0565_process_pixblock_tail_head
- vst1.16 {d0, d1, d2, d3}, [DST_W, :128]!
-.endm
-
-.macro pixman_composite_src_n_0565_init
- add DUMMY, sp, #ARGS_STACK_OFFSET
- vld1.32 {d0[0]}, [DUMMY]
- vsli.u64 d0, d0, #16
- vsli.u64 d0, d0, #32
- vorr d1, d0, d0
- vorr q1, q0, q0
-.endm
-
-.macro pixman_composite_src_n_0565_cleanup
-.endm
-
-generate_composite_function \
- pixman_composite_src_n_0565_asm_neon, 0, 0, 16, \
- FLAG_DST_WRITEONLY, \
- 16, /* number of pixels, processed in a single block */ \
- 0, /* prefetch distance */ \
- pixman_composite_src_n_0565_init, \
- pixman_composite_src_n_0565_cleanup, \
- pixman_composite_src_n_0565_process_pixblock_head, \
- pixman_composite_src_n_0565_process_pixblock_tail, \
- pixman_composite_src_n_0565_process_pixblock_tail_head, \
- 0, /* dst_w_basereg */ \
- 0, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 0 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_src_n_8888_process_pixblock_head
-.endm
-
-.macro pixman_composite_src_n_8888_process_pixblock_tail
-.endm
-
-.macro pixman_composite_src_n_8888_process_pixblock_tail_head
- vst1.32 {d0, d1, d2, d3}, [DST_W, :128]!
-.endm
-
-.macro pixman_composite_src_n_8888_init
- add DUMMY, sp, #ARGS_STACK_OFFSET
- vld1.32 {d0[0]}, [DUMMY]
- vsli.u64 d0, d0, #32
- vorr d1, d0, d0
- vorr q1, q0, q0
-.endm
-
-.macro pixman_composite_src_n_8888_cleanup
-.endm
-
-generate_composite_function \
- pixman_composite_src_n_8888_asm_neon, 0, 0, 32, \
- FLAG_DST_WRITEONLY, \
- 8, /* number of pixels, processed in a single block */ \
- 0, /* prefetch distance */ \
- pixman_composite_src_n_8888_init, \
- pixman_composite_src_n_8888_cleanup, \
- pixman_composite_src_n_8888_process_pixblock_head, \
- pixman_composite_src_n_8888_process_pixblock_tail, \
- pixman_composite_src_n_8888_process_pixblock_tail_head, \
- 0, /* dst_w_basereg */ \
- 0, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 0 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_src_8888_8888_process_pixblock_head
-.endm
-
-.macro pixman_composite_src_8888_8888_process_pixblock_tail
-.endm
-
-.macro pixman_composite_src_8888_8888_process_pixblock_tail_head
- vst1.32 {d0, d1, d2, d3}, [DST_W, :128]!
- fetch_src_pixblock
- cache_preload 8, 8
-.endm
-
-generate_composite_function \
- pixman_composite_src_8888_8888_asm_neon, 32, 0, 32, \
- FLAG_DST_WRITEONLY, \
- 8, /* number of pixels, processed in a single block */ \
- 10, /* prefetch distance */ \
- default_init, \
- default_cleanup, \
- pixman_composite_src_8888_8888_process_pixblock_head, \
- pixman_composite_src_8888_8888_process_pixblock_tail, \
- pixman_composite_src_8888_8888_process_pixblock_tail_head, \
- 0, /* dst_w_basereg */ \
- 0, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 0 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_src_x888_8888_process_pixblock_head
- vorr q0, q0, q2
- vorr q1, q1, q2
-.endm
-
-.macro pixman_composite_src_x888_8888_process_pixblock_tail
-.endm
-
-.macro pixman_composite_src_x888_8888_process_pixblock_tail_head
- vst1.32 {d0, d1, d2, d3}, [DST_W, :128]!
- fetch_src_pixblock
- vorr q0, q0, q2
- vorr q1, q1, q2
- cache_preload 8, 8
-.endm
-
-.macro pixman_composite_src_x888_8888_init
- vmov.u8 q2, #0xFF
- vshl.u32 q2, q2, #24
-.endm
-
-generate_composite_function \
- pixman_composite_src_x888_8888_asm_neon, 32, 0, 32, \
- FLAG_DST_WRITEONLY, \
- 8, /* number of pixels, processed in a single block */ \
- 10, /* prefetch distance */ \
- pixman_composite_src_x888_8888_init, \
- default_cleanup, \
- pixman_composite_src_x888_8888_process_pixblock_head, \
- pixman_composite_src_x888_8888_process_pixblock_tail, \
- pixman_composite_src_x888_8888_process_pixblock_tail_head, \
- 0, /* dst_w_basereg */ \
- 0, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 0 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_over_n_8_8888_process_pixblock_head
- /* expecting deinterleaved source data in {d8, d9, d10, d11} */
- /* d8 - blue, d9 - green, d10 - red, d11 - alpha */
- /* and destination data in {d4, d5, d6, d7} */
- /* mask is in d24 (d25, d26, d27 are unused) */
-
- /* in */
- vmull.u8 q0, d24, d8
- vmull.u8 q1, d24, d9
- vmull.u8 q6, d24, d10
- vmull.u8 q7, d24, d11
- vrshr.u16 q10, q0, #8
- vrshr.u16 q11, q1, #8
- vrshr.u16 q12, q6, #8
- vrshr.u16 q13, q7, #8
- vraddhn.u16 d0, q0, q10
- vraddhn.u16 d1, q1, q11
- vraddhn.u16 d2, q6, q12
- vraddhn.u16 d3, q7, q13
- vmvn.8 d24, d3 /* get inverted alpha */
- /* source: d0 - blue, d1 - green, d2 - red, d3 - alpha */
- /* destination: d4 - blue, d5 - green, d6 - red, d7 - alpha */
- /* now do alpha blending */
- vmull.u8 q8, d24, d4
- vmull.u8 q9, d24, d5
- vmull.u8 q10, d24, d6
- vmull.u8 q11, d24, d7
-.endm
-
-.macro pixman_composite_over_n_8_8888_process_pixblock_tail
- vrshr.u16 q14, q8, #8
- vrshr.u16 q15, q9, #8
- vrshr.u16 q12, q10, #8
- vrshr.u16 q13, q11, #8
- vraddhn.u16 d28, q14, q8
- vraddhn.u16 d29, q15, q9
- vraddhn.u16 d30, q12, q10
- vraddhn.u16 d31, q13, q11
- vqadd.u8 q14, q0, q14
- vqadd.u8 q15, q1, q15
-.endm
-
-/* TODO: expand macros and do better instructions scheduling */
-.macro pixman_composite_over_n_8_8888_process_pixblock_tail_head
- pixman_composite_over_n_8_8888_process_pixblock_tail
- vst4.8 {d28, d29, d30, d31}, [DST_W, :128]!
- vld4.8 {d4, d5, d6, d7}, [DST_R, :128]!
- fetch_mask_pixblock
- cache_preload 8, 8
- pixman_composite_over_n_8_8888_process_pixblock_head
-.endm
-
-.macro pixman_composite_over_n_8_8888_init
- add DUMMY, sp, #ARGS_STACK_OFFSET
- vpush {d8-d15}
- vld1.32 {d11[0]}, [DUMMY]
- vdup.8 d8, d11[0]
- vdup.8 d9, d11[1]
- vdup.8 d10, d11[2]
- vdup.8 d11, d11[3]
-.endm
-
-.macro pixman_composite_over_n_8_8888_cleanup
- vpop {d8-d15}
-.endm
-
-generate_composite_function \
- pixman_composite_over_n_8_8888_asm_neon, 0, 8, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- pixman_composite_over_n_8_8888_init, \
- pixman_composite_over_n_8_8888_cleanup, \
- pixman_composite_over_n_8_8888_process_pixblock_head, \
- pixman_composite_over_n_8_8888_process_pixblock_tail, \
- pixman_composite_over_n_8_8888_process_pixblock_tail_head
-
-/******************************************************************************/
-
-.macro pixman_composite_over_n_8_8_process_pixblock_head
- vmull.u8 q0, d24, d8
- vmull.u8 q1, d25, d8
- vmull.u8 q6, d26, d8
- vmull.u8 q7, d27, d8
- vrshr.u16 q10, q0, #8
- vrshr.u16 q11, q1, #8
- vrshr.u16 q12, q6, #8
- vrshr.u16 q13, q7, #8
- vraddhn.u16 d0, q0, q10
- vraddhn.u16 d1, q1, q11
- vraddhn.u16 d2, q6, q12
- vraddhn.u16 d3, q7, q13
- vmvn.8 q12, q0
- vmvn.8 q13, q1
- vmull.u8 q8, d24, d4
- vmull.u8 q9, d25, d5
- vmull.u8 q10, d26, d6
- vmull.u8 q11, d27, d7
-.endm
-
-.macro pixman_composite_over_n_8_8_process_pixblock_tail
- vrshr.u16 q14, q8, #8
- vrshr.u16 q15, q9, #8
- vrshr.u16 q12, q10, #8
- vrshr.u16 q13, q11, #8
- vraddhn.u16 d28, q14, q8
- vraddhn.u16 d29, q15, q9
- vraddhn.u16 d30, q12, q10
- vraddhn.u16 d31, q13, q11
- vqadd.u8 q14, q0, q14
- vqadd.u8 q15, q1, q15
-.endm
-
-/* TODO: expand macros and do better instructions scheduling */
-.macro pixman_composite_over_n_8_8_process_pixblock_tail_head
- vld1.8 {d4, d5, d6, d7}, [DST_R, :128]!
- pixman_composite_over_n_8_8_process_pixblock_tail
- fetch_mask_pixblock
- cache_preload 32, 32
- vst1.8 {d28, d29, d30, d31}, [DST_W, :128]!
- pixman_composite_over_n_8_8_process_pixblock_head
-.endm
-
-.macro pixman_composite_over_n_8_8_init
- add DUMMY, sp, #ARGS_STACK_OFFSET
- vpush {d8-d15}
- vld1.32 {d8[0]}, [DUMMY]
- vdup.8 d8, d8[3]
-.endm
-
-.macro pixman_composite_over_n_8_8_cleanup
- vpop {d8-d15}
-.endm
-
-generate_composite_function \
- pixman_composite_over_n_8_8_asm_neon, 0, 8, 8, \
- FLAG_DST_READWRITE, \
- 32, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- pixman_composite_over_n_8_8_init, \
- pixman_composite_over_n_8_8_cleanup, \
- pixman_composite_over_n_8_8_process_pixblock_head, \
- pixman_composite_over_n_8_8_process_pixblock_tail, \
- pixman_composite_over_n_8_8_process_pixblock_tail_head
-
-/******************************************************************************/
-
-.macro pixman_composite_over_n_8888_8888_ca_process_pixblock_head
- /*
- * 'combine_mask_ca' replacement
- *
- * input: solid src (n) in {d8, d9, d10, d11}
- * dest in {d4, d5, d6, d7 }
- * mask in {d24, d25, d26, d27}
- * output: updated src in {d0, d1, d2, d3 }
- * updated mask in {d24, d25, d26, d3 }
- */
- vmull.u8 q0, d24, d8
- vmull.u8 q1, d25, d9
- vmull.u8 q6, d26, d10
- vmull.u8 q7, d27, d11
- vmull.u8 q9, d11, d25
- vmull.u8 q12, d11, d24
- vmull.u8 q13, d11, d26
- vrshr.u16 q8, q0, #8
- vrshr.u16 q10, q1, #8
- vrshr.u16 q11, q6, #8
- vraddhn.u16 d0, q0, q8
- vraddhn.u16 d1, q1, q10
- vraddhn.u16 d2, q6, q11
- vrshr.u16 q11, q12, #8
- vrshr.u16 q8, q9, #8
- vrshr.u16 q6, q13, #8
- vrshr.u16 q10, q7, #8
- vraddhn.u16 d24, q12, q11
- vraddhn.u16 d25, q9, q8
- vraddhn.u16 d26, q13, q6
- vraddhn.u16 d3, q7, q10
- /*
- * 'combine_over_ca' replacement
- *
- * output: updated dest in {d28, d29, d30, d31}
- */
- vmvn.8 d24, d24
- vmvn.8 d25, d25
- vmull.u8 q8, d24, d4
- vmull.u8 q9, d25, d5
- vmvn.8 d26, d26
- vmvn.8 d27, d3
- vmull.u8 q10, d26, d6
- vmull.u8 q11, d27, d7
-.endm
-
-.macro pixman_composite_over_n_8888_8888_ca_process_pixblock_tail
- /* ... continue 'combine_over_ca' replacement */
- vrshr.u16 q14, q8, #8
- vrshr.u16 q15, q9, #8
- vrshr.u16 q6, q10, #8
- vrshr.u16 q7, q11, #8
- vraddhn.u16 d28, q14, q8
- vraddhn.u16 d29, q15, q9
- vraddhn.u16 d30, q6, q10
- vraddhn.u16 d31, q7, q11
- vqadd.u8 q14, q0, q14
- vqadd.u8 q15, q1, q15
-.endm
-
-.macro pixman_composite_over_n_8888_8888_ca_process_pixblock_tail_head
- vrshr.u16 q14, q8, #8
- vrshr.u16 q15, q9, #8
- vld4.8 {d4, d5, d6, d7}, [DST_R, :128]!
- vrshr.u16 q6, q10, #8
- vrshr.u16 q7, q11, #8
- vraddhn.u16 d28, q14, q8
- vraddhn.u16 d29, q15, q9
- vraddhn.u16 d30, q6, q10
- vraddhn.u16 d31, q7, q11
- fetch_mask_pixblock
- vqadd.u8 q14, q0, q14
- vqadd.u8 q15, q1, q15
- cache_preload 8, 8
- pixman_composite_over_n_8888_8888_ca_process_pixblock_head
- vst4.8 {d28, d29, d30, d31}, [DST_W, :128]!
-.endm
-
-.macro pixman_composite_over_n_8888_8888_ca_init
- add DUMMY, sp, #ARGS_STACK_OFFSET
- vpush {d8-d15}
- vld1.32 {d11[0]}, [DUMMY]
- vdup.8 d8, d11[0]
- vdup.8 d9, d11[1]
- vdup.8 d10, d11[2]
- vdup.8 d11, d11[3]
-.endm
-
-.macro pixman_composite_over_n_8888_8888_ca_cleanup
- vpop {d8-d15}
-.endm
-
-generate_composite_function \
- pixman_composite_over_n_8888_8888_ca_asm_neon, 0, 32, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- pixman_composite_over_n_8888_8888_ca_init, \
- pixman_composite_over_n_8888_8888_ca_cleanup, \
- pixman_composite_over_n_8888_8888_ca_process_pixblock_head, \
- pixman_composite_over_n_8888_8888_ca_process_pixblock_tail, \
- pixman_composite_over_n_8888_8888_ca_process_pixblock_tail_head
-
-/******************************************************************************/
-
-.macro pixman_composite_in_n_8_process_pixblock_head
- /* expecting source data in {d0, d1, d2, d3} */
- /* and destination data in {d4, d5, d6, d7} */
- vmull.u8 q8, d4, d3
- vmull.u8 q9, d5, d3
- vmull.u8 q10, d6, d3
- vmull.u8 q11, d7, d3
-.endm
-
-.macro pixman_composite_in_n_8_process_pixblock_tail
- vrshr.u16 q14, q8, #8
- vrshr.u16 q15, q9, #8
- vrshr.u16 q12, q10, #8
- vrshr.u16 q13, q11, #8
- vraddhn.u16 d28, q8, q14
- vraddhn.u16 d29, q9, q15
- vraddhn.u16 d30, q10, q12
- vraddhn.u16 d31, q11, q13
-.endm
-
-.macro pixman_composite_in_n_8_process_pixblock_tail_head
- pixman_composite_in_n_8_process_pixblock_tail
- vld1.8 {d4, d5, d6, d7}, [DST_R, :128]!
- cache_preload 32, 32
- pixman_composite_in_n_8_process_pixblock_head
- vst1.8 {d28, d29, d30, d31}, [DST_W, :128]!
-.endm
-
-.macro pixman_composite_in_n_8_init
- add DUMMY, sp, #ARGS_STACK_OFFSET
- vld1.32 {d3[0]}, [DUMMY]
- vdup.8 d3, d3[3]
-.endm
-
-.macro pixman_composite_in_n_8_cleanup
-.endm
-
-generate_composite_function \
- pixman_composite_in_n_8_asm_neon, 0, 0, 8, \
- FLAG_DST_READWRITE, \
- 32, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- pixman_composite_in_n_8_init, \
- pixman_composite_in_n_8_cleanup, \
- pixman_composite_in_n_8_process_pixblock_head, \
- pixman_composite_in_n_8_process_pixblock_tail, \
- pixman_composite_in_n_8_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 4, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 24 /* mask_basereg */
-
-.macro pixman_composite_add_n_8_8_process_pixblock_head
- /* expecting source data in {d8, d9, d10, d11} */
- /* d8 - blue, d9 - green, d10 - red, d11 - alpha */
- /* and destination data in {d4, d5, d6, d7} */
- /* mask is in d24, d25, d26, d27 */
- vmull.u8 q0, d24, d11
- vmull.u8 q1, d25, d11
- vmull.u8 q6, d26, d11
- vmull.u8 q7, d27, d11
- vrshr.u16 q10, q0, #8
- vrshr.u16 q11, q1, #8
- vrshr.u16 q12, q6, #8
- vrshr.u16 q13, q7, #8
- vraddhn.u16 d0, q0, q10
- vraddhn.u16 d1, q1, q11
- vraddhn.u16 d2, q6, q12
- vraddhn.u16 d3, q7, q13
- vqadd.u8 q14, q0, q2
- vqadd.u8 q15, q1, q3
-.endm
-
-.macro pixman_composite_add_n_8_8_process_pixblock_tail
-.endm
-
-/* TODO: expand macros and do better instructions scheduling */
-.macro pixman_composite_add_n_8_8_process_pixblock_tail_head
- pixman_composite_add_n_8_8_process_pixblock_tail
- vst1.8 {d28, d29, d30, d31}, [DST_W, :128]!
- vld1.8 {d4, d5, d6, d7}, [DST_R, :128]!
- fetch_mask_pixblock
- cache_preload 32, 32
- pixman_composite_add_n_8_8_process_pixblock_head
-.endm
-
-.macro pixman_composite_add_n_8_8_init
- add DUMMY, sp, #ARGS_STACK_OFFSET
- vpush {d8-d15}
- vld1.32 {d11[0]}, [DUMMY]
- vdup.8 d11, d11[3]
-.endm
-
-.macro pixman_composite_add_n_8_8_cleanup
- vpop {d8-d15}
-.endm
-
-generate_composite_function \
- pixman_composite_add_n_8_8_asm_neon, 0, 8, 8, \
- FLAG_DST_READWRITE, \
- 32, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- pixman_composite_add_n_8_8_init, \
- pixman_composite_add_n_8_8_cleanup, \
- pixman_composite_add_n_8_8_process_pixblock_head, \
- pixman_composite_add_n_8_8_process_pixblock_tail, \
- pixman_composite_add_n_8_8_process_pixblock_tail_head
-
-/******************************************************************************/
-
-.macro pixman_composite_add_8_8_8_process_pixblock_head
- /* expecting source data in {d0, d1, d2, d3} */
- /* destination data in {d4, d5, d6, d7} */
- /* mask in {d24, d25, d26, d27} */
- vmull.u8 q8, d24, d0
- vmull.u8 q9, d25, d1
- vmull.u8 q10, d26, d2
- vmull.u8 q11, d27, d3
- vrshr.u16 q0, q8, #8
- vrshr.u16 q1, q9, #8
- vrshr.u16 q12, q10, #8
- vrshr.u16 q13, q11, #8
- vraddhn.u16 d0, q0, q8
- vraddhn.u16 d1, q1, q9
- vraddhn.u16 d2, q12, q10
- vraddhn.u16 d3, q13, q11
- vqadd.u8 q14, q0, q2
- vqadd.u8 q15, q1, q3
-.endm
-
-.macro pixman_composite_add_8_8_8_process_pixblock_tail
-.endm
-
-/* TODO: expand macros and do better instructions scheduling */
-.macro pixman_composite_add_8_8_8_process_pixblock_tail_head
- pixman_composite_add_8_8_8_process_pixblock_tail
- vst1.8 {d28, d29, d30, d31}, [DST_W, :128]!
- vld1.8 {d4, d5, d6, d7}, [DST_R, :128]!
- fetch_mask_pixblock
- fetch_src_pixblock
- cache_preload 32, 32
- pixman_composite_add_8_8_8_process_pixblock_head
-.endm
-
-.macro pixman_composite_add_8_8_8_init
-.endm
-
-.macro pixman_composite_add_8_8_8_cleanup
-.endm
-
-generate_composite_function \
- pixman_composite_add_8_8_8_asm_neon, 8, 8, 8, \
- FLAG_DST_READWRITE, \
- 32, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- pixman_composite_add_8_8_8_init, \
- pixman_composite_add_8_8_8_cleanup, \
- pixman_composite_add_8_8_8_process_pixblock_head, \
- pixman_composite_add_8_8_8_process_pixblock_tail, \
- pixman_composite_add_8_8_8_process_pixblock_tail_head
-
-/******************************************************************************/
-
-.macro pixman_composite_add_8888_8888_8888_process_pixblock_head
- /* expecting source data in {d0, d1, d2, d3} */
- /* destination data in {d4, d5, d6, d7} */
- /* mask in {d24, d25, d26, d27} */
- vmull.u8 q8, d27, d0
- vmull.u8 q9, d27, d1
- vmull.u8 q10, d27, d2
- vmull.u8 q11, d27, d3
- /* 1 cycle bubble */
- vrsra.u16 q8, q8, #8
- vrsra.u16 q9, q9, #8
- vrsra.u16 q10, q10, #8
- vrsra.u16 q11, q11, #8
-.endm
-
-.macro pixman_composite_add_8888_8888_8888_process_pixblock_tail
- /* 2 cycle bubble */
- vrshrn.u16 d28, q8, #8
- vrshrn.u16 d29, q9, #8
- vrshrn.u16 d30, q10, #8
- vrshrn.u16 d31, q11, #8
- vqadd.u8 q14, q2, q14
- /* 1 cycle bubble */
- vqadd.u8 q15, q3, q15
-.endm
-
-.macro pixman_composite_add_8888_8888_8888_process_pixblock_tail_head
- fetch_src_pixblock
- vrshrn.u16 d28, q8, #8
- fetch_mask_pixblock
- vrshrn.u16 d29, q9, #8
- vmull.u8 q8, d27, d0
- vrshrn.u16 d30, q10, #8
- vmull.u8 q9, d27, d1
- vrshrn.u16 d31, q11, #8
- vmull.u8 q10, d27, d2
- vqadd.u8 q14, q2, q14
- vmull.u8 q11, d27, d3
- vqadd.u8 q15, q3, q15
- vrsra.u16 q8, q8, #8
- vld4.8 {d4, d5, d6, d7}, [DST_R, :128]!
- vrsra.u16 q9, q9, #8
- vst4.8 {d28, d29, d30, d31}, [DST_W, :128]!
- vrsra.u16 q10, q10, #8
-
- cache_preload 8, 8
-
- vrsra.u16 q11, q11, #8
-.endm
-
-generate_composite_function \
- pixman_composite_add_8888_8888_8888_asm_neon, 32, 32, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 10, /* prefetch distance */ \
- default_init, \
- default_cleanup, \
- pixman_composite_add_8888_8888_8888_process_pixblock_head, \
- pixman_composite_add_8888_8888_8888_process_pixblock_tail, \
- pixman_composite_add_8888_8888_8888_process_pixblock_tail_head
-
-generate_composite_function_single_scanline \
- pixman_composite_scanline_add_mask_asm_neon, 32, 32, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- default_init, \
- default_cleanup, \
- pixman_composite_add_8888_8888_8888_process_pixblock_head, \
- pixman_composite_add_8888_8888_8888_process_pixblock_tail, \
- pixman_composite_add_8888_8888_8888_process_pixblock_tail_head
-
-/******************************************************************************/
-
-generate_composite_function \
- pixman_composite_add_8888_8_8888_asm_neon, 32, 8, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- default_init, \
- default_cleanup, \
- pixman_composite_add_8888_8888_8888_process_pixblock_head, \
- pixman_composite_add_8888_8888_8888_process_pixblock_tail, \
- pixman_composite_add_8888_8888_8888_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 4, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 27 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_add_n_8_8888_init
- add DUMMY, sp, #ARGS_STACK_OFFSET
- vld1.32 {d3[0]}, [DUMMY]
- vdup.8 d0, d3[0]
- vdup.8 d1, d3[1]
- vdup.8 d2, d3[2]
- vdup.8 d3, d3[3]
-.endm
-
-.macro pixman_composite_add_n_8_8888_cleanup
-.endm
-
-generate_composite_function \
- pixman_composite_add_n_8_8888_asm_neon, 0, 8, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- pixman_composite_add_n_8_8888_init, \
- pixman_composite_add_n_8_8888_cleanup, \
- pixman_composite_add_8888_8888_8888_process_pixblock_head, \
- pixman_composite_add_8888_8888_8888_process_pixblock_tail, \
- pixman_composite_add_8888_8888_8888_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 4, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 27 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_add_8888_n_8888_init
- add DUMMY, sp, #(ARGS_STACK_OFFSET + 8)
- vld1.32 {d27[0]}, [DUMMY]
- vdup.8 d27, d27[3]
-.endm
-
-.macro pixman_composite_add_8888_n_8888_cleanup
-.endm
-
-generate_composite_function \
- pixman_composite_add_8888_n_8888_asm_neon, 32, 0, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- pixman_composite_add_8888_n_8888_init, \
- pixman_composite_add_8888_n_8888_cleanup, \
- pixman_composite_add_8888_8888_8888_process_pixblock_head, \
- pixman_composite_add_8888_8888_8888_process_pixblock_tail, \
- pixman_composite_add_8888_8888_8888_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 4, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 27 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_out_reverse_8888_n_8888_process_pixblock_head
- /* expecting source data in {d0, d1, d2, d3} */
- /* destination data in {d4, d5, d6, d7} */
- /* solid mask is in d15 */
-
- /* 'in' */
- vmull.u8 q8, d15, d3
- vmull.u8 q6, d15, d2
- vmull.u8 q5, d15, d1
- vmull.u8 q4, d15, d0
- vrshr.u16 q13, q8, #8
- vrshr.u16 q12, q6, #8
- vrshr.u16 q11, q5, #8
- vrshr.u16 q10, q4, #8
- vraddhn.u16 d3, q8, q13
- vraddhn.u16 d2, q6, q12
- vraddhn.u16 d1, q5, q11
- vraddhn.u16 d0, q4, q10
- vmvn.8 d24, d3 /* get inverted alpha */
- /* now do alpha blending */
- vmull.u8 q8, d24, d4
- vmull.u8 q9, d24, d5
- vmull.u8 q10, d24, d6
- vmull.u8 q11, d24, d7
-.endm
-
-.macro pixman_composite_out_reverse_8888_n_8888_process_pixblock_tail
- vrshr.u16 q14, q8, #8
- vrshr.u16 q15, q9, #8
- vrshr.u16 q12, q10, #8
- vrshr.u16 q13, q11, #8
- vraddhn.u16 d28, q14, q8
- vraddhn.u16 d29, q15, q9
- vraddhn.u16 d30, q12, q10
- vraddhn.u16 d31, q13, q11
-.endm
-
-/* TODO: expand macros and do better instructions scheduling */
-.macro pixman_composite_out_reverse_8888_8888_8888_process_pixblock_tail_head
- vld4.8 {d4, d5, d6, d7}, [DST_R, :128]!
- pixman_composite_out_reverse_8888_n_8888_process_pixblock_tail
- fetch_src_pixblock
- cache_preload 8, 8
- fetch_mask_pixblock
- pixman_composite_out_reverse_8888_n_8888_process_pixblock_head
- vst4.8 {d28, d29, d30, d31}, [DST_W, :128]!
-.endm
-
-generate_composite_function_single_scanline \
- pixman_composite_scanline_out_reverse_mask_asm_neon, 32, 32, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- default_init_need_all_regs, \
- default_cleanup_need_all_regs, \
- pixman_composite_out_reverse_8888_n_8888_process_pixblock_head, \
- pixman_composite_out_reverse_8888_n_8888_process_pixblock_tail, \
- pixman_composite_out_reverse_8888_8888_8888_process_pixblock_tail_head \
- 28, /* dst_w_basereg */ \
- 4, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 12 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_over_8888_n_8888_process_pixblock_head
- pixman_composite_out_reverse_8888_n_8888_process_pixblock_head
-.endm
-
-.macro pixman_composite_over_8888_n_8888_process_pixblock_tail
- pixman_composite_out_reverse_8888_n_8888_process_pixblock_tail
- vqadd.u8 q14, q0, q14
- vqadd.u8 q15, q1, q15
-.endm
-
-/* TODO: expand macros and do better instructions scheduling */
-.macro pixman_composite_over_8888_n_8888_process_pixblock_tail_head
- vld4.8 {d4, d5, d6, d7}, [DST_R, :128]!
- pixman_composite_over_8888_n_8888_process_pixblock_tail
- fetch_src_pixblock
- cache_preload 8, 8
- pixman_composite_over_8888_n_8888_process_pixblock_head
- vst4.8 {d28, d29, d30, d31}, [DST_W, :128]!
-.endm
-
-.macro pixman_composite_over_8888_n_8888_init
- add DUMMY, sp, #48
- vpush {d8-d15}
- vld1.32 {d15[0]}, [DUMMY]
- vdup.8 d15, d15[3]
-.endm
-
-.macro pixman_composite_over_8888_n_8888_cleanup
- vpop {d8-d15}
-.endm
-
-generate_composite_function \
- pixman_composite_over_8888_n_8888_asm_neon, 32, 0, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- pixman_composite_over_8888_n_8888_init, \
- pixman_composite_over_8888_n_8888_cleanup, \
- pixman_composite_over_8888_n_8888_process_pixblock_head, \
- pixman_composite_over_8888_n_8888_process_pixblock_tail, \
- pixman_composite_over_8888_n_8888_process_pixblock_tail_head
-
-/******************************************************************************/
-
-/* TODO: expand macros and do better instructions scheduling */
-.macro pixman_composite_over_8888_8888_8888_process_pixblock_tail_head
- vld4.8 {d4, d5, d6, d7}, [DST_R, :128]!
- pixman_composite_over_8888_n_8888_process_pixblock_tail
- fetch_src_pixblock
- cache_preload 8, 8
- fetch_mask_pixblock
- pixman_composite_over_8888_n_8888_process_pixblock_head
- vst4.8 {d28, d29, d30, d31}, [DST_W, :128]!
-.endm
-
-generate_composite_function \
- pixman_composite_over_8888_8888_8888_asm_neon, 32, 32, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- default_init_need_all_regs, \
- default_cleanup_need_all_regs, \
- pixman_composite_over_8888_n_8888_process_pixblock_head, \
- pixman_composite_over_8888_n_8888_process_pixblock_tail, \
- pixman_composite_over_8888_8888_8888_process_pixblock_tail_head \
- 28, /* dst_w_basereg */ \
- 4, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 12 /* mask_basereg */
-
-generate_composite_function_single_scanline \
- pixman_composite_scanline_over_mask_asm_neon, 32, 32, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- default_init_need_all_regs, \
- default_cleanup_need_all_regs, \
- pixman_composite_over_8888_n_8888_process_pixblock_head, \
- pixman_composite_over_8888_n_8888_process_pixblock_tail, \
- pixman_composite_over_8888_8888_8888_process_pixblock_tail_head \
- 28, /* dst_w_basereg */ \
- 4, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 12 /* mask_basereg */
-
-/******************************************************************************/
-
-/* TODO: expand macros and do better instructions scheduling */
-.macro pixman_composite_over_8888_8_8888_process_pixblock_tail_head
- vld4.8 {d4, d5, d6, d7}, [DST_R, :128]!
- pixman_composite_over_8888_n_8888_process_pixblock_tail
- fetch_src_pixblock
- cache_preload 8, 8
- fetch_mask_pixblock
- pixman_composite_over_8888_n_8888_process_pixblock_head
- vst4.8 {d28, d29, d30, d31}, [DST_W, :128]!
-.endm
-
-generate_composite_function \
- pixman_composite_over_8888_8_8888_asm_neon, 32, 8, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- default_init_need_all_regs, \
- default_cleanup_need_all_regs, \
- pixman_composite_over_8888_n_8888_process_pixblock_head, \
- pixman_composite_over_8888_n_8888_process_pixblock_tail, \
- pixman_composite_over_8888_8_8888_process_pixblock_tail_head \
- 28, /* dst_w_basereg */ \
- 4, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 15 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_src_0888_0888_process_pixblock_head
-.endm
-
-.macro pixman_composite_src_0888_0888_process_pixblock_tail
-.endm
-
-.macro pixman_composite_src_0888_0888_process_pixblock_tail_head
- vst3.8 {d0, d1, d2}, [DST_W]!
- fetch_src_pixblock
- cache_preload 8, 8
-.endm
-
-generate_composite_function \
- pixman_composite_src_0888_0888_asm_neon, 24, 0, 24, \
- FLAG_DST_WRITEONLY, \
- 8, /* number of pixels, processed in a single block */ \
- 10, /* prefetch distance */ \
- default_init, \
- default_cleanup, \
- pixman_composite_src_0888_0888_process_pixblock_head, \
- pixman_composite_src_0888_0888_process_pixblock_tail, \
- pixman_composite_src_0888_0888_process_pixblock_tail_head, \
- 0, /* dst_w_basereg */ \
- 0, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 0 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_src_0888_8888_rev_process_pixblock_head
- vswp d0, d2
-.endm
-
-.macro pixman_composite_src_0888_8888_rev_process_pixblock_tail
-.endm
-
-.macro pixman_composite_src_0888_8888_rev_process_pixblock_tail_head
- vst4.8 {d0, d1, d2, d3}, [DST_W]!
- fetch_src_pixblock
- vswp d0, d2
- cache_preload 8, 8
-.endm
-
-.macro pixman_composite_src_0888_8888_rev_init
- veor d3, d3, d3
-.endm
-
-generate_composite_function \
- pixman_composite_src_0888_8888_rev_asm_neon, 24, 0, 32, \
- FLAG_DST_WRITEONLY | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 10, /* prefetch distance */ \
- pixman_composite_src_0888_8888_rev_init, \
- default_cleanup, \
- pixman_composite_src_0888_8888_rev_process_pixblock_head, \
- pixman_composite_src_0888_8888_rev_process_pixblock_tail, \
- pixman_composite_src_0888_8888_rev_process_pixblock_tail_head, \
- 0, /* dst_w_basereg */ \
- 0, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 0 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_src_0888_0565_rev_process_pixblock_head
- vshll.u8 q8, d1, #8
- vshll.u8 q9, d2, #8
-.endm
-
-.macro pixman_composite_src_0888_0565_rev_process_pixblock_tail
- vshll.u8 q14, d0, #8
- vsri.u16 q14, q8, #5
- vsri.u16 q14, q9, #11
-.endm
-
-.macro pixman_composite_src_0888_0565_rev_process_pixblock_tail_head
- vshll.u8 q14, d0, #8
- fetch_src_pixblock
- vsri.u16 q14, q8, #5
- vsri.u16 q14, q9, #11
- vshll.u8 q8, d1, #8
- vst1.16 {d28, d29}, [DST_W, :128]!
- vshll.u8 q9, d2, #8
-.endm
-
-generate_composite_function \
- pixman_composite_src_0888_0565_rev_asm_neon, 24, 0, 16, \
- FLAG_DST_WRITEONLY, \
- 8, /* number of pixels, processed in a single block */ \
- 10, /* prefetch distance */ \
- default_init, \
- default_cleanup, \
- pixman_composite_src_0888_0565_rev_process_pixblock_head, \
- pixman_composite_src_0888_0565_rev_process_pixblock_tail, \
- pixman_composite_src_0888_0565_rev_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 0, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 0 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_src_pixbuf_8888_process_pixblock_head
- vmull.u8 q8, d3, d0
- vmull.u8 q9, d3, d1
- vmull.u8 q10, d3, d2
-.endm
-
-.macro pixman_composite_src_pixbuf_8888_process_pixblock_tail
- vrshr.u16 q11, q8, #8
- vswp d3, d31
- vrshr.u16 q12, q9, #8
- vrshr.u16 q13, q10, #8
- vraddhn.u16 d30, q11, q8
- vraddhn.u16 d29, q12, q9
- vraddhn.u16 d28, q13, q10
-.endm
-
-.macro pixman_composite_src_pixbuf_8888_process_pixblock_tail_head
- vrshr.u16 q11, q8, #8
- vswp d3, d31
- vrshr.u16 q12, q9, #8
- vrshr.u16 q13, q10, #8
- fetch_src_pixblock
- vraddhn.u16 d30, q11, q8
- PF add PF_X, PF_X, #8
- PF tst PF_CTL, #0xF
- PF addne PF_X, PF_X, #8
- PF subne PF_CTL, PF_CTL, #1
- vraddhn.u16 d29, q12, q9
- vraddhn.u16 d28, q13, q10
- vmull.u8 q8, d3, d0
- vmull.u8 q9, d3, d1
- vmull.u8 q10, d3, d2
- vst4.8 {d28, d29, d30, d31}, [DST_W, :128]!
- PF cmp PF_X, ORIG_W
- PF pld, [PF_SRC, PF_X, lsl #src_bpp_shift]
- PF subge PF_X, PF_X, ORIG_W
- PF subges PF_CTL, PF_CTL, #0x10
- PF ldrgeb DUMMY, [PF_SRC, SRC_STRIDE, lsl #src_bpp_shift]!
-.endm
-
-generate_composite_function \
- pixman_composite_src_pixbuf_8888_asm_neon, 32, 0, 32, \
- FLAG_DST_WRITEONLY | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 10, /* prefetch distance */ \
- default_init, \
- default_cleanup, \
- pixman_composite_src_pixbuf_8888_process_pixblock_head, \
- pixman_composite_src_pixbuf_8888_process_pixblock_tail, \
- pixman_composite_src_pixbuf_8888_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 0, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 0 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_src_rpixbuf_8888_process_pixblock_head
- vmull.u8 q8, d3, d0
- vmull.u8 q9, d3, d1
- vmull.u8 q10, d3, d2
-.endm
-
-.macro pixman_composite_src_rpixbuf_8888_process_pixblock_tail
- vrshr.u16 q11, q8, #8
- vswp d3, d31
- vrshr.u16 q12, q9, #8
- vrshr.u16 q13, q10, #8
- vraddhn.u16 d28, q11, q8
- vraddhn.u16 d29, q12, q9
- vraddhn.u16 d30, q13, q10
-.endm
-
-.macro pixman_composite_src_rpixbuf_8888_process_pixblock_tail_head
- vrshr.u16 q11, q8, #8
- vswp d3, d31
- vrshr.u16 q12, q9, #8
- vrshr.u16 q13, q10, #8
- fetch_src_pixblock
- vraddhn.u16 d28, q11, q8
- PF add PF_X, PF_X, #8
- PF tst PF_CTL, #0xF
- PF addne PF_X, PF_X, #8
- PF subne PF_CTL, PF_CTL, #1
- vraddhn.u16 d29, q12, q9
- vraddhn.u16 d30, q13, q10
- vmull.u8 q8, d3, d0
- vmull.u8 q9, d3, d1
- vmull.u8 q10, d3, d2
- vst4.8 {d28, d29, d30, d31}, [DST_W, :128]!
- PF cmp PF_X, ORIG_W
- PF pld, [PF_SRC, PF_X, lsl #src_bpp_shift]
- PF subge PF_X, PF_X, ORIG_W
- PF subges PF_CTL, PF_CTL, #0x10
- PF ldrgeb DUMMY, [PF_SRC, SRC_STRIDE, lsl #src_bpp_shift]!
-.endm
-
-generate_composite_function \
- pixman_composite_src_rpixbuf_8888_asm_neon, 32, 0, 32, \
- FLAG_DST_WRITEONLY | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- 10, /* prefetch distance */ \
- default_init, \
- default_cleanup, \
- pixman_composite_src_rpixbuf_8888_process_pixblock_head, \
- pixman_composite_src_rpixbuf_8888_process_pixblock_tail, \
- pixman_composite_src_rpixbuf_8888_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 0, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 0 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_over_0565_8_0565_process_pixblock_head
- /* mask is in d15 */
- convert_0565_to_x888 q4, d2, d1, d0
- convert_0565_to_x888 q5, d6, d5, d4
- /* source pixel data is in {d0, d1, d2, XX} */
- /* destination pixel data is in {d4, d5, d6, XX} */
- vmvn.8 d7, d15
- vmull.u8 q6, d15, d2
- vmull.u8 q5, d15, d1
- vmull.u8 q4, d15, d0
- vmull.u8 q8, d7, d4
- vmull.u8 q9, d7, d5
- vmull.u8 q13, d7, d6
- vrshr.u16 q12, q6, #8
- vrshr.u16 q11, q5, #8
- vrshr.u16 q10, q4, #8
- vraddhn.u16 d2, q6, q12
- vraddhn.u16 d1, q5, q11
- vraddhn.u16 d0, q4, q10
-.endm
-
-.macro pixman_composite_over_0565_8_0565_process_pixblock_tail
- vrshr.u16 q14, q8, #8
- vrshr.u16 q15, q9, #8
- vrshr.u16 q12, q13, #8
- vraddhn.u16 d28, q14, q8
- vraddhn.u16 d29, q15, q9
- vraddhn.u16 d30, q12, q13
- vqadd.u8 q0, q0, q14
- vqadd.u8 q1, q1, q15
- /* 32bpp result is in {d0, d1, d2, XX} */
- convert_8888_to_0565 d2, d1, d0, q14, q15, q3
-.endm
-
-/* TODO: expand macros and do better instructions scheduling */
-.macro pixman_composite_over_0565_8_0565_process_pixblock_tail_head
- fetch_mask_pixblock
- pixman_composite_over_0565_8_0565_process_pixblock_tail
- fetch_src_pixblock
- vld1.16 {d10, d11}, [DST_R, :128]!
- cache_preload 8, 8
- pixman_composite_over_0565_8_0565_process_pixblock_head
- vst1.16 {d28, d29}, [DST_W, :128]!
-.endm
-
-generate_composite_function \
- pixman_composite_over_0565_8_0565_asm_neon, 16, 8, 16, \
- FLAG_DST_READWRITE, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- default_init_need_all_regs, \
- default_cleanup_need_all_regs, \
- pixman_composite_over_0565_8_0565_process_pixblock_head, \
- pixman_composite_over_0565_8_0565_process_pixblock_tail, \
- pixman_composite_over_0565_8_0565_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 10, /* dst_r_basereg */ \
- 8, /* src_basereg */ \
- 15 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_over_0565_n_0565_init
- add DUMMY, sp, #(ARGS_STACK_OFFSET + 8)
- vpush {d8-d15}
- vld1.32 {d15[0]}, [DUMMY]
- vdup.8 d15, d15[3]
-.endm
-
-.macro pixman_composite_over_0565_n_0565_cleanup
- vpop {d8-d15}
-.endm
-
-generate_composite_function \
- pixman_composite_over_0565_n_0565_asm_neon, 16, 0, 16, \
- FLAG_DST_READWRITE, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- pixman_composite_over_0565_n_0565_init, \
- pixman_composite_over_0565_n_0565_cleanup, \
- pixman_composite_over_0565_8_0565_process_pixblock_head, \
- pixman_composite_over_0565_8_0565_process_pixblock_tail, \
- pixman_composite_over_0565_8_0565_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 10, /* dst_r_basereg */ \
- 8, /* src_basereg */ \
- 15 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_add_0565_8_0565_process_pixblock_head
- /* mask is in d15 */
- convert_0565_to_x888 q4, d2, d1, d0
- convert_0565_to_x888 q5, d6, d5, d4
- /* source pixel data is in {d0, d1, d2, XX} */
- /* destination pixel data is in {d4, d5, d6, XX} */
- vmull.u8 q6, d15, d2
- vmull.u8 q5, d15, d1
- vmull.u8 q4, d15, d0
- vrshr.u16 q12, q6, #8
- vrshr.u16 q11, q5, #8
- vrshr.u16 q10, q4, #8
- vraddhn.u16 d2, q6, q12
- vraddhn.u16 d1, q5, q11
- vraddhn.u16 d0, q4, q10
-.endm
-
-.macro pixman_composite_add_0565_8_0565_process_pixblock_tail
- vqadd.u8 q0, q0, q2
- vqadd.u8 q1, q1, q3
- /* 32bpp result is in {d0, d1, d2, XX} */
- convert_8888_to_0565 d2, d1, d0, q14, q15, q3
-.endm
-
-/* TODO: expand macros and do better instructions scheduling */
-.macro pixman_composite_add_0565_8_0565_process_pixblock_tail_head
- fetch_mask_pixblock
- pixman_composite_add_0565_8_0565_process_pixblock_tail
- fetch_src_pixblock
- vld1.16 {d10, d11}, [DST_R, :128]!
- cache_preload 8, 8
- pixman_composite_add_0565_8_0565_process_pixblock_head
- vst1.16 {d28, d29}, [DST_W, :128]!
-.endm
-
-generate_composite_function \
- pixman_composite_add_0565_8_0565_asm_neon, 16, 8, 16, \
- FLAG_DST_READWRITE, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- default_init_need_all_regs, \
- default_cleanup_need_all_regs, \
- pixman_composite_add_0565_8_0565_process_pixblock_head, \
- pixman_composite_add_0565_8_0565_process_pixblock_tail, \
- pixman_composite_add_0565_8_0565_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 10, /* dst_r_basereg */ \
- 8, /* src_basereg */ \
- 15 /* mask_basereg */
-
-/******************************************************************************/
-
-.macro pixman_composite_out_reverse_8_0565_process_pixblock_head
- /* mask is in d15 */
- convert_0565_to_x888 q5, d6, d5, d4
- /* destination pixel data is in {d4, d5, d6, xx} */
- vmvn.8 d24, d15 /* get inverted alpha */
- /* now do alpha blending */
- vmull.u8 q8, d24, d4
- vmull.u8 q9, d24, d5
- vmull.u8 q10, d24, d6
-.endm
-
-.macro pixman_composite_out_reverse_8_0565_process_pixblock_tail
- vrshr.u16 q14, q8, #8
- vrshr.u16 q15, q9, #8
- vrshr.u16 q12, q10, #8
- vraddhn.u16 d0, q14, q8
- vraddhn.u16 d1, q15, q9
- vraddhn.u16 d2, q12, q10
- /* 32bpp result is in {d0, d1, d2, XX} */
- convert_8888_to_0565 d2, d1, d0, q14, q15, q3
-.endm
-
-/* TODO: expand macros and do better instructions scheduling */
-.macro pixman_composite_out_reverse_8_0565_process_pixblock_tail_head
- fetch_src_pixblock
- pixman_composite_out_reverse_8_0565_process_pixblock_tail
- vld1.16 {d10, d11}, [DST_R, :128]!
- cache_preload 8, 8
- pixman_composite_out_reverse_8_0565_process_pixblock_head
- vst1.16 {d28, d29}, [DST_W, :128]!
-.endm
-
-generate_composite_function \
- pixman_composite_out_reverse_8_0565_asm_neon, 8, 0, 16, \
- FLAG_DST_READWRITE, \
- 8, /* number of pixels, processed in a single block */ \
- 5, /* prefetch distance */ \
- default_init_need_all_regs, \
- default_cleanup_need_all_regs, \
- pixman_composite_out_reverse_8_0565_process_pixblock_head, \
- pixman_composite_out_reverse_8_0565_process_pixblock_tail, \
- pixman_composite_out_reverse_8_0565_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 10, /* dst_r_basereg */ \
- 15, /* src_basereg */ \
- 0 /* mask_basereg */
-
-/******************************************************************************/
-
-generate_composite_function_nearest_scanline \
- pixman_scaled_nearest_scanline_8888_8888_OVER_asm_neon, 32, 0, 32, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- default_init, \
- default_cleanup, \
- pixman_composite_over_8888_8888_process_pixblock_head, \
- pixman_composite_over_8888_8888_process_pixblock_tail, \
- pixman_composite_over_8888_8888_process_pixblock_tail_head
-
-generate_composite_function_nearest_scanline \
- pixman_scaled_nearest_scanline_8888_0565_OVER_asm_neon, 32, 0, 16, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- default_init, \
- default_cleanup, \
- pixman_composite_over_8888_0565_process_pixblock_head, \
- pixman_composite_over_8888_0565_process_pixblock_tail, \
- pixman_composite_over_8888_0565_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 4, /* dst_r_basereg */ \
- 0, /* src_basereg */ \
- 24 /* mask_basereg */
-
-generate_composite_function_nearest_scanline \
- pixman_scaled_nearest_scanline_8888_0565_SRC_asm_neon, 32, 0, 16, \
- FLAG_DST_WRITEONLY | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- default_init, \
- default_cleanup, \
- pixman_composite_src_8888_0565_process_pixblock_head, \
- pixman_composite_src_8888_0565_process_pixblock_tail, \
- pixman_composite_src_8888_0565_process_pixblock_tail_head
-
-generate_composite_function_nearest_scanline \
- pixman_scaled_nearest_scanline_0565_8888_SRC_asm_neon, 16, 0, 32, \
- FLAG_DST_WRITEONLY | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- default_init, \
- default_cleanup, \
- pixman_composite_src_0565_8888_process_pixblock_head, \
- pixman_composite_src_0565_8888_process_pixblock_tail, \
- pixman_composite_src_0565_8888_process_pixblock_tail_head
-
-generate_composite_function_nearest_scanline \
- pixman_scaled_nearest_scanline_8888_8_0565_OVER_asm_neon, 32, 8, 16, \
- FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \
- 8, /* number of pixels, processed in a single block */ \
- default_init_need_all_regs, \
- default_cleanup_need_all_regs, \
- pixman_composite_over_8888_8_0565_process_pixblock_head, \
- pixman_composite_over_8888_8_0565_process_pixblock_tail, \
- pixman_composite_over_8888_8_0565_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 4, /* dst_r_basereg */ \
- 8, /* src_basereg */ \
- 24 /* mask_basereg */
-
-generate_composite_function_nearest_scanline \
- pixman_scaled_nearest_scanline_0565_8_0565_OVER_asm_neon, 16, 8, 16, \
- FLAG_DST_READWRITE, \
- 8, /* number of pixels, processed in a single block */ \
- default_init_need_all_regs, \
- default_cleanup_need_all_regs, \
- pixman_composite_over_0565_8_0565_process_pixblock_head, \
- pixman_composite_over_0565_8_0565_process_pixblock_tail, \
- pixman_composite_over_0565_8_0565_process_pixblock_tail_head, \
- 28, /* dst_w_basereg */ \
- 10, /* dst_r_basereg */ \
- 8, /* src_basereg */ \
- 15 /* mask_basereg */
-
-/******************************************************************************/
-
-/* Supplementary macro for setting function attributes */
-.macro pixman_asm_function fname
- .func fname
- .global fname
-#ifdef __ELF__
- .hidden fname
- .type fname, %function
-#endif
-fname:
-.endm
-
-/*
- * Bilinear scaling support code which tries to provide pixel fetching, color
- * format conversion, and interpolation as separate macros which can be used
- * as the basic building blocks for constructing bilinear scanline functions.
- */
-
-.macro bilinear_load_8888 reg1, reg2, tmp
- mov TMP2, X, asr #16
- add X, X, UX
- add TMP1, TOP, TMP2, asl #2
- add TMP2, BOTTOM, TMP2, asl #2
- vld1.32 {reg1}, [TMP1]
- vld1.32 {reg2}, [TMP2]
-.endm
-
-.macro bilinear_load_0565 reg1, reg2, tmp
- mov TMP2, X, asr #16
- add X, X, UX
- add TMP1, TOP, TMP2, asl #1
- add TMP2, BOTTOM, TMP2, asl #1
- vld1.32 {reg2[0]}, [TMP1]
- vld1.32 {reg2[1]}, [TMP2]
- convert_four_0565_to_x888_packed reg2, reg1, reg2, tmp
-.endm
-
-.macro bilinear_load_and_vertical_interpolate_two_8888 \
- acc1, acc2, reg1, reg2, reg3, reg4, tmp1, tmp2
-
- bilinear_load_8888 reg1, reg2, tmp1
- vmull.u8 acc1, reg1, d28
- vmlal.u8 acc1, reg2, d29
- bilinear_load_8888 reg3, reg4, tmp2
- vmull.u8 acc2, reg3, d28
- vmlal.u8 acc2, reg4, d29
-.endm
-
-.macro bilinear_load_and_vertical_interpolate_four_8888 \
- xacc1, xacc2, xreg1, xreg2, xreg3, xreg4, xacc2lo, xacc2hi \
- yacc1, yacc2, yreg1, yreg2, yreg3, yreg4, yacc2lo, yacc2hi
-
- bilinear_load_and_vertical_interpolate_two_8888 \
- xacc1, xacc2, xreg1, xreg2, xreg3, xreg4, xacc2lo, xacc2hi
- bilinear_load_and_vertical_interpolate_two_8888 \
- yacc1, yacc2, yreg1, yreg2, yreg3, yreg4, yacc2lo, yacc2hi
-.endm
-
-.macro bilinear_load_and_vertical_interpolate_two_0565 \
- acc1, acc2, reg1, reg2, reg3, reg4, acc2lo, acc2hi
-
- mov TMP2, X, asr #16
- add X, X, UX
- mov TMP4, X, asr #16
- add X, X, UX
- add TMP1, TOP, TMP2, asl #1
- add TMP2, BOTTOM, TMP2, asl #1
- add TMP3, TOP, TMP4, asl #1
- add TMP4, BOTTOM, TMP4, asl #1
- vld1.32 {acc2lo[0]}, [TMP1]
- vld1.32 {acc2hi[0]}, [TMP3]
- vld1.32 {acc2lo[1]}, [TMP2]
- vld1.32 {acc2hi[1]}, [TMP4]
- convert_0565_to_x888 acc2, reg3, reg2, reg1
- vzip.u8 reg1, reg3
- vzip.u8 reg2, reg4
- vzip.u8 reg3, reg4
- vzip.u8 reg1, reg2
- vmull.u8 acc1, reg1, d28
- vmlal.u8 acc1, reg2, d29
- vmull.u8 acc2, reg3, d28
- vmlal.u8 acc2, reg4, d29
-.endm
-
-.macro bilinear_load_and_vertical_interpolate_four_0565 \
- xacc1, xacc2, xreg1, xreg2, xreg3, xreg4, xacc2lo, xacc2hi \
- yacc1, yacc2, yreg1, yreg2, yreg3, yreg4, yacc2lo, yacc2hi
-
- mov TMP2, X, asr #16
- add X, X, UX
- mov TMP4, X, asr #16
- add X, X, UX
- add TMP1, TOP, TMP2, asl #1
- add TMP2, BOTTOM, TMP2, asl #1
- add TMP3, TOP, TMP4, asl #1
- add TMP4, BOTTOM, TMP4, asl #1
- vld1.32 {xacc2lo[0]}, [TMP1]
- vld1.32 {xacc2hi[0]}, [TMP3]
- vld1.32 {xacc2lo[1]}, [TMP2]
- vld1.32 {xacc2hi[1]}, [TMP4]
- convert_0565_to_x888 xacc2, xreg3, xreg2, xreg1
- mov TMP2, X, asr #16
- add X, X, UX
- mov TMP4, X, asr #16
- add X, X, UX
- add TMP1, TOP, TMP2, asl #1
- add TMP2, BOTTOM, TMP2, asl #1
- add TMP3, TOP, TMP4, asl #1
- add TMP4, BOTTOM, TMP4, asl #1
- vld1.32 {yacc2lo[0]}, [TMP1]
- vzip.u8 xreg1, xreg3
- vld1.32 {yacc2hi[0]}, [TMP3]
- vzip.u8 xreg2, xreg4
- vld1.32 {yacc2lo[1]}, [TMP2]
- vzip.u8 xreg3, xreg4
- vld1.32 {yacc2hi[1]}, [TMP4]
- vzip.u8 xreg1, xreg2
- convert_0565_to_x888 yacc2, yreg3, yreg2, yreg1
- vmull.u8 xacc1, xreg1, d28
- vzip.u8 yreg1, yreg3
- vmlal.u8 xacc1, xreg2, d29
- vzip.u8 yreg2, yreg4
- vmull.u8 xacc2, xreg3, d28
- vzip.u8 yreg3, yreg4
- vmlal.u8 xacc2, xreg4, d29
- vzip.u8 yreg1, yreg2
- vmull.u8 yacc1, yreg1, d28
- vmlal.u8 yacc1, yreg2, d29
- vmull.u8 yacc2, yreg3, d28
- vmlal.u8 yacc2, yreg4, d29
-.endm
-
-.macro bilinear_store_8888 numpix, tmp1, tmp2
-.if numpix == 4
- vst1.32 {d0, d1}, [OUT]!
-.elseif numpix == 2
- vst1.32 {d0}, [OUT]!
-.elseif numpix == 1
- vst1.32 {d0[0]}, [OUT, :32]!
-.else
- .error bilinear_store_8888 numpix is unsupported
-.endif
-.endm
-
-.macro bilinear_store_0565 numpix, tmp1, tmp2
- vuzp.u8 d0, d1
- vuzp.u8 d2, d3
- vuzp.u8 d1, d3
- vuzp.u8 d0, d2
- convert_8888_to_0565 d2, d1, d0, q1, tmp1, tmp2
-.if numpix == 4
- vst1.16 {d2}, [OUT]!
-.elseif numpix == 2
- vst1.32 {d2[0]}, [OUT]!
-.elseif numpix == 1
- vst1.16 {d2[0]}, [OUT]!
-.else
- .error bilinear_store_0565 numpix is unsupported
-.endif
-.endm
-
-.macro bilinear_interpolate_last_pixel src_fmt, dst_fmt
- bilinear_load_&src_fmt d0, d1, d2
- vmull.u8 q1, d0, d28
- vmlal.u8 q1, d1, d29
- vshr.u16 d30, d24, #8
- /* 4 cycles bubble */
- vshll.u16 q0, d2, #8
- vmlsl.u16 q0, d2, d30
- vmlal.u16 q0, d3, d30
- /* 5 cycles bubble */
- vshrn.u32 d0, q0, #16
- /* 3 cycles bubble */
- vmovn.u16 d0, q0
- /* 1 cycle bubble */
- bilinear_store_&dst_fmt 1, q2, q3
-.endm
-
-.macro bilinear_interpolate_two_pixels src_fmt, dst_fmt
- bilinear_load_and_vertical_interpolate_two_&src_fmt \
- q1, q11, d0, d1, d20, d21, d22, d23
- vshr.u16 q15, q12, #8
- vadd.u16 q12, q12, q13
- vshll.u16 q0, d2, #8
- vmlsl.u16 q0, d2, d30
- vmlal.u16 q0, d3, d30
- vshll.u16 q10, d22, #8
- vmlsl.u16 q10, d22, d31
- vmlal.u16 q10, d23, d31
- vshrn.u32 d30, q0, #16
- vshrn.u32 d31, q10, #16
- vmovn.u16 d0, q15
- bilinear_store_&dst_fmt 2, q2, q3
-.endm
-
-.macro bilinear_interpolate_four_pixels src_fmt, dst_fmt
- bilinear_load_and_vertical_interpolate_four_&src_fmt \
- q1, q11, d0, d1, d20, d21, d22, d23 \
- q3, q9, d4, d5, d16, d17, d18, d19
- pld [TMP1, PF_OFFS]
- vshr.u16 q15, q12, #8
- vadd.u16 q12, q12, q13
- vshll.u16 q0, d2, #8
- vmlsl.u16 q0, d2, d30
- vmlal.u16 q0, d3, d30
- vshll.u16 q10, d22, #8
- vmlsl.u16 q10, d22, d31
- vmlal.u16 q10, d23, d31
- vshr.u16 q15, q12, #8
- vshll.u16 q2, d6, #8
- vmlsl.u16 q2, d6, d30
- vmlal.u16 q2, d7, d30
- vshll.u16 q8, d18, #8
- pld [TMP2, PF_OFFS]
- vmlsl.u16 q8, d18, d31
- vmlal.u16 q8, d19, d31
- vadd.u16 q12, q12, q13
- vshrn.u32 d0, q0, #16
- vshrn.u32 d1, q10, #16
- vshrn.u32 d4, q2, #16
- vshrn.u32 d5, q8, #16
- vmovn.u16 d0, q0
- vmovn.u16 d1, q2
- bilinear_store_&dst_fmt 4, q2, q3
-.endm
-
-/*
- * Main template macro for generating NEON optimized bilinear scanline
- * functions.
- *
- * TODO: use software pipelining and aligned writes to the destination buffer
- * in order to improve performance
- *
- * Bilinear scanline scaler macro template uses the following arguments:
- * fname - name of the function to generate
- * src_fmt - source color format (8888 or 0565)
- * dst_fmt - destination color format (8888 or 0565)
- * bpp_shift - (1 << bpp_shift) is the size of source pixel in bytes
- * prefetch_distance - prefetch in the source image by that many
- * pixels ahead
- */
-
-.macro generate_bilinear_scanline_func fname, src_fmt, dst_fmt, \
- bpp_shift, prefetch_distance
-
-pixman_asm_function fname
- OUT .req r0
- TOP .req r1
- BOTTOM .req r2
- WT .req r3
- WB .req r4
- X .req r5
- UX .req r6
- WIDTH .req ip
- TMP1 .req r3
- TMP2 .req r4
- PF_OFFS .req r7
- TMP3 .req r8
- TMP4 .req r9
-
- mov ip, sp
- push {r4, r5, r6, r7, r8, r9}
- mov PF_OFFS, #prefetch_distance
- ldmia ip, {WB, X, UX, WIDTH}
- mul PF_OFFS, PF_OFFS, UX
-
- cmp WIDTH, #0
- ble 3f
-
- vdup.u16 q12, X
- vdup.u16 q13, UX
- vdup.u8 d28, WT
- vdup.u8 d29, WB
- vadd.u16 d25, d25, d26
- vadd.u16 q13, q13, q13
-
- subs WIDTH, WIDTH, #4
- blt 1f
- mov PF_OFFS, PF_OFFS, asr #(16 - bpp_shift)
-0:
- bilinear_interpolate_four_pixels src_fmt, dst_fmt
- subs WIDTH, WIDTH, #4
- bge 0b
-1:
- tst WIDTH, #2
- beq 2f
- bilinear_interpolate_two_pixels src_fmt, dst_fmt
-2:
- tst WIDTH, #1
- beq 3f
- bilinear_interpolate_last_pixel src_fmt, dst_fmt
-3:
- pop {r4, r5, r6, r7, r8, r9}
- bx lr
-
- .unreq OUT
- .unreq TOP
- .unreq BOTTOM
- .unreq WT
- .unreq WB
- .unreq X
- .unreq UX
- .unreq WIDTH
- .unreq TMP1
- .unreq TMP2
- .unreq PF_OFFS
- .unreq TMP3
- .unreq TMP4
-.endfunc
-
-.endm
-
-generate_bilinear_scanline_func \
- pixman_scaled_bilinear_scanline_8888_8888_SRC_asm_neon, 8888, 8888, 2, 28
-
-generate_bilinear_scanline_func \
- pixman_scaled_bilinear_scanline_8888_0565_SRC_asm_neon, 8888, 0565, 2, 28
-
-generate_bilinear_scanline_func \
- pixman_scaled_bilinear_scanline_0565_x888_SRC_asm_neon, 0565, 8888, 1, 28
-
-generate_bilinear_scanline_func \
- pixman_scaled_bilinear_scanline_0565_0565_SRC_asm_neon, 0565, 0565, 1, 28
+/* + * Copyright © 2009 Nokia Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Author: Siarhei Siamashka (siarhei.siamashka@nokia.com) + */ + +/* + * This file contains implementations of NEON optimized pixel processing + * functions. There is no full and detailed tutorial, but some functions + * (those which are exposing some new or interesting features) are + * extensively commented and can be used as examples. + * + * You may want to have a look at the comments for following functions: + * - pixman_composite_over_8888_0565_asm_neon + * - pixman_composite_over_n_8_0565_asm_neon + */ + +/* Prevent the stack from becoming executable for no reason... */ +#if defined(__linux__) && defined(__ELF__) +.section .note.GNU-stack,"",%progbits +#endif + + .text + .fpu neon + .arch armv7a + .object_arch armv4 + .eabi_attribute 10, 0 /* suppress Tag_FP_arch */ + .eabi_attribute 12, 0 /* suppress Tag_Advanced_SIMD_arch */ + .arm + .altmacro + +#include "pixman-arm-neon-asm.h" + +/* Global configuration options and preferences */ + +/* + * The code can optionally make use of unaligned memory accesses to improve + * performance of handling leading/trailing pixels for each scanline. + * Configuration variable RESPECT_STRICT_ALIGNMENT can be set to 0 for + * example in linux if unaligned memory accesses are not configured to + * generate.exceptions. + */ +.set RESPECT_STRICT_ALIGNMENT, 1 + +/* + * Set default prefetch type. There is a choice between the following options: + * + * PREFETCH_TYPE_NONE (may be useful for the ARM cores where PLD is set to work + * as NOP to workaround some HW bugs or for whatever other reason) + * + * PREFETCH_TYPE_SIMPLE (may be useful for simple single-issue ARM cores where + * advanced prefetch intruduces heavy overhead) + * + * PREFETCH_TYPE_ADVANCED (useful for superscalar cores such as ARM Cortex-A8 + * which can run ARM and NEON instructions simultaneously so that extra ARM + * instructions do not add (many) extra cycles, but improve prefetch efficiency) + * + * Note: some types of function can't support advanced prefetch and fallback + * to simple one (those which handle 24bpp pixels) + */ +.set PREFETCH_TYPE_DEFAULT, PREFETCH_TYPE_ADVANCED + +/* Prefetch distance in pixels for simple prefetch */ +.set PREFETCH_DISTANCE_SIMPLE, 64 + +/* + * Implementation of pixman_composite_over_8888_0565_asm_neon + * + * This function takes a8r8g8b8 source buffer, r5g6b5 destination buffer and + * performs OVER compositing operation. Function fast_composite_over_8888_0565 + * from pixman-fast-path.c does the same in C and can be used as a reference. + * + * First we need to have some NEON assembly code which can do the actual + * operation on the pixels and provide it to the template macro. + * + * Template macro quite conveniently takes care of emitting all the necessary + * code for memory reading and writing (including quite tricky cases of + * handling unaligned leading/trailing pixels), so we only need to deal with + * the data in NEON registers. + * + * NEON registers allocation in general is recommented to be the following: + * d0, d1, d2, d3 - contain loaded source pixel data + * d4, d5, d6, d7 - contain loaded destination pixels (if they are needed) + * d24, d25, d26, d27 - contain loading mask pixel data (if mask is used) + * d28, d29, d30, d31 - place for storing the result (destination pixels) + * + * As can be seen above, four 64-bit NEON registers are used for keeping + * intermediate pixel data and up to 8 pixels can be processed in one step + * for 32bpp formats (16 pixels for 16bpp, 32 pixels for 8bpp). + * + * This particular function uses the following registers allocation: + * d0, d1, d2, d3 - contain loaded source pixel data + * d4, d5 - contain loaded destination pixels (they are needed) + * d28, d29 - place for storing the result (destination pixels) + */ + +/* + * Step one. We need to have some code to do some arithmetics on pixel data. + * This is implemented as a pair of macros: '*_head' and '*_tail'. When used + * back-to-back, they take pixel data from {d0, d1, d2, d3} and {d4, d5}, + * perform all the needed calculations and write the result to {d28, d29}. + * The rationale for having two macros and not just one will be explained + * later. In practice, any single monolitic function which does the work can + * be split into two parts in any arbitrary way without affecting correctness. + * + * There is one special trick here too. Common template macro can optionally + * make our life a bit easier by doing R, G, B, A color components + * deinterleaving for 32bpp pixel formats (and this feature is used in + * 'pixman_composite_over_8888_0565_asm_neon' function). So it means that + * instead of having 8 packed pixels in {d0, d1, d2, d3} registers, we + * actually use d0 register for blue channel (a vector of eight 8-bit + * values), d1 register for green, d2 for red and d3 for alpha. This + * simple conversion can be also done with a few NEON instructions: + * + * Packed to planar conversion: + * vuzp.8 d0, d1 + * vuzp.8 d2, d3 + * vuzp.8 d1, d3 + * vuzp.8 d0, d2 + * + * Planar to packed conversion: + * vzip.8 d0, d2 + * vzip.8 d1, d3 + * vzip.8 d2, d3 + * vzip.8 d0, d1 + * + * But pixel can be loaded directly in planar format using VLD4.8 NEON + * instruction. It is 1 cycle slower than VLD1.32, so this is not always + * desirable, that's why deinterleaving is optional. + * + * But anyway, here is the code: + */ +.macro pixman_composite_over_8888_0565_process_pixblock_head + /* convert 8 r5g6b5 pixel data from {d4, d5} to planar 8-bit format + and put data into d6 - red, d7 - green, d30 - blue */ + vshrn.u16 d6, q2, #8 + vshrn.u16 d7, q2, #3 + vsli.u16 q2, q2, #5 + vsri.u8 d6, d6, #5 + vmvn.8 d3, d3 /* invert source alpha */ + vsri.u8 d7, d7, #6 + vshrn.u16 d30, q2, #2 + /* now do alpha blending, storing results in 8-bit planar format + into d16 - red, d19 - green, d18 - blue */ + vmull.u8 q10, d3, d6 + vmull.u8 q11, d3, d7 + vmull.u8 q12, d3, d30 + vrshr.u16 q13, q10, #8 + vrshr.u16 q3, q11, #8 + vrshr.u16 q15, q12, #8 + vraddhn.u16 d20, q10, q13 + vraddhn.u16 d23, q11, q3 + vraddhn.u16 d22, q12, q15 +.endm + +.macro pixman_composite_over_8888_0565_process_pixblock_tail + /* ... continue alpha blending */ + vqadd.u8 d16, d2, d20 + vqadd.u8 q9, q0, q11 + /* convert the result to r5g6b5 and store it into {d28, d29} */ + vshll.u8 q14, d16, #8 + vshll.u8 q8, d19, #8 + vshll.u8 q9, d18, #8 + vsri.u16 q14, q8, #5 + vsri.u16 q14, q9, #11 +.endm + +/* + * OK, now we got almost everything that we need. Using the above two + * macros, the work can be done right. But now we want to optimize + * it a bit. ARM Cortex-A8 is an in-order core, and benefits really + * a lot from good code scheduling and software pipelining. + * + * Let's construct some code, which will run in the core main loop. + * Some pseudo-code of the main loop will look like this: + * head + * while (...) { + * tail + * head + * } + * tail + * + * It may look a bit weird, but this setup allows to hide instruction + * latencies better and also utilize dual-issue capability more + * efficiently (make pairs of load-store and ALU instructions). + * + * So what we need now is a '*_tail_head' macro, which will be used + * in the core main loop. A trivial straightforward implementation + * of this macro would look like this: + * + * pixman_composite_over_8888_0565_process_pixblock_tail + * vst1.16 {d28, d29}, [DST_W, :128]! + * vld1.16 {d4, d5}, [DST_R, :128]! + * vld4.32 {d0, d1, d2, d3}, [SRC]! + * pixman_composite_over_8888_0565_process_pixblock_head + * cache_preload 8, 8 + * + * Now it also got some VLD/VST instructions. We simply can't move from + * processing one block of pixels to the other one with just arithmetics. + * The previously processed data needs to be written to memory and new + * data needs to be fetched. Fortunately, this main loop does not deal + * with partial leading/trailing pixels and can load/store a full block + * of pixels in a bulk. Additionally, destination buffer is already + * 16 bytes aligned here (which is good for performance). + * + * New things here are DST_R, DST_W, SRC and MASK identifiers. These + * are the aliases for ARM registers which are used as pointers for + * accessing data. We maintain separate pointers for reading and writing + * destination buffer (DST_R and DST_W). + * + * Another new thing is 'cache_preload' macro. It is used for prefetching + * data into CPU L2 cache and improve performance when dealing with large + * images which are far larger than cache size. It uses one argument + * (actually two, but they need to be the same here) - number of pixels + * in a block. Looking into 'pixman-arm-neon-asm.h' can provide some + * details about this macro. Moreover, if good performance is needed + * the code from this macro needs to be copied into '*_tail_head' macro + * and mixed with the rest of code for optimal instructions scheduling. + * We are actually doing it below. + * + * Now after all the explanations, here is the optimized code. + * Different instruction streams (originaling from '*_head', '*_tail' + * and 'cache_preload' macro) use different indentation levels for + * better readability. Actually taking the code from one of these + * indentation levels and ignoring a few VLD/VST instructions would + * result in exactly the code from '*_head', '*_tail' or 'cache_preload' + * macro! + */ + +#if 1 + +.macro pixman_composite_over_8888_0565_process_pixblock_tail_head + vqadd.u8 d16, d2, d20 + vld1.16 {d4, d5}, [DST_R, :128]! + vqadd.u8 q9, q0, q11 + vshrn.u16 d6, q2, #8 + fetch_src_pixblock + vshrn.u16 d7, q2, #3 + vsli.u16 q2, q2, #5 + vshll.u8 q14, d16, #8 + PF add PF_X, PF_X, #8 + vshll.u8 q8, d19, #8 + PF tst PF_CTL, #0xF + vsri.u8 d6, d6, #5 + PF addne PF_X, PF_X, #8 + vmvn.8 d3, d3 + PF subne PF_CTL, PF_CTL, #1 + vsri.u8 d7, d7, #6 + vshrn.u16 d30, q2, #2 + vmull.u8 q10, d3, d6 + PF pld, [PF_SRC, PF_X, lsl #src_bpp_shift] + vmull.u8 q11, d3, d7 + vmull.u8 q12, d3, d30 + PF pld, [PF_DST, PF_X, lsl #dst_bpp_shift] + vsri.u16 q14, q8, #5 + PF cmp PF_X, ORIG_W + vshll.u8 q9, d18, #8 + vrshr.u16 q13, q10, #8 + PF subge PF_X, PF_X, ORIG_W + vrshr.u16 q3, q11, #8 + vrshr.u16 q15, q12, #8 + PF subges PF_CTL, PF_CTL, #0x10 + vsri.u16 q14, q9, #11 + PF ldrgeb DUMMY, [PF_SRC, SRC_STRIDE, lsl #src_bpp_shift]! + vraddhn.u16 d20, q10, q13 + vraddhn.u16 d23, q11, q3 + PF ldrgeb DUMMY, [PF_DST, DST_STRIDE, lsl #dst_bpp_shift]! + vraddhn.u16 d22, q12, q15 + vst1.16 {d28, d29}, [DST_W, :128]! +.endm + +#else + +/* If we did not care much about the performance, we would just use this... */ +.macro pixman_composite_over_8888_0565_process_pixblock_tail_head + pixman_composite_over_8888_0565_process_pixblock_tail + vst1.16 {d28, d29}, [DST_W, :128]! + vld1.16 {d4, d5}, [DST_R, :128]! + fetch_src_pixblock + pixman_composite_over_8888_0565_process_pixblock_head + cache_preload 8, 8 +.endm + +#endif + +/* + * And now the final part. We are using 'generate_composite_function' macro + * to put all the stuff together. We are specifying the name of the function + * which we want to get, number of bits per pixel for the source, mask and + * destination (0 if unused, like mask in this case). Next come some bit + * flags: + * FLAG_DST_READWRITE - tells that the destination buffer is both read + * and written, for write-only buffer we would use + * FLAG_DST_WRITEONLY flag instead + * FLAG_DEINTERLEAVE_32BPP - tells that we prefer to work with planar data + * and separate color channels for 32bpp format. + * The next things are: + * - the number of pixels processed per iteration (8 in this case, because + * that's the maximum what can fit into four 64-bit NEON registers). + * - prefetch distance, measured in pixel blocks. In this case it is 5 times + * by 8 pixels. That would be 40 pixels, or up to 160 bytes. Optimal + * prefetch distance can be selected by running some benchmarks. + * + * After that we specify some macros, these are 'default_init', + * 'default_cleanup' here which are empty (but it is possible to have custom + * init/cleanup macros to be able to save/restore some extra NEON registers + * like d8-d15 or do anything else) followed by + * 'pixman_composite_over_8888_0565_process_pixblock_head', + * 'pixman_composite_over_8888_0565_process_pixblock_tail' and + * 'pixman_composite_over_8888_0565_process_pixblock_tail_head' + * which we got implemented above. + * + * The last part is the NEON registers allocation scheme. + */ +generate_composite_function \ + pixman_composite_over_8888_0565_asm_neon, 32, 0, 16, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + default_init, \ + default_cleanup, \ + pixman_composite_over_8888_0565_process_pixblock_head, \ + pixman_composite_over_8888_0565_process_pixblock_tail, \ + pixman_composite_over_8888_0565_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 4, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 24 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_over_n_0565_process_pixblock_head + /* convert 8 r5g6b5 pixel data from {d4, d5} to planar 8-bit format + and put data into d6 - red, d7 - green, d30 - blue */ + vshrn.u16 d6, q2, #8 + vshrn.u16 d7, q2, #3 + vsli.u16 q2, q2, #5 + vsri.u8 d6, d6, #5 + vsri.u8 d7, d7, #6 + vshrn.u16 d30, q2, #2 + /* now do alpha blending, storing results in 8-bit planar format + into d16 - red, d19 - green, d18 - blue */ + vmull.u8 q10, d3, d6 + vmull.u8 q11, d3, d7 + vmull.u8 q12, d3, d30 + vrshr.u16 q13, q10, #8 + vrshr.u16 q3, q11, #8 + vrshr.u16 q15, q12, #8 + vraddhn.u16 d20, q10, q13 + vraddhn.u16 d23, q11, q3 + vraddhn.u16 d22, q12, q15 +.endm + +.macro pixman_composite_over_n_0565_process_pixblock_tail + /* ... continue alpha blending */ + vqadd.u8 d16, d2, d20 + vqadd.u8 q9, q0, q11 + /* convert the result to r5g6b5 and store it into {d28, d29} */ + vshll.u8 q14, d16, #8 + vshll.u8 q8, d19, #8 + vshll.u8 q9, d18, #8 + vsri.u16 q14, q8, #5 + vsri.u16 q14, q9, #11 +.endm + +/* TODO: expand macros and do better instructions scheduling */ +.macro pixman_composite_over_n_0565_process_pixblock_tail_head + pixman_composite_over_n_0565_process_pixblock_tail + vld1.16 {d4, d5}, [DST_R, :128]! + vst1.16 {d28, d29}, [DST_W, :128]! + pixman_composite_over_n_0565_process_pixblock_head + cache_preload 8, 8 +.endm + +.macro pixman_composite_over_n_0565_init + add DUMMY, sp, #ARGS_STACK_OFFSET + vld1.32 {d3[0]}, [DUMMY] + vdup.8 d0, d3[0] + vdup.8 d1, d3[1] + vdup.8 d2, d3[2] + vdup.8 d3, d3[3] + vmvn.8 d3, d3 /* invert source alpha */ +.endm + +generate_composite_function \ + pixman_composite_over_n_0565_asm_neon, 0, 0, 16, \ + FLAG_DST_READWRITE, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + pixman_composite_over_n_0565_init, \ + default_cleanup, \ + pixman_composite_over_n_0565_process_pixblock_head, \ + pixman_composite_over_n_0565_process_pixblock_tail, \ + pixman_composite_over_n_0565_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 4, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 24 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_src_8888_0565_process_pixblock_head + vshll.u8 q8, d1, #8 + vshll.u8 q14, d2, #8 + vshll.u8 q9, d0, #8 +.endm + +.macro pixman_composite_src_8888_0565_process_pixblock_tail + vsri.u16 q14, q8, #5 + vsri.u16 q14, q9, #11 +.endm + +.macro pixman_composite_src_8888_0565_process_pixblock_tail_head + vsri.u16 q14, q8, #5 + PF add PF_X, PF_X, #8 + PF tst PF_CTL, #0xF + fetch_src_pixblock + PF addne PF_X, PF_X, #8 + PF subne PF_CTL, PF_CTL, #1 + vsri.u16 q14, q9, #11 + PF cmp PF_X, ORIG_W + PF pld, [PF_SRC, PF_X, lsl #src_bpp_shift] + vshll.u8 q8, d1, #8 + vst1.16 {d28, d29}, [DST_W, :128]! + PF subge PF_X, PF_X, ORIG_W + PF subges PF_CTL, PF_CTL, #0x10 + vshll.u8 q14, d2, #8 + PF ldrgeb DUMMY, [PF_SRC, SRC_STRIDE, lsl #src_bpp_shift]! + vshll.u8 q9, d0, #8 +.endm + +generate_composite_function \ + pixman_composite_src_8888_0565_asm_neon, 32, 0, 16, \ + FLAG_DST_WRITEONLY | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 10, /* prefetch distance */ \ + default_init, \ + default_cleanup, \ + pixman_composite_src_8888_0565_process_pixblock_head, \ + pixman_composite_src_8888_0565_process_pixblock_tail, \ + pixman_composite_src_8888_0565_process_pixblock_tail_head + +/******************************************************************************/ + +.macro pixman_composite_src_0565_8888_process_pixblock_head + vshrn.u16 d30, q0, #8 + vshrn.u16 d29, q0, #3 + vsli.u16 q0, q0, #5 + vmov.u8 d31, #255 + vsri.u8 d30, d30, #5 + vsri.u8 d29, d29, #6 + vshrn.u16 d28, q0, #2 +.endm + +.macro pixman_composite_src_0565_8888_process_pixblock_tail +.endm + +/* TODO: expand macros and do better instructions scheduling */ +.macro pixman_composite_src_0565_8888_process_pixblock_tail_head + pixman_composite_src_0565_8888_process_pixblock_tail + vst4.8 {d28, d29, d30, d31}, [DST_W, :128]! + fetch_src_pixblock + pixman_composite_src_0565_8888_process_pixblock_head + cache_preload 8, 8 +.endm + +generate_composite_function \ + pixman_composite_src_0565_8888_asm_neon, 16, 0, 32, \ + FLAG_DST_WRITEONLY | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 10, /* prefetch distance */ \ + default_init, \ + default_cleanup, \ + pixman_composite_src_0565_8888_process_pixblock_head, \ + pixman_composite_src_0565_8888_process_pixblock_tail, \ + pixman_composite_src_0565_8888_process_pixblock_tail_head + +/******************************************************************************/ + +.macro pixman_composite_add_8_8_process_pixblock_head + vqadd.u8 q14, q0, q2 + vqadd.u8 q15, q1, q3 +.endm + +.macro pixman_composite_add_8_8_process_pixblock_tail +.endm + +.macro pixman_composite_add_8_8_process_pixblock_tail_head + fetch_src_pixblock + PF add PF_X, PF_X, #32 + PF tst PF_CTL, #0xF + vld1.8 {d4, d5, d6, d7}, [DST_R, :128]! + PF addne PF_X, PF_X, #32 + PF subne PF_CTL, PF_CTL, #1 + vst1.8 {d28, d29, d30, d31}, [DST_W, :128]! + PF cmp PF_X, ORIG_W + PF pld, [PF_SRC, PF_X, lsl #src_bpp_shift] + PF pld, [PF_DST, PF_X, lsl #dst_bpp_shift] + PF subge PF_X, PF_X, ORIG_W + PF subges PF_CTL, PF_CTL, #0x10 + vqadd.u8 q14, q0, q2 + PF ldrgeb DUMMY, [PF_SRC, SRC_STRIDE, lsl #src_bpp_shift]! + PF ldrgeb DUMMY, [PF_DST, DST_STRIDE, lsl #dst_bpp_shift]! + vqadd.u8 q15, q1, q3 +.endm + +generate_composite_function \ + pixman_composite_add_8_8_asm_neon, 8, 0, 8, \ + FLAG_DST_READWRITE, \ + 32, /* number of pixels, processed in a single block */ \ + 10, /* prefetch distance */ \ + default_init, \ + default_cleanup, \ + pixman_composite_add_8_8_process_pixblock_head, \ + pixman_composite_add_8_8_process_pixblock_tail, \ + pixman_composite_add_8_8_process_pixblock_tail_head + +/******************************************************************************/ + +.macro pixman_composite_add_8888_8888_process_pixblock_tail_head + fetch_src_pixblock + PF add PF_X, PF_X, #8 + PF tst PF_CTL, #0xF + vld1.32 {d4, d5, d6, d7}, [DST_R, :128]! + PF addne PF_X, PF_X, #8 + PF subne PF_CTL, PF_CTL, #1 + vst1.32 {d28, d29, d30, d31}, [DST_W, :128]! + PF cmp PF_X, ORIG_W + PF pld, [PF_SRC, PF_X, lsl #src_bpp_shift] + PF pld, [PF_DST, PF_X, lsl #dst_bpp_shift] + PF subge PF_X, PF_X, ORIG_W + PF subges PF_CTL, PF_CTL, #0x10 + vqadd.u8 q14, q0, q2 + PF ldrgeb DUMMY, [PF_SRC, SRC_STRIDE, lsl #src_bpp_shift]! + PF ldrgeb DUMMY, [PF_DST, DST_STRIDE, lsl #dst_bpp_shift]! + vqadd.u8 q15, q1, q3 +.endm + +generate_composite_function \ + pixman_composite_add_8888_8888_asm_neon, 32, 0, 32, \ + FLAG_DST_READWRITE, \ + 8, /* number of pixels, processed in a single block */ \ + 10, /* prefetch distance */ \ + default_init, \ + default_cleanup, \ + pixman_composite_add_8_8_process_pixblock_head, \ + pixman_composite_add_8_8_process_pixblock_tail, \ + pixman_composite_add_8888_8888_process_pixblock_tail_head + +generate_composite_function_single_scanline \ + pixman_composite_scanline_add_asm_neon, 32, 0, 32, \ + FLAG_DST_READWRITE, \ + 8, /* number of pixels, processed in a single block */ \ + default_init, \ + default_cleanup, \ + pixman_composite_add_8_8_process_pixblock_head, \ + pixman_composite_add_8_8_process_pixblock_tail, \ + pixman_composite_add_8888_8888_process_pixblock_tail_head + +/******************************************************************************/ + +.macro pixman_composite_out_reverse_8888_8888_process_pixblock_head + vmvn.8 d24, d3 /* get inverted alpha */ + /* do alpha blending */ + vmull.u8 q8, d24, d4 + vmull.u8 q9, d24, d5 + vmull.u8 q10, d24, d6 + vmull.u8 q11, d24, d7 +.endm + +.macro pixman_composite_out_reverse_8888_8888_process_pixblock_tail + vrshr.u16 q14, q8, #8 + vrshr.u16 q15, q9, #8 + vrshr.u16 q12, q10, #8 + vrshr.u16 q13, q11, #8 + vraddhn.u16 d28, q14, q8 + vraddhn.u16 d29, q15, q9 + vraddhn.u16 d30, q12, q10 + vraddhn.u16 d31, q13, q11 +.endm + +.macro pixman_composite_out_reverse_8888_8888_process_pixblock_tail_head + vld4.8 {d4, d5, d6, d7}, [DST_R, :128]! + vrshr.u16 q14, q8, #8 + PF add PF_X, PF_X, #8 + PF tst PF_CTL, #0xF + vrshr.u16 q15, q9, #8 + vrshr.u16 q12, q10, #8 + vrshr.u16 q13, q11, #8 + PF addne PF_X, PF_X, #8 + PF subne PF_CTL, PF_CTL, #1 + vraddhn.u16 d28, q14, q8 + vraddhn.u16 d29, q15, q9 + PF cmp PF_X, ORIG_W + vraddhn.u16 d30, q12, q10 + vraddhn.u16 d31, q13, q11 + fetch_src_pixblock + PF pld, [PF_SRC, PF_X, lsl #src_bpp_shift] + vmvn.8 d22, d3 + PF pld, [PF_DST, PF_X, lsl #dst_bpp_shift] + vst4.8 {d28, d29, d30, d31}, [DST_W, :128]! + PF subge PF_X, PF_X, ORIG_W + vmull.u8 q8, d22, d4 + PF subges PF_CTL, PF_CTL, #0x10 + vmull.u8 q9, d22, d5 + PF ldrgeb DUMMY, [PF_SRC, SRC_STRIDE, lsl #src_bpp_shift]! + vmull.u8 q10, d22, d6 + PF ldrgeb DUMMY, [PF_DST, DST_STRIDE, lsl #dst_bpp_shift]! + vmull.u8 q11, d22, d7 +.endm + +generate_composite_function_single_scanline \ + pixman_composite_scanline_out_reverse_asm_neon, 32, 0, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + default_init, \ + default_cleanup, \ + pixman_composite_out_reverse_8888_8888_process_pixblock_head, \ + pixman_composite_out_reverse_8888_8888_process_pixblock_tail, \ + pixman_composite_out_reverse_8888_8888_process_pixblock_tail_head + +/******************************************************************************/ + +.macro pixman_composite_over_8888_8888_process_pixblock_head + pixman_composite_out_reverse_8888_8888_process_pixblock_head +.endm + +.macro pixman_composite_over_8888_8888_process_pixblock_tail + pixman_composite_out_reverse_8888_8888_process_pixblock_tail + vqadd.u8 q14, q0, q14 + vqadd.u8 q15, q1, q15 +.endm + +.macro pixman_composite_over_8888_8888_process_pixblock_tail_head + vld4.8 {d4, d5, d6, d7}, [DST_R, :128]! + vrshr.u16 q14, q8, #8 + PF add PF_X, PF_X, #8 + PF tst PF_CTL, #0xF + vrshr.u16 q15, q9, #8 + vrshr.u16 q12, q10, #8 + vrshr.u16 q13, q11, #8 + PF addne PF_X, PF_X, #8 + PF subne PF_CTL, PF_CTL, #1 + vraddhn.u16 d28, q14, q8 + vraddhn.u16 d29, q15, q9 + PF cmp PF_X, ORIG_W + vraddhn.u16 d30, q12, q10 + vraddhn.u16 d31, q13, q11 + vqadd.u8 q14, q0, q14 + vqadd.u8 q15, q1, q15 + fetch_src_pixblock + PF pld, [PF_SRC, PF_X, lsl #src_bpp_shift] + vmvn.8 d22, d3 + PF pld, [PF_DST, PF_X, lsl #dst_bpp_shift] + vst4.8 {d28, d29, d30, d31}, [DST_W, :128]! + PF subge PF_X, PF_X, ORIG_W + vmull.u8 q8, d22, d4 + PF subges PF_CTL, PF_CTL, #0x10 + vmull.u8 q9, d22, d5 + PF ldrgeb DUMMY, [PF_SRC, SRC_STRIDE, lsl #src_bpp_shift]! + vmull.u8 q10, d22, d6 + PF ldrgeb DUMMY, [PF_DST, DST_STRIDE, lsl #dst_bpp_shift]! + vmull.u8 q11, d22, d7 +.endm + +generate_composite_function \ + pixman_composite_over_8888_8888_asm_neon, 32, 0, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + default_init, \ + default_cleanup, \ + pixman_composite_over_8888_8888_process_pixblock_head, \ + pixman_composite_over_8888_8888_process_pixblock_tail, \ + pixman_composite_over_8888_8888_process_pixblock_tail_head + +generate_composite_function_single_scanline \ + pixman_composite_scanline_over_asm_neon, 32, 0, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + default_init, \ + default_cleanup, \ + pixman_composite_over_8888_8888_process_pixblock_head, \ + pixman_composite_over_8888_8888_process_pixblock_tail, \ + pixman_composite_over_8888_8888_process_pixblock_tail_head + +/******************************************************************************/ + +/* TODO: expand macros and do better instructions scheduling */ +.macro pixman_composite_over_n_8888_process_pixblock_tail_head + pixman_composite_over_8888_8888_process_pixblock_tail + vld4.8 {d4, d5, d6, d7}, [DST_R, :128]! + vst4.8 {d28, d29, d30, d31}, [DST_W, :128]! + pixman_composite_over_8888_8888_process_pixblock_head + cache_preload 8, 8 +.endm + +.macro pixman_composite_over_n_8888_init + add DUMMY, sp, #ARGS_STACK_OFFSET + vld1.32 {d3[0]}, [DUMMY] + vdup.8 d0, d3[0] + vdup.8 d1, d3[1] + vdup.8 d2, d3[2] + vdup.8 d3, d3[3] +.endm + +generate_composite_function \ + pixman_composite_over_n_8888_asm_neon, 0, 0, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + pixman_composite_over_n_8888_init, \ + default_cleanup, \ + pixman_composite_over_8888_8888_process_pixblock_head, \ + pixman_composite_over_8888_8888_process_pixblock_tail, \ + pixman_composite_over_n_8888_process_pixblock_tail_head + +/******************************************************************************/ + +.macro pixman_composite_over_reverse_n_8888_process_pixblock_tail_head + vrshr.u16 q14, q8, #8 + PF add PF_X, PF_X, #8 + PF tst PF_CTL, #0xF + vrshr.u16 q15, q9, #8 + vrshr.u16 q12, q10, #8 + vrshr.u16 q13, q11, #8 + PF addne PF_X, PF_X, #8 + PF subne PF_CTL, PF_CTL, #1 + vraddhn.u16 d28, q14, q8 + vraddhn.u16 d29, q15, q9 + PF cmp PF_X, ORIG_W + vraddhn.u16 d30, q12, q10 + vraddhn.u16 d31, q13, q11 + vqadd.u8 q14, q0, q14 + vqadd.u8 q15, q1, q15 + vld4.8 {d0, d1, d2, d3}, [DST_R, :128]! + vmvn.8 d22, d3 + PF pld, [PF_DST, PF_X, lsl #dst_bpp_shift] + vst4.8 {d28, d29, d30, d31}, [DST_W, :128]! + PF subge PF_X, PF_X, ORIG_W + vmull.u8 q8, d22, d4 + PF subges PF_CTL, PF_CTL, #0x10 + vmull.u8 q9, d22, d5 + vmull.u8 q10, d22, d6 + PF ldrgeb DUMMY, [PF_DST, DST_STRIDE, lsl #dst_bpp_shift]! + vmull.u8 q11, d22, d7 +.endm + +.macro pixman_composite_over_reverse_n_8888_init + add DUMMY, sp, #ARGS_STACK_OFFSET + vld1.32 {d7[0]}, [DUMMY] + vdup.8 d4, d7[0] + vdup.8 d5, d7[1] + vdup.8 d6, d7[2] + vdup.8 d7, d7[3] +.endm + +generate_composite_function \ + pixman_composite_over_reverse_n_8888_asm_neon, 0, 0, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + pixman_composite_over_reverse_n_8888_init, \ + default_cleanup, \ + pixman_composite_over_8888_8888_process_pixblock_head, \ + pixman_composite_over_8888_8888_process_pixblock_tail, \ + pixman_composite_over_reverse_n_8888_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 0, /* dst_r_basereg */ \ + 4, /* src_basereg */ \ + 24 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_over_8888_8_0565_process_pixblock_head + vmull.u8 q0, d24, d8 /* IN for SRC pixels (part1) */ + vmull.u8 q1, d24, d9 + vmull.u8 q6, d24, d10 + vmull.u8 q7, d24, d11 + vshrn.u16 d6, q2, #8 /* convert DST_R data to 32-bpp (part1) */ + vshrn.u16 d7, q2, #3 + vsli.u16 q2, q2, #5 + vrshr.u16 q8, q0, #8 /* IN for SRC pixels (part2) */ + vrshr.u16 q9, q1, #8 + vrshr.u16 q10, q6, #8 + vrshr.u16 q11, q7, #8 + vraddhn.u16 d0, q0, q8 + vraddhn.u16 d1, q1, q9 + vraddhn.u16 d2, q6, q10 + vraddhn.u16 d3, q7, q11 + vsri.u8 d6, d6, #5 /* convert DST_R data to 32-bpp (part2) */ + vsri.u8 d7, d7, #6 + vmvn.8 d3, d3 + vshrn.u16 d30, q2, #2 + vmull.u8 q8, d3, d6 /* now do alpha blending */ + vmull.u8 q9, d3, d7 + vmull.u8 q10, d3, d30 +.endm + +.macro pixman_composite_over_8888_8_0565_process_pixblock_tail + /* 3 cycle bubble (after vmull.u8) */ + vrshr.u16 q13, q8, #8 + vrshr.u16 q11, q9, #8 + vrshr.u16 q15, q10, #8 + vraddhn.u16 d16, q8, q13 + vraddhn.u16 d27, q9, q11 + vraddhn.u16 d26, q10, q15 + vqadd.u8 d16, d2, d16 + /* 1 cycle bubble */ + vqadd.u8 q9, q0, q13 + vshll.u8 q14, d16, #8 /* convert to 16bpp */ + vshll.u8 q8, d19, #8 + vshll.u8 q9, d18, #8 + vsri.u16 q14, q8, #5 + /* 1 cycle bubble */ + vsri.u16 q14, q9, #11 +.endm + +.macro pixman_composite_over_8888_8_0565_process_pixblock_tail_head + vld1.16 {d4, d5}, [DST_R, :128]! + vshrn.u16 d6, q2, #8 + fetch_mask_pixblock + vshrn.u16 d7, q2, #3 + fetch_src_pixblock + vmull.u8 q6, d24, d10 + vrshr.u16 q13, q8, #8 + vrshr.u16 q11, q9, #8 + vrshr.u16 q15, q10, #8 + vraddhn.u16 d16, q8, q13 + vraddhn.u16 d27, q9, q11 + vraddhn.u16 d26, q10, q15 + vqadd.u8 d16, d2, d16 + vmull.u8 q1, d24, d9 + vqadd.u8 q9, q0, q13 + vshll.u8 q14, d16, #8 + vmull.u8 q0, d24, d8 + vshll.u8 q8, d19, #8 + vshll.u8 q9, d18, #8 + vsri.u16 q14, q8, #5 + vmull.u8 q7, d24, d11 + vsri.u16 q14, q9, #11 + + cache_preload 8, 8 + + vsli.u16 q2, q2, #5 + vrshr.u16 q8, q0, #8 + vrshr.u16 q9, q1, #8 + vrshr.u16 q10, q6, #8 + vrshr.u16 q11, q7, #8 + vraddhn.u16 d0, q0, q8 + vraddhn.u16 d1, q1, q9 + vraddhn.u16 d2, q6, q10 + vraddhn.u16 d3, q7, q11 + vsri.u8 d6, d6, #5 + vsri.u8 d7, d7, #6 + vmvn.8 d3, d3 + vshrn.u16 d30, q2, #2 + vst1.16 {d28, d29}, [DST_W, :128]! + vmull.u8 q8, d3, d6 + vmull.u8 q9, d3, d7 + vmull.u8 q10, d3, d30 +.endm + +generate_composite_function \ + pixman_composite_over_8888_8_0565_asm_neon, 32, 8, 16, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + default_init_need_all_regs, \ + default_cleanup_need_all_regs, \ + pixman_composite_over_8888_8_0565_process_pixblock_head, \ + pixman_composite_over_8888_8_0565_process_pixblock_tail, \ + pixman_composite_over_8888_8_0565_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 4, /* dst_r_basereg */ \ + 8, /* src_basereg */ \ + 24 /* mask_basereg */ + +/******************************************************************************/ + +/* + * This function needs a special initialization of solid mask. + * Solid source pixel data is fetched from stack at ARGS_STACK_OFFSET + * offset, split into color components and replicated in d8-d11 + * registers. Additionally, this function needs all the NEON registers, + * so it has to save d8-d15 registers which are callee saved according + * to ABI. These registers are restored from 'cleanup' macro. All the + * other NEON registers are caller saved, so can be clobbered freely + * without introducing any problems. + */ +.macro pixman_composite_over_n_8_0565_init + add DUMMY, sp, #ARGS_STACK_OFFSET + vpush {d8-d15} + vld1.32 {d11[0]}, [DUMMY] + vdup.8 d8, d11[0] + vdup.8 d9, d11[1] + vdup.8 d10, d11[2] + vdup.8 d11, d11[3] +.endm + +.macro pixman_composite_over_n_8_0565_cleanup + vpop {d8-d15} +.endm + +generate_composite_function \ + pixman_composite_over_n_8_0565_asm_neon, 0, 8, 16, \ + FLAG_DST_READWRITE, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + pixman_composite_over_n_8_0565_init, \ + pixman_composite_over_n_8_0565_cleanup, \ + pixman_composite_over_8888_8_0565_process_pixblock_head, \ + pixman_composite_over_8888_8_0565_process_pixblock_tail, \ + pixman_composite_over_8888_8_0565_process_pixblock_tail_head + +/******************************************************************************/ + +.macro pixman_composite_over_8888_n_0565_init + add DUMMY, sp, #(ARGS_STACK_OFFSET + 8) + vpush {d8-d15} + vld1.32 {d24[0]}, [DUMMY] + vdup.8 d24, d24[3] +.endm + +.macro pixman_composite_over_8888_n_0565_cleanup + vpop {d8-d15} +.endm + +generate_composite_function \ + pixman_composite_over_8888_n_0565_asm_neon, 32, 0, 16, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + pixman_composite_over_8888_n_0565_init, \ + pixman_composite_over_8888_n_0565_cleanup, \ + pixman_composite_over_8888_8_0565_process_pixblock_head, \ + pixman_composite_over_8888_8_0565_process_pixblock_tail, \ + pixman_composite_over_8888_8_0565_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 4, /* dst_r_basereg */ \ + 8, /* src_basereg */ \ + 24 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_src_0565_0565_process_pixblock_head +.endm + +.macro pixman_composite_src_0565_0565_process_pixblock_tail +.endm + +.macro pixman_composite_src_0565_0565_process_pixblock_tail_head + vst1.16 {d0, d1, d2, d3}, [DST_W, :128]! + fetch_src_pixblock + cache_preload 16, 16 +.endm + +generate_composite_function \ + pixman_composite_src_0565_0565_asm_neon, 16, 0, 16, \ + FLAG_DST_WRITEONLY, \ + 16, /* number of pixels, processed in a single block */ \ + 10, /* prefetch distance */ \ + default_init, \ + default_cleanup, \ + pixman_composite_src_0565_0565_process_pixblock_head, \ + pixman_composite_src_0565_0565_process_pixblock_tail, \ + pixman_composite_src_0565_0565_process_pixblock_tail_head, \ + 0, /* dst_w_basereg */ \ + 0, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 0 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_src_n_8_process_pixblock_head +.endm + +.macro pixman_composite_src_n_8_process_pixblock_tail +.endm + +.macro pixman_composite_src_n_8_process_pixblock_tail_head + vst1.8 {d0, d1, d2, d3}, [DST_W, :128]! +.endm + +.macro pixman_composite_src_n_8_init + add DUMMY, sp, #ARGS_STACK_OFFSET + vld1.32 {d0[0]}, [DUMMY] + vsli.u64 d0, d0, #8 + vsli.u64 d0, d0, #16 + vsli.u64 d0, d0, #32 + vorr d1, d0, d0 + vorr q1, q0, q0 +.endm + +.macro pixman_composite_src_n_8_cleanup +.endm + +generate_composite_function \ + pixman_composite_src_n_8_asm_neon, 0, 0, 8, \ + FLAG_DST_WRITEONLY, \ + 32, /* number of pixels, processed in a single block */ \ + 0, /* prefetch distance */ \ + pixman_composite_src_n_8_init, \ + pixman_composite_src_n_8_cleanup, \ + pixman_composite_src_n_8_process_pixblock_head, \ + pixman_composite_src_n_8_process_pixblock_tail, \ + pixman_composite_src_n_8_process_pixblock_tail_head, \ + 0, /* dst_w_basereg */ \ + 0, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 0 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_src_n_0565_process_pixblock_head +.endm + +.macro pixman_composite_src_n_0565_process_pixblock_tail +.endm + +.macro pixman_composite_src_n_0565_process_pixblock_tail_head + vst1.16 {d0, d1, d2, d3}, [DST_W, :128]! +.endm + +.macro pixman_composite_src_n_0565_init + add DUMMY, sp, #ARGS_STACK_OFFSET + vld1.32 {d0[0]}, [DUMMY] + vsli.u64 d0, d0, #16 + vsli.u64 d0, d0, #32 + vorr d1, d0, d0 + vorr q1, q0, q0 +.endm + +.macro pixman_composite_src_n_0565_cleanup +.endm + +generate_composite_function \ + pixman_composite_src_n_0565_asm_neon, 0, 0, 16, \ + FLAG_DST_WRITEONLY, \ + 16, /* number of pixels, processed in a single block */ \ + 0, /* prefetch distance */ \ + pixman_composite_src_n_0565_init, \ + pixman_composite_src_n_0565_cleanup, \ + pixman_composite_src_n_0565_process_pixblock_head, \ + pixman_composite_src_n_0565_process_pixblock_tail, \ + pixman_composite_src_n_0565_process_pixblock_tail_head, \ + 0, /* dst_w_basereg */ \ + 0, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 0 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_src_n_8888_process_pixblock_head +.endm + +.macro pixman_composite_src_n_8888_process_pixblock_tail +.endm + +.macro pixman_composite_src_n_8888_process_pixblock_tail_head + vst1.32 {d0, d1, d2, d3}, [DST_W, :128]! +.endm + +.macro pixman_composite_src_n_8888_init + add DUMMY, sp, #ARGS_STACK_OFFSET + vld1.32 {d0[0]}, [DUMMY] + vsli.u64 d0, d0, #32 + vorr d1, d0, d0 + vorr q1, q0, q0 +.endm + +.macro pixman_composite_src_n_8888_cleanup +.endm + +generate_composite_function \ + pixman_composite_src_n_8888_asm_neon, 0, 0, 32, \ + FLAG_DST_WRITEONLY, \ + 8, /* number of pixels, processed in a single block */ \ + 0, /* prefetch distance */ \ + pixman_composite_src_n_8888_init, \ + pixman_composite_src_n_8888_cleanup, \ + pixman_composite_src_n_8888_process_pixblock_head, \ + pixman_composite_src_n_8888_process_pixblock_tail, \ + pixman_composite_src_n_8888_process_pixblock_tail_head, \ + 0, /* dst_w_basereg */ \ + 0, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 0 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_src_8888_8888_process_pixblock_head +.endm + +.macro pixman_composite_src_8888_8888_process_pixblock_tail +.endm + +.macro pixman_composite_src_8888_8888_process_pixblock_tail_head + vst1.32 {d0, d1, d2, d3}, [DST_W, :128]! + fetch_src_pixblock + cache_preload 8, 8 +.endm + +generate_composite_function \ + pixman_composite_src_8888_8888_asm_neon, 32, 0, 32, \ + FLAG_DST_WRITEONLY, \ + 8, /* number of pixels, processed in a single block */ \ + 10, /* prefetch distance */ \ + default_init, \ + default_cleanup, \ + pixman_composite_src_8888_8888_process_pixblock_head, \ + pixman_composite_src_8888_8888_process_pixblock_tail, \ + pixman_composite_src_8888_8888_process_pixblock_tail_head, \ + 0, /* dst_w_basereg */ \ + 0, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 0 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_src_x888_8888_process_pixblock_head + vorr q0, q0, q2 + vorr q1, q1, q2 +.endm + +.macro pixman_composite_src_x888_8888_process_pixblock_tail +.endm + +.macro pixman_composite_src_x888_8888_process_pixblock_tail_head + vst1.32 {d0, d1, d2, d3}, [DST_W, :128]! + fetch_src_pixblock + vorr q0, q0, q2 + vorr q1, q1, q2 + cache_preload 8, 8 +.endm + +.macro pixman_composite_src_x888_8888_init + vmov.u8 q2, #0xFF + vshl.u32 q2, q2, #24 +.endm + +generate_composite_function \ + pixman_composite_src_x888_8888_asm_neon, 32, 0, 32, \ + FLAG_DST_WRITEONLY, \ + 8, /* number of pixels, processed in a single block */ \ + 10, /* prefetch distance */ \ + pixman_composite_src_x888_8888_init, \ + default_cleanup, \ + pixman_composite_src_x888_8888_process_pixblock_head, \ + pixman_composite_src_x888_8888_process_pixblock_tail, \ + pixman_composite_src_x888_8888_process_pixblock_tail_head, \ + 0, /* dst_w_basereg */ \ + 0, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 0 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_over_n_8_8888_process_pixblock_head + /* expecting deinterleaved source data in {d8, d9, d10, d11} */ + /* d8 - blue, d9 - green, d10 - red, d11 - alpha */ + /* and destination data in {d4, d5, d6, d7} */ + /* mask is in d24 (d25, d26, d27 are unused) */ + + /* in */ + vmull.u8 q0, d24, d8 + vmull.u8 q1, d24, d9 + vmull.u8 q6, d24, d10 + vmull.u8 q7, d24, d11 + vrshr.u16 q10, q0, #8 + vrshr.u16 q11, q1, #8 + vrshr.u16 q12, q6, #8 + vrshr.u16 q13, q7, #8 + vraddhn.u16 d0, q0, q10 + vraddhn.u16 d1, q1, q11 + vraddhn.u16 d2, q6, q12 + vraddhn.u16 d3, q7, q13 + vmvn.8 d24, d3 /* get inverted alpha */ + /* source: d0 - blue, d1 - green, d2 - red, d3 - alpha */ + /* destination: d4 - blue, d5 - green, d6 - red, d7 - alpha */ + /* now do alpha blending */ + vmull.u8 q8, d24, d4 + vmull.u8 q9, d24, d5 + vmull.u8 q10, d24, d6 + vmull.u8 q11, d24, d7 +.endm + +.macro pixman_composite_over_n_8_8888_process_pixblock_tail + vrshr.u16 q14, q8, #8 + vrshr.u16 q15, q9, #8 + vrshr.u16 q12, q10, #8 + vrshr.u16 q13, q11, #8 + vraddhn.u16 d28, q14, q8 + vraddhn.u16 d29, q15, q9 + vraddhn.u16 d30, q12, q10 + vraddhn.u16 d31, q13, q11 + vqadd.u8 q14, q0, q14 + vqadd.u8 q15, q1, q15 +.endm + +/* TODO: expand macros and do better instructions scheduling */ +.macro pixman_composite_over_n_8_8888_process_pixblock_tail_head + pixman_composite_over_n_8_8888_process_pixblock_tail + vst4.8 {d28, d29, d30, d31}, [DST_W, :128]! + vld4.8 {d4, d5, d6, d7}, [DST_R, :128]! + fetch_mask_pixblock + cache_preload 8, 8 + pixman_composite_over_n_8_8888_process_pixblock_head +.endm + +.macro pixman_composite_over_n_8_8888_init + add DUMMY, sp, #ARGS_STACK_OFFSET + vpush {d8-d15} + vld1.32 {d11[0]}, [DUMMY] + vdup.8 d8, d11[0] + vdup.8 d9, d11[1] + vdup.8 d10, d11[2] + vdup.8 d11, d11[3] +.endm + +.macro pixman_composite_over_n_8_8888_cleanup + vpop {d8-d15} +.endm + +generate_composite_function \ + pixman_composite_over_n_8_8888_asm_neon, 0, 8, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + pixman_composite_over_n_8_8888_init, \ + pixman_composite_over_n_8_8888_cleanup, \ + pixman_composite_over_n_8_8888_process_pixblock_head, \ + pixman_composite_over_n_8_8888_process_pixblock_tail, \ + pixman_composite_over_n_8_8888_process_pixblock_tail_head + +/******************************************************************************/ + +.macro pixman_composite_over_n_8_8_process_pixblock_head + vmull.u8 q0, d24, d8 + vmull.u8 q1, d25, d8 + vmull.u8 q6, d26, d8 + vmull.u8 q7, d27, d8 + vrshr.u16 q10, q0, #8 + vrshr.u16 q11, q1, #8 + vrshr.u16 q12, q6, #8 + vrshr.u16 q13, q7, #8 + vraddhn.u16 d0, q0, q10 + vraddhn.u16 d1, q1, q11 + vraddhn.u16 d2, q6, q12 + vraddhn.u16 d3, q7, q13 + vmvn.8 q12, q0 + vmvn.8 q13, q1 + vmull.u8 q8, d24, d4 + vmull.u8 q9, d25, d5 + vmull.u8 q10, d26, d6 + vmull.u8 q11, d27, d7 +.endm + +.macro pixman_composite_over_n_8_8_process_pixblock_tail + vrshr.u16 q14, q8, #8 + vrshr.u16 q15, q9, #8 + vrshr.u16 q12, q10, #8 + vrshr.u16 q13, q11, #8 + vraddhn.u16 d28, q14, q8 + vraddhn.u16 d29, q15, q9 + vraddhn.u16 d30, q12, q10 + vraddhn.u16 d31, q13, q11 + vqadd.u8 q14, q0, q14 + vqadd.u8 q15, q1, q15 +.endm + +/* TODO: expand macros and do better instructions scheduling */ +.macro pixman_composite_over_n_8_8_process_pixblock_tail_head + vld1.8 {d4, d5, d6, d7}, [DST_R, :128]! + pixman_composite_over_n_8_8_process_pixblock_tail + fetch_mask_pixblock + cache_preload 32, 32 + vst1.8 {d28, d29, d30, d31}, [DST_W, :128]! + pixman_composite_over_n_8_8_process_pixblock_head +.endm + +.macro pixman_composite_over_n_8_8_init + add DUMMY, sp, #ARGS_STACK_OFFSET + vpush {d8-d15} + vld1.32 {d8[0]}, [DUMMY] + vdup.8 d8, d8[3] +.endm + +.macro pixman_composite_over_n_8_8_cleanup + vpop {d8-d15} +.endm + +generate_composite_function \ + pixman_composite_over_n_8_8_asm_neon, 0, 8, 8, \ + FLAG_DST_READWRITE, \ + 32, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + pixman_composite_over_n_8_8_init, \ + pixman_composite_over_n_8_8_cleanup, \ + pixman_composite_over_n_8_8_process_pixblock_head, \ + pixman_composite_over_n_8_8_process_pixblock_tail, \ + pixman_composite_over_n_8_8_process_pixblock_tail_head + +/******************************************************************************/ + +.macro pixman_composite_over_n_8888_8888_ca_process_pixblock_head + /* + * 'combine_mask_ca' replacement + * + * input: solid src (n) in {d8, d9, d10, d11} + * dest in {d4, d5, d6, d7 } + * mask in {d24, d25, d26, d27} + * output: updated src in {d0, d1, d2, d3 } + * updated mask in {d24, d25, d26, d3 } + */ + vmull.u8 q0, d24, d8 + vmull.u8 q1, d25, d9 + vmull.u8 q6, d26, d10 + vmull.u8 q7, d27, d11 + vmull.u8 q9, d11, d25 + vmull.u8 q12, d11, d24 + vmull.u8 q13, d11, d26 + vrshr.u16 q8, q0, #8 + vrshr.u16 q10, q1, #8 + vrshr.u16 q11, q6, #8 + vraddhn.u16 d0, q0, q8 + vraddhn.u16 d1, q1, q10 + vraddhn.u16 d2, q6, q11 + vrshr.u16 q11, q12, #8 + vrshr.u16 q8, q9, #8 + vrshr.u16 q6, q13, #8 + vrshr.u16 q10, q7, #8 + vraddhn.u16 d24, q12, q11 + vraddhn.u16 d25, q9, q8 + vraddhn.u16 d26, q13, q6 + vraddhn.u16 d3, q7, q10 + /* + * 'combine_over_ca' replacement + * + * output: updated dest in {d28, d29, d30, d31} + */ + vmvn.8 q12, q12 + vmvn.8 d26, d26 + vmull.u8 q8, d24, d4 + vmull.u8 q9, d25, d5 + vmvn.8 d27, d3 + vmull.u8 q10, d26, d6 + vmull.u8 q11, d27, d7 +.endm + +.macro pixman_composite_over_n_8888_8888_ca_process_pixblock_tail + /* ... continue 'combine_over_ca' replacement */ + vrshr.u16 q14, q8, #8 + vrshr.u16 q15, q9, #8 + vrshr.u16 q6, q10, #8 + vrshr.u16 q7, q11, #8 + vraddhn.u16 d28, q14, q8 + vraddhn.u16 d29, q15, q9 + vraddhn.u16 d30, q6, q10 + vraddhn.u16 d31, q7, q11 + vqadd.u8 q14, q0, q14 + vqadd.u8 q15, q1, q15 +.endm + +.macro pixman_composite_over_n_8888_8888_ca_process_pixblock_tail_head + vrshr.u16 q14, q8, #8 + vrshr.u16 q15, q9, #8 + vld4.8 {d4, d5, d6, d7}, [DST_R, :128]! + vrshr.u16 q6, q10, #8 + vrshr.u16 q7, q11, #8 + vraddhn.u16 d28, q14, q8 + vraddhn.u16 d29, q15, q9 + vraddhn.u16 d30, q6, q10 + vraddhn.u16 d31, q7, q11 + fetch_mask_pixblock + vqadd.u8 q14, q0, q14 + vqadd.u8 q15, q1, q15 + cache_preload 8, 8 + pixman_composite_over_n_8888_8888_ca_process_pixblock_head + vst4.8 {d28, d29, d30, d31}, [DST_W, :128]! +.endm + +.macro pixman_composite_over_n_8888_8888_ca_init + add DUMMY, sp, #ARGS_STACK_OFFSET + vpush {d8-d15} + vld1.32 {d11[0]}, [DUMMY] + vdup.8 d8, d11[0] + vdup.8 d9, d11[1] + vdup.8 d10, d11[2] + vdup.8 d11, d11[3] +.endm + +.macro pixman_composite_over_n_8888_8888_ca_cleanup + vpop {d8-d15} +.endm + +generate_composite_function \ + pixman_composite_over_n_8888_8888_ca_asm_neon, 0, 32, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + pixman_composite_over_n_8888_8888_ca_init, \ + pixman_composite_over_n_8888_8888_ca_cleanup, \ + pixman_composite_over_n_8888_8888_ca_process_pixblock_head, \ + pixman_composite_over_n_8888_8888_ca_process_pixblock_tail, \ + pixman_composite_over_n_8888_8888_ca_process_pixblock_tail_head + +/******************************************************************************/ + +.macro pixman_composite_in_n_8_process_pixblock_head + /* expecting source data in {d0, d1, d2, d3} */ + /* and destination data in {d4, d5, d6, d7} */ + vmull.u8 q8, d4, d3 + vmull.u8 q9, d5, d3 + vmull.u8 q10, d6, d3 + vmull.u8 q11, d7, d3 +.endm + +.macro pixman_composite_in_n_8_process_pixblock_tail + vrshr.u16 q14, q8, #8 + vrshr.u16 q15, q9, #8 + vrshr.u16 q12, q10, #8 + vrshr.u16 q13, q11, #8 + vraddhn.u16 d28, q8, q14 + vraddhn.u16 d29, q9, q15 + vraddhn.u16 d30, q10, q12 + vraddhn.u16 d31, q11, q13 +.endm + +.macro pixman_composite_in_n_8_process_pixblock_tail_head + pixman_composite_in_n_8_process_pixblock_tail + vld1.8 {d4, d5, d6, d7}, [DST_R, :128]! + cache_preload 32, 32 + pixman_composite_in_n_8_process_pixblock_head + vst1.8 {d28, d29, d30, d31}, [DST_W, :128]! +.endm + +.macro pixman_composite_in_n_8_init + add DUMMY, sp, #ARGS_STACK_OFFSET + vld1.32 {d3[0]}, [DUMMY] + vdup.8 d3, d3[3] +.endm + +.macro pixman_composite_in_n_8_cleanup +.endm + +generate_composite_function \ + pixman_composite_in_n_8_asm_neon, 0, 0, 8, \ + FLAG_DST_READWRITE, \ + 32, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + pixman_composite_in_n_8_init, \ + pixman_composite_in_n_8_cleanup, \ + pixman_composite_in_n_8_process_pixblock_head, \ + pixman_composite_in_n_8_process_pixblock_tail, \ + pixman_composite_in_n_8_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 4, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 24 /* mask_basereg */ + +.macro pixman_composite_add_n_8_8_process_pixblock_head + /* expecting source data in {d8, d9, d10, d11} */ + /* d8 - blue, d9 - green, d10 - red, d11 - alpha */ + /* and destination data in {d4, d5, d6, d7} */ + /* mask is in d24, d25, d26, d27 */ + vmull.u8 q0, d24, d11 + vmull.u8 q1, d25, d11 + vmull.u8 q6, d26, d11 + vmull.u8 q7, d27, d11 + vrshr.u16 q10, q0, #8 + vrshr.u16 q11, q1, #8 + vrshr.u16 q12, q6, #8 + vrshr.u16 q13, q7, #8 + vraddhn.u16 d0, q0, q10 + vraddhn.u16 d1, q1, q11 + vraddhn.u16 d2, q6, q12 + vraddhn.u16 d3, q7, q13 + vqadd.u8 q14, q0, q2 + vqadd.u8 q15, q1, q3 +.endm + +.macro pixman_composite_add_n_8_8_process_pixblock_tail +.endm + +/* TODO: expand macros and do better instructions scheduling */ +.macro pixman_composite_add_n_8_8_process_pixblock_tail_head + pixman_composite_add_n_8_8_process_pixblock_tail + vst1.8 {d28, d29, d30, d31}, [DST_W, :128]! + vld1.8 {d4, d5, d6, d7}, [DST_R, :128]! + fetch_mask_pixblock + cache_preload 32, 32 + pixman_composite_add_n_8_8_process_pixblock_head +.endm + +.macro pixman_composite_add_n_8_8_init + add DUMMY, sp, #ARGS_STACK_OFFSET + vpush {d8-d15} + vld1.32 {d11[0]}, [DUMMY] + vdup.8 d11, d11[3] +.endm + +.macro pixman_composite_add_n_8_8_cleanup + vpop {d8-d15} +.endm + +generate_composite_function \ + pixman_composite_add_n_8_8_asm_neon, 0, 8, 8, \ + FLAG_DST_READWRITE, \ + 32, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + pixman_composite_add_n_8_8_init, \ + pixman_composite_add_n_8_8_cleanup, \ + pixman_composite_add_n_8_8_process_pixblock_head, \ + pixman_composite_add_n_8_8_process_pixblock_tail, \ + pixman_composite_add_n_8_8_process_pixblock_tail_head + +/******************************************************************************/ + +.macro pixman_composite_add_8_8_8_process_pixblock_head + /* expecting source data in {d0, d1, d2, d3} */ + /* destination data in {d4, d5, d6, d7} */ + /* mask in {d24, d25, d26, d27} */ + vmull.u8 q8, d24, d0 + vmull.u8 q9, d25, d1 + vmull.u8 q10, d26, d2 + vmull.u8 q11, d27, d3 + vrshr.u16 q0, q8, #8 + vrshr.u16 q1, q9, #8 + vrshr.u16 q12, q10, #8 + vrshr.u16 q13, q11, #8 + vraddhn.u16 d0, q0, q8 + vraddhn.u16 d1, q1, q9 + vraddhn.u16 d2, q12, q10 + vraddhn.u16 d3, q13, q11 + vqadd.u8 q14, q0, q2 + vqadd.u8 q15, q1, q3 +.endm + +.macro pixman_composite_add_8_8_8_process_pixblock_tail +.endm + +/* TODO: expand macros and do better instructions scheduling */ +.macro pixman_composite_add_8_8_8_process_pixblock_tail_head + pixman_composite_add_8_8_8_process_pixblock_tail + vst1.8 {d28, d29, d30, d31}, [DST_W, :128]! + vld1.8 {d4, d5, d6, d7}, [DST_R, :128]! + fetch_mask_pixblock + fetch_src_pixblock + cache_preload 32, 32 + pixman_composite_add_8_8_8_process_pixblock_head +.endm + +.macro pixman_composite_add_8_8_8_init +.endm + +.macro pixman_composite_add_8_8_8_cleanup +.endm + +generate_composite_function \ + pixman_composite_add_8_8_8_asm_neon, 8, 8, 8, \ + FLAG_DST_READWRITE, \ + 32, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + pixman_composite_add_8_8_8_init, \ + pixman_composite_add_8_8_8_cleanup, \ + pixman_composite_add_8_8_8_process_pixblock_head, \ + pixman_composite_add_8_8_8_process_pixblock_tail, \ + pixman_composite_add_8_8_8_process_pixblock_tail_head + +/******************************************************************************/ + +.macro pixman_composite_add_8888_8888_8888_process_pixblock_head + /* expecting source data in {d0, d1, d2, d3} */ + /* destination data in {d4, d5, d6, d7} */ + /* mask in {d24, d25, d26, d27} */ + vmull.u8 q8, d27, d0 + vmull.u8 q9, d27, d1 + vmull.u8 q10, d27, d2 + vmull.u8 q11, d27, d3 + /* 1 cycle bubble */ + vrsra.u16 q8, q8, #8 + vrsra.u16 q9, q9, #8 + vrsra.u16 q10, q10, #8 + vrsra.u16 q11, q11, #8 +.endm + +.macro pixman_composite_add_8888_8888_8888_process_pixblock_tail + /* 2 cycle bubble */ + vrshrn.u16 d28, q8, #8 + vrshrn.u16 d29, q9, #8 + vrshrn.u16 d30, q10, #8 + vrshrn.u16 d31, q11, #8 + vqadd.u8 q14, q2, q14 + /* 1 cycle bubble */ + vqadd.u8 q15, q3, q15 +.endm + +.macro pixman_composite_add_8888_8888_8888_process_pixblock_tail_head + fetch_src_pixblock + vrshrn.u16 d28, q8, #8 + fetch_mask_pixblock + vrshrn.u16 d29, q9, #8 + vmull.u8 q8, d27, d0 + vrshrn.u16 d30, q10, #8 + vmull.u8 q9, d27, d1 + vrshrn.u16 d31, q11, #8 + vmull.u8 q10, d27, d2 + vqadd.u8 q14, q2, q14 + vmull.u8 q11, d27, d3 + vqadd.u8 q15, q3, q15 + vrsra.u16 q8, q8, #8 + vld4.8 {d4, d5, d6, d7}, [DST_R, :128]! + vrsra.u16 q9, q9, #8 + vst4.8 {d28, d29, d30, d31}, [DST_W, :128]! + vrsra.u16 q10, q10, #8 + + cache_preload 8, 8 + + vrsra.u16 q11, q11, #8 +.endm + +generate_composite_function \ + pixman_composite_add_8888_8888_8888_asm_neon, 32, 32, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 10, /* prefetch distance */ \ + default_init, \ + default_cleanup, \ + pixman_composite_add_8888_8888_8888_process_pixblock_head, \ + pixman_composite_add_8888_8888_8888_process_pixblock_tail, \ + pixman_composite_add_8888_8888_8888_process_pixblock_tail_head + +generate_composite_function_single_scanline \ + pixman_composite_scanline_add_mask_asm_neon, 32, 32, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + default_init, \ + default_cleanup, \ + pixman_composite_add_8888_8888_8888_process_pixblock_head, \ + pixman_composite_add_8888_8888_8888_process_pixblock_tail, \ + pixman_composite_add_8888_8888_8888_process_pixblock_tail_head + +/******************************************************************************/ + +generate_composite_function \ + pixman_composite_add_8888_8_8888_asm_neon, 32, 8, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + default_init, \ + default_cleanup, \ + pixman_composite_add_8888_8888_8888_process_pixblock_head, \ + pixman_composite_add_8888_8888_8888_process_pixblock_tail, \ + pixman_composite_add_8888_8888_8888_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 4, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 27 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_add_n_8_8888_init + add DUMMY, sp, #ARGS_STACK_OFFSET + vld1.32 {d3[0]}, [DUMMY] + vdup.8 d0, d3[0] + vdup.8 d1, d3[1] + vdup.8 d2, d3[2] + vdup.8 d3, d3[3] +.endm + +.macro pixman_composite_add_n_8_8888_cleanup +.endm + +generate_composite_function \ + pixman_composite_add_n_8_8888_asm_neon, 0, 8, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + pixman_composite_add_n_8_8888_init, \ + pixman_composite_add_n_8_8888_cleanup, \ + pixman_composite_add_8888_8888_8888_process_pixblock_head, \ + pixman_composite_add_8888_8888_8888_process_pixblock_tail, \ + pixman_composite_add_8888_8888_8888_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 4, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 27 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_add_8888_n_8888_init + add DUMMY, sp, #(ARGS_STACK_OFFSET + 8) + vld1.32 {d27[0]}, [DUMMY] + vdup.8 d27, d27[3] +.endm + +.macro pixman_composite_add_8888_n_8888_cleanup +.endm + +generate_composite_function \ + pixman_composite_add_8888_n_8888_asm_neon, 32, 0, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + pixman_composite_add_8888_n_8888_init, \ + pixman_composite_add_8888_n_8888_cleanup, \ + pixman_composite_add_8888_8888_8888_process_pixblock_head, \ + pixman_composite_add_8888_8888_8888_process_pixblock_tail, \ + pixman_composite_add_8888_8888_8888_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 4, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 27 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_out_reverse_8888_n_8888_process_pixblock_head + /* expecting source data in {d0, d1, d2, d3} */ + /* destination data in {d4, d5, d6, d7} */ + /* solid mask is in d15 */ + + /* 'in' */ + vmull.u8 q8, d15, d3 + vmull.u8 q6, d15, d2 + vmull.u8 q5, d15, d1 + vmull.u8 q4, d15, d0 + vrshr.u16 q13, q8, #8 + vrshr.u16 q12, q6, #8 + vrshr.u16 q11, q5, #8 + vrshr.u16 q10, q4, #8 + vraddhn.u16 d3, q8, q13 + vraddhn.u16 d2, q6, q12 + vraddhn.u16 d1, q5, q11 + vraddhn.u16 d0, q4, q10 + vmvn.8 d24, d3 /* get inverted alpha */ + /* now do alpha blending */ + vmull.u8 q8, d24, d4 + vmull.u8 q9, d24, d5 + vmull.u8 q10, d24, d6 + vmull.u8 q11, d24, d7 +.endm + +.macro pixman_composite_out_reverse_8888_n_8888_process_pixblock_tail + vrshr.u16 q14, q8, #8 + vrshr.u16 q15, q9, #8 + vrshr.u16 q12, q10, #8 + vrshr.u16 q13, q11, #8 + vraddhn.u16 d28, q14, q8 + vraddhn.u16 d29, q15, q9 + vraddhn.u16 d30, q12, q10 + vraddhn.u16 d31, q13, q11 +.endm + +/* TODO: expand macros and do better instructions scheduling */ +.macro pixman_composite_out_reverse_8888_8888_8888_process_pixblock_tail_head + vld4.8 {d4, d5, d6, d7}, [DST_R, :128]! + pixman_composite_out_reverse_8888_n_8888_process_pixblock_tail + fetch_src_pixblock + cache_preload 8, 8 + fetch_mask_pixblock + pixman_composite_out_reverse_8888_n_8888_process_pixblock_head + vst4.8 {d28, d29, d30, d31}, [DST_W, :128]! +.endm + +generate_composite_function_single_scanline \ + pixman_composite_scanline_out_reverse_mask_asm_neon, 32, 32, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + default_init_need_all_regs, \ + default_cleanup_need_all_regs, \ + pixman_composite_out_reverse_8888_n_8888_process_pixblock_head, \ + pixman_composite_out_reverse_8888_n_8888_process_pixblock_tail, \ + pixman_composite_out_reverse_8888_8888_8888_process_pixblock_tail_head \ + 28, /* dst_w_basereg */ \ + 4, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 12 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_over_8888_n_8888_process_pixblock_head + pixman_composite_out_reverse_8888_n_8888_process_pixblock_head +.endm + +.macro pixman_composite_over_8888_n_8888_process_pixblock_tail + pixman_composite_out_reverse_8888_n_8888_process_pixblock_tail + vqadd.u8 q14, q0, q14 + vqadd.u8 q15, q1, q15 +.endm + +/* TODO: expand macros and do better instructions scheduling */ +.macro pixman_composite_over_8888_n_8888_process_pixblock_tail_head + vld4.8 {d4, d5, d6, d7}, [DST_R, :128]! + pixman_composite_over_8888_n_8888_process_pixblock_tail + fetch_src_pixblock + cache_preload 8, 8 + pixman_composite_over_8888_n_8888_process_pixblock_head + vst4.8 {d28, d29, d30, d31}, [DST_W, :128]! +.endm + +.macro pixman_composite_over_8888_n_8888_init + add DUMMY, sp, #48 + vpush {d8-d15} + vld1.32 {d15[0]}, [DUMMY] + vdup.8 d15, d15[3] +.endm + +.macro pixman_composite_over_8888_n_8888_cleanup + vpop {d8-d15} +.endm + +generate_composite_function \ + pixman_composite_over_8888_n_8888_asm_neon, 32, 0, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + pixman_composite_over_8888_n_8888_init, \ + pixman_composite_over_8888_n_8888_cleanup, \ + pixman_composite_over_8888_n_8888_process_pixblock_head, \ + pixman_composite_over_8888_n_8888_process_pixblock_tail, \ + pixman_composite_over_8888_n_8888_process_pixblock_tail_head + +/******************************************************************************/ + +/* TODO: expand macros and do better instructions scheduling */ +.macro pixman_composite_over_8888_8888_8888_process_pixblock_tail_head + vld4.8 {d4, d5, d6, d7}, [DST_R, :128]! + pixman_composite_over_8888_n_8888_process_pixblock_tail + fetch_src_pixblock + cache_preload 8, 8 + fetch_mask_pixblock + pixman_composite_over_8888_n_8888_process_pixblock_head + vst4.8 {d28, d29, d30, d31}, [DST_W, :128]! +.endm + +generate_composite_function \ + pixman_composite_over_8888_8888_8888_asm_neon, 32, 32, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + default_init_need_all_regs, \ + default_cleanup_need_all_regs, \ + pixman_composite_over_8888_n_8888_process_pixblock_head, \ + pixman_composite_over_8888_n_8888_process_pixblock_tail, \ + pixman_composite_over_8888_8888_8888_process_pixblock_tail_head \ + 28, /* dst_w_basereg */ \ + 4, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 12 /* mask_basereg */ + +generate_composite_function_single_scanline \ + pixman_composite_scanline_over_mask_asm_neon, 32, 32, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + default_init_need_all_regs, \ + default_cleanup_need_all_regs, \ + pixman_composite_over_8888_n_8888_process_pixblock_head, \ + pixman_composite_over_8888_n_8888_process_pixblock_tail, \ + pixman_composite_over_8888_8888_8888_process_pixblock_tail_head \ + 28, /* dst_w_basereg */ \ + 4, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 12 /* mask_basereg */ + +/******************************************************************************/ + +/* TODO: expand macros and do better instructions scheduling */ +.macro pixman_composite_over_8888_8_8888_process_pixblock_tail_head + vld4.8 {d4, d5, d6, d7}, [DST_R, :128]! + pixman_composite_over_8888_n_8888_process_pixblock_tail + fetch_src_pixblock + cache_preload 8, 8 + fetch_mask_pixblock + pixman_composite_over_8888_n_8888_process_pixblock_head + vst4.8 {d28, d29, d30, d31}, [DST_W, :128]! +.endm + +generate_composite_function \ + pixman_composite_over_8888_8_8888_asm_neon, 32, 8, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + default_init_need_all_regs, \ + default_cleanup_need_all_regs, \ + pixman_composite_over_8888_n_8888_process_pixblock_head, \ + pixman_composite_over_8888_n_8888_process_pixblock_tail, \ + pixman_composite_over_8888_8_8888_process_pixblock_tail_head \ + 28, /* dst_w_basereg */ \ + 4, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 15 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_src_0888_0888_process_pixblock_head +.endm + +.macro pixman_composite_src_0888_0888_process_pixblock_tail +.endm + +.macro pixman_composite_src_0888_0888_process_pixblock_tail_head + vst3.8 {d0, d1, d2}, [DST_W]! + fetch_src_pixblock + cache_preload 8, 8 +.endm + +generate_composite_function \ + pixman_composite_src_0888_0888_asm_neon, 24, 0, 24, \ + FLAG_DST_WRITEONLY, \ + 8, /* number of pixels, processed in a single block */ \ + 10, /* prefetch distance */ \ + default_init, \ + default_cleanup, \ + pixman_composite_src_0888_0888_process_pixblock_head, \ + pixman_composite_src_0888_0888_process_pixblock_tail, \ + pixman_composite_src_0888_0888_process_pixblock_tail_head, \ + 0, /* dst_w_basereg */ \ + 0, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 0 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_src_0888_8888_rev_process_pixblock_head + vswp d0, d2 +.endm + +.macro pixman_composite_src_0888_8888_rev_process_pixblock_tail +.endm + +.macro pixman_composite_src_0888_8888_rev_process_pixblock_tail_head + vst4.8 {d0, d1, d2, d3}, [DST_W]! + fetch_src_pixblock + vswp d0, d2 + cache_preload 8, 8 +.endm + +.macro pixman_composite_src_0888_8888_rev_init + veor d3, d3, d3 +.endm + +generate_composite_function \ + pixman_composite_src_0888_8888_rev_asm_neon, 24, 0, 32, \ + FLAG_DST_WRITEONLY | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 10, /* prefetch distance */ \ + pixman_composite_src_0888_8888_rev_init, \ + default_cleanup, \ + pixman_composite_src_0888_8888_rev_process_pixblock_head, \ + pixman_composite_src_0888_8888_rev_process_pixblock_tail, \ + pixman_composite_src_0888_8888_rev_process_pixblock_tail_head, \ + 0, /* dst_w_basereg */ \ + 0, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 0 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_src_0888_0565_rev_process_pixblock_head + vshll.u8 q8, d1, #8 + vshll.u8 q9, d2, #8 +.endm + +.macro pixman_composite_src_0888_0565_rev_process_pixblock_tail + vshll.u8 q14, d0, #8 + vsri.u16 q14, q8, #5 + vsri.u16 q14, q9, #11 +.endm + +.macro pixman_composite_src_0888_0565_rev_process_pixblock_tail_head + vshll.u8 q14, d0, #8 + fetch_src_pixblock + vsri.u16 q14, q8, #5 + vsri.u16 q14, q9, #11 + vshll.u8 q8, d1, #8 + vst1.16 {d28, d29}, [DST_W, :128]! + vshll.u8 q9, d2, #8 +.endm + +generate_composite_function \ + pixman_composite_src_0888_0565_rev_asm_neon, 24, 0, 16, \ + FLAG_DST_WRITEONLY, \ + 8, /* number of pixels, processed in a single block */ \ + 10, /* prefetch distance */ \ + default_init, \ + default_cleanup, \ + pixman_composite_src_0888_0565_rev_process_pixblock_head, \ + pixman_composite_src_0888_0565_rev_process_pixblock_tail, \ + pixman_composite_src_0888_0565_rev_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 0, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 0 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_src_pixbuf_8888_process_pixblock_head + vmull.u8 q8, d3, d0 + vmull.u8 q9, d3, d1 + vmull.u8 q10, d3, d2 +.endm + +.macro pixman_composite_src_pixbuf_8888_process_pixblock_tail + vrshr.u16 q11, q8, #8 + vswp d3, d31 + vrshr.u16 q12, q9, #8 + vrshr.u16 q13, q10, #8 + vraddhn.u16 d30, q11, q8 + vraddhn.u16 d29, q12, q9 + vraddhn.u16 d28, q13, q10 +.endm + +.macro pixman_composite_src_pixbuf_8888_process_pixblock_tail_head + vrshr.u16 q11, q8, #8 + vswp d3, d31 + vrshr.u16 q12, q9, #8 + vrshr.u16 q13, q10, #8 + fetch_src_pixblock + vraddhn.u16 d30, q11, q8 + PF add PF_X, PF_X, #8 + PF tst PF_CTL, #0xF + PF addne PF_X, PF_X, #8 + PF subne PF_CTL, PF_CTL, #1 + vraddhn.u16 d29, q12, q9 + vraddhn.u16 d28, q13, q10 + vmull.u8 q8, d3, d0 + vmull.u8 q9, d3, d1 + vmull.u8 q10, d3, d2 + vst4.8 {d28, d29, d30, d31}, [DST_W, :128]! + PF cmp PF_X, ORIG_W + PF pld, [PF_SRC, PF_X, lsl #src_bpp_shift] + PF subge PF_X, PF_X, ORIG_W + PF subges PF_CTL, PF_CTL, #0x10 + PF ldrgeb DUMMY, [PF_SRC, SRC_STRIDE, lsl #src_bpp_shift]! +.endm + +generate_composite_function \ + pixman_composite_src_pixbuf_8888_asm_neon, 32, 0, 32, \ + FLAG_DST_WRITEONLY | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 10, /* prefetch distance */ \ + default_init, \ + default_cleanup, \ + pixman_composite_src_pixbuf_8888_process_pixblock_head, \ + pixman_composite_src_pixbuf_8888_process_pixblock_tail, \ + pixman_composite_src_pixbuf_8888_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 0, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 0 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_src_rpixbuf_8888_process_pixblock_head + vmull.u8 q8, d3, d0 + vmull.u8 q9, d3, d1 + vmull.u8 q10, d3, d2 +.endm + +.macro pixman_composite_src_rpixbuf_8888_process_pixblock_tail + vrshr.u16 q11, q8, #8 + vswp d3, d31 + vrshr.u16 q12, q9, #8 + vrshr.u16 q13, q10, #8 + vraddhn.u16 d28, q11, q8 + vraddhn.u16 d29, q12, q9 + vraddhn.u16 d30, q13, q10 +.endm + +.macro pixman_composite_src_rpixbuf_8888_process_pixblock_tail_head + vrshr.u16 q11, q8, #8 + vswp d3, d31 + vrshr.u16 q12, q9, #8 + vrshr.u16 q13, q10, #8 + fetch_src_pixblock + vraddhn.u16 d28, q11, q8 + PF add PF_X, PF_X, #8 + PF tst PF_CTL, #0xF + PF addne PF_X, PF_X, #8 + PF subne PF_CTL, PF_CTL, #1 + vraddhn.u16 d29, q12, q9 + vraddhn.u16 d30, q13, q10 + vmull.u8 q8, d3, d0 + vmull.u8 q9, d3, d1 + vmull.u8 q10, d3, d2 + vst4.8 {d28, d29, d30, d31}, [DST_W, :128]! + PF cmp PF_X, ORIG_W + PF pld, [PF_SRC, PF_X, lsl #src_bpp_shift] + PF subge PF_X, PF_X, ORIG_W + PF subges PF_CTL, PF_CTL, #0x10 + PF ldrgeb DUMMY, [PF_SRC, SRC_STRIDE, lsl #src_bpp_shift]! +.endm + +generate_composite_function \ + pixman_composite_src_rpixbuf_8888_asm_neon, 32, 0, 32, \ + FLAG_DST_WRITEONLY | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + 10, /* prefetch distance */ \ + default_init, \ + default_cleanup, \ + pixman_composite_src_rpixbuf_8888_process_pixblock_head, \ + pixman_composite_src_rpixbuf_8888_process_pixblock_tail, \ + pixman_composite_src_rpixbuf_8888_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 0, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 0 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_over_0565_8_0565_process_pixblock_head + /* mask is in d15 */ + convert_0565_to_x888 q4, d2, d1, d0 + convert_0565_to_x888 q5, d6, d5, d4 + /* source pixel data is in {d0, d1, d2, XX} */ + /* destination pixel data is in {d4, d5, d6, XX} */ + vmvn.8 d7, d15 + vmull.u8 q6, d15, d2 + vmull.u8 q5, d15, d1 + vmull.u8 q4, d15, d0 + vmull.u8 q8, d7, d4 + vmull.u8 q9, d7, d5 + vmull.u8 q13, d7, d6 + vrshr.u16 q12, q6, #8 + vrshr.u16 q11, q5, #8 + vrshr.u16 q10, q4, #8 + vraddhn.u16 d2, q6, q12 + vraddhn.u16 d1, q5, q11 + vraddhn.u16 d0, q4, q10 +.endm + +.macro pixman_composite_over_0565_8_0565_process_pixblock_tail + vrshr.u16 q14, q8, #8 + vrshr.u16 q15, q9, #8 + vrshr.u16 q12, q13, #8 + vraddhn.u16 d28, q14, q8 + vraddhn.u16 d29, q15, q9 + vraddhn.u16 d30, q12, q13 + vqadd.u8 q0, q0, q14 + vqadd.u8 q1, q1, q15 + /* 32bpp result is in {d0, d1, d2, XX} */ + convert_8888_to_0565 d2, d1, d0, q14, q15, q3 +.endm + +/* TODO: expand macros and do better instructions scheduling */ +.macro pixman_composite_over_0565_8_0565_process_pixblock_tail_head + fetch_mask_pixblock + pixman_composite_over_0565_8_0565_process_pixblock_tail + fetch_src_pixblock + vld1.16 {d10, d11}, [DST_R, :128]! + cache_preload 8, 8 + pixman_composite_over_0565_8_0565_process_pixblock_head + vst1.16 {d28, d29}, [DST_W, :128]! +.endm + +generate_composite_function \ + pixman_composite_over_0565_8_0565_asm_neon, 16, 8, 16, \ + FLAG_DST_READWRITE, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + default_init_need_all_regs, \ + default_cleanup_need_all_regs, \ + pixman_composite_over_0565_8_0565_process_pixblock_head, \ + pixman_composite_over_0565_8_0565_process_pixblock_tail, \ + pixman_composite_over_0565_8_0565_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 10, /* dst_r_basereg */ \ + 8, /* src_basereg */ \ + 15 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_over_0565_n_0565_init + add DUMMY, sp, #(ARGS_STACK_OFFSET + 8) + vpush {d8-d15} + vld1.32 {d15[0]}, [DUMMY] + vdup.8 d15, d15[3] +.endm + +.macro pixman_composite_over_0565_n_0565_cleanup + vpop {d8-d15} +.endm + +generate_composite_function \ + pixman_composite_over_0565_n_0565_asm_neon, 16, 0, 16, \ + FLAG_DST_READWRITE, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + pixman_composite_over_0565_n_0565_init, \ + pixman_composite_over_0565_n_0565_cleanup, \ + pixman_composite_over_0565_8_0565_process_pixblock_head, \ + pixman_composite_over_0565_8_0565_process_pixblock_tail, \ + pixman_composite_over_0565_8_0565_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 10, /* dst_r_basereg */ \ + 8, /* src_basereg */ \ + 15 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_add_0565_8_0565_process_pixblock_head + /* mask is in d15 */ + convert_0565_to_x888 q4, d2, d1, d0 + convert_0565_to_x888 q5, d6, d5, d4 + /* source pixel data is in {d0, d1, d2, XX} */ + /* destination pixel data is in {d4, d5, d6, XX} */ + vmull.u8 q6, d15, d2 + vmull.u8 q5, d15, d1 + vmull.u8 q4, d15, d0 + vrshr.u16 q12, q6, #8 + vrshr.u16 q11, q5, #8 + vrshr.u16 q10, q4, #8 + vraddhn.u16 d2, q6, q12 + vraddhn.u16 d1, q5, q11 + vraddhn.u16 d0, q4, q10 +.endm + +.macro pixman_composite_add_0565_8_0565_process_pixblock_tail + vqadd.u8 q0, q0, q2 + vqadd.u8 q1, q1, q3 + /* 32bpp result is in {d0, d1, d2, XX} */ + convert_8888_to_0565 d2, d1, d0, q14, q15, q3 +.endm + +/* TODO: expand macros and do better instructions scheduling */ +.macro pixman_composite_add_0565_8_0565_process_pixblock_tail_head + fetch_mask_pixblock + pixman_composite_add_0565_8_0565_process_pixblock_tail + fetch_src_pixblock + vld1.16 {d10, d11}, [DST_R, :128]! + cache_preload 8, 8 + pixman_composite_add_0565_8_0565_process_pixblock_head + vst1.16 {d28, d29}, [DST_W, :128]! +.endm + +generate_composite_function \ + pixman_composite_add_0565_8_0565_asm_neon, 16, 8, 16, \ + FLAG_DST_READWRITE, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + default_init_need_all_regs, \ + default_cleanup_need_all_regs, \ + pixman_composite_add_0565_8_0565_process_pixblock_head, \ + pixman_composite_add_0565_8_0565_process_pixblock_tail, \ + pixman_composite_add_0565_8_0565_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 10, /* dst_r_basereg */ \ + 8, /* src_basereg */ \ + 15 /* mask_basereg */ + +/******************************************************************************/ + +.macro pixman_composite_out_reverse_8_0565_process_pixblock_head + /* mask is in d15 */ + convert_0565_to_x888 q5, d6, d5, d4 + /* destination pixel data is in {d4, d5, d6, xx} */ + vmvn.8 d24, d15 /* get inverted alpha */ + /* now do alpha blending */ + vmull.u8 q8, d24, d4 + vmull.u8 q9, d24, d5 + vmull.u8 q10, d24, d6 +.endm + +.macro pixman_composite_out_reverse_8_0565_process_pixblock_tail + vrshr.u16 q14, q8, #8 + vrshr.u16 q15, q9, #8 + vrshr.u16 q12, q10, #8 + vraddhn.u16 d0, q14, q8 + vraddhn.u16 d1, q15, q9 + vraddhn.u16 d2, q12, q10 + /* 32bpp result is in {d0, d1, d2, XX} */ + convert_8888_to_0565 d2, d1, d0, q14, q15, q3 +.endm + +/* TODO: expand macros and do better instructions scheduling */ +.macro pixman_composite_out_reverse_8_0565_process_pixblock_tail_head + fetch_src_pixblock + pixman_composite_out_reverse_8_0565_process_pixblock_tail + vld1.16 {d10, d11}, [DST_R, :128]! + cache_preload 8, 8 + pixman_composite_out_reverse_8_0565_process_pixblock_head + vst1.16 {d28, d29}, [DST_W, :128]! +.endm + +generate_composite_function \ + pixman_composite_out_reverse_8_0565_asm_neon, 8, 0, 16, \ + FLAG_DST_READWRITE, \ + 8, /* number of pixels, processed in a single block */ \ + 5, /* prefetch distance */ \ + default_init_need_all_regs, \ + default_cleanup_need_all_regs, \ + pixman_composite_out_reverse_8_0565_process_pixblock_head, \ + pixman_composite_out_reverse_8_0565_process_pixblock_tail, \ + pixman_composite_out_reverse_8_0565_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 10, /* dst_r_basereg */ \ + 15, /* src_basereg */ \ + 0 /* mask_basereg */ + +/******************************************************************************/ + +generate_composite_function_nearest_scanline \ + pixman_scaled_nearest_scanline_8888_8888_OVER_asm_neon, 32, 0, 32, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + default_init, \ + default_cleanup, \ + pixman_composite_over_8888_8888_process_pixblock_head, \ + pixman_composite_over_8888_8888_process_pixblock_tail, \ + pixman_composite_over_8888_8888_process_pixblock_tail_head + +generate_composite_function_nearest_scanline \ + pixman_scaled_nearest_scanline_8888_0565_OVER_asm_neon, 32, 0, 16, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + default_init, \ + default_cleanup, \ + pixman_composite_over_8888_0565_process_pixblock_head, \ + pixman_composite_over_8888_0565_process_pixblock_tail, \ + pixman_composite_over_8888_0565_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 4, /* dst_r_basereg */ \ + 0, /* src_basereg */ \ + 24 /* mask_basereg */ + +generate_composite_function_nearest_scanline \ + pixman_scaled_nearest_scanline_8888_0565_SRC_asm_neon, 32, 0, 16, \ + FLAG_DST_WRITEONLY | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + default_init, \ + default_cleanup, \ + pixman_composite_src_8888_0565_process_pixblock_head, \ + pixman_composite_src_8888_0565_process_pixblock_tail, \ + pixman_composite_src_8888_0565_process_pixblock_tail_head + +generate_composite_function_nearest_scanline \ + pixman_scaled_nearest_scanline_0565_8888_SRC_asm_neon, 16, 0, 32, \ + FLAG_DST_WRITEONLY | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + default_init, \ + default_cleanup, \ + pixman_composite_src_0565_8888_process_pixblock_head, \ + pixman_composite_src_0565_8888_process_pixblock_tail, \ + pixman_composite_src_0565_8888_process_pixblock_tail_head + +generate_composite_function_nearest_scanline \ + pixman_scaled_nearest_scanline_8888_8_0565_OVER_asm_neon, 32, 8, 16, \ + FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ + 8, /* number of pixels, processed in a single block */ \ + default_init_need_all_regs, \ + default_cleanup_need_all_regs, \ + pixman_composite_over_8888_8_0565_process_pixblock_head, \ + pixman_composite_over_8888_8_0565_process_pixblock_tail, \ + pixman_composite_over_8888_8_0565_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 4, /* dst_r_basereg */ \ + 8, /* src_basereg */ \ + 24 /* mask_basereg */ + +generate_composite_function_nearest_scanline \ + pixman_scaled_nearest_scanline_0565_8_0565_OVER_asm_neon, 16, 8, 16, \ + FLAG_DST_READWRITE, \ + 8, /* number of pixels, processed in a single block */ \ + default_init_need_all_regs, \ + default_cleanup_need_all_regs, \ + pixman_composite_over_0565_8_0565_process_pixblock_head, \ + pixman_composite_over_0565_8_0565_process_pixblock_tail, \ + pixman_composite_over_0565_8_0565_process_pixblock_tail_head, \ + 28, /* dst_w_basereg */ \ + 10, /* dst_r_basereg */ \ + 8, /* src_basereg */ \ + 15 /* mask_basereg */ + +/******************************************************************************/ + +/* Supplementary macro for setting function attributes */ +.macro pixman_asm_function fname + .func fname + .global fname +#ifdef __ELF__ + .hidden fname + .type fname, %function +#endif +fname: +.endm + +/* + * Bilinear scaling support code which tries to provide pixel fetching, color + * format conversion, and interpolation as separate macros which can be used + * as the basic building blocks for constructing bilinear scanline functions. + */ + +.macro bilinear_load_8888 reg1, reg2, tmp + mov TMP2, X, asr #16 + add X, X, UX + add TMP1, TOP, TMP2, asl #2 + add TMP2, BOTTOM, TMP2, asl #2 + vld1.32 {reg1}, [TMP1] + vld1.32 {reg2}, [TMP2] +.endm + +.macro bilinear_load_0565 reg1, reg2, tmp + mov TMP2, X, asr #16 + add X, X, UX + add TMP1, TOP, TMP2, asl #1 + add TMP2, BOTTOM, TMP2, asl #1 + vld1.32 {reg2[0]}, [TMP1] + vld1.32 {reg2[1]}, [TMP2] + convert_four_0565_to_x888_packed reg2, reg1, reg2, tmp +.endm + +.macro bilinear_load_and_vertical_interpolate_two_8888 \ + acc1, acc2, reg1, reg2, reg3, reg4, tmp1, tmp2 + + bilinear_load_8888 reg1, reg2, tmp1 + vmull.u8 acc1, reg1, d28 + vmlal.u8 acc1, reg2, d29 + bilinear_load_8888 reg3, reg4, tmp2 + vmull.u8 acc2, reg3, d28 + vmlal.u8 acc2, reg4, d29 +.endm + +.macro bilinear_load_and_vertical_interpolate_four_8888 \ + xacc1, xacc2, xreg1, xreg2, xreg3, xreg4, xacc2lo, xacc2hi \ + yacc1, yacc2, yreg1, yreg2, yreg3, yreg4, yacc2lo, yacc2hi + + bilinear_load_and_vertical_interpolate_two_8888 \ + xacc1, xacc2, xreg1, xreg2, xreg3, xreg4, xacc2lo, xacc2hi + bilinear_load_and_vertical_interpolate_two_8888 \ + yacc1, yacc2, yreg1, yreg2, yreg3, yreg4, yacc2lo, yacc2hi +.endm + +.macro bilinear_load_and_vertical_interpolate_two_0565 \ + acc1, acc2, reg1, reg2, reg3, reg4, acc2lo, acc2hi + + mov TMP2, X, asr #16 + add X, X, UX + mov TMP4, X, asr #16 + add X, X, UX + add TMP1, TOP, TMP2, asl #1 + add TMP2, BOTTOM, TMP2, asl #1 + add TMP3, TOP, TMP4, asl #1 + add TMP4, BOTTOM, TMP4, asl #1 + vld1.32 {acc2lo[0]}, [TMP1] + vld1.32 {acc2hi[0]}, [TMP3] + vld1.32 {acc2lo[1]}, [TMP2] + vld1.32 {acc2hi[1]}, [TMP4] + convert_0565_to_x888 acc2, reg3, reg2, reg1 + vzip.u8 reg1, reg3 + vzip.u8 reg2, reg4 + vzip.u8 reg3, reg4 + vzip.u8 reg1, reg2 + vmull.u8 acc1, reg1, d28 + vmlal.u8 acc1, reg2, d29 + vmull.u8 acc2, reg3, d28 + vmlal.u8 acc2, reg4, d29 +.endm + +.macro bilinear_load_and_vertical_interpolate_four_0565 \ + xacc1, xacc2, xreg1, xreg2, xreg3, xreg4, xacc2lo, xacc2hi \ + yacc1, yacc2, yreg1, yreg2, yreg3, yreg4, yacc2lo, yacc2hi + + mov TMP2, X, asr #16 + add X, X, UX + mov TMP4, X, asr #16 + add X, X, UX + add TMP1, TOP, TMP2, asl #1 + add TMP2, BOTTOM, TMP2, asl #1 + add TMP3, TOP, TMP4, asl #1 + add TMP4, BOTTOM, TMP4, asl #1 + vld1.32 {xacc2lo[0]}, [TMP1] + vld1.32 {xacc2hi[0]}, [TMP3] + vld1.32 {xacc2lo[1]}, [TMP2] + vld1.32 {xacc2hi[1]}, [TMP4] + convert_0565_to_x888 xacc2, xreg3, xreg2, xreg1 + mov TMP2, X, asr #16 + add X, X, UX + mov TMP4, X, asr #16 + add X, X, UX + add TMP1, TOP, TMP2, asl #1 + add TMP2, BOTTOM, TMP2, asl #1 + add TMP3, TOP, TMP4, asl #1 + add TMP4, BOTTOM, TMP4, asl #1 + vld1.32 {yacc2lo[0]}, [TMP1] + vzip.u8 xreg1, xreg3 + vld1.32 {yacc2hi[0]}, [TMP3] + vzip.u8 xreg2, xreg4 + vld1.32 {yacc2lo[1]}, [TMP2] + vzip.u8 xreg3, xreg4 + vld1.32 {yacc2hi[1]}, [TMP4] + vzip.u8 xreg1, xreg2 + convert_0565_to_x888 yacc2, yreg3, yreg2, yreg1 + vmull.u8 xacc1, xreg1, d28 + vzip.u8 yreg1, yreg3 + vmlal.u8 xacc1, xreg2, d29 + vzip.u8 yreg2, yreg4 + vmull.u8 xacc2, xreg3, d28 + vzip.u8 yreg3, yreg4 + vmlal.u8 xacc2, xreg4, d29 + vzip.u8 yreg1, yreg2 + vmull.u8 yacc1, yreg1, d28 + vmlal.u8 yacc1, yreg2, d29 + vmull.u8 yacc2, yreg3, d28 + vmlal.u8 yacc2, yreg4, d29 +.endm + +.macro bilinear_store_8888 numpix, tmp1, tmp2 +.if numpix == 4 + vst1.32 {d0, d1}, [OUT]! +.elseif numpix == 2 + vst1.32 {d0}, [OUT]! +.elseif numpix == 1 + vst1.32 {d0[0]}, [OUT, :32]! +.else + .error bilinear_store_8888 numpix is unsupported +.endif +.endm + +.macro bilinear_store_0565 numpix, tmp1, tmp2 + vuzp.u8 d0, d1 + vuzp.u8 d2, d3 + vuzp.u8 d1, d3 + vuzp.u8 d0, d2 + convert_8888_to_0565 d2, d1, d0, q1, tmp1, tmp2 +.if numpix == 4 + vst1.16 {d2}, [OUT]! +.elseif numpix == 2 + vst1.32 {d2[0]}, [OUT]! +.elseif numpix == 1 + vst1.16 {d2[0]}, [OUT]! +.else + .error bilinear_store_0565 numpix is unsupported +.endif +.endm + +.macro bilinear_interpolate_last_pixel src_fmt, dst_fmt + bilinear_load_&src_fmt d0, d1, d2 + vmull.u8 q1, d0, d28 + vmlal.u8 q1, d1, d29 + vshr.u16 d30, d24, #8 + /* 4 cycles bubble */ + vshll.u16 q0, d2, #8 + vmlsl.u16 q0, d2, d30 + vmlal.u16 q0, d3, d30 + /* 5 cycles bubble */ + vshrn.u32 d0, q0, #16 + /* 3 cycles bubble */ + vmovn.u16 d0, q0 + /* 1 cycle bubble */ + bilinear_store_&dst_fmt 1, q2, q3 +.endm + +.macro bilinear_interpolate_two_pixels src_fmt, dst_fmt + bilinear_load_and_vertical_interpolate_two_&src_fmt \ + q1, q11, d0, d1, d20, d21, d22, d23 + vshr.u16 q15, q12, #8 + vadd.u16 q12, q12, q13 + vshll.u16 q0, d2, #8 + vmlsl.u16 q0, d2, d30 + vmlal.u16 q0, d3, d30 + vshll.u16 q10, d22, #8 + vmlsl.u16 q10, d22, d31 + vmlal.u16 q10, d23, d31 + vshrn.u32 d30, q0, #16 + vshrn.u32 d31, q10, #16 + vmovn.u16 d0, q15 + bilinear_store_&dst_fmt 2, q2, q3 +.endm + +.macro bilinear_interpolate_four_pixels src_fmt, dst_fmt + bilinear_load_and_vertical_interpolate_four_&src_fmt \ + q1, q11, d0, d1, d20, d21, d22, d23 \ + q3, q9, d4, d5, d16, d17, d18, d19 + pld [TMP1, PF_OFFS] + vshr.u16 q15, q12, #8 + vadd.u16 q12, q12, q13 + vshll.u16 q0, d2, #8 + vmlsl.u16 q0, d2, d30 + vmlal.u16 q0, d3, d30 + vshll.u16 q10, d22, #8 + vmlsl.u16 q10, d22, d31 + vmlal.u16 q10, d23, d31 + vshr.u16 q15, q12, #8 + vshll.u16 q2, d6, #8 + vmlsl.u16 q2, d6, d30 + vmlal.u16 q2, d7, d30 + vshll.u16 q8, d18, #8 + pld [TMP2, PF_OFFS] + vmlsl.u16 q8, d18, d31 + vmlal.u16 q8, d19, d31 + vadd.u16 q12, q12, q13 + vshrn.u32 d0, q0, #16 + vshrn.u32 d1, q10, #16 + vshrn.u32 d4, q2, #16 + vshrn.u32 d5, q8, #16 + vmovn.u16 d0, q0 + vmovn.u16 d1, q2 + bilinear_store_&dst_fmt 4, q2, q3 +.endm + +/* + * Main template macro for generating NEON optimized bilinear scanline + * functions. + * + * TODO: use software pipelining and aligned writes to the destination buffer + * in order to improve performance + * + * Bilinear scanline scaler macro template uses the following arguments: + * fname - name of the function to generate + * src_fmt - source color format (8888 or 0565) + * dst_fmt - destination color format (8888 or 0565) + * bpp_shift - (1 << bpp_shift) is the size of source pixel in bytes + * prefetch_distance - prefetch in the source image by that many + * pixels ahead + */ + +.macro generate_bilinear_scanline_func fname, src_fmt, dst_fmt, \ + bpp_shift, prefetch_distance + +pixman_asm_function fname + OUT .req r0 + TOP .req r1 + BOTTOM .req r2 + WT .req r3 + WB .req r4 + X .req r5 + UX .req r6 + WIDTH .req ip + TMP1 .req r3 + TMP2 .req r4 + PF_OFFS .req r7 + TMP3 .req r8 + TMP4 .req r9 + + mov ip, sp + push {r4, r5, r6, r7, r8, r9} + mov PF_OFFS, #prefetch_distance + ldmia ip, {WB, X, UX, WIDTH} + mul PF_OFFS, PF_OFFS, UX + + cmp WIDTH, #0 + ble 3f + + vdup.u16 q12, X + vdup.u16 q13, UX + vdup.u8 d28, WT + vdup.u8 d29, WB + vadd.u16 d25, d25, d26 + vadd.u16 q13, q13, q13 + + subs WIDTH, WIDTH, #4 + blt 1f + mov PF_OFFS, PF_OFFS, asr #(16 - bpp_shift) +0: + bilinear_interpolate_four_pixels src_fmt, dst_fmt + subs WIDTH, WIDTH, #4 + bge 0b +1: + tst WIDTH, #2 + beq 2f + bilinear_interpolate_two_pixels src_fmt, dst_fmt +2: + tst WIDTH, #1 + beq 3f + bilinear_interpolate_last_pixel src_fmt, dst_fmt +3: + pop {r4, r5, r6, r7, r8, r9} + bx lr + + .unreq OUT + .unreq TOP + .unreq BOTTOM + .unreq WT + .unreq WB + .unreq X + .unreq UX + .unreq WIDTH + .unreq TMP1 + .unreq TMP2 + .unreq PF_OFFS + .unreq TMP3 + .unreq TMP4 +.endfunc + +.endm + +generate_bilinear_scanline_func \ + pixman_scaled_bilinear_scanline_8888_8888_SRC_asm_neon, 8888, 8888, 2, 28 + +generate_bilinear_scanline_func \ + pixman_scaled_bilinear_scanline_8888_0565_SRC_asm_neon, 8888, 0565, 2, 28 + +generate_bilinear_scanline_func \ + pixman_scaled_bilinear_scanline_0565_x888_SRC_asm_neon, 0565, 8888, 1, 28 + +generate_bilinear_scanline_func \ + pixman_scaled_bilinear_scanline_0565_0565_SRC_asm_neon, 0565, 0565, 1, 28 diff --git a/xorg-server/Xext/geext.c b/xorg-server/Xext/geext.c index a6fbb0947..18f8ffeac 100644 --- a/xorg-server/Xext/geext.c +++ b/xorg-server/Xext/geext.c @@ -49,6 +49,7 @@ static const int version_requests[] = { static void SGEGenericEvent(xEvent* from, xEvent* to); #define NUM_VERSION_REQUESTS (sizeof (version_requests) / sizeof (version_requests[0])) +#define EXT_MASK(ext) ((ext) & 0x7F) /************************************************************/ /* request handlers */ @@ -191,8 +192,8 @@ SGEGenericEvent(xEvent* from, xEvent* to) return; } - if (GEExtensions[gefrom->extension & 0x7F].evswap) - GEExtensions[gefrom->extension & 0x7F].evswap(gefrom, geto); + if (GEExtensions[EXT_MASK(gefrom->extension)].evswap) + GEExtensions[EXT_MASK(gefrom->extension)].evswap(gefrom, geto); } /* Init extension, register at server. @@ -241,11 +242,11 @@ void GERegisterExtension(int extension, void (*ev_swap)(xGenericEvent* from, xGenericEvent* to)) { - if ((extension & 0x7F) >= MAXEXTENSIONS) + if (EXT_MASK(extension) >= MAXEXTENSIONS) FatalError("GE: extension > MAXEXTENSIONS. This should not happen.\n"); /* extension opcodes are > 128, might as well save some space here */ - GEExtensions[extension & 0x7f].evswap = ev_swap; + GEExtensions[EXT_MASK(extension)].evswap = ev_swap; } diff --git a/xorg-server/dix/eventconvert.c b/xorg-server/dix/eventconvert.c index a5fe0a9cc..14731f4f6 100644 --- a/xorg-server/dix/eventconvert.c +++ b/xorg-server/dix/eventconvert.c @@ -383,12 +383,12 @@ getValuatorEvents(DeviceEvent *ev, deviceValuator *xv) int i; int state = 0; int first_valuator, num_valuators; - DeviceIntPtr dev = NULL; num_valuators = countValuators(ev, &first_valuator); if (num_valuators > 0) { + DeviceIntPtr dev = NULL; dixLookupDevice(&dev, ev->deviceid, serverClient, DixUseAccess); /* State needs to be assembled BEFORE the device is updated. */ state = (dev && dev->key) ? XkbStateFieldFromRec(&dev->key->xkbInfo->state) : 0; @@ -405,14 +405,10 @@ getValuatorEvents(DeviceEvent *ev, deviceValuator *xv) xv->deviceid = ev->deviceid; xv->device_state = state; - for (j = 0; j < xv->num_valuators; j++) { - if (BitIsOn(ev->valuators.mask, xv->first_valuator + j)) - valuators[j] = ev->valuators.data[xv->first_valuator + j]; - else if (dev->valuator->axes[xv->first_valuator + j].mode == Absolute) - valuators[j] = dev->valuator->axisVal[xv->first_valuator + j]; - else - valuators[j] = 0; - } + /* Unset valuators in masked valuator events have the proper data values + * in the case of an absolute axis in between two set valuators. */ + for (j = 0; j < xv->num_valuators; j++) + valuators[j] = ev->valuators.data[xv->first_valuator + j]; if (i + 6 < num_valuators) xv->deviceid |= MORE_EVENTS; diff --git a/xorg-server/dix/getevents.c b/xorg-server/dix/getevents.c index 2361810a0..644b3887e 100644 --- a/xorg-server/dix/getevents.c +++ b/xorg-server/dix/getevents.c @@ -1,1314 +1,1318 @@ -/*
- * Copyright © 2006 Nokia Corporation
- * Copyright © 2006-2007 Daniel Stone
- * Copyright © 2008 Red Hat, Inc.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
- * DEALINGS IN THE SOFTWARE.
- *
- * Authors: Daniel Stone <daniel@fooishbar.org>
- * Peter Hutterer <peter.hutterer@who-t.net>
- */
-
-#ifdef HAVE_DIX_CONFIG_H
-#include <dix-config.h>
-#endif
-
-#include <X11/X.h>
-#include <X11/keysym.h>
-#include <X11/Xproto.h>
-#include <math.h>
-
-#include "misc.h"
-#include "resource.h"
-#include "inputstr.h"
-#include "scrnintstr.h"
-#include "cursorstr.h"
-#include "dixstruct.h"
-#include "globals.h"
-#include "dixevents.h"
-#include "mipointer.h"
-#include "eventstr.h"
-#include "eventconvert.h"
-#include "inpututils.h"
-
-#include <X11/extensions/XKBproto.h>
-#include "xkbsrv.h"
-
-#ifdef PANORAMIX
-#include "panoramiX.h"
-#include "panoramiXsrv.h"
-#endif
-
-#include <X11/extensions/XI.h>
-#include <X11/extensions/XIproto.h>
-#include <pixman.h>
-#include "exglobals.h"
-#include "exevents.h"
-#include "exglobals.h"
-#include "extnsionst.h"
-#include "listdev.h" /* for sizing up DeviceClassesChangedEvent */
-
-/* Number of motion history events to store. */
-#define MOTION_HISTORY_SIZE 256
-
-/* InputEventList is the container list for all input events generated by the
- * DDX. The DDX is expected to call GetEventList() and then pass the list into
- * Get{Pointer|Keyboard}Events.
- */
-EventListPtr InputEventList = NULL;
-int InputEventListLen = 0;
-
-int
-GetEventList(EventListPtr* list)
-{
- *list = InputEventList;
- return InputEventListLen;
-}
-
-/**
- * Pick some arbitrary size for Xi motion history.
- */
-int
-GetMotionHistorySize(void)
-{
- return MOTION_HISTORY_SIZE;
-}
-
-void
-set_button_down(DeviceIntPtr pDev, int button, int type)
-{
- if (type == BUTTON_PROCESSED)
- SetBit(pDev->button->down, button);
- else
- SetBit(pDev->button->postdown, button);
-}
-
-void
-set_button_up(DeviceIntPtr pDev, int button, int type)
-{
- if (type == BUTTON_PROCESSED)
- ClearBit(pDev->button->down, button);
- else
- ClearBit(pDev->button->postdown, button);
-}
-
-Bool
-button_is_down(DeviceIntPtr pDev, int button, int type)
-{
- Bool ret = FALSE;
-
- if (type & BUTTON_PROCESSED)
- ret = ret || BitIsOn(pDev->button->down, button);
- if (type & BUTTON_POSTED)
- ret = ret || BitIsOn(pDev->button->postdown, button);
-
- return ret;
-}
-
-void
-set_key_down(DeviceIntPtr pDev, int key_code, int type)
-{
- if (type == KEY_PROCESSED)
- SetBit(pDev->key->down, key_code);
- else
- SetBit(pDev->key->postdown, key_code);
-}
-
-void
-set_key_up(DeviceIntPtr pDev, int key_code, int type)
-{
- if (type == KEY_PROCESSED)
- ClearBit(pDev->key->down, key_code);
- else
- ClearBit(pDev->key->postdown, key_code);
-}
-
-Bool
-key_is_down(DeviceIntPtr pDev, int key_code, int type)
-{
- Bool ret = FALSE;
-
- if (type & KEY_PROCESSED)
- ret = ret || BitIsOn(pDev->key->down, key_code);
- if (type & KEY_POSTED)
- ret = ret || BitIsOn(pDev->key->postdown, key_code);
-
- return ret;
-}
-
-static Bool
-key_autorepeats(DeviceIntPtr pDev, int key_code)
-{
- return !!(pDev->kbdfeed->ctrl.autoRepeats[key_code >> 3] &
- (1 << (key_code & 7)));
-}
-
-static void
-init_event(DeviceIntPtr dev, DeviceEvent* event, Time ms)
-{
- memset(event, 0, sizeof(DeviceEvent));
- event->header = ET_Internal;
- event->length = sizeof(DeviceEvent);
- event->time = ms;
- event->deviceid = dev->id;
- event->sourceid = dev->id;
-}
-
-static void
-init_raw(DeviceIntPtr dev, RawDeviceEvent *event, Time ms, int type, int detail)
-{
- memset(event, 0, sizeof(RawDeviceEvent));
- event->header = ET_Internal;
- event->length = sizeof(RawDeviceEvent);
- event->type = ET_RawKeyPress - ET_KeyPress + type;
- event->time = ms;
- event->deviceid = dev->id;
- event->sourceid = dev->id;
- event->detail.button = detail;
-}
-
-static void
-set_raw_valuators(RawDeviceEvent *event, ValuatorMask *mask, int32_t* data)
-{
- int i;
-
- for (i = 0; i < valuator_mask_size(mask); i++)
- {
- if (valuator_mask_isset(mask, i))
- {
- SetBit(event->valuators.mask, i);
- data[i] = valuator_mask_get(mask, i);
- }
- }
-}
-
-
-static void
-set_valuators(DeviceIntPtr dev, DeviceEvent* event, ValuatorMask *mask)
-{
- int i;
-
- for (i = 0; i < valuator_mask_size(mask); i++)
- {
- if (valuator_mask_isset(mask, i))
- {
- SetBit(event->valuators.mask, i);
- if (valuator_get_mode(dev, i) == Absolute)
- SetBit(event->valuators.mode, i);
- event->valuators.data[i] = valuator_mask_get(mask, i);
- event->valuators.data_frac[i] =
- dev->last.remainder[i] * (1 << 16) * (1 << 16);
- }
- }
-}
-
-void
-CreateClassesChangedEvent(EventList* event,
- DeviceIntPtr master,
- DeviceIntPtr slave,
- int type)
-{
- int i;
- DeviceChangedEvent *dce;
- CARD32 ms = GetTimeInMillis();
-
- dce = (DeviceChangedEvent*)event->event;
- memset(dce, 0, sizeof(DeviceChangedEvent));
- dce->deviceid = slave->id;
- dce->masterid = master->id;
- dce->header = ET_Internal;
- dce->length = sizeof(DeviceChangedEvent);
- dce->type = ET_DeviceChanged;
- dce->time = ms;
- dce->flags = type;
- dce->flags |= DEVCHANGE_SLAVE_SWITCH;
- dce->sourceid = slave->id;
-
- if (slave->button)
- {
- dce->buttons.num_buttons = slave->button->numButtons;
- for (i = 0; i < dce->buttons.num_buttons; i++)
- dce->buttons.names[i] = slave->button->labels[i];
- }
- if (slave->valuator)
- {
- dce->num_valuators = slave->valuator->numAxes;
- for (i = 0; i < dce->num_valuators; i++)
- {
- dce->valuators[i].min = slave->valuator->axes[i].min_value;
- dce->valuators[i].max = slave->valuator->axes[i].max_value;
- dce->valuators[i].resolution = slave->valuator->axes[i].resolution;
- dce->valuators[i].mode = slave->valuator->axes[i].mode;
- dce->valuators[i].name = slave->valuator->axes[i].label;
- }
- }
- if (slave->key)
- {
- dce->keys.min_keycode = slave->key->xkbInfo->desc->min_key_code;
- dce->keys.max_keycode = slave->key->xkbInfo->desc->max_key_code;
- }
-}
-
-/**
- * Rescale the coord between the two axis ranges.
- */
-static int
-rescaleValuatorAxis(int coord, float remainder, float *remainder_return, AxisInfoPtr from, AxisInfoPtr to,
- int defmax)
-{
- int fmin = 0, tmin = 0, fmax = defmax, tmax = defmax, coord_return;
- float value;
-
- if(from && from->min_value < from->max_value) {
- fmin = from->min_value;
- fmax = from->max_value;
- }
- if(to && to->min_value < to->max_value) {
- tmin = to->min_value;
- tmax = to->max_value;
- }
-
- if(fmin == tmin && fmax == tmax) {
- if (remainder_return)
- *remainder_return = remainder;
- return coord;
- }
-
- if(fmax == fmin) { /* avoid division by 0 */
- if (remainder_return)
- *remainder_return = 0.0;
- return 0;
- }
-
- value = (coord + remainder - fmin) * (tmax - tmin) / (fmax - fmin) + tmin;
- coord_return = lroundf(value);
- if (remainder_return)
- *remainder_return = value - coord_return;
- return coord_return;
-}
-
-/**
- * Update all coordinates when changing to a different SD
- * to ensure that relative reporting will work as expected
- * without loss of precision.
- *
- * pDev->last.valuators will be in absolute device coordinates after this
- * function.
- */
-static void
-updateSlaveDeviceCoords(DeviceIntPtr master, DeviceIntPtr pDev)
-{
- ScreenPtr scr = miPointerGetScreen(pDev);
- int i;
- DeviceIntPtr lastSlave;
-
- /* master->last.valuators[0]/[1] is in screen coords and the actual
- * position of the pointer */
- pDev->last.valuators[0] = master->last.valuators[0];
- pDev->last.valuators[1] = master->last.valuators[1];
-
- if (!pDev->valuator)
- return;
-
- /* scale back to device coordinates */
- if(pDev->valuator->numAxes > 0)
- pDev->last.valuators[0] = rescaleValuatorAxis(pDev->last.valuators[0], pDev->last.remainder[0],
- &pDev->last.remainder[0], NULL, pDev->valuator->axes + 0, scr->width);
- if(pDev->valuator->numAxes > 1)
- pDev->last.valuators[1] = rescaleValuatorAxis(pDev->last.valuators[1], pDev->last.remainder[1],
- &pDev->last.remainder[1], NULL, pDev->valuator->axes + 1, scr->height);
-
- /* calculate the other axis as well based on info from the old
- * slave-device. If the old slave had less axes than this one,
- * last.valuators is reset to 0.
- */
- if ((lastSlave = master->last.slave) && lastSlave->valuator) {
- for (i = 2; i < pDev->valuator->numAxes; i++) {
- if (i >= lastSlave->valuator->numAxes)
- pDev->last.valuators[i] = 0;
- else
- pDev->last.valuators[i] =
- rescaleValuatorAxis(pDev->last.valuators[i],
- pDev->last.remainder[i],
- &pDev->last.remainder[i],
- lastSlave->valuator->axes + i,
- pDev->valuator->axes + i, 0);
- }
- }
-
-}
-
-/**
- * Allocate the motion history buffer.
- */
-void
-AllocateMotionHistory(DeviceIntPtr pDev)
-{
- int size;
- free(pDev->valuator->motion);
-
- if (pDev->valuator->numMotionEvents < 1)
- return;
-
- /* An MD must have a motion history size large enough to keep all
- * potential valuators, plus the respective range of the valuators.
- * 3 * INT32 for (min_val, max_val, curr_val))
- */
- if (IsMaster(pDev))
- size = sizeof(INT32) * 3 * MAX_VALUATORS;
- else {
- ValuatorClassPtr v = pDev->valuator;
- int numAxes;
- /* XI1 doesn't understand mixed mode devices */
- for (numAxes = 0; numAxes < v->numAxes; numAxes++)
- if (valuator_get_mode(pDev, numAxes) != valuator_get_mode(pDev, 0))
- break;
- size = sizeof(INT32) * numAxes;
- }
-
- size += sizeof(Time);
-
- pDev->valuator->motion = calloc(pDev->valuator->numMotionEvents, size);
- pDev->valuator->first_motion = 0;
- pDev->valuator->last_motion = 0;
- if (!pDev->valuator->motion)
- ErrorF("[dix] %s: Failed to alloc motion history (%d bytes).\n",
- pDev->name, size * pDev->valuator->numMotionEvents);
-}
-
-/**
- * Dump the motion history between start and stop into the supplied buffer.
- * Only records the event for a given screen in theory, but in practice, we
- * sort of ignore this.
- *
- * If core is set, we only generate x/y, in INT16, scaled to screen coords.
- */
-int
-GetMotionHistory(DeviceIntPtr pDev, xTimecoord **buff, unsigned long start,
- unsigned long stop, ScreenPtr pScreen, BOOL core)
-{
- char *ibuff = NULL, *obuff;
- int i = 0, ret = 0;
- int j, coord;
- Time current;
- /* The size of a single motion event. */
- int size;
- int dflt;
- AxisInfo from, *to; /* for scaling */
- INT32 *ocbuf, *icbuf; /* pointer to coordinates for copying */
- INT16 *corebuf;
- AxisInfo core_axis = {0};
-
- if (!pDev->valuator || !pDev->valuator->numMotionEvents)
- return 0;
-
- if (core && !pScreen)
- return 0;
-
- if (IsMaster(pDev))
- size = (sizeof(INT32) * 3 * MAX_VALUATORS) + sizeof(Time);
- else
- size = (sizeof(INT32) * pDev->valuator->numAxes) + sizeof(Time);
-
- *buff = malloc(size * pDev->valuator->numMotionEvents);
- if (!(*buff))
- return 0;
- obuff = (char *)*buff;
-
- for (i = pDev->valuator->first_motion;
- i != pDev->valuator->last_motion;
- i = (i + 1) % pDev->valuator->numMotionEvents) {
- /* We index the input buffer by which element we're accessing, which
- * is not monotonic, and the output buffer by how many events we've
- * written so far. */
- ibuff = (char *) pDev->valuator->motion + (i * size);
- memcpy(¤t, ibuff, sizeof(Time));
-
- if (current > stop) {
- return ret;
- }
- else if (current >= start) {
- if (core)
- {
- memcpy(obuff, ibuff, sizeof(Time)); /* copy timestamp */
-
- icbuf = (INT32*)(ibuff + sizeof(Time));
- corebuf = (INT16*)(obuff + sizeof(Time));
-
- /* fetch x coordinate + range */
- memcpy(&from.min_value, icbuf++, sizeof(INT32));
- memcpy(&from.max_value, icbuf++, sizeof(INT32));
- memcpy(&coord, icbuf++, sizeof(INT32));
-
- /* scale to screen coords */
- to = &core_axis;
- to->max_value = pScreen->width;
- coord = rescaleValuatorAxis(coord, 0.0, NULL, &from, to, pScreen->width);
-
- memcpy(corebuf, &coord, sizeof(INT16));
- corebuf++;
-
- /* fetch y coordinate + range */
- memcpy(&from.min_value, icbuf++, sizeof(INT32));
- memcpy(&from.max_value, icbuf++, sizeof(INT32));
- memcpy(&coord, icbuf++, sizeof(INT32));
-
- to->max_value = pScreen->height;
- coord = rescaleValuatorAxis(coord, 0.0, NULL, &from, to, pScreen->height);
- memcpy(corebuf, &coord, sizeof(INT16));
-
- } else if (IsMaster(pDev))
- {
- memcpy(obuff, ibuff, sizeof(Time)); /* copy timestamp */
-
- ocbuf = (INT32*)(obuff + sizeof(Time));
- icbuf = (INT32*)(ibuff + sizeof(Time));
- for (j = 0; j < MAX_VALUATORS; j++)
- {
- if (j >= pDev->valuator->numAxes)
- break;
-
- /* fetch min/max/coordinate */
- memcpy(&from.min_value, icbuf++, sizeof(INT32));
- memcpy(&from.max_value, icbuf++, sizeof(INT32));
- memcpy(&coord, icbuf++, sizeof(INT32));
-
- to = (j < pDev->valuator->numAxes) ? &pDev->valuator->axes[j] : NULL;
-
- /* x/y scaled to screen if no range is present */
- if (j == 0 && (from.max_value < from.min_value))
- from.max_value = pScreen->width;
- else if (j == 1 && (from.max_value < from.min_value))
- from.max_value = pScreen->height;
-
- if (j == 0 && (to->max_value < to->min_value))
- dflt = pScreen->width;
- else if (j == 1 && (to->max_value < to->min_value))
- dflt = pScreen->height;
- else
- dflt = 0;
-
- /* scale from stored range into current range */
- coord = rescaleValuatorAxis(coord, 0.0, NULL, &from, to, 0);
- memcpy(ocbuf, &coord, sizeof(INT32));
- ocbuf++;
- }
- } else
- memcpy(obuff, ibuff, size);
-
- /* don't advance by size here. size may be different to the
- * actually written size if the MD has less valuators than MAX */
- if (core)
- obuff += sizeof(INT32) + sizeof(Time);
- else
- obuff += (sizeof(INT32) * pDev->valuator->numAxes) + sizeof(Time);
- ret++;
- }
- }
-
- return ret;
-}
-
-
-/**
- * Update the motion history for a specific device, with the list of
- * valuators.
- *
- * Layout of the history buffer:
- * for SDs: [time] [val0] [val1] ... [valn]
- * for MDs: [time] [min_val0] [max_val0] [val0] [min_val1] ... [valn]
- *
- * For events that have some valuators unset:
- * min_val == max_val == val == 0.
- */
-static void
-updateMotionHistory(DeviceIntPtr pDev, CARD32 ms, ValuatorMask *mask,
- int *valuators)
-{
- char *buff = (char *) pDev->valuator->motion;
- ValuatorClassPtr v;
- int i;
-
- if (!pDev->valuator->numMotionEvents)
- return;
-
- v = pDev->valuator;
- if (IsMaster(pDev))
- {
- buff += ((sizeof(INT32) * 3 * MAX_VALUATORS) + sizeof(CARD32)) *
- v->last_motion;
-
- memcpy(buff, &ms, sizeof(Time));
- buff += sizeof(Time);
-
- memset(buff, 0, sizeof(INT32) * 3 * MAX_VALUATORS);
-
- for (i = 0; i < v->numAxes; i++)
- {
- /* XI1 doesn't support mixed mode devices */
- if (valuator_get_mode(pDev, i) != valuator_get_mode(pDev, 0))
- break;
- if (valuator_mask_size(mask) <= i || !valuator_mask_isset(mask, i))
- {
- buff += 3 * sizeof(INT32);
- continue;
- }
- memcpy(buff, &v->axes[i].min_value, sizeof(INT32));
- buff += sizeof(INT32);
- memcpy(buff, &v->axes[i].max_value, sizeof(INT32));
- buff += sizeof(INT32);
- memcpy(buff, &valuators[i], sizeof(INT32));
- buff += sizeof(INT32);
- }
- } else
- {
-
- buff += ((sizeof(INT32) * pDev->valuator->numAxes) + sizeof(CARD32)) *
- pDev->valuator->last_motion;
-
- memcpy(buff, &ms, sizeof(Time));
- buff += sizeof(Time);
-
- memset(buff, 0, sizeof(INT32) * pDev->valuator->numAxes);
-
- for (i = 0; i < MAX_VALUATORS; i++)
- {
- if (valuator_mask_size(mask) <= i || !valuator_mask_isset(mask, i))
- {
- buff += sizeof(INT32);
- continue;
- }
- memcpy(buff, &valuators[i], sizeof(INT32));
- buff += sizeof(INT32);
- }
- }
-
- pDev->valuator->last_motion = (pDev->valuator->last_motion + 1) %
- pDev->valuator->numMotionEvents;
- /* If we're wrapping around, just keep the circular buffer going. */
- if (pDev->valuator->first_motion == pDev->valuator->last_motion)
- pDev->valuator->first_motion = (pDev->valuator->first_motion + 1) %
- pDev->valuator->numMotionEvents;
-
- return;
-}
-
-
-/**
- * Returns the maximum number of events GetKeyboardEvents,
- * GetKeyboardValuatorEvents, and GetPointerEvents will ever return.
- *
- * This MUST be absolutely constant, from init until exit.
- */
-int
-GetMaximumEventsNum(void) {
- /* One raw event
- * One device event
- * One possible device changed event
- */
- return 3;
-}
-
-
-/**
- * Clip an axis to its bounds, which are declared in the call to
- * InitValuatorAxisClassStruct.
- */
-static void
-clipAxis(DeviceIntPtr pDev, int axisNum, int *val)
-{
- AxisInfoPtr axis;
-
- if (axisNum >= pDev->valuator->numAxes)
- return;
-
- axis = pDev->valuator->axes + axisNum;
-
- /* If a value range is defined, clip. If not, do nothing */
- if (axis->max_value <= axis->min_value)
- return;
-
- if (*val < axis->min_value)
- *val = axis->min_value;
- if (*val > axis->max_value)
- *val = axis->max_value;
-}
-
-/**
- * Clip every axis in the list of valuators to its bounds.
- */
-static void
-clipValuators(DeviceIntPtr pDev, ValuatorMask *mask)
-{
- int i;
-
- for (i = 0; i < valuator_mask_size(mask); i++)
- if (valuator_mask_isset(mask, i))
- {
- int val = valuator_mask_get(mask, i);
- clipAxis(pDev, i, &val);
- valuator_mask_set(mask, i, val);
- }
-}
-
-/**
- * Create the DCCE event (does not update the master's device state yet, this
- * is done in the event processing).
- * Pull in the coordinates from the MD if necessary.
- *
- * @param events Pointer to a pre-allocated event list.
- * @param dev The slave device that generated an event.
- * @param type Either DEVCHANGE_POINTER_EVENT and/or DEVCHANGE_KEYBOARD_EVENT
- * @param num_events The current number of events, returns the number of
- * events if a DCCE was generated.
- * @return The updated @events pointer.
- */
-EventListPtr
-UpdateFromMaster(EventListPtr events, DeviceIntPtr dev, int type, int *num_events)
-{
- DeviceIntPtr master;
-
- master = GetMaster(dev, (type & DEVCHANGE_POINTER_EVENT) ? MASTER_POINTER : MASTER_KEYBOARD);
-
- if (master && master->last.slave != dev)
- {
- CreateClassesChangedEvent(events, master, dev, type);
- if (IsPointerDevice(master))
- {
- updateSlaveDeviceCoords(master, dev);
- master->last.numValuators = dev->last.numValuators;
- }
- master->last.slave = dev;
- (*num_events)++;
- events++;
- }
- return events;
-}
-
-/**
- * Move the device's pointer to the position given in the valuators.
- *
- * @param dev The device which's pointer is to be moved.
- * @param x Returns the x position of the pointer after the move.
- * @param y Returns the y position of the pointer after the move.
- * @param mask Bit mask of valid valuators.
- * @param valuators Valuator data for each axis between @first and
- * @first+@num.
- */
-static void
-moveAbsolute(DeviceIntPtr dev, int *x, int *y, ValuatorMask *mask)
-{
- int i;
-
- if (valuator_mask_isset(mask, 0))
- *x = valuator_mask_get(mask, 0);
- else
- *x = dev->last.valuators[0];
-
- if (valuator_mask_isset(mask, 1))
- *y = valuator_mask_get(mask, 1);
- else
- *y = dev->last.valuators[1];
-
- clipAxis(dev, 0, x);
- clipAxis(dev, 1, y);
-
- for (i = 2; i < valuator_mask_size(mask); i++)
- {
- if (valuator_mask_isset(mask, i))
- {
- dev->last.valuators[i] = valuator_mask_get(mask, i);
- clipAxis(dev, i, &dev->last.valuators[i]);
- }
- }
-}
-
-/**
- * Move the device's pointer by the values given in @valuators.
- *
- * @param dev The device which's pointer is to be moved.
- * @param x Returns the x position of the pointer after the move.
- * @param y Returns the y position of the pointer after the move.
- * @param mask Bit mask of valid valuators.
- * @param valuators Valuator data for each axis between @first and
- * @first+@num.
- */
-static void
-moveRelative(DeviceIntPtr dev, int *x, int *y, ValuatorMask *mask)
-{
- int i;
-
- *x = dev->last.valuators[0];
- *y = dev->last.valuators[1];
-
- if (valuator_mask_isset(mask, 0))
- *x += valuator_mask_get(mask, 0);
-
- if (valuator_mask_isset(mask, 1))
- *y += valuator_mask_get(mask, 1);
-
- /* if attached, clip both x and y to the defined limits (usually
- * co-ord space limit). If it is attached, we need x/y to go over the
- * limits to be able to change screens. */
- if (dev->valuator && (IsMaster(dev) || !IsFloating(dev))) {
- if (valuator_get_mode(dev, 0) == Absolute)
- clipAxis(dev, 0, x);
- if (valuator_get_mode(dev, 1) == Absolute)
- clipAxis(dev, 1, y);
- }
-
- /* calc other axes, clip, drop back into valuators */
- for (i = 2; i < valuator_mask_size(mask); i++)
- {
- if (valuator_mask_isset(mask, i))
- {
- dev->last.valuators[i] += valuator_mask_get(mask, i);
- if (valuator_get_mode(dev, i) == Absolute)
- clipAxis(dev, i, &dev->last.valuators[i]);
- valuator_mask_set(mask, i, dev->last.valuators[i]);
- }
- }
-}
-
-/**
- * Accelerate the data in valuators based on the device's acceleration scheme.
- *
- * @param dev The device which's pointer is to be moved.
- * @param valuators Valuator mask
- * @param ms Current time.
- */
-static void
-accelPointer(DeviceIntPtr dev, ValuatorMask* valuators, CARD32 ms)
-{
- if (dev->valuator->accelScheme.AccelSchemeProc)
- dev->valuator->accelScheme.AccelSchemeProc(dev, valuators, ms);
-}
-
-/**
- * If we have HW cursors, this actually moves the visible sprite. If not, we
- * just do all the screen crossing, etc.
- *
- * We scale from device to screen coordinates here, call
- * miPointerSetPosition() and then scale back into device coordinates (if
- * needed). miPSP will change x/y if the screen was crossed.
- *
- * The coordinates provided are always absolute. The parameter mode whether
- * it was relative or absolute movement that landed us at those coordinates.
- *
- * @param dev The device to be moved.
- * @param mode Movement mode (Absolute or Relative)
- * @param x Pointer to current x-axis value, may be modified.
- * @param y Pointer to current y-axis value, may be modified.
- * @param x_frac Fractional part of current x-axis value, may be modified.
- * @param y_frac Fractional part of current y-axis value, may be modified.
- * @param scr Screen the device's sprite is currently on.
- * @param screenx Screen x coordinate the sprite is on after the update.
- * @param screeny Screen y coordinate the sprite is on after the update.
- * @param screenx_frac Fractional part of screen x coordinate, as above.
- * @param screeny_frac Fractional part of screen y coordinate, as above.
- */
-static void
-positionSprite(DeviceIntPtr dev, int mode,
- int *x, int *y, float x_frac, float y_frac,
- ScreenPtr scr, int *screenx, int *screeny, float *screenx_frac, float *screeny_frac)
-{
- int old_screenx, old_screeny;
-
- /* scale x&y to screen */
- if (dev->valuator && dev->valuator->numAxes > 0) {
- *screenx = rescaleValuatorAxis(*x, x_frac, screenx_frac,
- dev->valuator->axes + 0, NULL, scr->width);
- } else {
- *screenx = dev->last.valuators[0];
- *screenx_frac = dev->last.remainder[0];
- }
-
- if (dev->valuator && dev->valuator->numAxes > 1) {
- *screeny = rescaleValuatorAxis(*y, y_frac, screeny_frac,
- dev->valuator->axes + 1, NULL, scr->height);
- } else {
- *screeny = dev->last.valuators[1];
- *screeny_frac = dev->last.remainder[1];
- }
-
- /* Hit the left screen edge? */
- if (*screenx <= 0 && *screenx_frac < 0.0f)
- {
- *screenx_frac = 0.0f;
- x_frac = 0.0f;
- }
- if (*screeny <= 0 && *screeny_frac < 0.0f)
- {
- *screeny_frac = 0.0f;
- y_frac = 0.0f;
- }
-
-
- old_screenx = *screenx;
- old_screeny = *screeny;
- /* This takes care of crossing screens for us, as well as clipping
- * to the current screen. */
- miPointerSetPosition(dev, mode, screenx, screeny);
-
- if(!IsMaster(dev) || !IsFloating(dev)) {
- DeviceIntPtr master = GetMaster(dev, MASTER_POINTER);
- master->last.valuators[0] = *screenx;
- master->last.valuators[1] = *screeny;
- master->last.remainder[0] = *screenx_frac;
- master->last.remainder[1] = *screeny_frac;
- }
-
- if (dev->valuator)
- {
- /* Crossed screen? Scale back to device coordiantes */
- if(*screenx != old_screenx)
- {
- scr = miPointerGetScreen(dev);
- *x = rescaleValuatorAxis(*screenx, *screenx_frac, &x_frac, NULL,
- dev->valuator->axes + 0, scr->width);
- }
- if(*screeny != old_screeny)
- {
- scr = miPointerGetScreen(dev);
- *y = rescaleValuatorAxis(*screeny, *screeny_frac, &y_frac, NULL,
- dev->valuator->axes + 1, scr->height);
- }
- }
-
- /* dropy x/y (device coordinates) back into valuators for next event */
- dev->last.valuators[0] = *x;
- dev->last.valuators[1] = *y;
- dev->last.remainder[0] = x_frac;
- dev->last.remainder[1] = y_frac;
-}
-
-/**
- * Update the motion history for the device and (if appropriate) for its
- * master device.
- * @param dev Slave device to update.
- * @param mask Bit mask of valid valuators to append to history.
- * @param num Total number of valuators to append to history.
- * @param ms Current time
- */
-static void
-updateHistory(DeviceIntPtr dev, ValuatorMask *mask, CARD32 ms)
-{
- if (!dev->valuator)
- return;
-
- updateMotionHistory(dev, ms, mask, dev->last.valuators);
- if(!IsMaster(dev) || !IsFloating(dev))
- {
- DeviceIntPtr master = GetMaster(dev, MASTER_POINTER);
- updateMotionHistory(master, ms, mask, dev->last.valuators);
- }
-}
-
-/**
- * Convenience wrapper around GetKeyboardValuatorEvents, that takes no
- * valuators.
- */
-int
-GetKeyboardEvents(EventList *events, DeviceIntPtr pDev, int type, int key_code) {
- ValuatorMask mask;
-
- valuator_mask_zero(&mask);
- return GetKeyboardValuatorEvents(events, pDev, type, key_code, &mask);
-}
-
-
-/**
- * Returns a set of InternalEvents for KeyPress/KeyRelease, optionally
- * also with valuator events.
- *
- * events is not NULL-terminated; the return value is the number of events.
- * The DDX is responsible for allocating the event structure in the first
- * place via GetMaximumEventsNum(), and for freeing it.
- */
-int
-GetKeyboardValuatorEvents(EventList *events, DeviceIntPtr pDev, int type,
- int key_code, const ValuatorMask *mask_in) {
- int num_events = 0;
- CARD32 ms = 0;
- DeviceEvent *event;
- RawDeviceEvent *raw;
- ValuatorMask mask;
-
- /* refuse events from disabled devices */
- if (!pDev->enabled)
- return 0;
-
- if (!events ||!pDev->key || !pDev->focus || !pDev->kbdfeed ||
- (type != KeyPress && type != KeyRelease) ||
- (key_code < 8 || key_code > 255))
- return 0;
-
- num_events = 1;
-
- events = UpdateFromMaster(events, pDev, DEVCHANGE_KEYBOARD_EVENT, &num_events);
-
- /* Handle core repeating, via press/release/press/release. */
- if (type == KeyPress && key_is_down(pDev, key_code, KEY_POSTED)) {
- /* If autorepeating is disabled either globally or just for that key,
- * or we have a modifier, don't generate a repeat event. */
- if (!pDev->kbdfeed->ctrl.autoRepeat ||
- !key_autorepeats(pDev, key_code) ||
- pDev->key->xkbInfo->desc->map->modmap[key_code])
- return 0;
- }
-
- ms = GetTimeInMillis();
-
- raw = (RawDeviceEvent*)events->event;
- events++;
- num_events++;
-
- valuator_mask_copy(&mask, mask_in);
-
- init_raw(pDev, raw, ms, type, key_code);
- set_raw_valuators(raw, &mask, raw->valuators.data_raw);
-
- clipValuators(pDev, &mask);
-
- set_raw_valuators(raw, &mask, raw->valuators.data);
-
- event = (DeviceEvent*) events->event;
- init_event(pDev, event, ms);
- event->detail.key = key_code;
-
- if (type == KeyPress) {
- event->type = ET_KeyPress;
- set_key_down(pDev, key_code, KEY_POSTED);
- }
- else if (type == KeyRelease) {
- event->type = ET_KeyRelease;
- set_key_up(pDev, key_code, KEY_POSTED);
- }
-
- clipValuators(pDev, &mask);
-
- set_valuators(pDev, event, &mask);
-
- return num_events;
-}
-
-/**
- * Initialize an event list and fill with 32 byte sized events.
- * This event list is to be passed into GetPointerEvents() and
- * GetKeyboardEvents().
- *
- * @param num_events Number of elements in list.
- */
-EventListPtr
-InitEventList(int num_events)
-{
- EventListPtr events;
- int i;
-
- events = (EventListPtr)calloc(num_events, sizeof(EventList));
- if (!events)
- return NULL;
-
- for (i = 0; i < num_events; i++)
- {
- events[i].evlen = sizeof(InternalEvent);
- events[i].event = calloc(1, sizeof(InternalEvent));
- if (!events[i].event)
- {
- /* rollback */
- while(i--)
- free(events[i].event);
- free(events);
- events = NULL;
- break;
- }
- }
-
- return events;
-}
-
-/**
- * Free an event list.
- *
- * @param list The list to be freed.
- * @param num_events Number of elements in list.
- */
-void
-FreeEventList(EventListPtr list, int num_events)
-{
- if (!list)
- return;
- while(num_events--)
- free(list[num_events].event);
- free(list);
-}
-
-static void
-transformAbsolute(DeviceIntPtr dev, ValuatorMask *mask)
-{
- struct pixman_f_vector p;
-
- /* p' = M * p in homogeneous coordinates */
- p.v[0] = (valuator_mask_isset(mask, 0) ? valuator_mask_get(mask, 0) :
- dev->last.valuators[0]);
- p.v[1] = (valuator_mask_isset(mask, 1) ? valuator_mask_get(mask, 1) :
- dev->last.valuators[1]);
- p.v[2] = 1.0;
-
- pixman_f_transform_point(&dev->transform, &p);
-
- if (lround(p.v[0]) != dev->last.valuators[0])
- valuator_mask_set(mask, 0, lround(p.v[0]));
- if (lround(p.v[1]) != dev->last.valuators[1])
- valuator_mask_set(mask, 1, lround(p.v[1]));
-}
-
-/**
- * Generate a series of InternalEvents (filled into the EventList)
- * representing pointer motion, or button presses.
- *
- * events is not NULL-terminated; the return value is the number of events.
- * The DDX is responsible for allocating the event structure in the first
- * place via InitEventList() and GetMaximumEventsNum(), and for freeing it.
- *
- * In the generated events rootX/Y will be in absolute screen coords and
- * the valuator information in the absolute or relative device coords.
- *
- * last.valuators[x] of the device is always in absolute device coords.
- * last.valuators[x] of the master device is in absolute screen coords.
- *
- * master->last.valuators[x] for x > 2 is undefined.
- */
-int
-GetPointerEvents(EventList *events, DeviceIntPtr pDev, int type, int buttons,
- int flags, const ValuatorMask *mask_in) {
- int num_events = 1;
- CARD32 ms;
- DeviceEvent *event;
- RawDeviceEvent *raw;
- int x = 0, y = 0, /* device coords */
- cx, cy; /* only screen coordinates */
- float x_frac = 0.0, y_frac = 0.0, cx_frac, cy_frac;
- ScreenPtr scr = miPointerGetScreen(pDev);
- ValuatorMask mask;
-
- /* refuse events from disabled devices */
- if (!pDev->enabled)
- return 0;
-
- if (!scr)
- return 0;
-
- switch (type)
- {
- case MotionNotify:
- if (!mask_in || valuator_mask_num_valuators(mask_in) <= 0)
- return 0;
- break;
- case ButtonPress:
- case ButtonRelease:
- if (!pDev->button || !buttons)
- return 0;
- break;
- default:
- return 0;
- }
-
- ms = GetTimeInMillis(); /* before pointer update to help precision */
-
- events = UpdateFromMaster(events, pDev, DEVCHANGE_POINTER_EVENT, &num_events);
-
- raw = (RawDeviceEvent*)events->event;
- events++;
- num_events++;
-
- valuator_mask_copy(&mask, mask_in);
-
- init_raw(pDev, raw, ms, type, buttons);
- set_raw_valuators(raw, &mask, raw->valuators.data_raw);
-
- if (flags & POINTER_ABSOLUTE)
- {
- if (flags & POINTER_SCREEN) /* valuators are in screen coords */
- {
- int scaled;
-
- if (valuator_mask_isset(&mask, 0))
- {
- scaled = rescaleValuatorAxis(valuator_mask_get(&mask, 0),
- 0.0, &x_frac, NULL,
- pDev->valuator->axes + 0,
- scr->width);
- valuator_mask_set(&mask, 0, scaled);
- }
- if (valuator_mask_isset(&mask, 1))
- {
- scaled = rescaleValuatorAxis(valuator_mask_get(&mask, 1),
- 0.0, &y_frac, NULL,
- pDev->valuator->axes + 1,
- scr->height);
- valuator_mask_set(&mask, 1, scaled);
- }
- }
-
- transformAbsolute(pDev, &mask);
- moveAbsolute(pDev, &x, &y, &mask);
- } else {
- if (flags & POINTER_ACCELERATE) {
- accelPointer(pDev, &mask, ms);
- /* The pointer acceleration code modifies the fractional part
- * in-place, so we need to extract this information first */
- x_frac = pDev->last.remainder[0];
- y_frac = pDev->last.remainder[1];
- }
- moveRelative(pDev, &x, &y, &mask);
- }
-
- set_raw_valuators(raw, &mask, raw->valuators.data);
-
- positionSprite(pDev, (flags & POINTER_ABSOLUTE) ? Absolute : Relative,
- &x, &y, x_frac, y_frac, scr, &cx, &cy, &cx_frac, &cy_frac);
- updateHistory(pDev, &mask, ms);
-
- /* Update the valuators with the true value sent to the client*/
- if (valuator_mask_isset(&mask, 0))
- valuator_mask_set(&mask, 0, x);
- if (valuator_mask_isset(&mask, 1))
- valuator_mask_set(&mask, 1, y);
-
- clipValuators(pDev, &mask);
-
- event = (DeviceEvent*) events->event;
- init_event(pDev, event, ms);
-
- if (type == MotionNotify) {
- event->type = ET_Motion;
- event->detail.button = 0;
- }
- else {
- if (type == ButtonPress) {
- event->type = ET_ButtonPress;
- set_button_down(pDev, buttons, BUTTON_POSTED);
- }
- else if (type == ButtonRelease) {
- event->type = ET_ButtonRelease;
- set_button_up(pDev, buttons, BUTTON_POSTED);
- }
- event->detail.button = buttons;
- }
-
- event->root_x = cx; /* root_x/y always in screen coords */
- event->root_y = cy;
- event->root_x_frac = cx_frac;
- event->root_y_frac = cy_frac;
-
- set_valuators(pDev, event, &mask);
-
- return num_events;
-}
-
-
-/**
- * Generate ProximityIn/ProximityOut InternalEvents, accompanied by
- * valuators.
- *
- * events is not NULL-terminated; the return value is the number of events.
- * The DDX is responsible for allocating the event structure in the first
- * place via GetMaximumEventsNum(), and for freeing it.
- */
-int
-GetProximityEvents(EventList *events, DeviceIntPtr pDev, int type, const ValuatorMask *mask_in)
-{
- int num_events = 1, i;
- DeviceEvent *event;
- ValuatorMask mask;
-
- /* refuse events from disabled devices */
- if (!pDev->enabled)
- return 0;
-
- /* Sanity checks. */
- if ((type != ProximityIn && type != ProximityOut) || !mask_in)
- return 0;
- if (!pDev->valuator)
- return 0;
-
- valuator_mask_copy(&mask, mask_in);
-
- /* ignore relative axes for proximity. */
- for (i = 0; i < valuator_mask_size(&mask); i++)
- {
- if (valuator_mask_isset(&mask, i) &&
- valuator_get_mode(pDev, i) == Relative)
- valuator_mask_unset(&mask, i);
- }
-
- /* FIXME: posting proximity events with relative valuators only results
- * in an empty event, EventToXI() will fail to convert → no event sent
- * to client. */
-
- events = UpdateFromMaster(events, pDev, DEVCHANGE_POINTER_EVENT, &num_events);
-
- event = (DeviceEvent *) events->event;
- init_event(pDev, event, GetTimeInMillis());
- event->type = (type == ProximityIn) ? ET_ProximityIn : ET_ProximityOut;
-
- clipValuators(pDev, &mask);
-
- set_valuators(pDev, event, &mask);
-
- return num_events;
-}
-
-/**
- * Synthesize a single motion event for the core pointer.
- *
- * Used in cursor functions, e.g. when cursor confinement changes, and we need
- * to shift the pointer to get it inside the new bounds.
- */
-void
-PostSyntheticMotion(DeviceIntPtr pDev,
- int x,
- int y,
- int screen,
- unsigned long time)
-{
- DeviceEvent ev;
-
-#ifdef PANORAMIX
- /* Translate back to the sprite screen since processInputProc
- will translate from sprite screen to screen 0 upon reentry
- to the DIX layer. */
- if (!noPanoramiXExtension) {
- x += screenInfo.screens[0]->x - screenInfo.screens[screen]->x;
- y += screenInfo.screens[0]->y - screenInfo.screens[screen]->y;
- }
-#endif
-
- memset(&ev, 0, sizeof(DeviceEvent));
- init_event(pDev, &ev, time);
- ev.root_x = x;
- ev.root_y = y;
- ev.type = ET_Motion;
- ev.time = time;
-
- /* FIXME: MD/SD considerations? */
- (*pDev->public.processInputProc)((InternalEvent*)&ev, pDev);
-}
+/* + * Copyright © 2006 Nokia Corporation + * Copyright © 2006-2007 Daniel Stone + * Copyright © 2008 Red Hat, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Authors: Daniel Stone <daniel@fooishbar.org> + * Peter Hutterer <peter.hutterer@who-t.net> + */ + +#ifdef HAVE_DIX_CONFIG_H +#include <dix-config.h> +#endif + +#include <X11/X.h> +#include <X11/keysym.h> +#include <X11/Xproto.h> +#include <math.h> + +#include "misc.h" +#include "resource.h" +#include "inputstr.h" +#include "scrnintstr.h" +#include "cursorstr.h" +#include "dixstruct.h" +#include "globals.h" +#include "dixevents.h" +#include "mipointer.h" +#include "eventstr.h" +#include "eventconvert.h" +#include "inpututils.h" + +#include <X11/extensions/XKBproto.h> +#include "xkbsrv.h" + +#ifdef PANORAMIX +#include "panoramiX.h" +#include "panoramiXsrv.h" +#endif + +#include <X11/extensions/XI.h> +#include <X11/extensions/XIproto.h> +#include <pixman.h> +#include "exglobals.h" +#include "exevents.h" +#include "exglobals.h" +#include "extnsionst.h" +#include "listdev.h" /* for sizing up DeviceClassesChangedEvent */ + +/* Number of motion history events to store. */ +#define MOTION_HISTORY_SIZE 256 + +/* InputEventList is the container list for all input events generated by the + * DDX. The DDX is expected to call GetEventList() and then pass the list into + * Get{Pointer|Keyboard}Events. + */ +EventListPtr InputEventList = NULL; +int InputEventListLen = 0; + +int +GetEventList(EventListPtr* list) +{ + *list = InputEventList; + return InputEventListLen; +} + +/** + * Pick some arbitrary size for Xi motion history. + */ +int +GetMotionHistorySize(void) +{ + return MOTION_HISTORY_SIZE; +} + +void +set_button_down(DeviceIntPtr pDev, int button, int type) +{ + if (type == BUTTON_PROCESSED) + SetBit(pDev->button->down, button); + else + SetBit(pDev->button->postdown, button); +} + +void +set_button_up(DeviceIntPtr pDev, int button, int type) +{ + if (type == BUTTON_PROCESSED) + ClearBit(pDev->button->down, button); + else + ClearBit(pDev->button->postdown, button); +} + +Bool +button_is_down(DeviceIntPtr pDev, int button, int type) +{ + Bool ret = FALSE; + + if (type & BUTTON_PROCESSED) + ret = ret || BitIsOn(pDev->button->down, button); + if (type & BUTTON_POSTED) + ret = ret || BitIsOn(pDev->button->postdown, button); + + return ret; +} + +void +set_key_down(DeviceIntPtr pDev, int key_code, int type) +{ + if (type == KEY_PROCESSED) + SetBit(pDev->key->down, key_code); + else + SetBit(pDev->key->postdown, key_code); +} + +void +set_key_up(DeviceIntPtr pDev, int key_code, int type) +{ + if (type == KEY_PROCESSED) + ClearBit(pDev->key->down, key_code); + else + ClearBit(pDev->key->postdown, key_code); +} + +Bool +key_is_down(DeviceIntPtr pDev, int key_code, int type) +{ + Bool ret = FALSE; + + if (type & KEY_PROCESSED) + ret = ret || BitIsOn(pDev->key->down, key_code); + if (type & KEY_POSTED) + ret = ret || BitIsOn(pDev->key->postdown, key_code); + + return ret; +} + +static Bool +key_autorepeats(DeviceIntPtr pDev, int key_code) +{ + return !!(pDev->kbdfeed->ctrl.autoRepeats[key_code >> 3] & + (1 << (key_code & 7))); +} + +static void +init_event(DeviceIntPtr dev, DeviceEvent* event, Time ms) +{ + memset(event, 0, sizeof(DeviceEvent)); + event->header = ET_Internal; + event->length = sizeof(DeviceEvent); + event->time = ms; + event->deviceid = dev->id; + event->sourceid = dev->id; +} + +static void +init_raw(DeviceIntPtr dev, RawDeviceEvent *event, Time ms, int type, int detail) +{ + memset(event, 0, sizeof(RawDeviceEvent)); + event->header = ET_Internal; + event->length = sizeof(RawDeviceEvent); + event->type = ET_RawKeyPress - ET_KeyPress + type; + event->time = ms; + event->deviceid = dev->id; + event->sourceid = dev->id; + event->detail.button = detail; +} + +static void +set_raw_valuators(RawDeviceEvent *event, ValuatorMask *mask, int32_t* data) +{ + int i; + + for (i = 0; i < valuator_mask_size(mask); i++) + { + if (valuator_mask_isset(mask, i)) + { + SetBit(event->valuators.mask, i); + data[i] = valuator_mask_get(mask, i); + } + } +} + + +static void +set_valuators(DeviceIntPtr dev, DeviceEvent* event, ValuatorMask *mask) +{ + int i; + + /* Set the data to the previous value for unset absolute axes. The values + * may be used when sent as part of an XI 1.x valuator event. */ + for (i = 0; i < valuator_mask_size(mask); i++) + { + if (valuator_mask_isset(mask, i)) + { + SetBit(event->valuators.mask, i); + if (valuator_get_mode(dev, i) == Absolute) + SetBit(event->valuators.mode, i); + event->valuators.data[i] = valuator_mask_get(mask, i); + event->valuators.data_frac[i] = + dev->last.remainder[i] * (1 << 16) * (1 << 16); + } + else if (valuator_get_mode(dev, i) == Absolute) + event->valuators.data[i] = dev->valuator->axisVal[i]; + } +} + +void +CreateClassesChangedEvent(EventList* event, + DeviceIntPtr master, + DeviceIntPtr slave, + int type) +{ + int i; + DeviceChangedEvent *dce; + CARD32 ms = GetTimeInMillis(); + + dce = (DeviceChangedEvent*)event->event; + memset(dce, 0, sizeof(DeviceChangedEvent)); + dce->deviceid = slave->id; + dce->masterid = master->id; + dce->header = ET_Internal; + dce->length = sizeof(DeviceChangedEvent); + dce->type = ET_DeviceChanged; + dce->time = ms; + dce->flags = type; + dce->flags |= DEVCHANGE_SLAVE_SWITCH; + dce->sourceid = slave->id; + + if (slave->button) + { + dce->buttons.num_buttons = slave->button->numButtons; + for (i = 0; i < dce->buttons.num_buttons; i++) + dce->buttons.names[i] = slave->button->labels[i]; + } + if (slave->valuator) + { + dce->num_valuators = slave->valuator->numAxes; + for (i = 0; i < dce->num_valuators; i++) + { + dce->valuators[i].min = slave->valuator->axes[i].min_value; + dce->valuators[i].max = slave->valuator->axes[i].max_value; + dce->valuators[i].resolution = slave->valuator->axes[i].resolution; + dce->valuators[i].mode = slave->valuator->axes[i].mode; + dce->valuators[i].name = slave->valuator->axes[i].label; + } + } + if (slave->key) + { + dce->keys.min_keycode = slave->key->xkbInfo->desc->min_key_code; + dce->keys.max_keycode = slave->key->xkbInfo->desc->max_key_code; + } +} + +/** + * Rescale the coord between the two axis ranges. + */ +static int +rescaleValuatorAxis(int coord, float remainder, float *remainder_return, AxisInfoPtr from, AxisInfoPtr to, + int defmax) +{ + int fmin = 0, tmin = 0, fmax = defmax, tmax = defmax, coord_return; + float value; + + if(from && from->min_value < from->max_value) { + fmin = from->min_value; + fmax = from->max_value; + } + if(to && to->min_value < to->max_value) { + tmin = to->min_value; + tmax = to->max_value; + } + + if(fmin == tmin && fmax == tmax) { + if (remainder_return) + *remainder_return = remainder; + return coord; + } + + if(fmax == fmin) { /* avoid division by 0 */ + if (remainder_return) + *remainder_return = 0.0; + return 0; + } + + value = (coord + remainder - fmin) * (tmax - tmin) / (fmax - fmin) + tmin; + coord_return = lroundf(value); + if (remainder_return) + *remainder_return = value - coord_return; + return coord_return; +} + +/** + * Update all coordinates when changing to a different SD + * to ensure that relative reporting will work as expected + * without loss of precision. + * + * pDev->last.valuators will be in absolute device coordinates after this + * function. + */ +static void +updateSlaveDeviceCoords(DeviceIntPtr master, DeviceIntPtr pDev) +{ + ScreenPtr scr = miPointerGetScreen(pDev); + int i; + DeviceIntPtr lastSlave; + + /* master->last.valuators[0]/[1] is in screen coords and the actual + * position of the pointer */ + pDev->last.valuators[0] = master->last.valuators[0]; + pDev->last.valuators[1] = master->last.valuators[1]; + + if (!pDev->valuator) + return; + + /* scale back to device coordinates */ + if(pDev->valuator->numAxes > 0) + pDev->last.valuators[0] = rescaleValuatorAxis(pDev->last.valuators[0], pDev->last.remainder[0], + &pDev->last.remainder[0], NULL, pDev->valuator->axes + 0, scr->width); + if(pDev->valuator->numAxes > 1) + pDev->last.valuators[1] = rescaleValuatorAxis(pDev->last.valuators[1], pDev->last.remainder[1], + &pDev->last.remainder[1], NULL, pDev->valuator->axes + 1, scr->height); + + /* calculate the other axis as well based on info from the old + * slave-device. If the old slave had less axes than this one, + * last.valuators is reset to 0. + */ + if ((lastSlave = master->last.slave) && lastSlave->valuator) { + for (i = 2; i < pDev->valuator->numAxes; i++) { + if (i >= lastSlave->valuator->numAxes) + pDev->last.valuators[i] = 0; + else + pDev->last.valuators[i] = + rescaleValuatorAxis(pDev->last.valuators[i], + pDev->last.remainder[i], + &pDev->last.remainder[i], + lastSlave->valuator->axes + i, + pDev->valuator->axes + i, 0); + } + } + +} + +/** + * Allocate the motion history buffer. + */ +void +AllocateMotionHistory(DeviceIntPtr pDev) +{ + int size; + free(pDev->valuator->motion); + + if (pDev->valuator->numMotionEvents < 1) + return; + + /* An MD must have a motion history size large enough to keep all + * potential valuators, plus the respective range of the valuators. + * 3 * INT32 for (min_val, max_val, curr_val)) + */ + if (IsMaster(pDev)) + size = sizeof(INT32) * 3 * MAX_VALUATORS; + else { + ValuatorClassPtr v = pDev->valuator; + int numAxes; + /* XI1 doesn't understand mixed mode devices */ + for (numAxes = 0; numAxes < v->numAxes; numAxes++) + if (valuator_get_mode(pDev, numAxes) != valuator_get_mode(pDev, 0)) + break; + size = sizeof(INT32) * numAxes; + } + + size += sizeof(Time); + + pDev->valuator->motion = calloc(pDev->valuator->numMotionEvents, size); + pDev->valuator->first_motion = 0; + pDev->valuator->last_motion = 0; + if (!pDev->valuator->motion) + ErrorF("[dix] %s: Failed to alloc motion history (%d bytes).\n", + pDev->name, size * pDev->valuator->numMotionEvents); +} + +/** + * Dump the motion history between start and stop into the supplied buffer. + * Only records the event for a given screen in theory, but in practice, we + * sort of ignore this. + * + * If core is set, we only generate x/y, in INT16, scaled to screen coords. + */ +int +GetMotionHistory(DeviceIntPtr pDev, xTimecoord **buff, unsigned long start, + unsigned long stop, ScreenPtr pScreen, BOOL core) +{ + char *ibuff = NULL, *obuff; + int i = 0, ret = 0; + int j, coord; + Time current; + /* The size of a single motion event. */ + int size; + int dflt; + AxisInfo from, *to; /* for scaling */ + INT32 *ocbuf, *icbuf; /* pointer to coordinates for copying */ + INT16 *corebuf; + AxisInfo core_axis = {0}; + + if (!pDev->valuator || !pDev->valuator->numMotionEvents) + return 0; + + if (core && !pScreen) + return 0; + + if (IsMaster(pDev)) + size = (sizeof(INT32) * 3 * MAX_VALUATORS) + sizeof(Time); + else + size = (sizeof(INT32) * pDev->valuator->numAxes) + sizeof(Time); + + *buff = malloc(size * pDev->valuator->numMotionEvents); + if (!(*buff)) + return 0; + obuff = (char *)*buff; + + for (i = pDev->valuator->first_motion; + i != pDev->valuator->last_motion; + i = (i + 1) % pDev->valuator->numMotionEvents) { + /* We index the input buffer by which element we're accessing, which + * is not monotonic, and the output buffer by how many events we've + * written so far. */ + ibuff = (char *) pDev->valuator->motion + (i * size); + memcpy(¤t, ibuff, sizeof(Time)); + + if (current > stop) { + return ret; + } + else if (current >= start) { + if (core) + { + memcpy(obuff, ibuff, sizeof(Time)); /* copy timestamp */ + + icbuf = (INT32*)(ibuff + sizeof(Time)); + corebuf = (INT16*)(obuff + sizeof(Time)); + + /* fetch x coordinate + range */ + memcpy(&from.min_value, icbuf++, sizeof(INT32)); + memcpy(&from.max_value, icbuf++, sizeof(INT32)); + memcpy(&coord, icbuf++, sizeof(INT32)); + + /* scale to screen coords */ + to = &core_axis; + to->max_value = pScreen->width; + coord = rescaleValuatorAxis(coord, 0.0, NULL, &from, to, pScreen->width); + + memcpy(corebuf, &coord, sizeof(INT16)); + corebuf++; + + /* fetch y coordinate + range */ + memcpy(&from.min_value, icbuf++, sizeof(INT32)); + memcpy(&from.max_value, icbuf++, sizeof(INT32)); + memcpy(&coord, icbuf++, sizeof(INT32)); + + to->max_value = pScreen->height; + coord = rescaleValuatorAxis(coord, 0.0, NULL, &from, to, pScreen->height); + memcpy(corebuf, &coord, sizeof(INT16)); + + } else if (IsMaster(pDev)) + { + memcpy(obuff, ibuff, sizeof(Time)); /* copy timestamp */ + + ocbuf = (INT32*)(obuff + sizeof(Time)); + icbuf = (INT32*)(ibuff + sizeof(Time)); + for (j = 0; j < MAX_VALUATORS; j++) + { + if (j >= pDev->valuator->numAxes) + break; + + /* fetch min/max/coordinate */ + memcpy(&from.min_value, icbuf++, sizeof(INT32)); + memcpy(&from.max_value, icbuf++, sizeof(INT32)); + memcpy(&coord, icbuf++, sizeof(INT32)); + + to = (j < pDev->valuator->numAxes) ? &pDev->valuator->axes[j] : NULL; + + /* x/y scaled to screen if no range is present */ + if (j == 0 && (from.max_value < from.min_value)) + from.max_value = pScreen->width; + else if (j == 1 && (from.max_value < from.min_value)) + from.max_value = pScreen->height; + + if (j == 0 && (to->max_value < to->min_value)) + dflt = pScreen->width; + else if (j == 1 && (to->max_value < to->min_value)) + dflt = pScreen->height; + else + dflt = 0; + + /* scale from stored range into current range */ + coord = rescaleValuatorAxis(coord, 0.0, NULL, &from, to, 0); + memcpy(ocbuf, &coord, sizeof(INT32)); + ocbuf++; + } + } else + memcpy(obuff, ibuff, size); + + /* don't advance by size here. size may be different to the + * actually written size if the MD has less valuators than MAX */ + if (core) + obuff += sizeof(INT32) + sizeof(Time); + else + obuff += (sizeof(INT32) * pDev->valuator->numAxes) + sizeof(Time); + ret++; + } + } + + return ret; +} + + +/** + * Update the motion history for a specific device, with the list of + * valuators. + * + * Layout of the history buffer: + * for SDs: [time] [val0] [val1] ... [valn] + * for MDs: [time] [min_val0] [max_val0] [val0] [min_val1] ... [valn] + * + * For events that have some valuators unset: + * min_val == max_val == val == 0. + */ +static void +updateMotionHistory(DeviceIntPtr pDev, CARD32 ms, ValuatorMask *mask, + int *valuators) +{ + char *buff = (char *) pDev->valuator->motion; + ValuatorClassPtr v; + int i; + + if (!pDev->valuator->numMotionEvents) + return; + + v = pDev->valuator; + if (IsMaster(pDev)) + { + buff += ((sizeof(INT32) * 3 * MAX_VALUATORS) + sizeof(CARD32)) * + v->last_motion; + + memcpy(buff, &ms, sizeof(Time)); + buff += sizeof(Time); + + memset(buff, 0, sizeof(INT32) * 3 * MAX_VALUATORS); + + for (i = 0; i < v->numAxes; i++) + { + /* XI1 doesn't support mixed mode devices */ + if (valuator_get_mode(pDev, i) != valuator_get_mode(pDev, 0)) + break; + if (valuator_mask_size(mask) <= i || !valuator_mask_isset(mask, i)) + { + buff += 3 * sizeof(INT32); + continue; + } + memcpy(buff, &v->axes[i].min_value, sizeof(INT32)); + buff += sizeof(INT32); + memcpy(buff, &v->axes[i].max_value, sizeof(INT32)); + buff += sizeof(INT32); + memcpy(buff, &valuators[i], sizeof(INT32)); + buff += sizeof(INT32); + } + } else + { + + buff += ((sizeof(INT32) * pDev->valuator->numAxes) + sizeof(CARD32)) * + pDev->valuator->last_motion; + + memcpy(buff, &ms, sizeof(Time)); + buff += sizeof(Time); + + memset(buff, 0, sizeof(INT32) * pDev->valuator->numAxes); + + for (i = 0; i < MAX_VALUATORS; i++) + { + if (valuator_mask_size(mask) <= i || !valuator_mask_isset(mask, i)) + { + buff += sizeof(INT32); + continue; + } + memcpy(buff, &valuators[i], sizeof(INT32)); + buff += sizeof(INT32); + } + } + + pDev->valuator->last_motion = (pDev->valuator->last_motion + 1) % + pDev->valuator->numMotionEvents; + /* If we're wrapping around, just keep the circular buffer going. */ + if (pDev->valuator->first_motion == pDev->valuator->last_motion) + pDev->valuator->first_motion = (pDev->valuator->first_motion + 1) % + pDev->valuator->numMotionEvents; + + return; +} + + +/** + * Returns the maximum number of events GetKeyboardEvents, + * GetKeyboardValuatorEvents, and GetPointerEvents will ever return. + * + * This MUST be absolutely constant, from init until exit. + */ +int +GetMaximumEventsNum(void) { + /* One raw event + * One device event + * One possible device changed event + */ + return 3; +} + + +/** + * Clip an axis to its bounds, which are declared in the call to + * InitValuatorAxisClassStruct. + */ +static void +clipAxis(DeviceIntPtr pDev, int axisNum, int *val) +{ + AxisInfoPtr axis; + + if (axisNum >= pDev->valuator->numAxes) + return; + + axis = pDev->valuator->axes + axisNum; + + /* If a value range is defined, clip. If not, do nothing */ + if (axis->max_value <= axis->min_value) + return; + + if (*val < axis->min_value) + *val = axis->min_value; + if (*val > axis->max_value) + *val = axis->max_value; +} + +/** + * Clip every axis in the list of valuators to its bounds. + */ +static void +clipValuators(DeviceIntPtr pDev, ValuatorMask *mask) +{ + int i; + + for (i = 0; i < valuator_mask_size(mask); i++) + if (valuator_mask_isset(mask, i)) + { + int val = valuator_mask_get(mask, i); + clipAxis(pDev, i, &val); + valuator_mask_set(mask, i, val); + } +} + +/** + * Create the DCCE event (does not update the master's device state yet, this + * is done in the event processing). + * Pull in the coordinates from the MD if necessary. + * + * @param events Pointer to a pre-allocated event list. + * @param dev The slave device that generated an event. + * @param type Either DEVCHANGE_POINTER_EVENT and/or DEVCHANGE_KEYBOARD_EVENT + * @param num_events The current number of events, returns the number of + * events if a DCCE was generated. + * @return The updated @events pointer. + */ +EventListPtr +UpdateFromMaster(EventListPtr events, DeviceIntPtr dev, int type, int *num_events) +{ + DeviceIntPtr master; + + master = GetMaster(dev, (type & DEVCHANGE_POINTER_EVENT) ? MASTER_POINTER : MASTER_KEYBOARD); + + if (master && master->last.slave != dev) + { + CreateClassesChangedEvent(events, master, dev, type); + if (IsPointerDevice(master)) + { + updateSlaveDeviceCoords(master, dev); + master->last.numValuators = dev->last.numValuators; + } + master->last.slave = dev; + (*num_events)++; + events++; + } + return events; +} + +/** + * Move the device's pointer to the position given in the valuators. + * + * @param dev The device which's pointer is to be moved. + * @param x Returns the x position of the pointer after the move. + * @param y Returns the y position of the pointer after the move. + * @param mask Bit mask of valid valuators. + * @param valuators Valuator data for each axis between @first and + * @first+@num. + */ +static void +moveAbsolute(DeviceIntPtr dev, int *x, int *y, ValuatorMask *mask) +{ + int i; + + if (valuator_mask_isset(mask, 0)) + *x = valuator_mask_get(mask, 0); + else + *x = dev->last.valuators[0]; + + if (valuator_mask_isset(mask, 1)) + *y = valuator_mask_get(mask, 1); + else + *y = dev->last.valuators[1]; + + clipAxis(dev, 0, x); + clipAxis(dev, 1, y); + + for (i = 2; i < valuator_mask_size(mask); i++) + { + if (valuator_mask_isset(mask, i)) + { + dev->last.valuators[i] = valuator_mask_get(mask, i); + clipAxis(dev, i, &dev->last.valuators[i]); + } + } +} + +/** + * Move the device's pointer by the values given in @valuators. + * + * @param dev The device which's pointer is to be moved. + * @param x Returns the x position of the pointer after the move. + * @param y Returns the y position of the pointer after the move. + * @param mask Bit mask of valid valuators. + * @param valuators Valuator data for each axis between @first and + * @first+@num. + */ +static void +moveRelative(DeviceIntPtr dev, int *x, int *y, ValuatorMask *mask) +{ + int i; + + *x = dev->last.valuators[0]; + *y = dev->last.valuators[1]; + + if (valuator_mask_isset(mask, 0)) + *x += valuator_mask_get(mask, 0); + + if (valuator_mask_isset(mask, 1)) + *y += valuator_mask_get(mask, 1); + + /* if attached, clip both x and y to the defined limits (usually + * co-ord space limit). If it is attached, we need x/y to go over the + * limits to be able to change screens. */ + if (dev->valuator && (IsMaster(dev) || !IsFloating(dev))) { + if (valuator_get_mode(dev, 0) == Absolute) + clipAxis(dev, 0, x); + if (valuator_get_mode(dev, 1) == Absolute) + clipAxis(dev, 1, y); + } + + /* calc other axes, clip, drop back into valuators */ + for (i = 2; i < valuator_mask_size(mask); i++) + { + if (valuator_mask_isset(mask, i)) + { + dev->last.valuators[i] += valuator_mask_get(mask, i); + if (valuator_get_mode(dev, i) == Absolute) + clipAxis(dev, i, &dev->last.valuators[i]); + valuator_mask_set(mask, i, dev->last.valuators[i]); + } + } +} + +/** + * Accelerate the data in valuators based on the device's acceleration scheme. + * + * @param dev The device which's pointer is to be moved. + * @param valuators Valuator mask + * @param ms Current time. + */ +static void +accelPointer(DeviceIntPtr dev, ValuatorMask* valuators, CARD32 ms) +{ + if (dev->valuator->accelScheme.AccelSchemeProc) + dev->valuator->accelScheme.AccelSchemeProc(dev, valuators, ms); +} + +/** + * If we have HW cursors, this actually moves the visible sprite. If not, we + * just do all the screen crossing, etc. + * + * We scale from device to screen coordinates here, call + * miPointerSetPosition() and then scale back into device coordinates (if + * needed). miPSP will change x/y if the screen was crossed. + * + * The coordinates provided are always absolute. The parameter mode whether + * it was relative or absolute movement that landed us at those coordinates. + * + * @param dev The device to be moved. + * @param mode Movement mode (Absolute or Relative) + * @param x Pointer to current x-axis value, may be modified. + * @param y Pointer to current y-axis value, may be modified. + * @param x_frac Fractional part of current x-axis value, may be modified. + * @param y_frac Fractional part of current y-axis value, may be modified. + * @param scr Screen the device's sprite is currently on. + * @param screenx Screen x coordinate the sprite is on after the update. + * @param screeny Screen y coordinate the sprite is on after the update. + * @param screenx_frac Fractional part of screen x coordinate, as above. + * @param screeny_frac Fractional part of screen y coordinate, as above. + */ +static void +positionSprite(DeviceIntPtr dev, int mode, + int *x, int *y, float x_frac, float y_frac, + ScreenPtr scr, int *screenx, int *screeny, float *screenx_frac, float *screeny_frac) +{ + int old_screenx, old_screeny; + + /* scale x&y to screen */ + if (dev->valuator && dev->valuator->numAxes > 0) { + *screenx = rescaleValuatorAxis(*x, x_frac, screenx_frac, + dev->valuator->axes + 0, NULL, scr->width); + } else { + *screenx = dev->last.valuators[0]; + *screenx_frac = dev->last.remainder[0]; + } + + if (dev->valuator && dev->valuator->numAxes > 1) { + *screeny = rescaleValuatorAxis(*y, y_frac, screeny_frac, + dev->valuator->axes + 1, NULL, scr->height); + } else { + *screeny = dev->last.valuators[1]; + *screeny_frac = dev->last.remainder[1]; + } + + /* Hit the left screen edge? */ + if (*screenx <= 0 && *screenx_frac < 0.0f) + { + *screenx_frac = 0.0f; + x_frac = 0.0f; + } + if (*screeny <= 0 && *screeny_frac < 0.0f) + { + *screeny_frac = 0.0f; + y_frac = 0.0f; + } + + + old_screenx = *screenx; + old_screeny = *screeny; + /* This takes care of crossing screens for us, as well as clipping + * to the current screen. */ + miPointerSetPosition(dev, mode, screenx, screeny); + + if(!IsMaster(dev) || !IsFloating(dev)) { + DeviceIntPtr master = GetMaster(dev, MASTER_POINTER); + master->last.valuators[0] = *screenx; + master->last.valuators[1] = *screeny; + master->last.remainder[0] = *screenx_frac; + master->last.remainder[1] = *screeny_frac; + } + + if (dev->valuator) + { + /* Crossed screen? Scale back to device coordiantes */ + if(*screenx != old_screenx) + { + scr = miPointerGetScreen(dev); + *x = rescaleValuatorAxis(*screenx, *screenx_frac, &x_frac, NULL, + dev->valuator->axes + 0, scr->width); + } + if(*screeny != old_screeny) + { + scr = miPointerGetScreen(dev); + *y = rescaleValuatorAxis(*screeny, *screeny_frac, &y_frac, NULL, + dev->valuator->axes + 1, scr->height); + } + } + + /* dropy x/y (device coordinates) back into valuators for next event */ + dev->last.valuators[0] = *x; + dev->last.valuators[1] = *y; + dev->last.remainder[0] = x_frac; + dev->last.remainder[1] = y_frac; +} + +/** + * Update the motion history for the device and (if appropriate) for its + * master device. + * @param dev Slave device to update. + * @param mask Bit mask of valid valuators to append to history. + * @param num Total number of valuators to append to history. + * @param ms Current time + */ +static void +updateHistory(DeviceIntPtr dev, ValuatorMask *mask, CARD32 ms) +{ + if (!dev->valuator) + return; + + updateMotionHistory(dev, ms, mask, dev->last.valuators); + if(!IsMaster(dev) || !IsFloating(dev)) + { + DeviceIntPtr master = GetMaster(dev, MASTER_POINTER); + updateMotionHistory(master, ms, mask, dev->last.valuators); + } +} + +/** + * Convenience wrapper around GetKeyboardValuatorEvents, that takes no + * valuators. + */ +int +GetKeyboardEvents(EventList *events, DeviceIntPtr pDev, int type, int key_code) { + ValuatorMask mask; + + valuator_mask_zero(&mask); + return GetKeyboardValuatorEvents(events, pDev, type, key_code, &mask); +} + + +/** + * Returns a set of InternalEvents for KeyPress/KeyRelease, optionally + * also with valuator events. + * + * events is not NULL-terminated; the return value is the number of events. + * The DDX is responsible for allocating the event structure in the first + * place via GetMaximumEventsNum(), and for freeing it. + */ +int +GetKeyboardValuatorEvents(EventList *events, DeviceIntPtr pDev, int type, + int key_code, const ValuatorMask *mask_in) { + int num_events = 0; + CARD32 ms = 0; + DeviceEvent *event; + RawDeviceEvent *raw; + ValuatorMask mask; + + /* refuse events from disabled devices */ + if (!pDev->enabled) + return 0; + + if (!events ||!pDev->key || !pDev->focus || !pDev->kbdfeed || + (type != KeyPress && type != KeyRelease) || + (key_code < 8 || key_code > 255)) + return 0; + + num_events = 1; + + events = UpdateFromMaster(events, pDev, DEVCHANGE_KEYBOARD_EVENT, &num_events); + + /* Handle core repeating, via press/release/press/release. */ + if (type == KeyPress && key_is_down(pDev, key_code, KEY_POSTED)) { + /* If autorepeating is disabled either globally or just for that key, + * or we have a modifier, don't generate a repeat event. */ + if (!pDev->kbdfeed->ctrl.autoRepeat || + !key_autorepeats(pDev, key_code) || + pDev->key->xkbInfo->desc->map->modmap[key_code]) + return 0; + } + + ms = GetTimeInMillis(); + + raw = (RawDeviceEvent*)events->event; + events++; + num_events++; + + valuator_mask_copy(&mask, mask_in); + + init_raw(pDev, raw, ms, type, key_code); + set_raw_valuators(raw, &mask, raw->valuators.data_raw); + + clipValuators(pDev, &mask); + + set_raw_valuators(raw, &mask, raw->valuators.data); + + event = (DeviceEvent*) events->event; + init_event(pDev, event, ms); + event->detail.key = key_code; + + if (type == KeyPress) { + event->type = ET_KeyPress; + set_key_down(pDev, key_code, KEY_POSTED); + } + else if (type == KeyRelease) { + event->type = ET_KeyRelease; + set_key_up(pDev, key_code, KEY_POSTED); + } + + clipValuators(pDev, &mask); + + set_valuators(pDev, event, &mask); + + return num_events; +} + +/** + * Initialize an event list and fill with 32 byte sized events. + * This event list is to be passed into GetPointerEvents() and + * GetKeyboardEvents(). + * + * @param num_events Number of elements in list. + */ +EventListPtr +InitEventList(int num_events) +{ + EventListPtr events; + int i; + + events = (EventListPtr)calloc(num_events, sizeof(EventList)); + if (!events) + return NULL; + + for (i = 0; i < num_events; i++) + { + events[i].evlen = sizeof(InternalEvent); + events[i].event = calloc(1, sizeof(InternalEvent)); + if (!events[i].event) + { + /* rollback */ + while(i--) + free(events[i].event); + free(events); + events = NULL; + break; + } + } + + return events; +} + +/** + * Free an event list. + * + * @param list The list to be freed. + * @param num_events Number of elements in list. + */ +void +FreeEventList(EventListPtr list, int num_events) +{ + if (!list) + return; + while(num_events--) + free(list[num_events].event); + free(list); +} + +static void +transformAbsolute(DeviceIntPtr dev, ValuatorMask *mask) +{ + struct pixman_f_vector p; + + /* p' = M * p in homogeneous coordinates */ + p.v[0] = (valuator_mask_isset(mask, 0) ? valuator_mask_get(mask, 0) : + dev->last.valuators[0]); + p.v[1] = (valuator_mask_isset(mask, 1) ? valuator_mask_get(mask, 1) : + dev->last.valuators[1]); + p.v[2] = 1.0; + + pixman_f_transform_point(&dev->transform, &p); + + if (lround(p.v[0]) != dev->last.valuators[0]) + valuator_mask_set(mask, 0, lround(p.v[0])); + if (lround(p.v[1]) != dev->last.valuators[1]) + valuator_mask_set(mask, 1, lround(p.v[1])); +} + +/** + * Generate a series of InternalEvents (filled into the EventList) + * representing pointer motion, or button presses. + * + * events is not NULL-terminated; the return value is the number of events. + * The DDX is responsible for allocating the event structure in the first + * place via InitEventList() and GetMaximumEventsNum(), and for freeing it. + * + * In the generated events rootX/Y will be in absolute screen coords and + * the valuator information in the absolute or relative device coords. + * + * last.valuators[x] of the device is always in absolute device coords. + * last.valuators[x] of the master device is in absolute screen coords. + * + * master->last.valuators[x] for x > 2 is undefined. + */ +int +GetPointerEvents(EventList *events, DeviceIntPtr pDev, int type, int buttons, + int flags, const ValuatorMask *mask_in) { + int num_events = 1; + CARD32 ms; + DeviceEvent *event; + RawDeviceEvent *raw; + int x = 0, y = 0, /* device coords */ + cx, cy; /* only screen coordinates */ + float x_frac = 0.0, y_frac = 0.0, cx_frac, cy_frac; + ScreenPtr scr = miPointerGetScreen(pDev); + ValuatorMask mask; + + /* refuse events from disabled devices */ + if (!pDev->enabled) + return 0; + + if (!scr) + return 0; + + switch (type) + { + case MotionNotify: + if (!mask_in || valuator_mask_num_valuators(mask_in) <= 0) + return 0; + break; + case ButtonPress: + case ButtonRelease: + if (!pDev->button || !buttons) + return 0; + break; + default: + return 0; + } + + ms = GetTimeInMillis(); /* before pointer update to help precision */ + + events = UpdateFromMaster(events, pDev, DEVCHANGE_POINTER_EVENT, &num_events); + + raw = (RawDeviceEvent*)events->event; + events++; + num_events++; + + valuator_mask_copy(&mask, mask_in); + + init_raw(pDev, raw, ms, type, buttons); + set_raw_valuators(raw, &mask, raw->valuators.data_raw); + + if (flags & POINTER_ABSOLUTE) + { + if (flags & POINTER_SCREEN) /* valuators are in screen coords */ + { + int scaled; + + if (valuator_mask_isset(&mask, 0)) + { + scaled = rescaleValuatorAxis(valuator_mask_get(&mask, 0), + 0.0, &x_frac, NULL, + pDev->valuator->axes + 0, + scr->width); + valuator_mask_set(&mask, 0, scaled); + } + if (valuator_mask_isset(&mask, 1)) + { + scaled = rescaleValuatorAxis(valuator_mask_get(&mask, 1), + 0.0, &y_frac, NULL, + pDev->valuator->axes + 1, + scr->height); + valuator_mask_set(&mask, 1, scaled); + } + } + + transformAbsolute(pDev, &mask); + moveAbsolute(pDev, &x, &y, &mask); + } else { + if (flags & POINTER_ACCELERATE) { + accelPointer(pDev, &mask, ms); + /* The pointer acceleration code modifies the fractional part + * in-place, so we need to extract this information first */ + x_frac = pDev->last.remainder[0]; + y_frac = pDev->last.remainder[1]; + } + moveRelative(pDev, &x, &y, &mask); + } + + set_raw_valuators(raw, &mask, raw->valuators.data); + + positionSprite(pDev, (flags & POINTER_ABSOLUTE) ? Absolute : Relative, + &x, &y, x_frac, y_frac, scr, &cx, &cy, &cx_frac, &cy_frac); + updateHistory(pDev, &mask, ms); + + /* Update the valuators with the true value sent to the client*/ + if (valuator_mask_isset(&mask, 0)) + valuator_mask_set(&mask, 0, x); + if (valuator_mask_isset(&mask, 1)) + valuator_mask_set(&mask, 1, y); + + clipValuators(pDev, &mask); + + event = (DeviceEvent*) events->event; + init_event(pDev, event, ms); + + if (type == MotionNotify) { + event->type = ET_Motion; + event->detail.button = 0; + } + else { + if (type == ButtonPress) { + event->type = ET_ButtonPress; + set_button_down(pDev, buttons, BUTTON_POSTED); + } + else if (type == ButtonRelease) { + event->type = ET_ButtonRelease; + set_button_up(pDev, buttons, BUTTON_POSTED); + } + event->detail.button = buttons; + } + + event->root_x = cx; /* root_x/y always in screen coords */ + event->root_y = cy; + event->root_x_frac = cx_frac; + event->root_y_frac = cy_frac; + + set_valuators(pDev, event, &mask); + + return num_events; +} + + +/** + * Generate ProximityIn/ProximityOut InternalEvents, accompanied by + * valuators. + * + * events is not NULL-terminated; the return value is the number of events. + * The DDX is responsible for allocating the event structure in the first + * place via GetMaximumEventsNum(), and for freeing it. + */ +int +GetProximityEvents(EventList *events, DeviceIntPtr pDev, int type, const ValuatorMask *mask_in) +{ + int num_events = 1, i; + DeviceEvent *event; + ValuatorMask mask; + + /* refuse events from disabled devices */ + if (!pDev->enabled) + return 0; + + /* Sanity checks. */ + if ((type != ProximityIn && type != ProximityOut) || !mask_in) + return 0; + if (!pDev->valuator) + return 0; + + valuator_mask_copy(&mask, mask_in); + + /* ignore relative axes for proximity. */ + for (i = 0; i < valuator_mask_size(&mask); i++) + { + if (valuator_mask_isset(&mask, i) && + valuator_get_mode(pDev, i) == Relative) + valuator_mask_unset(&mask, i); + } + + /* FIXME: posting proximity events with relative valuators only results + * in an empty event, EventToXI() will fail to convert → no event sent + * to client. */ + + events = UpdateFromMaster(events, pDev, DEVCHANGE_POINTER_EVENT, &num_events); + + event = (DeviceEvent *) events->event; + init_event(pDev, event, GetTimeInMillis()); + event->type = (type == ProximityIn) ? ET_ProximityIn : ET_ProximityOut; + + clipValuators(pDev, &mask); + + set_valuators(pDev, event, &mask); + + return num_events; +} + +/** + * Synthesize a single motion event for the core pointer. + * + * Used in cursor functions, e.g. when cursor confinement changes, and we need + * to shift the pointer to get it inside the new bounds. + */ +void +PostSyntheticMotion(DeviceIntPtr pDev, + int x, + int y, + int screen, + unsigned long time) +{ + DeviceEvent ev; + +#ifdef PANORAMIX + /* Translate back to the sprite screen since processInputProc + will translate from sprite screen to screen 0 upon reentry + to the DIX layer. */ + if (!noPanoramiXExtension) { + x += screenInfo.screens[0]->x - screenInfo.screens[screen]->x; + y += screenInfo.screens[0]->y - screenInfo.screens[screen]->y; + } +#endif + + memset(&ev, 0, sizeof(DeviceEvent)); + init_event(pDev, &ev, time); + ev.root_x = x; + ev.root_y = y; + ev.type = ET_Motion; + ev.time = time; + + /* FIXME: MD/SD considerations? */ + (*pDev->public.processInputProc)((InternalEvent*)&ev, pDev); +} diff --git a/xorg-server/xkeyboard-config/rules/base.xml.in b/xorg-server/xkeyboard-config/rules/base.xml.in index 941183efb..90cec4a32 100644 --- a/xorg-server/xkeyboard-config/rules/base.xml.in +++ b/xorg-server/xkeyboard-config/rules/base.xml.in @@ -5459,7 +5459,7 @@ <option> <configItem> <name>lv3:bksl_switch_latch</name> - <_description>Backslash chooses 3rd level, latches when pressed together with another 3rd-level-chooser)</_description> + <_description>Backslash (chooses 3rd level, latches when pressed together with another 3rd-level-chooser)</_description> </configItem> </option> <option> |