From f3a2bb9a6aa842b162001774e21e9021e244a08d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= Date: Wed, 25 Oct 2023 08:34:30 +0300 Subject: [PATCH 01/18] Sketch something for cl_khr_tensor --- ext/cl_khr_tensor.asciidoc | 547 ++++++++++++++++ ext/cl_khr_tensor.html | 1228 ++++++++++++++++++++++++++++++++++++ 2 files changed, 1775 insertions(+) create mode 100644 ext/cl_khr_tensor.asciidoc create mode 100644 ext/cl_khr_tensor.html diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc new file mode 100644 index 000000000..cd17a42bb --- /dev/null +++ b/ext/cl_khr_tensor.asciidoc @@ -0,0 +1,547 @@ +// Copyright 2023 The Khronos Group. This work is licensed under a +// Creative Commons Attribution 4.0 International License; see +// http://creativecommons.org/licenses/by/4.0/ += cl_khr_tensor + +:source-highlighter: coreray + +[[cl_khr_tensor]] +== Tensor Data Type + +Purpose of this extension is to provide ... + +=== General information + +==== Name Strings + +`cl_khr_tensor` + +==== Version history + +[cols="1,1,3",options="header",] +|==== +| *Date* | *Version* | *Description* +| 2023-10-XX | 0.1.0 | First assigned version. +|==== + +==== Dependencies + +This extension is written against the OpenCL Specification version 3.0.14. + +This extension requires OpenCL 1.2 or later. + +This extension requires cl_khr_command_buffer. + +==== Contributors + +Henry Linjamäki, Intel. + + +=== Overview + + +=== Modifications to OpenCL + +==== New OpenCL Functions + +To create a tensor use: + +[source,c] +---- +cl_tensor clCreateTensor( + cl_context context, + const cl_tensor_peoperties *properties, + size_t rank, + size_t shape, + cl_tensor_type dtype, + cl_int *errcode_ret); +---- + +* _context_ is a valid OpenCL context used to create the tensor object. + +* _properties_ is an optional list of properties for the tensor object + and their corresponding values. The list is terminated with the + special property 0. If no properties are required, properties may be + NULL. + +* _rank_ is the number of dimensions. Zero value creates a "scalar" + tensor which has no dimensions but has storage for one element. + +* _shape_ is a list of sizes of the dimensions. The length of the list + must be _rank_ elements. _shape_ can be NULL if _rank_ value is + zero. All the first _rank_ values in the list must be non-zero. + +* _dtype_ is the element type of _tensor_. Refer to the + <> table for the types. + +* _errcode_ret_ may return an appropriate error code. If errcode_ret + is NULL, no error code is returned. + +clCreateTensor function creates a `rank`-dimensional tensor with +`shape[0] * shape[1] * ... * shape[rank-1]` elements of _dtype_ +type. At the creation time of the tensor, it does not have +storage. The storage is assigned to the tensor either by: + +* calling clCreateBufferWithProperties() with CL_MEM_BIND_TO_TENSOR or + +* automatically by command buffers - possibly on-demand basis - if the + tensor is created with CL_TENSOR_COMMAND_BUFFER_TEMPORARY property + set on. + +A command that refers to a tensor must be bound to a valid buffer +object before enqueuing the command into a command queue unless the +command is recorded in a command buffer and +CL_TENSOR_COMMAND_BUFFER_TEMPORARY is set to true. + +*clCreateTensor* returns a valid non-zero tensor object and errcode_ret +is set to CL_SUCCESS if the tensor object is created +successfully. Otherwise, they return a NULL value with one of the +following error values returned in errcode_ret: + +* CL_INVALID_CONTEXT if context is not a valid context. + +* CL_INVALID_PROPERTY if a property name in properties is not a + supported property name, if the value specified for a supported + property name is not valid, or if the same property name is + specified more than once. + +* CL_INVALID_VALUE if a value specified in dtype is invalid. + +* CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources + required by the OpenCL implementation on the host. + +.Tensor element types +[cols="1,2",stripes=odd] +[#TensorDtypes] +|=== +| *Tensor element data type* | *Description* + +| CL_TENSOR_BOOL | 1-bit signedless integer. +| CL_TENSOR_INT8 | 8-bit signed integer. +| CL_TENSOR_INT16 | 16-bit signed integer. +| CL_TENSOR_INT32 | 32-bit signed integer. +| CL_TENSOR_INT64 | 64-bit signed integer. +| CL_TENSOR_UINT8 | 8-bit signed integer. +| CL_TENSOR_UINT16 | 16-bit signed integer. +| CL_TENSOR_UINT32 | 32-bit signed integer. +| CL_TENSOR_UINT64 | 64-bit signed integer. +| CL_TENSOR_HALF | Half precision floating-point value. +| CL_TENSOR_BFLOAT16 | 16-bit brain floating-point value. +| CL_TENSOR_FLOAT | Single precision floating-point value. +| CL_TENSOR_DOUBLE | Double precision floating-point value. +| CL_TENSOR_COMPLEX64 | 64-bit complex floating point value with + 32-bit real and imaginary part. +| CL_TENSOR_COMPLEX128 | 128-bit complex floating point value with + 64-bit real and imaginary part. +|=== + +.Tensor properties +[cols="2,1,2",stripes=odd] +|=== +| *Tensor Property* | *Property Value* | *Description* + +| CL_TENSOR_COMMAND_BUFFER_TEMPORARY | cl_bool + +a| If the value is true, create a "temporary" tensor that only can be +used on commands recorded in command buffers. The storage of the +temporary tensors are managed by command buffers. When a temporary +tensor is used by multiple command buffer, the tensor receive separate +storage for each command buffer. + +// IOW, Data may not be exchanged between command buffers through +// temporary tensors. + +Temporary tensors may not be bound to buffer objects. + +Data stored in temporary tensors are not preserved across command +buffer executions. +|=== + +To retain a tensor object, call the function + +[source,c] +---- +cl_int clRetainTensorObject( + cl_tensor tensor); +---- + +* _tensor_ is the tensor object to be retained. + +The _tensor_ reference count is incremented. + +*clRetainTensor* returns CL_SUCCESS if the function is executed +successfully. Otherwise, it returns one of the following errors: + +* CL_INVALID_TENSOR if tensor is not a valid tensor object. + +To release a tensor object, call the function + +[source,c] +---- +cl_int clReleaseTensorObject( + cl_tensor tensor); +---- + +* _tensor_ is the tensor object to be released. + +The _tensor_ reference count is decremented. + +The tensor object is deleted once the number of instances that are +retained to tensor become zero and the tensor object is no longer +needed by any enqueued or recorded commands that use _tensor_. Using +this function to release a reference that was not obtained by creating +the object or by calling *clRetainTensor* causes undefined behavior. + +*clReleaseTensor* returns CL_SUCCESS if the function is executed +successfully. Otherwise, it returns one of the following errors: + +* CL_INVALID_TENSOR if tensor is not a valid tensor object. + +// TODO: add clSetTensorObjectDestructorCallback? + +To return information about a tensor object, call the function + +[source,c] +---- +cl_int clGetTensorInfo( + cl_tensor tensor, + cl_tensor_info param_name, + size_t param_value_size, + void* param_value, + size_t* param_value_size_ret); +---- + +* _tensor_ specifies the tensor object being queried. + +* _param_name_ specifies the information to query. The list of + supported param_name types and the information returned in + _param_value_ by clGetTensorInfo is described in the <> table. + +* _param_value_ is a pointer to memory where the appropriate result + being queried is returned. If _param_value_ is NULL, it is ignored. + +* _param_value_size_ is used to specify the size in bytes of memory + pointed to by _param_value_. This size must be ≥ size of return type + as described in the <> table. + +* _param_value_size_ret_ returns the actual size in bytes of data + being queried by _param_name_. If _param_value_size_ret_ is NULL, it is + ignored. + +*clGetTensorInfo* returns CL_SUCCESS if the function is executed + succesfully. Otherwise, it returns one of the following errors: + +* CL_INVALID_TENSOR if _tensor_ is not a valid tensor object. + +[#Tensor Object Quaries] +.List of supported param_names by clGetTensorInfo +[cols="2,1,2",stripes=odd] +|=== +| CL_TENSOR_RANK | size_t | Return the tensor rank. +| CL_TENSOR_SHAPE | size_t[] | Return the tensor shape. +| CL_TENSOR_DTYPE | cl_tensor_type | Return the tensor data type. + +| CL_TENSOR_COMMAND_BUFFER_TEMPORARY | cl_bool | Return true if the +tensor is temporary tensor for command buffers. + +| CL_TENSOR_BOUND_TO_BUFFER | cl_bool | Return true if the tensor is +bound to a buffer. If CL_TENSOR_COMMAND_BUFFER_TEMPORARY is true, then +CL_TENSOR_BOUND_TO_BUFFER must return false. + +| CL_TENSOR_BUFFER | cl_mem a| If CL_TENSOR_BOUND_TO_BUFFER is true, +return the buffer object the tensor is bound to. Otherwise, +clGetTensorInfo call returns: + +* CL_INVALID_MEM_OBJECT if the tensor is not bound to a buffer object. + +* CL_INVALID_PROPERTY otherwise. + +| CL_TENSOR_CONTEXT | cl_context | Return the context specified when + the tensor object is created. + +| CL_TENSOR_REFERENCE_COUNT | cl_uint | Return the tensor reference +count. +|=== + +To read from a tensor to host memory / buffer object or to write to a +tensor object from host memory / buffer object call one of the functions. + +[source,c] +---- +cl_int clEnqueueReadTensor( + cl_command_queue command_queue, + cl_tensor tensor, + cl_bool blocking_command, + cl_mem buffer, + void* host_ptr, + cl_uint num_events_in_wait_list, + const cl_event* event_wait_list, + cl_event* event); +---- + +[source,c] +---- +cl_int clEnqueueWriteTensor( + cl_command_queue command_queue, + cl_tensor tensor, + cl_bool blocking_command, + cl_mem buffer, + void* host_ptr, + cl_uint num_events_in_wait_list, + const cl_event* event_wait_list, + cl_event* event); +---- + +* _command_queue_ is a valid host command-queue in which the read / + write command will be queued. _command_queue_ and _tensor_ must be + created with the same OpenCL context. + +* _tensor_ refers to a valid tensor object which is bound to a buffer. + +* _blocking_command_ indicate if the read and write operations are + blocking or non-blocking (see below). + +* _buffer_ refers to a valid buffer object where data is to be + read into or to be written from when the value of _host_ptr_ is + NULL. If _host_ptr_ is non-NULL then value of _buffer_ is ignored. + +* _host_ptr_ is the pointer to buffer in host memory where data is to + be read into or to be written from when the value is non-NULL. + +* _event_wait_list_ and _num_events_in_wait_list_ specify events that + need to complete before this particular command can be executed. If + _event_wait_list_ is NULL, then this particular command does not + wait on any event to complete. If _event_wait_list_ is NULL, + _num_events_in_wait_list_ must be 0. If _event_wait_list_ is not + NULL, the list of events pointed to by _event_wait_list_ must be + valid and _num_events_in_wait_list_ must be greater than 0. The + events specified in _event_wait_list_ act as synchronization + points. The context associated with events in _event_wait_list_ and + _command_queue_ must be the same. The memory associated with + _event_wait_list_ can be reused or freed after the function returns. + +* _event_ returns an event object that identifies this read / write + command and can be used to query or queue a wait for this command to + complete. If _event_ is NULL or the enqueue is unsuccessful, no + event will be created and therefore it will not be possible to query + the status of this command or to wait for this command to + complete. If _event_wait_list_ and _event_ are not NULL, _event_ + must not refer to an element of the _event_wait_list_ array. + +For a read and write operation, the elements of N-dimensional tensor are +related to host memory / buffer object as followed: + +---- +tensor.element(i0, i1, ..., i, i)) == (tensor.dtype)buffer_or_host_ptr[ + i0 * tensor.shape[1] * tensor.shape[2] * ... * tensor.shape[N-1] + + i1 * tensor.shape[2] * tensor.shape[3] * ... * tensor.shape[N-1] + + ... + + i * tensor.shape[i(N-1)] + + i] +---- + +Where `iX` is a tensor coordinate index with inclusive range of `0..`. + +// TODO: add clEnqueueCopyTensor + +// TODO: add clEnqueueFillTensor? + +// TODO: add command buffer variants for clEnqueue{copy,read,write}Tensor. + + +==== Add New Buffer Property in Section 5.2.1 + +[cols="2,1,2",stripes=odd] +|=== +| CL_MEM_BIND_TO_TENSOR | cl_tensor a| Use the created buffer as +storage for the given valid tensor. To succeed creating the buffer, +the target tensor may not have storage already, must not have +CL_TENSOR_COMMAND_BUFFER_TEMPORARY property set on and _size_ argument +of the clCreateBufferWithProperties() must be zero. + +Size of the memory buffer is implementation-defined and it can be +queried with clGetTensorInfo(). + +Memory layout of the tensor in the created memory buffer is +implementation-defined and opaque to the applications and it may +change at unspecified points. Implementation may store auxiliary data +in the memory buffer for the tensor. Therefore, writing data into the +memory buffer directly using the cl_mem handle leads to undefined +behavior. + +If the tensor is already bound to a buffer object, +clCreateBufferWithProperties call returns CL_TENSOR_BOUND_TO_BUFFER +error code. +|=== + +=== Sample Codes + +Helper functions used in the follow up tensor code samples: + +[source,c] +---- +cl_kernel create_matmul_kernel( + cl_context ctx, std::span device_span, + cl_tensor lhs, cl_tensor rhs, cl_tensor out) { + // A hypothetical matmul kernel signature in pseudo OpenCL C for + // illustrative purposes: + // + // kernel void matmul( + // global read_only tensor_t, + // global read_only tensor_t, + // global write_only tensor_t); + + cl_kernel matmul_kernel = /* Omitted. */; + clSetKernelArg(matmul_kernel, 0, sizeof(cl_tensor), &lhs); + clSetKernelArg(matmul_kernel, 1, sizeof(cl_tensor), &rhs); + clSetKernelArg(matmul_kernel, 2, sizeof(cl_tensor), &out); + return matmul_kernel; +} + +cl_kernel create_matmul_kernel( + cl_context ctx, std::span device_span, + cl_tensor lhs, cl_tensor rhs, cl_tensor out) { + // A hypothetical add kernel signature in pseudo OpenCL C for illustrative + // purposes: + // + // kernel void add( + // global read_only tensor_t, + // global read_only tensor_t, + // global write_only tensor_t); + + cl_tensor add_kernel = /* Omitted. */; + clSetKernelArg(add_kernel, 0, sizeof(cl_tensor), &lhs); + clSetKernelArg(add_kernel, 1, sizeof(cl_tensor), &rhs); + clSetKernelArg(add_kernel, 2, sizeof(cl_tensor), &out); + return add_kernel; +} +---- +An example usage of tensors on a command queue: + +[source,c] +---- +constexpr size_t b = 64, m = 100, n = 200, k = 50; + +cl_tensor in0 = clCreateTensor(ctx, nullptr, 3, {b, m, k}, CL_TENSOR_FLOAT, err); +cl_tensor in1 = clCreateTensor(ctx, nullptr, 3, {b, k, n}, CL_TENSOR_FLOAT, err); +cl_tensor in2 = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err); +cl_tensor t0 = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err); +cl_tensor out = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err); + +cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0); +cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out); + +// Allocate storage for the tensors. The buffer size must be set to zero +// when the buffer is bound to a tensor. OpenCL implementation may +// determine optimal data layout and the storage needed for it, based +// on the tensor's uses (matmul kernel in this sample) so far. +cl_int err; +cl_mem in0_mem = clCreateBufferWithProperties( + ctx, {CL_MEM_BIND_TO_TENSOR, in0, 0}, CL_MEM_READ_ONLY, + 0 /* must be zero for CL_MEM_BIND_TO_TENSOR. */, nullptr, &err); +cl_mem in1_mem = clCreateBufferWithProperties( + ctx, {CL_MEM_BIND_TO_TENSOR, in1, 0}, CL_MEM_READ_ONLY, + 0, nullptr, &err); +cl_mem in2_mem = clCreateBufferWithProperties( + ctx, {CL_MEM_BIND_TO_TENSOR, in2, 0}, CL_MEM_READ_ONLY, + 0, nullptr, &err); +cl_mem t0_mem = clCreateBufferWithProperties( + ctx, {CL_MEM_BIND_TO_TENSOR, t0, 0}, CL_MEM_READ_WRITE, + 0, nullptr, &err); +cl_mem out_mem = clCreateBufferWithProperties( + ctx, {CL_MEM_BIND_TO_TENSOR, out, 0}, CL_MEM_WRITE_ONLY, + 0, nullptr, &err); + +std::vector in0_data = ...; +std::vector in1_data = ...; +std::vector out_data(b * m * n); + +// Copies data into in0 tensor while possibly rearranging the data to the +// optimal data layout. +clEnqueueWriteTensor( + cmd_q, in0, false, nullptr, nullptr, {b, m, k}, nullptr, in0_data.data(), + 0, nullptr, nullptr); + +clEnqueueWriteTensor( + cmd_q, in1, false, nullptr, nullptr, {b, k, n}, nullptr, in1_data.data(), + 0, nullptr, nullptr); +clEnqueueNDRangeKernel( + cmd_q, matmul_kernel, 0, nullptr, nullptr, nullptr, 0, nullptr, nullptr); +clEnqueueNDRangeKernel( + cmd_q, add_kernel, 0, nullptr, nullptr, nullptr, 0, nullptr, nullptr); +clEnqueueReadTensor( + cmd_q, out, false, nullptr, nullptr, {b, m, n}, nullptr, out_data.data(), + 0, nullptr, nullptr); +---- + +An example use of tensors in a command buffer when cl_khr_command_buffer +extension is supported: + +[source,c] +---- +constexpr size_t b = 64, m = 100, n = 200, k = 50; + +cl_int err; +// Create tensors which are used as temporaries in a command buffer. +// Command buffers allocate space for them as needed. +// +// NOTE: same temporary tensor handle used in multiple command buffers +// will have separate storage. IOW, command buffers may not exchange +// data via temporary buffers between them. +cl_tensor in0 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0}, + 3, {b, m, k}, CL_TENSOR_FLOAT, err); +cl_tensor in1 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0}, + 3, {b, k, n}, CL_TENSOR_FLOAT, err); +cl_tensor in2 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0}, + 3, {b, m, n}, CL_TENSOR_FLOAT, err); +cl_tensor t0 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0}, + 3, {b, m, n}, CL_TENSOR_FLOAT, err); +cl_tensor out = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0}, + 3, {b, m, n}, CL_TENSOR_FLOAT, err); + +cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0); +cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out); + +// Binding a buffer to temporary tensor is not allowed. +auto ignored = clCreateBufferWithProperties( + ctx, {CL_MEM_BIND_TO_TENSOR, t0, 0}, CL_MEM_READ_WRITE, 0, nullptr, &err); +assert(err == CL_TENSOR_IS_TEMPORARY) + +std::vector in0_data = ...; +std::vector in1_data = ...; +std::vector out_data(b * m * n); + +cl_command_buffer_khr cb = + clCreateCommandBufferKHR(num_queues, queue_list, nullptr, &err); + +cl_sync_point_khr in0_syncp, in1_syncp, matmul_syncp, add_syncp; +clCommandWriteTensorKHR( + cmd_b, cmd_q, in0, false, nullptr, nullptr, {b, m, k}, nullptr, + in0_data.data(), 0, nullptr, &in0_syncp); +clCommandWriteTensorKHR( + cmd_b, cmd_q, in1, false, nullptr, nullptr, {b, k, m}, nullptr, + in1_data.data(), 0, nullptr, &in1_syncp); +clCommandNDRangeKernelKHR( + cmd_b, cmd_q, nullptr, matmul_kernel, 0, nullptr, nullptr, nullptr, + 2, {in0_syncp, in2_syncp}, &matmul_syncp, nullptr); +clCommandNDRangeKernelKHR( + cmd_b, cmd_q, nullptr, add_kernel, 0, nullptr, nullptr, nullptr, + 1, {matmul_syncp}, &add_syncp, nullptr); +clCommandReadTensorKHR( + cmd_b, cmd_q, out, false, nullptr, nullptr, {b, k, m}, nullptr, + out_data.data(), 1, {add_syncp}, nullptr); + +// Finalize the command buffer. At this point the OpenCL +// implementation may reserve enough storage for all the tensor +// temporaries. Temporary tensors might be eliminated - for example, +// OpenCL implementation could use 'out' tensor to store result of +// matmul_kernel , thus, eliminating the need of 't0' tensor. +clFinalizeCommandBufferKHR(cmd_b); + +// Temporary tensors used in a command buffer can't be read or written +// into. A hypothetical reason is that the finalized command buffer +// might not use some of the tensor. +assert(clEnqueueReadTensor(..., t0, ...) == CL_INVALID_OPERATION); +---- + +=== Open Questions === diff --git a/ext/cl_khr_tensor.html b/ext/cl_khr_tensor.html new file mode 100644 index 000000000..878925489 --- /dev/null +++ b/ext/cl_khr_tensor.html @@ -0,0 +1,1228 @@ + + + + + + + +cl_khr_tensor + + + + + +
+
+

Tensor Data Type

+
+
+

Purpose of this extension is to provide …​

+
+
+

General information

+
+

Name Strings

+
+

cl_khr_tensor

+
+
+
+

Version history

+ +++++ + + + + + + + + + + + + + + +
DateVersionDescription

2023-10-XX

0.1.0

First assigned version.

+
+
+

Dependencies

+
+

This extension is written against the OpenCL Specification version 3.0.14.

+
+
+

This extension requires OpenCL 1.2 or later.

+
+
+

This extension requires cl_khr_command_buffer.

+
+
+
+

Contributors

+
+

Henry Linjamäki, Intel.

+
+
+
+
+

Overview

+ +
+
+

Modifications to OpenCL

+
+

New OpenCL Functions

+
+

To create a tensor use:

+
+
+
+
cl_tensor clCreateTensor(
+    cl_context context,
+    const cl_tensor_peoperties *properties,
+    size_t rank,
+    size_t shape,
+    cl_tensor_type dtype,
+    cl_int *errcode_ret);
+
+
+
+
    +
  • +

    context is a valid OpenCL context used to create the tensor object.

    +
  • +
  • +

    properties is an optional list of properties for the tensor object +and their corresponding values. The list is terminated with the +special property 0. If no properties are required, properties may be +NULL.

    +
  • +
  • +

    rank is the number of dimensions. Zero value creates a "scalar" +tensor which has no dimensions but has storage for one element.

    +
  • +
  • +

    shape is a list of sizes of the dimensions. The length of the list +must be rank elements. shape can be NULL if rank value is +zero. All the first rank values in the list must be non-zero.

    +
  • +
  • +

    dtype is the element type of tensor. Refer to the +Tensor element types table for the types.

    +
  • +
  • +

    errcode_ret may return an appropriate error code. If errcode_ret +is NULL, no error code is returned.

    +
  • +
+
+
+

clCreateTensor function creates a rank-dimensional tensor with +shape[0] * shape[1] * …​ * shape[rank-1] elements of dtype +type. At the creation time of the tensor, it does not have +storage. The storage is assigned to the tensor either by:

+
+
+
    +
  • +

    calling clCreateBufferWithProperties() with CL_MEM_BIND_TO_TENSOR or

    +
  • +
  • +

    automatically by command buffers - possibly on-demand basis - if the +tensor is created with CL_TENSOR_COMMAND_BUFFER_TEMPORARY property +set on.

    +
  • +
+
+
+

A command that refers to a tensor must be bound to a valid buffer +object before enqueuing the command into a command queue unless the +command is recorded in a command buffer and +CL_TENSOR_COMMAND_BUFFER_TEMPORARY is set to true.

+
+
+

clCreateTensor returns a valid non-zero tensor object and errcode_ret +is set to CL_SUCCESS if the tensor object is created +successfully. Otherwise, they return a NULL value with one of the +following error values returned in errcode_ret:

+
+
+
    +
  • +

    CL_INVALID_CONTEXT if context is not a valid context.

    +
  • +
  • +

    CL_INVALID_PROPERTY if a property name in properties is not a +supported property name, if the value specified for a supported +property name is not valid, or if the same property name is +specified more than once.

    +
  • +
  • +

    CL_INVALID_VALUE if a value specified in dtype is invalid.

    +
  • +
  • +

    CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources +required by the OpenCL implementation on the host.

    +
  • +
+
+ + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 1. Tensor element types
Tensor element data typeDescription

CL_TENSOR_BOOL

1-bit signedless integer.

CL_TENSOR_INT8

8-bit signed integer.

CL_TENSOR_INT16

16-bit signed integer.

CL_TENSOR_INT32

32-bit signed integer.

CL_TENSOR_INT64

64-bit signed integer.

CL_TENSOR_UINT8

8-bit signed integer.

CL_TENSOR_UINT16

16-bit signed integer.

CL_TENSOR_UINT32

32-bit signed integer.

CL_TENSOR_UINT64

64-bit signed integer.

CL_TENSOR_HALF

Half precision floating-point value.

CL_TENSOR_BFLOAT16

16-bit brain floating-point value.

CL_TENSOR_FLOAT

Single precision floating-point value.

CL_TENSOR_DOUBLE

Double precision floating-point value.

CL_TENSOR_COMPLEX64

64-bit complex floating point value with + 32-bit real and imaginary part.

CL_TENSOR_COMPLEX128

128-bit complex floating point value with + 64-bit real and imaginary part.

+ + +++++ + + + + + + + + + + + + + + +
Table 2. Tensor properties
Tensor PropertyProperty ValueDescription

CL_TENSOR_COMMAND_BUFFER_TEMPORARY

cl_bool

+

If the value is true, create a "temporary" tensor that only can be +used on commands recorded in command buffers. The storage of the +temporary tensors are managed by command buffers. When a temporary +tensor is used by multiple command buffer, the tensor receive separate +storage for each command buffer.

+
+
+

Temporary tensors may not be bound to buffer objects.

+
+
+

Data stored in temporary tensors are not preserved across command +buffer executions.

+
+
+

To retain a tensor object, call the function

+
+
+
+
cl_int clRetainTensorObject(
+  cl_tensor tensor);
+
+
+
+
    +
  • +

    tensor is the tensor object to be retained.

    +
  • +
+
+
+

The tensor reference count is incremented.

+
+
+

clRetainTensor returns CL_SUCCESS if the function is executed +successfully. Otherwise, it returns one of the following errors:

+
+
+
    +
  • +

    CL_INVALID_TENSOR if tensor is not a valid tensor object.

    +
  • +
+
+
+

To release a tensor object, call the function

+
+
+
+
cl_int clReleaseTensorObject(
+  cl_tensor tensor);
+
+
+
+
    +
  • +

    tensor is the tensor object to be released.

    +
  • +
+
+
+

The tensor reference count is decremented.

+
+
+

The tensor object is deleted once the number of instances that are +retained to tensor become zero and the tensor object is no longer +needed by any enqueued or recorded commands that use tensor. Using +this function to release a reference that was not obtained by creating +the object or by calling clRetainTensor causes undefined behavior.

+
+
+

clReleaseTensor returns CL_SUCCESS if the function is executed +successfully. Otherwise, it returns one of the following errors:

+
+
+
    +
  • +

    CL_INVALID_TENSOR if tensor is not a valid tensor object.

    +
  • +
+
+
+

To return information about a tensor object, call the function

+
+
+
+
cl_int clGetTensorInfo(
+  cl_tensor tensor,
+  cl_tensor_info param_name,
+  size_t param_value_size,
+  void* param_value,
+  size_t* param_value_size_ret);
+
+
+
+
    +
  • +

    tensor specifies the tensor object being queried.

    +
  • +
  • +

    param_name specifies the information to query. The list of +supported param_name types and the information returned in +param_value by clGetTensorInfo is described in the [Tensor Object +Queries] table.

    +
  • +
  • +

    param_value is a pointer to memory where the appropriate result +being queried is returned. If param_value is NULL, it is ignored.

    +
  • +
  • +

    param_value_size is used to specify the size in bytes of memory +pointed to by param_value. This size must be ≥ size of return type +as described in the [Tensor Object Queries] table.

    +
  • +
  • +

    param_value_size_ret returns the actual size in bytes of data +being queried by param_name. If param_value_size_ret is NULL, it is +ignored.

    +
  • +
+
+
+

clGetTensorInfo returns CL_SUCCESS if the function is executed + succesfully. Otherwise, it returns one of the following errors:

+
+
+
    +
  • +

    CL_INVALID_TENSOR if tensor is not a valid tensor object.

    +
  • +
+
+ + +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 3. List of supported param_names by clGetTensorInfo

CL_TENSOR_RANK

size_t

Return the tensor rank.

CL_TENSOR_SHAPE

size_t[]

Return the tensor shape.

CL_TENSOR_DTYPE

cl_tensor_type

Return the tensor data type.

CL_TENSOR_COMMAND_BUFFER_TEMPORARY

cl_bool

Return true if the +tensor is temporary tensor for command buffers.

CL_TENSOR_BOUND_TO_BUFFER

cl_bool

Return true if the tensor is +bound to a buffer. If CL_TENSOR_COMMAND_BUFFER_TEMPORARY is true, then +CL_TENSOR_BOUND_TO_BUFFER must return false.

CL_TENSOR_BUFFER

cl_mem

+

If CL_TENSOR_BOUND_TO_BUFFER is true, +return the buffer object the tensor is bound to. Otherwise, +clGetTensorInfo call returns:

+
+
+
    +
  • +

    CL_INVALID_MEM_OBJECT if the tensor is not bound to a buffer object.

    +
  • +
  • +

    CL_INVALID_PROPERTY otherwise.

    +
  • +
+

CL_TENSOR_CONTEXT

cl_context

Return the context specified when + the tensor object is created.

CL_TENSOR_REFERENCE_COUNT

cl_uint

Return the tensor reference +count.

+
+

To read from a tensor to host memory / buffer object or to write to a +tensor object from host memory / buffer object call one of the functions.

+
+
+
+
cl_int clEnqueueReadTensor(
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  cl_bool blocking_command,
+  cl_mem buffer,
+  void* host_ptr,
+  cl_uint num_events_in_wait_list,
+  const cl_event* event_wait_list,
+  cl_event* event);
+
+
+
+
+
cl_int clEnqueueWriteTensor(
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  cl_bool blocking_command,
+  cl_mem buffer,
+  void* host_ptr,
+  cl_uint num_events_in_wait_list,
+  const cl_event* event_wait_list,
+  cl_event* event);
+
+
+
+
    +
  • +

    command_queue is a valid host command-queue in which the read / +write command will be queued. command_queue and tensor must be +created with the same OpenCL context.

    +
  • +
  • +

    tensor refers to a valid tensor object which is bound to a buffer.

    +
  • +
  • +

    blocking_command indicate if the read and write operations are +blocking or non-blocking (see below).

    +
  • +
  • +

    buffer refers to a valid buffer object where data is to be +read into or to be written from when the value of host_ptr is +NULL. If host_ptr is non-NULL then value of buffer is ignored.

    +
  • +
  • +

    host_ptr is the pointer to buffer in host memory where data is to +be read into or to be written from when the value is non-NULL.

    +
  • +
  • +

    event_wait_list and num_events_in_wait_list specify events that +need to complete before this particular command can be executed. If +event_wait_list is NULL, then this particular command does not +wait on any event to complete. If event_wait_list is NULL, +num_events_in_wait_list must be 0. If event_wait_list is not +NULL, the list of events pointed to by event_wait_list must be +valid and num_events_in_wait_list must be greater than 0. The +events specified in event_wait_list act as synchronization +points. The context associated with events in event_wait_list and +command_queue must be the same. The memory associated with +event_wait_list can be reused or freed after the function returns.

    +
  • +
  • +

    event returns an event object that identifies this read / write +command and can be used to query or queue a wait for this command to +complete. If event is NULL or the enqueue is unsuccessful, no +event will be created and therefore it will not be possible to query +the status of this command or to wait for this command to +complete. If event_wait_list and event are not NULL, event +must not refer to an element of the event_wait_list array.

    +
  • +
+
+
+

For a read and write operation, the elements of N-dimensional tensor are +related to host memory / buffer object as followed:

+
+
+
+
tensor.element(i0, i1, ..., i<N-2>, i<N-1>)) == (tensor.dtype)buffer_or_host_ptr[
+  i0 * tensor.shape[1] * tensor.shape[2] * ... * tensor.shape[N-1] +
+  i1 * tensor.shape[2] * tensor.shape[3] * ... * tensor.shape[N-1] +
+  ... +
+  i<N-2> * tensor.shape[i(N-1)] +
+  i<N-1>]
+
+
+
+

Where iX is a tensor coordinate index with inclusive range of 0..<shape[X]>.

+
+
+
+

Add New Buffer Property in Section 5.2.1

+ +++++ + + + + + + + +

CL_MEM_BIND_TO_TENSOR

cl_tensor

+

Use the created buffer as +storage for the given valid tensor. To succeed creating the buffer, +the target tensor may not have storage already, must not have +CL_TENSOR_COMMAND_BUFFER_TEMPORARY property set on and size argument +of the clCreateBufferWithProperties() must be zero.

+
+
+

Size of the memory buffer is implementation-defined and it can be +queried with clGetTensorInfo().

+
+
+

Memory layout of the tensor in the created memory buffer is +implementation-defined and opaque to the applications and it may +change at unspecified points. Implementation may store auxiliary data +in the memory buffer for the tensor. Therefore, writing data into the +memory buffer directly using the cl_mem handle leads to undefined +behavior.

+
+
+

If the tensor is already bound to a buffer object, +clCreateBufferWithProperties call returns CL_TENSOR_BOUND_TO_BUFFER +error code.

+
+
+
+
+

Sample Codes

+
+

Helper functions used in the follow up tensor code samples:

+
+
+
+
cl_kernel create_matmul_kernel(
+  cl_context ctx, std::span<cl_device_id> device_span,
+  cl_tensor lhs, cl_tensor rhs, cl_tensor out) {
+  // A hypothetical matmul kernel signature in pseudo OpenCL C for
+  // illustrative purposes:
+  //
+  //   kernel void matmul(
+  //     global read_only tensor_t,
+  //     global read_only tensor_t,
+  //     global write_only tensor_t);
+
+  cl_kernel matmul_kernel = /* Omitted. */;
+  clSetKernelArg(matmul_kernel, 0, sizeof(cl_tensor), &lhs);
+  clSetKernelArg(matmul_kernel, 1, sizeof(cl_tensor), &rhs);
+  clSetKernelArg(matmul_kernel, 2, sizeof(cl_tensor), &out);
+  return matmul_kernel;
+}
+
+cl_kernel create_matmul_kernel(
+  cl_context ctx, std::span<cl_device_id> device_span,
+  cl_tensor lhs, cl_tensor rhs, cl_tensor out) {
+  // A hypothetical add kernel signature in pseudo OpenCL C for illustrative
+  // purposes:
+  //
+  // kernel void add(
+  //     global read_only tensor_t,
+  //     global read_only tensor_t,
+  //     global write_only tensor_t);
+
+  cl_tensor add_kernel = /* Omitted. */;
+  clSetKernelArg(add_kernel, 0, sizeof(cl_tensor), &lhs);
+  clSetKernelArg(add_kernel, 1, sizeof(cl_tensor), &rhs);
+  clSetKernelArg(add_kernel, 2, sizeof(cl_tensor), &out);
+  return add_kernel;
+}
+
+
+
+

An example usage of tensors on a command queue:

+
+
+
+
constexpr size_t b = 64, m = 100, n = 200, k = 50;
+
+cl_tensor in0 = clCreateTensor(ctx, nullptr, 3, {b, m, k}, CL_TENSOR_FLOAT, err);
+cl_tensor in1 = clCreateTensor(ctx, nullptr, 3, {b, k, n}, CL_TENSOR_FLOAT, err);
+cl_tensor in2 = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor t0  = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor out = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+
+cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0);
+cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out);
+
+// Allocate storage for the tensors. The buffer size must be set to zero
+// when the buffer is bound to a tensor. OpenCL implementation may
+// determine optimal data layout and the storage needed for it, based
+// on the tensor's uses (matmul kernel in this sample) so far.
+cl_int err;
+cl_mem in0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, in0, 0}, CL_MEM_READ_ONLY,
+  0 /* must be zero for CL_MEM_BIND_TO_TENSOR. */, nullptr, &err);
+cl_mem in1_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, in1, 0}, CL_MEM_READ_ONLY,
+  0, nullptr, &err);
+cl_mem in2_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, in2, 0}, CL_MEM_READ_ONLY,
+  0, nullptr, &err);
+cl_mem t0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, t0, 0}, CL_MEM_READ_WRITE,
+  0, nullptr, &err);
+cl_mem out_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, out, 0}, CL_MEM_WRITE_ONLY,
+  0, nullptr, &err);
+
+std::vector<float> in0_data = ...;
+std::vector<float> in1_data = ...;
+std::vector<float> out_data(b * m * n);
+
+// Copies data into in0 tensor while possibly rearranging the data to the
+// optimal data layout.
+clEnqueueWriteTensor(
+  cmd_q, in0, false, nullptr, nullptr, {b, m, k}, nullptr, in0_data.data(),
+  0, nullptr, nullptr);
+
+clEnqueueWriteTensor(
+  cmd_q, in1, false, nullptr, nullptr, {b, k, n}, nullptr, in1_data.data(),
+  0, nullptr, nullptr);
+clEnqueueNDRangeKernel(
+  cmd_q, matmul_kernel, 0, nullptr, nullptr, nullptr, 0, nullptr, nullptr);
+clEnqueueNDRangeKernel(
+  cmd_q, add_kernel, 0, nullptr, nullptr, nullptr, 0, nullptr, nullptr);
+clEnqueueReadTensor(
+  cmd_q, out, false, nullptr, nullptr, {b, m, n}, nullptr, out_data.data(),
+  0, nullptr, nullptr);
+
+
+
+

An example use of tensors in a command buffer when cl_khr_command_buffer +extension is supported:

+
+
+
+
constexpr size_t b = 64, m = 100, n = 200, k = 50;
+
+cl_int err;
+// Create tensors which are used as temporaries in a command buffer.
+// Command buffers allocate space for them as needed.
+//
+// NOTE: same temporary tensor handle used in multiple command buffers
+//       will have separate storage. IOW, command buffers may not exchange
+//       data via temporary buffers between them.
+cl_tensor in0 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
+  3, {b, m, k}, CL_TENSOR_FLOAT, err);
+cl_tensor in1 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
+  3, {b, k, n}, CL_TENSOR_FLOAT, err);
+cl_tensor in2 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
+  3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor t0  = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
+  3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor out = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
+  3, {b, m, n}, CL_TENSOR_FLOAT, err);
+
+cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0);
+cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out);
+
+// Binding a buffer to temporary tensor is not allowed.
+auto ignored = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, t0, 0}, CL_MEM_READ_WRITE, 0, nullptr, &err);
+assert(err == CL_TENSOR_IS_TEMPORARY)
+
+std::vector<float> in0_data = ...;
+std::vector<float> in1_data = ...;
+std::vector<float> out_data(b * m * n);
+
+cl_command_buffer_khr cb =
+  clCreateCommandBufferKHR(num_queues, queue_list, nullptr, &err);
+
+cl_sync_point_khr in0_syncp, in1_syncp, matmul_syncp, add_syncp;
+clCommandWriteTensorKHR(
+  cmd_b, cmd_q, in0, false, nullptr, nullptr, {b, m, k}, nullptr,
+  in0_data.data(), 0, nullptr, &in0_syncp);
+clCommandWriteTensorKHR(
+  cmd_b, cmd_q, in1, false, nullptr, nullptr, {b, k, m}, nullptr,
+  in1_data.data(), 0, nullptr, &in1_syncp);
+clCommandNDRangeKernelKHR(
+  cmd_b, cmd_q, nullptr, matmul_kernel, 0, nullptr, nullptr, nullptr,
+  2, {in0_syncp, in2_syncp}, &matmul_syncp, nullptr);
+clCommandNDRangeKernelKHR(
+  cmd_b, cmd_q, nullptr, add_kernel, 0, nullptr, nullptr, nullptr,
+  1, {matmul_syncp}, &add_syncp, nullptr);
+clCommandReadTensorKHR(
+  cmd_b, cmd_q, out,  false, nullptr, nullptr, {b, k, m}, nullptr,
+  out_data.data(), 1, {add_syncp}, nullptr);
+
+// Finalize the command buffer. At this point the OpenCL
+// implementation may reserve enough storage for all the tensor
+// temporaries. Temporary tensors might be eliminated - for example,
+// OpenCL implementation could use 'out' tensor to store result of
+// matmul_kernel , thus, eliminating the need of 't0' tensor.
+clFinalizeCommandBufferKHR(cmd_b);
+
+// Temporary tensors used in a command buffer can't be read or written
+// into. A hypothetical reason is that the finalized command buffer
+// might not use some of the tensor.
+assert(clEnqueueReadTensor(..., t0, ...) == CL_INVALID_OPERATION);
+
+
+
+
+

Open Questions

+ +
+
+
+
+ + + \ No newline at end of file From bf94321d718fb7da01ff79baf4c6ea81905df563 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= Date: Thu, 2 Nov 2023 14:16:43 +0200 Subject: [PATCH 02/18] Apply suggestions from code review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Ben Ashbaugh Co-authored-by: Pekka Jääskeläinen --- ext/cl_khr_tensor.asciidoc | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc index cd17a42bb..1df37e9e4 100644 --- a/ext/cl_khr_tensor.asciidoc +++ b/ext/cl_khr_tensor.asciidoc @@ -51,7 +51,7 @@ cl_tensor clCreateTensor( cl_context context, const cl_tensor_peoperties *properties, size_t rank, - size_t shape, + const size_t* shape, cl_tensor_type dtype, cl_int *errcode_ret); ---- @@ -88,7 +88,7 @@ storage. The storage is assigned to the tensor either by: set on. A command that refers to a tensor must be bound to a valid buffer -object before enqueuing the command into a command queue unless the +object before enqueuing the command that refers to the tensor into a command queue unless the command is recorded in a command buffer and CL_TENSOR_COMMAND_BUFFER_TEMPORARY is set to true. @@ -124,7 +124,7 @@ following error values returned in errcode_ret: | CL_TENSOR_UINT16 | 16-bit signed integer. | CL_TENSOR_UINT32 | 32-bit signed integer. | CL_TENSOR_UINT64 | 64-bit signed integer. -| CL_TENSOR_HALF | Half precision floating-point value. +| CL_TENSOR_HALF | Half precision floating-point. | CL_TENSOR_BFLOAT16 | 16-bit brain floating-point value. | CL_TENSOR_FLOAT | Single precision floating-point value. | CL_TENSOR_DOUBLE | Double precision floating-point value. @@ -144,7 +144,7 @@ following error values returned in errcode_ret: a| If the value is true, create a "temporary" tensor that only can be used on commands recorded in command buffers. The storage of the temporary tensors are managed by command buffers. When a temporary -tensor is used by multiple command buffer, the tensor receive separate +tensor is used by multiple command buffers, the tensor receives separate storage for each command buffer. // IOW, Data may not be exchanged between command buffers through @@ -171,7 +171,7 @@ The _tensor_ reference count is incremented. *clRetainTensor* returns CL_SUCCESS if the function is executed successfully. Otherwise, it returns one of the following errors: -* CL_INVALID_TENSOR if tensor is not a valid tensor object. +* CL_INVALID_TENSOR if the tensor is not a valid tensor object. To release a tensor object, call the function @@ -242,7 +242,7 @@ cl_int clGetTensorInfo( | CL_TENSOR_DTYPE | cl_tensor_type | Return the tensor data type. | CL_TENSOR_COMMAND_BUFFER_TEMPORARY | cl_bool | Return true if the -tensor is temporary tensor for command buffers. +tensor is a temporary tensor for command buffers. | CL_TENSOR_BOUND_TO_BUFFER | cl_bool | Return true if the tensor is bound to a buffer. If CL_TENSOR_COMMAND_BUFFER_TEMPORARY is true, then @@ -263,8 +263,8 @@ clGetTensorInfo call returns: count. |=== -To read from a tensor to host memory / buffer object or to write to a -tensor object from host memory / buffer object call one of the functions. +The following functions are for reading from a tensor to host memory / buffer object or to write to a +tensor object from host memory / buffer object. [source,c] ---- @@ -286,7 +286,7 @@ cl_int clEnqueueWriteTensor( cl_tensor tensor, cl_bool blocking_command, cl_mem buffer, - void* host_ptr, + const void* host_ptr, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event); @@ -329,10 +329,10 @@ cl_int clEnqueueWriteTensor( must not refer to an element of the _event_wait_list_ array. For a read and write operation, the elements of N-dimensional tensor are -related to host memory / buffer object as followed: +related to host memory / buffer object as follows: ---- -tensor.element(i0, i1, ..., i, i)) == (tensor.dtype)buffer_or_host_ptr[ +tensor.element(i0, i1, ..., i, i) == (tensor.dtype)buffer_or_host_ptr[ i0 * tensor.shape[1] * tensor.shape[2] * ... * tensor.shape[N-1] + i1 * tensor.shape[2] * tensor.shape[3] * ... * tensor.shape[N-1] + ... + @@ -505,7 +505,7 @@ cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out); // Binding a buffer to temporary tensor is not allowed. auto ignored = clCreateBufferWithProperties( ctx, {CL_MEM_BIND_TO_TENSOR, t0, 0}, CL_MEM_READ_WRITE, 0, nullptr, &err); -assert(err == CL_TENSOR_IS_TEMPORARY) +assert(err == CL_TENSOR_IS_TEMPORARY); std::vector in0_data = ...; std::vector in1_data = ...; From 36db4b6d9d3ec7caacb6849d16a119ce005a59a7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= Date: Thu, 2 Nov 2023 09:15:20 +0200 Subject: [PATCH 03/18] * Add brief introduction. * cl_khr_tensor -> cl_exp_tensor. * Remove cl_khr_command_buffer requirement. --- ext/cl_khr_tensor.asciidoc | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc index 1df37e9e4..05c7ad521 100644 --- a/ext/cl_khr_tensor.asciidoc +++ b/ext/cl_khr_tensor.asciidoc @@ -1,20 +1,25 @@ // Copyright 2023 The Khronos Group. This work is licensed under a // Creative Commons Attribution 4.0 International License; see // http://creativecommons.org/licenses/by/4.0/ -= cl_khr_tensor += cl_exp_tensor :source-highlighter: coreray -[[cl_khr_tensor]] +[[cl_exp_tensor]] == Tensor Data Type -Purpose of this extension is to provide ... +This extension provides a new opaque OpenCL datatype called +`cl_tensor`. It is used for storing N-dimensional tensor data in +implementation-defined memory layout which may be optimized based on +tensor's use cases. The datatype is designed to be efficiently used +within the `cl_khr_command_buffers` extension to capture task graphs +which can utilize tensors as input, output and temporary storage. === General information ==== Name Strings -`cl_khr_tensor` +`cl_exp_tensor` ==== Version history @@ -30,8 +35,6 @@ This extension is written against the OpenCL Specification version 3.0.14. This extension requires OpenCL 1.2 or later. -This extension requires cl_khr_command_buffer. - ==== Contributors Henry Linjamäki, Intel. + From baa768882dbad2d71d667e2282859098e69e4c23 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= Date: Thu, 2 Nov 2023 09:18:38 +0200 Subject: [PATCH 04/18] Add contributors --- ext/cl_khr_tensor.asciidoc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc index 05c7ad521..5cba054ca 100644 --- a/ext/cl_khr_tensor.asciidoc +++ b/ext/cl_khr_tensor.asciidoc @@ -38,6 +38,8 @@ This extension requires OpenCL 1.2 or later. ==== Contributors Henry Linjamäki, Intel. + +Pekka Jääslkeläinen, Intel and Tampere University. + +Ben Ashbaugh, Intel. + === Overview From 9db1e6543d68b5a986aa760f228d098ebc4ff0c4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= Date: Thu, 2 Nov 2023 09:19:16 +0200 Subject: [PATCH 05/18] * Fix name for add kernel creator --- ext/cl_khr_tensor.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc index 5cba054ca..0115b054e 100644 --- a/ext/cl_khr_tensor.asciidoc +++ b/ext/cl_khr_tensor.asciidoc @@ -403,7 +403,7 @@ cl_kernel create_matmul_kernel( return matmul_kernel; } -cl_kernel create_matmul_kernel( +cl_kernel create_add_kernel( cl_context ctx, std::span device_span, cl_tensor lhs, cl_tensor rhs, cl_tensor out) { // A hypothetical add kernel signature in pseudo OpenCL C for illustrative From 141643dc4eacc15c99d6d527889cf55b239c60ac Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= Date: Thu, 2 Nov 2023 09:21:51 +0200 Subject: [PATCH 06/18] * cl_tensor_type -> cl_tensor _datatype. * Fix signed -> unsigned. * Single line cl{Retain,Release}TensorObject declaration. --- ext/cl_khr_tensor.asciidoc | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc index 0115b054e..bed45d976 100644 --- a/ext/cl_khr_tensor.asciidoc +++ b/ext/cl_khr_tensor.asciidoc @@ -56,8 +56,8 @@ cl_tensor clCreateTensor( cl_context context, const cl_tensor_peoperties *properties, size_t rank, - const size_t* shape, - cl_tensor_type dtype, + const size_t shape, + cl_tensor_datatype dtype, cl_int *errcode_ret); ---- @@ -125,17 +125,17 @@ following error values returned in errcode_ret: | CL_TENSOR_INT16 | 16-bit signed integer. | CL_TENSOR_INT32 | 32-bit signed integer. | CL_TENSOR_INT64 | 64-bit signed integer. -| CL_TENSOR_UINT8 | 8-bit signed integer. -| CL_TENSOR_UINT16 | 16-bit signed integer. -| CL_TENSOR_UINT32 | 32-bit signed integer. -| CL_TENSOR_UINT64 | 64-bit signed integer. +| CL_TENSOR_UINT8 | 8-bit unsigned integer. +| CL_TENSOR_UINT16 | 16-bit unsigned integer. +| CL_TENSOR_UINT32 | 32-bit unsigned integer. +| CL_TENSOR_UINT64 | 64-bit unsigned integer. | CL_TENSOR_HALF | Half precision floating-point. -| CL_TENSOR_BFLOAT16 | 16-bit brain floating-point value. -| CL_TENSOR_FLOAT | Single precision floating-point value. -| CL_TENSOR_DOUBLE | Double precision floating-point value. -| CL_TENSOR_COMPLEX64 | 64-bit complex floating point value with +| CL_TENSOR_BFLOAT16 | 16-bit brain floating-point. +| CL_TENSOR_FLOAT | Single precision floating-point. +| CL_TENSOR_DOUBLE | Double precision floating-point. +| CL_TENSOR_COMPLEX64 | 64-bit complex floating point with 32-bit real and imaginary part. -| CL_TENSOR_COMPLEX128 | 128-bit complex floating point value with +| CL_TENSOR_COMPLEX128 | 128-bit complex floating point with 64-bit real and imaginary part. |=== @@ -165,8 +165,7 @@ To retain a tensor object, call the function [source,c] ---- -cl_int clRetainTensorObject( - cl_tensor tensor); +cl_int clRetainTensorObject(cl_tensor tensor); ---- * _tensor_ is the tensor object to be retained. @@ -182,8 +181,7 @@ To release a tensor object, call the function [source,c] ---- -cl_int clReleaseTensorObject( - cl_tensor tensor); +cl_int clReleaseTensorObject(cl_tensor tensor); ---- * _tensor_ is the tensor object to be released. From db91aee8a971fc9bf3d2d4daacfa197e3ff46929 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= Date: Thu, 2 Nov 2023 09:59:18 +0200 Subject: [PATCH 07/18] * clEnqueue(Read,Write)Tensor -> clEnqueue(TranslateFrom,TranslateTo)Tensor. * Clarify in clEnqueue{TranslateFrom,TranslateTo}Tensor that data read from / written to the tensor in opaque manner. --- ext/cl_khr_tensor.asciidoc | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc index bed45d976..99e653706 100644 --- a/ext/cl_khr_tensor.asciidoc +++ b/ext/cl_khr_tensor.asciidoc @@ -271,7 +271,7 @@ tensor object from host memory / buffer object. [source,c] ---- -cl_int clEnqueueReadTensor( +cl_int clEnqueueTranslateFromTensor( cl_command_queue command_queue, cl_tensor tensor, cl_bool blocking_command, @@ -284,7 +284,7 @@ cl_int clEnqueueReadTensor( [source,c] ---- -cl_int clEnqueueWriteTensor( +cl_int clEnqueueTranslateToTensor( cl_command_queue command_queue, cl_tensor tensor, cl_bool blocking_command, @@ -331,8 +331,14 @@ cl_int clEnqueueWriteTensor( complete. If _event_wait_list_ and _event_ are not NULL, _event_ must not refer to an element of the _event_wait_list_ array. -For a read and write operation, the elements of N-dimensional tensor are -related to host memory / buffer object as follows: +The *clEnqueueTranslateToTensor* function copies contents of the buffer +object / host allocation to tensor's storage in +implementation-defined, opaque memory layout. The +*clEnqueueTranslateFromTensor* function copies data from tensor's +storage to buffer object / host allocation. + +The elements of buffer object / host allocation are mapped to tensor +coordinates as follows: ---- tensor.element(i0, i1, ..., i, i) == (tensor.dtype)buffer_or_host_ptr[ @@ -343,7 +349,11 @@ tensor.element(i0, i1, ..., i, i) == (tensor.dtype)buffer_or_host_ptr[ i] ---- -Where `iX` is a tensor coordinate index with inclusive range of `0..`. +Where `iX` is a tensor coordinate index with inclusive range of +`0..`. The `tensor.element()` represents an abstract +function that accesses a tensor element in its storage at given +coordinate. The method how the coordinates translate to tensor storage +addresses is unspecified. // TODO: add clEnqueueCopyTensor From 6fecc4e7a50b1cd1f1146ad43d58a73d7aaf1479 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= Date: Thu, 2 Nov 2023 13:38:29 +0200 Subject: [PATCH 08/18] Refactor command buffer temporary property out of tensor --- ext/cl_khr_tensor.asciidoc | 139 +++++++++++++++++++++---------------- 1 file changed, 78 insertions(+), 61 deletions(-) diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc index 99e653706..0de088c70 100644 --- a/ext/cl_khr_tensor.asciidoc +++ b/ext/cl_khr_tensor.asciidoc @@ -66,7 +66,8 @@ cl_tensor clCreateTensor( * _properties_ is an optional list of properties for the tensor object and their corresponding values. The list is terminated with the special property 0. If no properties are required, properties may be - NULL. + NULL. This extension does not define any optional properties for + tensors. * _rank_ is the number of dimensions. Zero value creates a "scalar" tensor which has no dimensions but has storage for one element. @@ -84,18 +85,11 @@ cl_tensor clCreateTensor( clCreateTensor function creates a `rank`-dimensional tensor with `shape[0] * shape[1] * ... * shape[rank-1]` elements of _dtype_ type. At the creation time of the tensor, it does not have -storage. The storage is assigned to the tensor either by: - -* calling clCreateBufferWithProperties() with CL_MEM_BIND_TO_TENSOR or - -* automatically by command buffers - possibly on-demand basis - if the - tensor is created with CL_TENSOR_COMMAND_BUFFER_TEMPORARY property - set on. +storage. The storage is assigned to the tensor by calling +clCreateBufferWithProperties() with CL_MEM_BIND_TO_TENSOR. A command that refers to a tensor must be bound to a valid buffer -object before enqueuing the command that refers to the tensor into a command queue unless the -command is recorded in a command buffer and -CL_TENSOR_COMMAND_BUFFER_TEMPORARY is set to true. +object before enqueuing or recording the command. *clCreateTensor* returns a valid non-zero tensor object and errcode_ret is set to CL_SUCCESS if the tensor object is created @@ -139,28 +133,6 @@ following error values returned in errcode_ret: 64-bit real and imaginary part. |=== -.Tensor properties -[cols="2,1,2",stripes=odd] -|=== -| *Tensor Property* | *Property Value* | *Description* - -| CL_TENSOR_COMMAND_BUFFER_TEMPORARY | cl_bool - -a| If the value is true, create a "temporary" tensor that only can be -used on commands recorded in command buffers. The storage of the -temporary tensors are managed by command buffers. When a temporary -tensor is used by multiple command buffers, the tensor receives separate -storage for each command buffer. - -// IOW, Data may not be exchanged between command buffers through -// temporary tensors. - -Temporary tensors may not be bound to buffer objects. - -Data stored in temporary tensors are not preserved across command -buffer executions. -|=== - To retain a tensor object, call the function [source,c] @@ -244,12 +216,8 @@ cl_int clGetTensorInfo( | CL_TENSOR_SHAPE | size_t[] | Return the tensor shape. | CL_TENSOR_DTYPE | cl_tensor_type | Return the tensor data type. -| CL_TENSOR_COMMAND_BUFFER_TEMPORARY | cl_bool | Return true if the -tensor is a temporary tensor for command buffers. - | CL_TENSOR_BOUND_TO_BUFFER | cl_bool | Return true if the tensor is -bound to a buffer. If CL_TENSOR_COMMAND_BUFFER_TEMPORARY is true, then -CL_TENSOR_BOUND_TO_BUFFER must return false. +bound to a buffer. | CL_TENSOR_BUFFER | cl_mem a| If CL_TENSOR_BOUND_TO_BUFFER is true, return the buffer object the tensor is bound to. Otherwise, @@ -366,11 +334,34 @@ addresses is unspecified. [cols="2,1,2",stripes=odd] |=== +| CL_MEM_COMMAND_BUFFER_TEMPORARY | cl_bool + +a| This property can be set if *cl_khr_command_buffer* extension is +supported. + +If the value is true, create a "temporary" buffer object that only can +be used on commands recorded in command buffers. Non-recording +command enqueue functions must return CL_INVALID_OPERATION if the +command refers to a temporary buffer object. + +The temporary buffer objects are managed by command buffers. When a +temporary buffer object is used by multiple command buffer, the object +receives disjoint storage for each command buffer. + +// Consequently, Data may not be exchanged between command buffers through +// temporary buffers. + +Storage of the temporary buffer objects may be allocated on-demand +basis. At the times the buffer is not needed, OpenCL implementations +may reuse storage for other tasks within the command buffer. + +Contents of the temporary buffers are not guaranteed to be preserved +across command buffer executions. + | CL_MEM_BIND_TO_TENSOR | cl_tensor a| Use the created buffer as storage for the given valid tensor. To succeed creating the buffer, -the target tensor may not have storage already, must not have -CL_TENSOR_COMMAND_BUFFER_TEMPORARY property set on and _size_ argument -of the clCreateBufferWithProperties() must be zero. +the target tensor may not have storage already and _size_ +argument of the clCreateBufferWithProperties() must be zero. Size of the memory buffer is implementation-defined and it can be queried with clGetTensorInfo(). @@ -387,6 +378,26 @@ clCreateBufferWithProperties call returns CL_TENSOR_BOUND_TO_BUFFER error code. |=== +==== Add New Memory Object Query in Section 5.5.5 + +[cols="2,1,2",stripes=odd] +|=== +| CL_MEM_COMMAND_BUFFER_TEMPORARY | cl_bool | This property can be +queried if *cl_khr_command_buffer* extension is supported. + +Return true if the _memobj_ is temporary buffer object for command +buffers. +|=== + +==== Add New Error Codes in Appendix F + +[cols="2,3", stripes=odd] +|=== +| CL_TENSOR_BOUND_TO_BUFFER | Returned when attempting to bind a + buffer object to a tensor which already has been bound to the same + or another. +|=== + === Sample Codes Helper functions used in the follow up tensor code samples: @@ -495,30 +506,36 @@ extension is supported: constexpr size_t b = 64, m = 100, n = 200, k = 50; cl_int err; -// Create tensors which are used as temporaries in a command buffer. -// Command buffers allocate space for them as needed. -// -// NOTE: same temporary tensor handle used in multiple command buffers -// will have separate storage. IOW, command buffers may not exchange -// data via temporary buffers between them. -cl_tensor in0 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0}, - 3, {b, m, k}, CL_TENSOR_FLOAT, err); -cl_tensor in1 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0}, - 3, {b, k, n}, CL_TENSOR_FLOAT, err); -cl_tensor in2 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0}, - 3, {b, m, n}, CL_TENSOR_FLOAT, err); -cl_tensor t0 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0}, - 3, {b, m, n}, CL_TENSOR_FLOAT, err); -cl_tensor out = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0}, - 3, {b, m, n}, CL_TENSOR_FLOAT, err); +cl_tensor in0 = clCreateTensor(ctx, nullptr, 3, {b, m, k}, CL_TENSOR_FLOAT, err); +cl_tensor in1 = clCreateTensor(ctx, nullptr, 3, {b, k, n}, CL_TENSOR_FLOAT, err); +cl_tensor in2 = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err); +cl_tensor t0 = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err); +cl_tensor out = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err); cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0); cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out); -// Binding a buffer to temporary tensor is not allowed. -auto ignored = clCreateBufferWithProperties( - ctx, {CL_MEM_BIND_TO_TENSOR, t0, 0}, CL_MEM_READ_WRITE, 0, nullptr, &err); -assert(err == CL_TENSOR_IS_TEMPORARY); +// Bind command buffer managed storage to tensors. +// +// NOTE: same temporary tensor handle used in multiple command buffers +// will have separate storage. IOW, command buffers may not exchange +// data via temporary buffers between them. +cl_mem in0_mem = clCreateBufferWithProperties( + ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, in0, 0}, + CL_MEM_READ_ONLY, 0 /* must be zero for CL_MEM_BIND_TO_TENSOR. */, + nullptr, &err); +cl_mem in1_mem = clCreateBufferWithProperties( + ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, in1, 0}, + CL_MEM_READ_ONLY, 0, nullptr, &err); +cl_mem in2_mem = clCreateBufferWithProperties( + ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, in2, 0}, + CL_MEM_READ_ONLY, 0, nullptr, &err); +cl_mem t0_mem = clCreateBufferWithProperties( + ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, t0, 0}, + CL_MEM_READ_WRITE, 0, nullptr, &err); +cl_mem out_mem = clCreateBufferWithProperties( + ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, out, 0}, + CL_MEM_WRITE_ONLY, 0, nullptr, &err); std::vector in0_data = ...; std::vector in1_data = ...; From f55a9045552f1a4b2aeff613256ec0a02764e4d0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= Date: Thu, 2 Nov 2023 13:41:39 +0200 Subject: [PATCH 09/18] Fix cl_tensor_type -> cl_tensor_datatype --- ext/cl_khr_tensor.asciidoc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc index 0de088c70..22a6cd007 100644 --- a/ext/cl_khr_tensor.asciidoc +++ b/ext/cl_khr_tensor.asciidoc @@ -56,7 +56,7 @@ cl_tensor clCreateTensor( cl_context context, const cl_tensor_peoperties *properties, size_t rank, - const size_t shape, + const size_t* shape, cl_tensor_datatype dtype, cl_int *errcode_ret); ---- @@ -212,9 +212,9 @@ cl_int clGetTensorInfo( .List of supported param_names by clGetTensorInfo [cols="2,1,2",stripes=odd] |=== -| CL_TENSOR_RANK | size_t | Return the tensor rank. -| CL_TENSOR_SHAPE | size_t[] | Return the tensor shape. -| CL_TENSOR_DTYPE | cl_tensor_type | Return the tensor data type. +| CL_TENSOR_RANK | size_t | Return the tensor rank. +| CL_TENSOR_SHAPE | size_t[] | Return the tensor shape. +| CL_TENSOR_DTYPE | cl_tensor_datatype | Return the tensor data type. | CL_TENSOR_BOUND_TO_BUFFER | cl_bool | Return true if the tensor is bound to a buffer. From 6d1c26ff5c86591b9a08bc287d921eaa49968b0b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= Date: Thu, 2 Nov 2023 13:42:16 +0200 Subject: [PATCH 10/18] Add an open question --- ext/cl_khr_tensor.asciidoc | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc index 22a6cd007..e91f81dff 100644 --- a/ext/cl_khr_tensor.asciidoc +++ b/ext/cl_khr_tensor.asciidoc @@ -575,3 +575,10 @@ assert(clEnqueueReadTensor(..., t0, ...) == CL_INVALID_OPERATION); ---- === Open Questions === + +. Should we have support for tensors with undefined shape and tensors + with unknown / symbolic dimension sizes like in ONNX? + +// https://onnx.ai/onnx/repo-docs/ShapeInference.html + +*UNRESOLVED* From 52d8bb3514900c4ec271ebfa87870419427d3f63 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= Date: Thu, 2 Nov 2023 13:46:58 +0200 Subject: [PATCH 11/18] Add CL_INVALID_TENSOR error code --- ext/cl_khr_tensor.asciidoc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc index e91f81dff..1b2a9686e 100644 --- a/ext/cl_khr_tensor.asciidoc +++ b/ext/cl_khr_tensor.asciidoc @@ -396,6 +396,8 @@ buffers. | CL_TENSOR_BOUND_TO_BUFFER | Returned when attempting to bind a buffer object to a tensor which already has been bound to the same or another. +| CL_INVALID_TENSOR | Returned then the specified tensor is not a + valid tensor object. |=== === Sample Codes From 534bcef9c29a6250437927ed2facab163811ff27 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= Date: Thu, 2 Nov 2023 13:59:25 +0200 Subject: [PATCH 12/18] Require either buffer or host_ptr to be non-NULL --- ext/cl_khr_tensor.asciidoc | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc index 1b2a9686e..f1437dd31 100644 --- a/ext/cl_khr_tensor.asciidoc +++ b/ext/cl_khr_tensor.asciidoc @@ -272,12 +272,10 @@ cl_int clEnqueueTranslateToTensor( * _blocking_command_ indicate if the read and write operations are blocking or non-blocking (see below). -* _buffer_ refers to a valid buffer object where data is to be - read into or to be written from when the value of _host_ptr_ is - NULL. If _host_ptr_ is non-NULL then value of _buffer_ is ignored. - -* _host_ptr_ is the pointer to buffer in host memory where data is to - be read into or to be written from when the value is non-NULL. +* _buffer_ and _host_ptr_ refer to a valid buffer object / host + allocation where data is to be read into or to be written from. + Either the _buffer_ or _host_ptr_ can be non-NULL in which case the + non-NULL argument is used as the operand for the operation. * _event_wait_list_ and _num_events_in_wait_list_ specify events that need to complete before this particular command can be executed. If From 7447be25b605168073e5d73c639fd99d9d8767fa Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= Date: Thu, 2 Nov 2023 14:27:14 +0200 Subject: [PATCH 13/18] Regenerate html for cl_exp_tensor --- ext/cl_khr_tensor.html | 298 +++++++++++++++++++++++------------------ 1 file changed, 168 insertions(+), 130 deletions(-) diff --git a/ext/cl_khr_tensor.html b/ext/cl_khr_tensor.html index 878925489..c232ddea7 100644 --- a/ext/cl_khr_tensor.html +++ b/ext/cl_khr_tensor.html @@ -5,7 +5,7 @@ -cl_khr_tensor +cl_exp_tensor