From f3a2bb9a6aa842b162001774e21e9021e244a08d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.mikael.linjamaki@intel.com>
Date: Wed, 25 Oct 2023 08:34:30 +0300
Subject: [PATCH 01/18] Sketch something for cl_khr_tensor

---
 ext/cl_khr_tensor.asciidoc |  547 ++++++++++++++++
 ext/cl_khr_tensor.html     | 1228 ++++++++++++++++++++++++++++++++++++
 2 files changed, 1775 insertions(+)
 create mode 100644 ext/cl_khr_tensor.asciidoc
 create mode 100644 ext/cl_khr_tensor.html
diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc
new file mode 100644
index 000000000..cd17a42bb
--- /dev/null
+++ b/ext/cl_khr_tensor.asciidoc
@@ -0,0 +1,547 @@
+// Copyright 2023 The Khronos Group. This work is licensed under a
+// Creative Commons Attribution 4.0 International License; see
+// http://creativecommons.org/licenses/by/4.0/
+= cl_khr_tensor
+
+:source-highlighter: coreray
+
+[[cl_khr_tensor]]
+== Tensor Data Type
+
+Purpose of this extension is to provide ...
+
+=== General information
+
+==== Name Strings
+
+`cl_khr_tensor`
+
+==== Version history
+
+[cols="1,1,3",options="header",]
+|====
+| *Date*     | *Version* | *Description*
+| 2023-10-XX | 0.1.0     | First assigned version.
+|====
+
+==== Dependencies
+
+This extension is written against the OpenCL Specification version 3.0.14.
+
+This extension requires OpenCL 1.2 or later.
+
+This extension requires cl_khr_command_buffer.
+
+==== Contributors
+
+Henry Linjamäki, Intel. +
+
+=== Overview
+
+
+=== Modifications to OpenCL
+
+==== New OpenCL Functions
+
+To create a tensor use:
+
+[source,c]
+----
+cl_tensor clCreateTensor(
+    cl_context context,
+    const cl_tensor_peoperties *properties,
+    size_t rank,
+    size_t shape,
+    cl_tensor_type dtype,
+    cl_int *errcode_ret);
+----
+
+* _context_ is a valid OpenCL context used to create the tensor object.
+
+* _properties_ is an optional list of properties for the tensor object
+  and their corresponding values. The list is terminated with the
+  special property 0. If no properties are required, properties may be
+  NULL.
+
+* _rank_ is the number of dimensions. Zero value creates a "scalar"
+  tensor which has no dimensions but has storage for one element.
+
+* _shape_ is a list of sizes of the dimensions. The length of the list
+  must be _rank_ elements. _shape_ can be NULL if _rank_ value is
+  zero. All the first _rank_ values in the list must be non-zero.
+
+* _dtype_ is the element type of _tensor_. Refer to the
+  <<TensorDtypes>> table for the types.
+
+* _errcode_ret_ may return an appropriate error code. If errcode_ret
+  is NULL, no error code is returned.
+
+clCreateTensor function creates a `rank`-dimensional tensor with
+`shape[0] * shape[1] * ... * shape[rank-1]` elements of _dtype_
+type. At the creation time of the tensor, it does not have
+storage. The storage is assigned to the tensor either by:
+
+* calling clCreateBufferWithProperties() with CL_MEM_BIND_TO_TENSOR or
+
+* automatically by command buffers - possibly on-demand basis - if the
+  tensor is created with CL_TENSOR_COMMAND_BUFFER_TEMPORARY property
+  set on.
+
+A command that refers to a tensor must be bound to a valid buffer
+object before enqueuing the command into a command queue unless the
+command is recorded in a command buffer and
+CL_TENSOR_COMMAND_BUFFER_TEMPORARY is set to true.
+
+*clCreateTensor* returns a valid non-zero tensor object and errcode_ret
+is set to CL_SUCCESS if the tensor object is created
+successfully. Otherwise, they return a NULL value with one of the
+following error values returned in errcode_ret:
+
+* CL_INVALID_CONTEXT if context is not a valid context.
+
+* CL_INVALID_PROPERTY if a property name in properties is not a
+  supported property name, if the value specified for a supported
+  property name is not valid, or if the same property name is
+  specified more than once.
+
+* CL_INVALID_VALUE if a value specified in dtype is invalid.
+
+* CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+  required by the OpenCL implementation on the host.
+
+.Tensor element types
+[cols="1,2",stripes=odd]
+[#TensorDtypes]
+|===
+| *Tensor element data type* | *Description*
+
+| CL_TENSOR_BOOL       | 1-bit signedless integer.
+| CL_TENSOR_INT8       | 8-bit signed integer.
+| CL_TENSOR_INT16      | 16-bit signed integer.
+| CL_TENSOR_INT32      | 32-bit signed integer.
+| CL_TENSOR_INT64      | 64-bit signed integer.
+| CL_TENSOR_UINT8      | 8-bit signed integer.
+| CL_TENSOR_UINT16     | 16-bit signed integer.
+| CL_TENSOR_UINT32     | 32-bit signed integer.
+| CL_TENSOR_UINT64     | 64-bit signed integer.
+| CL_TENSOR_HALF       | Half precision floating-point value.
+| CL_TENSOR_BFLOAT16   | 16-bit brain floating-point value.
+| CL_TENSOR_FLOAT      | Single precision floating-point value.
+| CL_TENSOR_DOUBLE     | Double precision floating-point value.
+| CL_TENSOR_COMPLEX64  | 64-bit complex floating point value with
+  32-bit real and imaginary part.
+| CL_TENSOR_COMPLEX128 | 128-bit complex floating point value with
+  64-bit real and imaginary part.
+|===
+
+.Tensor properties
+[cols="2,1,2",stripes=odd]
+|===
+| *Tensor Property* | *Property Value* | *Description*
+
+| CL_TENSOR_COMMAND_BUFFER_TEMPORARY | cl_bool
+
+a| If the value is true, create a "temporary" tensor that only can be
+used on commands recorded in command buffers. The storage of the
+temporary tensors are managed by command buffers. When a temporary
+tensor is used by multiple command buffer, the tensor receive separate
+storage for each command buffer.
+
+// IOW, Data may not be exchanged between command buffers through
+// temporary tensors.
+
+Temporary tensors may not be bound to buffer objects.
+
+Data stored in temporary tensors are not preserved across command
+buffer executions.
+|===
+
+To retain a tensor object, call the function
+
+[source,c]
+----
+cl_int clRetainTensorObject(
+  cl_tensor tensor);
+----
+
+* _tensor_ is the tensor object to be retained.
+
+The _tensor_ reference count is incremented.
+
+*clRetainTensor* returns CL_SUCCESS if the function is executed
+successfully. Otherwise, it returns one of the following errors:
+
+* CL_INVALID_TENSOR if tensor is not a valid tensor object.
+
+To release a tensor object, call the function
+
+[source,c]
+----
+cl_int clReleaseTensorObject(
+  cl_tensor tensor);
+----
+
+* _tensor_ is the tensor object to be released.
+
+The _tensor_ reference count is decremented.
+
+The tensor object is deleted once the number of instances that are
+retained to tensor become zero and the tensor object is no longer
+needed by any enqueued or recorded commands that use _tensor_. Using
+this function to release a reference that was not obtained by creating
+the object or by calling *clRetainTensor* causes undefined behavior.
+
+*clReleaseTensor* returns CL_SUCCESS if the function is executed
+successfully. Otherwise, it returns one of the following errors:
+
+* CL_INVALID_TENSOR if tensor is not a valid tensor object.
+
+// TODO: add clSetTensorObjectDestructorCallback?
+
+To return information about a tensor object, call the function
+
+[source,c]
+----
+cl_int clGetTensorInfo(
+  cl_tensor tensor,
+  cl_tensor_info param_name,
+  size_t param_value_size,
+  void* param_value,
+  size_t* param_value_size_ret);
+----
+
+* _tensor_ specifies the tensor object being queried.
+
+* _param_name_ specifies the information to query. The list of
+  supported param_name types and the information returned in
+  _param_value_ by clGetTensorInfo is described in the <<Tensor Object
+  Queries>> table.
+
+* _param_value_ is a pointer to memory where the appropriate result
+  being queried is returned. If _param_value_ is NULL, it is ignored.
+
+* _param_value_size_ is used to specify the size in bytes of memory
+  pointed to by _param_value_. This size must be ≥ size of return type
+  as described in the <<Tensor Object Queries>> table.
+
+* _param_value_size_ret_ returns the actual size in bytes of data
+  being queried by _param_name_. If _param_value_size_ret_ is NULL, it is
+  ignored.
+
+*clGetTensorInfo* returns CL_SUCCESS if the function is executed
+ succesfully. Otherwise, it returns one of the following errors:
+
+* CL_INVALID_TENSOR if _tensor_ is not a valid tensor object.
+
+[#Tensor Object Quaries]
+.List of supported param_names by clGetTensorInfo
+[cols="2,1,2",stripes=odd]
+|===
+| CL_TENSOR_RANK  | size_t         | Return the tensor rank.
+| CL_TENSOR_SHAPE | size_t[]       | Return the tensor shape.
+| CL_TENSOR_DTYPE | cl_tensor_type | Return the tensor data type.
+
+| CL_TENSOR_COMMAND_BUFFER_TEMPORARY | cl_bool | Return true if the
+tensor is temporary tensor for command buffers.
+
+| CL_TENSOR_BOUND_TO_BUFFER | cl_bool | Return true if the tensor is
+bound to a buffer. If CL_TENSOR_COMMAND_BUFFER_TEMPORARY is true, then
+CL_TENSOR_BOUND_TO_BUFFER must return false.
+
+| CL_TENSOR_BUFFER | cl_mem a| If CL_TENSOR_BOUND_TO_BUFFER is true,
+return the buffer object the tensor is bound to. Otherwise,
+clGetTensorInfo call returns:
+
+* CL_INVALID_MEM_OBJECT if the tensor is not bound to a buffer object.
+
+* CL_INVALID_PROPERTY otherwise.
+
+| CL_TENSOR_CONTEXT | cl_context | Return the context specified when
+  the tensor object is created.
+
+| CL_TENSOR_REFERENCE_COUNT | cl_uint | Return the tensor reference
+count.
+|===
+
+To read from a tensor to host memory / buffer object or to write to a
+tensor object from host memory / buffer object call one of the functions.
+
+[source,c]
+----
+cl_int clEnqueueReadTensor(
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  cl_bool blocking_command,
+  cl_mem buffer,
+  void* host_ptr,
+  cl_uint num_events_in_wait_list,
+  const cl_event* event_wait_list,
+  cl_event* event);
+----
+
+[source,c]
+----
+cl_int clEnqueueWriteTensor(
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  cl_bool blocking_command,
+  cl_mem buffer,
+  void* host_ptr,
+  cl_uint num_events_in_wait_list,
+  const cl_event* event_wait_list,
+  cl_event* event);
+----
+
+* _command_queue_ is a valid host command-queue in which the read /
+  write command will be queued. _command_queue_ and _tensor_ must be
+  created with the same OpenCL context.
+
+* _tensor_ refers to a valid tensor object which is bound to a buffer.
+
+* _blocking_command_ indicate if the read and write operations are
+  blocking or non-blocking (see below).
+
+* _buffer_ refers to a valid buffer object where data is to be
+  read into or to be written from when the value of _host_ptr_ is
+  NULL. If _host_ptr_ is non-NULL then value of _buffer_ is ignored.
+
+* _host_ptr_ is the pointer to buffer in host memory where data is to
+  be read into or to be written from when the value is non-NULL.
+
+* _event_wait_list_ and _num_events_in_wait_list_ specify events that
+  need to complete before this particular command can be executed. If
+  _event_wait_list_ is NULL, then this particular command does not
+  wait on any event to complete. If _event_wait_list_ is NULL,
+  _num_events_in_wait_list_ must be 0. If _event_wait_list_ is not
+  NULL, the list of events pointed to by _event_wait_list_ must be
+  valid and _num_events_in_wait_list_ must be greater than 0. The
+  events specified in _event_wait_list_ act as synchronization
+  points. The context associated with events in _event_wait_list_ and
+  _command_queue_ must be the same. The memory associated with
+  _event_wait_list_ can be reused or freed after the function returns.
+
+* _event_ returns an event object that identifies this read / write
+  command and can be used to query or queue a wait for this command to
+  complete. If _event_ is NULL or the enqueue is unsuccessful, no
+  event will be created and therefore it will not be possible to query
+  the status of this command or to wait for this command to
+  complete. If _event_wait_list_ and _event_ are not NULL, _event_
+  must not refer to an element of the _event_wait_list_ array.
+
+For a read and write operation, the elements of N-dimensional tensor are
+related to host memory / buffer object as followed:
+
+----
+tensor.element(i0, i1, ..., i<N-2>, i<N-1>)) == (tensor.dtype)buffer_or_host_ptr[
+  i0 * tensor.shape[1] * tensor.shape[2] * ... * tensor.shape[N-1] +
+  i1 * tensor.shape[2] * tensor.shape[3] * ... * tensor.shape[N-1] +
+  ... +
+  i<N-2> * tensor.shape[i(N-1)] +
+  i<N-1>]
+----
+
+Where `iX` is a tensor coordinate index with inclusive range of `0..<shape[X]>`.
+
+// TODO: add clEnqueueCopyTensor
+
+// TODO: add clEnqueueFillTensor?
+
+// TODO: add command buffer variants for clEnqueue{copy,read,write}Tensor.
+
+
+==== Add New Buffer Property in Section 5.2.1
+
+[cols="2,1,2",stripes=odd]
+|===
+| CL_MEM_BIND_TO_TENSOR | cl_tensor a| Use the created buffer as
+storage for the given valid tensor. To succeed creating the buffer,
+the target tensor may not have storage already, must not have
+CL_TENSOR_COMMAND_BUFFER_TEMPORARY property set on and _size_ argument
+of the clCreateBufferWithProperties() must be zero.
+
+Size of the memory buffer is implementation-defined and it can be
+queried with clGetTensorInfo().
+
+Memory layout of the tensor in the created memory buffer is
+implementation-defined and opaque to the applications and it may
+change at unspecified points. Implementation may store auxiliary data
+in the memory buffer for the tensor. Therefore, writing data into the
+memory buffer directly using the cl_mem handle leads to undefined
+behavior.
+
+If the tensor is already bound to a buffer object,
+clCreateBufferWithProperties call returns CL_TENSOR_BOUND_TO_BUFFER
+error code.
+|===
+
+=== Sample Codes
+
+Helper functions used in the follow up tensor code samples:
+
+[source,c]
+----
+cl_kernel create_matmul_kernel(
+  cl_context ctx, std::span<cl_device_id> device_span,
+  cl_tensor lhs, cl_tensor rhs, cl_tensor out) {
+  // A hypothetical matmul kernel signature in pseudo OpenCL C for
+  // illustrative purposes:
+  //
+  //   kernel void matmul(
+  //     global read_only tensor_t,
+  //     global read_only tensor_t,
+  //     global write_only tensor_t);
+
+  cl_kernel matmul_kernel = /* Omitted. */;
+  clSetKernelArg(matmul_kernel, 0, sizeof(cl_tensor), &lhs);
+  clSetKernelArg(matmul_kernel, 1, sizeof(cl_tensor), &rhs);
+  clSetKernelArg(matmul_kernel, 2, sizeof(cl_tensor), &out);
+  return matmul_kernel;
+}
+
+cl_kernel create_matmul_kernel(
+  cl_context ctx, std::span<cl_device_id> device_span,
+  cl_tensor lhs, cl_tensor rhs, cl_tensor out) {
+  // A hypothetical add kernel signature in pseudo OpenCL C for illustrative
+  // purposes:
+  //
+  // kernel void add(
+  //     global read_only tensor_t,
+  //     global read_only tensor_t,
+  //     global write_only tensor_t);
+
+  cl_tensor add_kernel = /* Omitted. */;
+  clSetKernelArg(add_kernel, 0, sizeof(cl_tensor), &lhs);
+  clSetKernelArg(add_kernel, 1, sizeof(cl_tensor), &rhs);
+  clSetKernelArg(add_kernel, 2, sizeof(cl_tensor), &out);
+  return add_kernel;
+}
+----
+An example usage of tensors on a command queue:
+
+[source,c]
+----
+constexpr size_t b = 64, m = 100, n = 200, k = 50;
+
+cl_tensor in0 = clCreateTensor(ctx, nullptr, 3, {b, m, k}, CL_TENSOR_FLOAT, err);
+cl_tensor in1 = clCreateTensor(ctx, nullptr, 3, {b, k, n}, CL_TENSOR_FLOAT, err);
+cl_tensor in2 = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor t0  = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor out = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+
+cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0);
+cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out);
+
+// Allocate storage for the tensors. The buffer size must be set to zero
+// when the buffer is bound to a tensor. OpenCL implementation may
+// determine optimal data layout and the storage needed for it, based
+// on the tensor's uses (matmul kernel in this sample) so far.
+cl_int err;
+cl_mem in0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, in0, 0}, CL_MEM_READ_ONLY,
+  0 /* must be zero for CL_MEM_BIND_TO_TENSOR. */, nullptr, &err);
+cl_mem in1_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, in1, 0}, CL_MEM_READ_ONLY,
+  0, nullptr, &err);
+cl_mem in2_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, in2, 0}, CL_MEM_READ_ONLY,
+  0, nullptr, &err);
+cl_mem t0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, t0, 0}, CL_MEM_READ_WRITE,
+  0, nullptr, &err);
+cl_mem out_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, out, 0}, CL_MEM_WRITE_ONLY,
+  0, nullptr, &err);
+
+std::vector<float> in0_data = ...;
+std::vector<float> in1_data = ...;
+std::vector<float> out_data(b * m * n);
+
+// Copies data into in0 tensor while possibly rearranging the data to the
+// optimal data layout.
+clEnqueueWriteTensor(
+  cmd_q, in0, false, nullptr, nullptr, {b, m, k}, nullptr, in0_data.data(),
+  0, nullptr, nullptr);
+
+clEnqueueWriteTensor(
+  cmd_q, in1, false, nullptr, nullptr, {b, k, n}, nullptr, in1_data.data(),
+  0, nullptr, nullptr);
+clEnqueueNDRangeKernel(
+  cmd_q, matmul_kernel, 0, nullptr, nullptr, nullptr, 0, nullptr, nullptr);
+clEnqueueNDRangeKernel(
+  cmd_q, add_kernel, 0, nullptr, nullptr, nullptr, 0, nullptr, nullptr);
+clEnqueueReadTensor(
+  cmd_q, out, false, nullptr, nullptr, {b, m, n}, nullptr, out_data.data(),
+  0, nullptr, nullptr);
+----
+
+An example use of tensors in a command buffer when cl_khr_command_buffer
+extension is supported:
+
+[source,c]
+----
+constexpr size_t b = 64, m = 100, n = 200, k = 50;
+
+cl_int err;
+// Create tensors which are used as temporaries in a command buffer.
+// Command buffers allocate space for them as needed.
+//
+// NOTE: same temporary tensor handle used in multiple command buffers
+//       will have separate storage. IOW, command buffers may not exchange
+//       data via temporary buffers between them.
+cl_tensor in0 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
+  3, {b, m, k}, CL_TENSOR_FLOAT, err);
+cl_tensor in1 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
+  3, {b, k, n}, CL_TENSOR_FLOAT, err);
+cl_tensor in2 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
+  3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor t0  = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
+  3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor out = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
+  3, {b, m, n}, CL_TENSOR_FLOAT, err);
+
+cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0);
+cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out);
+
+// Binding a buffer to temporary tensor is not allowed.
+auto ignored = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, t0, 0}, CL_MEM_READ_WRITE, 0, nullptr, &err);
+assert(err == CL_TENSOR_IS_TEMPORARY)
+
+std::vector<float> in0_data = ...;
+std::vector<float> in1_data = ...;
+std::vector<float> out_data(b * m * n);
+
+cl_command_buffer_khr cb =
+  clCreateCommandBufferKHR(num_queues, queue_list, nullptr, &err);
+
+cl_sync_point_khr in0_syncp, in1_syncp, matmul_syncp, add_syncp;
+clCommandWriteTensorKHR(
+  cmd_b, cmd_q, in0, false, nullptr, nullptr, {b, m, k}, nullptr,
+  in0_data.data(), 0, nullptr, &in0_syncp);
+clCommandWriteTensorKHR(
+  cmd_b, cmd_q, in1, false, nullptr, nullptr, {b, k, m}, nullptr,
+  in1_data.data(), 0, nullptr, &in1_syncp);
+clCommandNDRangeKernelKHR(
+  cmd_b, cmd_q, nullptr, matmul_kernel, 0, nullptr, nullptr, nullptr,
+  2, {in0_syncp, in2_syncp}, &matmul_syncp, nullptr);
+clCommandNDRangeKernelKHR(
+  cmd_b, cmd_q, nullptr, add_kernel, 0, nullptr, nullptr, nullptr,
+  1, {matmul_syncp}, &add_syncp, nullptr);
+clCommandReadTensorKHR(
+  cmd_b, cmd_q, out,  false, nullptr, nullptr, {b, k, m}, nullptr,
+  out_data.data(), 1, {add_syncp}, nullptr);
+
+// Finalize the command buffer. At this point the OpenCL
+// implementation may reserve enough storage for all the tensor
+// temporaries. Temporary tensors might be eliminated - for example,
+// OpenCL implementation could use 'out' tensor to store result of
+// matmul_kernel , thus, eliminating the need of 't0' tensor.
+clFinalizeCommandBufferKHR(cmd_b);
+
+// Temporary tensors used in a command buffer can't be read or written
+// into. A hypothetical reason is that the finalized command buffer
+// might not use some of the tensor.
+assert(clEnqueueReadTensor(..., t0, ...) == CL_INVALID_OPERATION);
+----
+
+=== Open Questions ===
diff --git a/ext/cl_khr_tensor.html b/ext/cl_khr_tensor.html
new file mode 100644
index 000000000..878925489
--- /dev/null
+++ b/ext/cl_khr_tensor.html
@@ -0,0 +1,1228 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8">
+<meta http-equiv="X-UA-Compatible" content="IE=edge">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<meta name="generator" content="Asciidoctor 2.0.16">
+<title>cl_khr_tensor</title>
+<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Open+Sans:300,300italic,400,400italic,600,600italic%7CNoto+Serif:400,400italic,700,700italic%7CDroid+Sans+Mono:400,700">
+<style>
+/*! Asciidoctor default stylesheet | MIT License | https://asciidoctor.org */
+/* Uncomment the following line when using as a custom stylesheet */
+/* @import "https://fonts.googleapis.com/css?family=Open+Sans:300,300italic,400,400italic,600,600italic%7CNoto+Serif:400,400italic,700,700italic%7CDroid+Sans+Mono:400,700"; */
+html{font-family:sans-serif;-webkit-text-size-adjust:100%}
+a{background:none}
+a:focus{outline:thin dotted}
+a:active,a:hover{outline:0}
+h1{font-size:2em;margin:.67em 0}
+b,strong{font-weight:bold}
+abbr{font-size:.9em}
+abbr[title]{cursor:help;border-bottom:1px dotted #dddddf;text-decoration:none}
+dfn{font-style:italic}
+hr{height:0}
+mark{background:#ff0;color:#000}
+code,kbd,pre,samp{font-family:monospace;font-size:1em}
+pre{white-space:pre-wrap}
+q{quotes:"\201C" "\201D" "\2018" "\2019"}
+small{font-size:80%}
+sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}
+sup{top:-.5em}
+sub{bottom:-.25em}
+img{border:0}
+svg:not(:root){overflow:hidden}
+figure{margin:0}
+audio,video{display:inline-block}
+audio:not([controls]){display:none;height:0}
+fieldset{border:1px solid silver;margin:0 2px;padding:.35em .625em .75em}
+legend{border:0;padding:0}
+button,input,select,textarea{font-family:inherit;font-size:100%;margin:0}
+button,input{line-height:normal}
+button,select{text-transform:none}
+button,html input[type=button],input[type=reset],input[type=submit]{-webkit-appearance:button;cursor:pointer}
+button[disabled],html input[disabled]{cursor:default}
+input[type=checkbox],input[type=radio]{padding:0}
+button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0}
+textarea{overflow:auto;vertical-align:top}
+table{border-collapse:collapse;border-spacing:0}
+*,::before,::after{box-sizing:border-box}
+html,body{font-size:100%}
+body{background:#fff;color:rgba(0,0,0,.8);padding:0;margin:0;font-family:"Noto Serif","DejaVu Serif",serif;line-height:1;position:relative;cursor:auto;-moz-tab-size:4;-o-tab-size:4;tab-size:4;word-wrap:anywhere;-moz-osx-font-smoothing:grayscale;-webkit-font-smoothing:antialiased}
+a:hover{cursor:pointer}
+img,object,embed{max-width:100%;height:auto}
+object,embed{height:100%}
+img{-ms-interpolation-mode:bicubic}
+.left{float:left!important}
+.right{float:right!important}
+.text-left{text-align:left!important}
+.text-right{text-align:right!important}
+.text-center{text-align:center!important}
+.text-justify{text-align:justify!important}
+.hide{display:none}
+img,object,svg{display:inline-block;vertical-align:middle}
+textarea{height:auto;min-height:50px}
+select{width:100%}
+.subheader,.admonitionblock td.content>.title,.audioblock>.title,.exampleblock>.title,.imageblock>.title,.listingblock>.title,.literalblock>.title,.stemblock>.title,.openblock>.title,.paragraph>.title,.quoteblock>.title,table.tableblock>.title,.verseblock>.title,.videoblock>.title,.dlist>.title,.olist>.title,.ulist>.title,.qlist>.title,.hdlist>.title{line-height:1.45;color:#7a2518;font-weight:400;margin-top:0;margin-bottom:.25em}
+div,dl,dt,dd,ul,ol,li,h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6,pre,form,p,blockquote,th,td{margin:0;padding:0}
+a{color:#2156a5;text-decoration:underline;line-height:inherit}
+a:hover,a:focus{color:#1d4b8f}
+a img{border:0}
+p{line-height:1.6;margin-bottom:1.25em;text-rendering:optimizeLegibility}
+p aside{font-size:.875em;line-height:1.35;font-style:italic}
+h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{font-family:"Open Sans","DejaVu Sans",sans-serif;font-weight:300;font-style:normal;color:#ba3925;text-rendering:optimizeLegibility;margin-top:1em;margin-bottom:.5em;line-height:1.0125em}
+h1 small,h2 small,h3 small,#toctitle small,.sidebarblock>.content>.title small,h4 small,h5 small,h6 small{font-size:60%;color:#e99b8f;line-height:0}
+h1{font-size:2.125em}
+h2{font-size:1.6875em}
+h3,#toctitle,.sidebarblock>.content>.title{font-size:1.375em}
+h4,h5{font-size:1.125em}
+h6{font-size:1em}
+hr{border:solid #dddddf;border-width:1px 0 0;clear:both;margin:1.25em 0 1.1875em}
+em,i{font-style:italic;line-height:inherit}
+strong,b{font-weight:bold;line-height:inherit}
+small{font-size:60%;line-height:inherit}
+code{font-family:"Droid Sans Mono","DejaVu Sans Mono",monospace;font-weight:400;color:rgba(0,0,0,.9)}
+ul,ol,dl{line-height:1.6;margin-bottom:1.25em;list-style-position:outside;font-family:inherit}
+ul,ol{margin-left:1.5em}
+ul li ul,ul li ol{margin-left:1.25em;margin-bottom:0}
+ul.square li ul,ul.circle li ul,ul.disc li ul{list-style:inherit}
+ul.square{list-style-type:square}
+ul.circle{list-style-type:circle}
+ul.disc{list-style-type:disc}
+ol li ul,ol li ol{margin-left:1.25em;margin-bottom:0}
+dl dt{margin-bottom:.3125em;font-weight:bold}
+dl dd{margin-bottom:1.25em}
+blockquote{margin:0 0 1.25em;padding:.5625em 1.25em 0 1.1875em;border-left:1px solid #ddd}
+blockquote,blockquote p{line-height:1.6;color:rgba(0,0,0,.85)}
+@media screen and (min-width:768px){h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{line-height:1.2}
+h1{font-size:2.75em}
+h2{font-size:2.3125em}
+h3,#toctitle,.sidebarblock>.content>.title{font-size:1.6875em}
+h4{font-size:1.4375em}}
+table{background:#fff;margin-bottom:1.25em;border:1px solid #dedede;word-wrap:normal}
+table thead,table tfoot{background:#f7f8f7}
+table thead tr th,table thead tr td,table tfoot tr th,table tfoot tr td{padding:.5em .625em .625em;font-size:inherit;color:rgba(0,0,0,.8);text-align:left}
+table tr th,table tr td{padding:.5625em .625em;font-size:inherit;color:rgba(0,0,0,.8)}
+table tr.even,table tr.alt{background:#f8f8f7}
+table thead tr th,table tfoot tr th,table tbody tr td,table tr td,table tfoot tr td{line-height:1.6}
+h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{line-height:1.2;word-spacing:-.05em}
+h1 strong,h2 strong,h3 strong,#toctitle strong,.sidebarblock>.content>.title strong,h4 strong,h5 strong,h6 strong{font-weight:400}
+.center{margin-left:auto;margin-right:auto}
+.stretch{width:100%}
+.clearfix::before,.clearfix::after,.float-group::before,.float-group::after{content:" ";display:table}
+.clearfix::after,.float-group::after{clear:both}
+:not(pre).nobreak{word-wrap:normal}
+:not(pre).nowrap{white-space:nowrap}
+:not(pre).pre-wrap{white-space:pre-wrap}
+:not(pre):not([class^=L])>code{font-size:.9375em;font-style:normal!important;letter-spacing:0;padding:.1em .5ex;word-spacing:-.15em;background:#f7f7f8;border-radius:4px;line-height:1.45;text-rendering:optimizeSpeed}
+pre{color:rgba(0,0,0,.9);font-family:"Droid Sans Mono","DejaVu Sans Mono",monospace;line-height:1.45;text-rendering:optimizeSpeed}
+pre code,pre pre{color:inherit;font-size:inherit;line-height:inherit}
+pre>code{display:block}
+pre.nowrap,pre.nowrap pre{white-space:pre;word-wrap:normal}
+em em{font-style:normal}
+strong strong{font-weight:400}
+.keyseq{color:rgba(51,51,51,.8)}
+kbd{font-family:"Droid Sans Mono","DejaVu Sans Mono",monospace;display:inline-block;color:rgba(0,0,0,.8);font-size:.65em;line-height:1.45;background:#f7f7f7;border:1px solid #ccc;border-radius:3px;box-shadow:0 1px 0 rgba(0,0,0,.2),inset 0 0 0 .1em #fff;margin:0 .15em;padding:.2em .5em;vertical-align:middle;position:relative;top:-.1em;white-space:nowrap}
+.keyseq kbd:first-child{margin-left:0}
+.keyseq kbd:last-child{margin-right:0}
+.menuseq,.menuref{color:#000}
+.menuseq b:not(.caret),.menuref{font-weight:inherit}
+.menuseq{word-spacing:-.02em}
+.menuseq b.caret{font-size:1.25em;line-height:.8}
+.menuseq i.caret{font-weight:bold;text-align:center;width:.45em}
+b.button::before,b.button::after{position:relative;top:-1px;font-weight:400}
+b.button::before{content:"[";padding:0 3px 0 2px}
+b.button::after{content:"]";padding:0 2px 0 3px}
+p a>code:hover{color:rgba(0,0,0,.9)}
+#header,#content,#footnotes,#footer{width:100%;margin:0 auto;max-width:62.5em;*zoom:1;position:relative;padding-left:.9375em;padding-right:.9375em}
+#header::before,#header::after,#content::before,#content::after,#footnotes::before,#footnotes::after,#footer::before,#footer::after{content:" ";display:table}
+#header::after,#content::after,#footnotes::after,#footer::after{clear:both}
+#content{margin-top:1.25em}
+#content::before{content:none}
+#header>h1:first-child{color:rgba(0,0,0,.85);margin-top:2.25rem;margin-bottom:0}
+#header>h1:first-child+#toc{margin-top:8px;border-top:1px solid #dddddf}
+#header>h1:only-child,body.toc2 #header>h1:nth-last-child(2){border-bottom:1px solid #dddddf;padding-bottom:8px}
+#header .details{border-bottom:1px solid #dddddf;line-height:1.45;padding-top:.25em;padding-bottom:.25em;padding-left:.25em;color:rgba(0,0,0,.6);display:flex;flex-flow:row wrap}
+#header .details span:first-child{margin-left:-.125em}
+#header .details span.email a{color:rgba(0,0,0,.85)}
+#header .details br{display:none}
+#header .details br+span::before{content:"\00a0\2013\00a0"}
+#header .details br+span.author::before{content:"\00a0\22c5\00a0";color:rgba(0,0,0,.85)}
+#header .details br+span#revremark::before{content:"\00a0|\00a0"}
+#header #revnumber{text-transform:capitalize}
+#header #revnumber::after{content:"\00a0"}
+#content>h1:first-child:not([class]){color:rgba(0,0,0,.85);border-bottom:1px solid #dddddf;padding-bottom:8px;margin-top:0;padding-top:1rem;margin-bottom:1.25rem}
+#toc{border-bottom:1px solid #e7e7e9;padding-bottom:.5em}
+#toc>ul{margin-left:.125em}
+#toc ul.sectlevel0>li>a{font-style:italic}
+#toc ul.sectlevel0 ul.sectlevel1{margin:.5em 0}
+#toc ul{font-family:"Open Sans","DejaVu Sans",sans-serif;list-style-type:none}
+#toc li{line-height:1.3334;margin-top:.3334em}
+#toc a{text-decoration:none}
+#toc a:active{text-decoration:underline}
+#toctitle{color:#7a2518;font-size:1.2em}
+@media screen and (min-width:768px){#toctitle{font-size:1.375em}
+body.toc2{padding-left:15em;padding-right:0}
+#toc.toc2{margin-top:0!important;background:#f8f8f7;position:fixed;width:15em;left:0;top:0;border-right:1px solid #e7e7e9;border-top-width:0!important;border-bottom-width:0!important;z-index:1000;padding:1.25em 1em;height:100%;overflow:auto}
+#toc.toc2 #toctitle{margin-top:0;margin-bottom:.8rem;font-size:1.2em}
+#toc.toc2>ul{font-size:.9em;margin-bottom:0}
+#toc.toc2 ul ul{margin-left:0;padding-left:1em}
+#toc.toc2 ul.sectlevel0 ul.sectlevel1{padding-left:0;margin-top:.5em;margin-bottom:.5em}
+body.toc2.toc-right{padding-left:0;padding-right:15em}
+body.toc2.toc-right #toc.toc2{border-right-width:0;border-left:1px solid #e7e7e9;left:auto;right:0}}
+@media screen and (min-width:1280px){body.toc2{padding-left:20em;padding-right:0}
+#toc.toc2{width:20em}
+#toc.toc2 #toctitle{font-size:1.375em}
+#toc.toc2>ul{font-size:.95em}
+#toc.toc2 ul ul{padding-left:1.25em}
+body.toc2.toc-right{padding-left:0;padding-right:20em}}
+#content #toc{border:1px solid #e0e0dc;margin-bottom:1.25em;padding:1.25em;background:#f8f8f7;border-radius:4px}
+#content #toc>:first-child{margin-top:0}
+#content #toc>:last-child{margin-bottom:0}
+#footer{max-width:none;background:rgba(0,0,0,.8);padding:1.25em}
+#footer-text{color:hsla(0,0%,100%,.8);line-height:1.44}
+#content{margin-bottom:.625em}
+.sect1{padding-bottom:.625em}
+@media screen and (min-width:768px){#content{margin-bottom:1.25em}
+.sect1{padding-bottom:1.25em}}
+.sect1:last-child{padding-bottom:0}
+.sect1+.sect1{border-top:1px solid #e7e7e9}
+#content h1>a.anchor,h2>a.anchor,h3>a.anchor,#toctitle>a.anchor,.sidebarblock>.content>.title>a.anchor,h4>a.anchor,h5>a.anchor,h6>a.anchor{position:absolute;z-index:1001;width:1.5ex;margin-left:-1.5ex;display:block;text-decoration:none!important;visibility:hidden;text-align:center;font-weight:400}
+#content h1>a.anchor::before,h2>a.anchor::before,h3>a.anchor::before,#toctitle>a.anchor::before,.sidebarblock>.content>.title>a.anchor::before,h4>a.anchor::before,h5>a.anchor::before,h6>a.anchor::before{content:"\00A7";font-size:.85em;display:block;padding-top:.1em}
+#content h1:hover>a.anchor,#content h1>a.anchor:hover,h2:hover>a.anchor,h2>a.anchor:hover,h3:hover>a.anchor,#toctitle:hover>a.anchor,.sidebarblock>.content>.title:hover>a.anchor,h3>a.anchor:hover,#toctitle>a.anchor:hover,.sidebarblock>.content>.title>a.anchor:hover,h4:hover>a.anchor,h4>a.anchor:hover,h5:hover>a.anchor,h5>a.anchor:hover,h6:hover>a.anchor,h6>a.anchor:hover{visibility:visible}
+#content h1>a.link,h2>a.link,h3>a.link,#toctitle>a.link,.sidebarblock>.content>.title>a.link,h4>a.link,h5>a.link,h6>a.link{color:#ba3925;text-decoration:none}
+#content h1>a.link:hover,h2>a.link:hover,h3>a.link:hover,#toctitle>a.link:hover,.sidebarblock>.content>.title>a.link:hover,h4>a.link:hover,h5>a.link:hover,h6>a.link:hover{color:#a53221}
+details,.audioblock,.imageblock,.literalblock,.listingblock,.stemblock,.videoblock{margin-bottom:1.25em}
+details{margin-left:1.25rem}
+details>summary{cursor:pointer;display:block;position:relative;line-height:1.6;margin-bottom:.625rem;-webkit-tap-highlight-color:transparent}
+details>summary::before{content:"";border:solid transparent;border-left:solid;border-width:.3em 0 .3em .5em;position:absolute;top:.5em;left:-1.25rem;transform:translateX(15%)}
+details[open]>summary::before{border:solid transparent;border-top:solid;border-width:.5em .3em 0;transform:translateY(15%)}
+details>summary::after{content:"";width:1.25rem;height:1em;position:absolute;top:.3em;left:-1.25rem}
+.admonitionblock td.content>.title,.audioblock>.title,.exampleblock>.title,.imageblock>.title,.listingblock>.title,.literalblock>.title,.stemblock>.title,.openblock>.title,.paragraph>.title,.quoteblock>.title,table.tableblock>.title,.verseblock>.title,.videoblock>.title,.dlist>.title,.olist>.title,.ulist>.title,.qlist>.title,.hdlist>.title{text-rendering:optimizeLegibility;text-align:left;font-family:"Noto Serif","DejaVu Serif",serif;font-size:1rem;font-style:italic}
+table.tableblock.fit-content>caption.title{white-space:nowrap;width:0}
+.paragraph.lead>p,#preamble>.sectionbody>[class=paragraph]:first-of-type p{font-size:1.21875em;line-height:1.6;color:rgba(0,0,0,.85)}
+.admonitionblock>table{border-collapse:separate;border:0;background:none;width:100%}
+.admonitionblock>table td.icon{text-align:center;width:80px}
+.admonitionblock>table td.icon img{max-width:none}
+.admonitionblock>table td.icon .title{font-weight:bold;font-family:"Open Sans","DejaVu Sans",sans-serif;text-transform:uppercase}
+.admonitionblock>table td.content{padding-left:1.125em;padding-right:1.25em;border-left:1px solid #dddddf;color:rgba(0,0,0,.6);word-wrap:anywhere}
+.admonitionblock>table td.content>:last-child>:last-child{margin-bottom:0}
+.exampleblock>.content{border:1px solid #e6e6e6;margin-bottom:1.25em;padding:1.25em;background:#fff;border-radius:4px}
+.exampleblock>.content>:first-child{margin-top:0}
+.exampleblock>.content>:last-child{margin-bottom:0}
+.sidebarblock{border:1px solid #dbdbd6;margin-bottom:1.25em;padding:1.25em;background:#f3f3f2;border-radius:4px}
+.sidebarblock>:first-child{margin-top:0}
+.sidebarblock>:last-child{margin-bottom:0}
+.sidebarblock>.content>.title{color:#7a2518;margin-top:0;text-align:center}
+.exampleblock>.content>:last-child>:last-child,.exampleblock>.content .olist>ol>li:last-child>:last-child,.exampleblock>.content .ulist>ul>li:last-child>:last-child,.exampleblock>.content .qlist>ol>li:last-child>:last-child,.sidebarblock>.content>:last-child>:last-child,.sidebarblock>.content .olist>ol>li:last-child>:last-child,.sidebarblock>.content .ulist>ul>li:last-child>:last-child,.sidebarblock>.content .qlist>ol>li:last-child>:last-child{margin-bottom:0}
+.literalblock pre,.listingblock>.content>pre{border-radius:4px;overflow-x:auto;padding:1em;font-size:.8125em}
+@media screen and (min-width:768px){.literalblock pre,.listingblock>.content>pre{font-size:.90625em}}
+@media screen and (min-width:1280px){.literalblock pre,.listingblock>.content>pre{font-size:1em}}
+.literalblock pre,.listingblock>.content>pre:not(.highlight),.listingblock>.content>pre[class=highlight],.listingblock>.content>pre[class^="highlight "]{background:#f7f7f8}
+.literalblock.output pre{color:#f7f7f8;background:rgba(0,0,0,.9)}
+.listingblock>.content{position:relative}
+.listingblock code[data-lang]::before{display:none;content:attr(data-lang);position:absolute;font-size:.75em;top:.425rem;right:.5rem;line-height:1;text-transform:uppercase;color:inherit;opacity:.5}
+.listingblock:hover code[data-lang]::before{display:block}
+.listingblock.terminal pre .command::before{content:attr(data-prompt);padding-right:.5em;color:inherit;opacity:.5}
+.listingblock.terminal pre .command:not([data-prompt])::before{content:"$"}
+.listingblock pre.highlightjs{padding:0}
+.listingblock pre.highlightjs>code{padding:1em;border-radius:4px}
+.listingblock pre.prettyprint{border-width:0}
+.prettyprint{background:#f7f7f8}
+pre.prettyprint .linenums{line-height:1.45;margin-left:2em}
+pre.prettyprint li{background:none;list-style-type:inherit;padding-left:0}
+pre.prettyprint li code[data-lang]::before{opacity:1}
+pre.prettyprint li:not(:first-child) code[data-lang]::before{display:none}
+table.linenotable{border-collapse:separate;border:0;margin-bottom:0;background:none}
+table.linenotable td[class]{color:inherit;vertical-align:top;padding:0;line-height:inherit;white-space:normal}
+table.linenotable td.code{padding-left:.75em}
+table.linenotable td.linenos{border-right:1px solid;opacity:.35;padding-right:.5em}
+pre.pygments .lineno{border-right:1px solid;opacity:.35;display:inline-block;margin-right:.75em}
+pre.pygments .lineno::before{content:"";margin-right:-.125em}
+.quoteblock{margin:0 1em 1.25em 1.5em;display:table}
+.quoteblock:not(.excerpt)>.title{margin-left:-1.5em;margin-bottom:.75em}
+.quoteblock blockquote,.quoteblock p{color:rgba(0,0,0,.85);font-size:1.15rem;line-height:1.75;word-spacing:.1em;letter-spacing:0;font-style:italic;text-align:justify}
+.quoteblock blockquote{margin:0;padding:0;border:0}
+.quoteblock blockquote::before{content:"\201c";float:left;font-size:2.75em;font-weight:bold;line-height:.6em;margin-left:-.6em;color:#7a2518;text-shadow:0 1px 2px rgba(0,0,0,.1)}
+.quoteblock blockquote>.paragraph:last-child p{margin-bottom:0}
+.quoteblock .attribution{margin-top:.75em;margin-right:.5ex;text-align:right}
+.verseblock{margin:0 1em 1.25em}
+.verseblock pre{font-family:"Open Sans","DejaVu Sans",sans-serif;font-size:1.15rem;color:rgba(0,0,0,.85);font-weight:300;text-rendering:optimizeLegibility}
+.verseblock pre strong{font-weight:400}
+.verseblock .attribution{margin-top:1.25rem;margin-left:.5ex}
+.quoteblock .attribution,.verseblock .attribution{font-size:.9375em;line-height:1.45;font-style:italic}
+.quoteblock .attribution br,.verseblock .attribution br{display:none}
+.quoteblock .attribution cite,.verseblock .attribution cite{display:block;letter-spacing:-.025em;color:rgba(0,0,0,.6)}
+.quoteblock.abstract blockquote::before,.quoteblock.excerpt blockquote::before,.quoteblock .quoteblock blockquote::before{display:none}
+.quoteblock.abstract blockquote,.quoteblock.abstract p,.quoteblock.excerpt blockquote,.quoteblock.excerpt p,.quoteblock .quoteblock blockquote,.quoteblock .quoteblock p{line-height:1.6;word-spacing:0}
+.quoteblock.abstract{margin:0 1em 1.25em;display:block}
+.quoteblock.abstract>.title{margin:0 0 .375em;font-size:1.15em;text-align:center}
+.quoteblock.excerpt>blockquote,.quoteblock .quoteblock{padding:0 0 .25em 1em;border-left:.25em solid #dddddf}
+.quoteblock.excerpt,.quoteblock .quoteblock{margin-left:0}
+.quoteblock.excerpt blockquote,.quoteblock.excerpt p,.quoteblock .quoteblock blockquote,.quoteblock .quoteblock p{color:inherit;font-size:1.0625rem}
+.quoteblock.excerpt .attribution,.quoteblock .quoteblock .attribution{color:inherit;font-size:.85rem;text-align:left;margin-right:0}
+p.tableblock:last-child{margin-bottom:0}
+td.tableblock>.content{margin-bottom:1.25em;word-wrap:anywhere}
+td.tableblock>.content>:last-child{margin-bottom:-1.25em}
+table.tableblock,th.tableblock,td.tableblock{border:0 solid #dedede}
+table.grid-all>*>tr>*{border-width:1px}
+table.grid-cols>*>tr>*{border-width:0 1px}
+table.grid-rows>*>tr>*{border-width:1px 0}
+table.frame-all{border-width:1px}
+table.frame-ends{border-width:1px 0}
+table.frame-sides{border-width:0 1px}
+table.frame-none>colgroup+*>:first-child>*,table.frame-sides>colgroup+*>:first-child>*{border-top-width:0}
+table.frame-none>:last-child>:last-child>*,table.frame-sides>:last-child>:last-child>*{border-bottom-width:0}
+table.frame-none>*>tr>:first-child,table.frame-ends>*>tr>:first-child{border-left-width:0}
+table.frame-none>*>tr>:last-child,table.frame-ends>*>tr>:last-child{border-right-width:0}
+table.stripes-all tr,table.stripes-odd tr:nth-of-type(odd),table.stripes-even tr:nth-of-type(even),table.stripes-hover tr:hover{background:#f8f8f7}
+th.halign-left,td.halign-left{text-align:left}
+th.halign-right,td.halign-right{text-align:right}
+th.halign-center,td.halign-center{text-align:center}
+th.valign-top,td.valign-top{vertical-align:top}
+th.valign-bottom,td.valign-bottom{vertical-align:bottom}
+th.valign-middle,td.valign-middle{vertical-align:middle}
+table thead th,table tfoot th{font-weight:bold}
+tbody tr th{background:#f7f8f7}
+tbody tr th,tbody tr th p,tfoot tr th,tfoot tr th p{color:rgba(0,0,0,.8);font-weight:bold}
+p.tableblock>code:only-child{background:none;padding:0}
+p.tableblock{font-size:1em}
+ol{margin-left:1.75em}
+ul li ol{margin-left:1.5em}
+dl dd{margin-left:1.125em}
+dl dd:last-child,dl dd:last-child>:last-child{margin-bottom:0}
+ol>li p,ul>li p,ul dd,ol dd,.olist .olist,.ulist .ulist,.ulist .olist,.olist .ulist{margin-bottom:.625em}
+ul.checklist,ul.none,ol.none,ul.no-bullet,ol.no-bullet,ol.unnumbered,ul.unstyled,ol.unstyled{list-style-type:none}
+ul.no-bullet,ol.no-bullet,ol.unnumbered{margin-left:.625em}
+ul.unstyled,ol.unstyled{margin-left:0}
+ul.checklist>li>p:first-child{margin-left:-1em}
+ul.checklist>li>p:first-child>.fa-square-o:first-child,ul.checklist>li>p:first-child>.fa-check-square-o:first-child{width:1.25em;font-size:.8em;position:relative;bottom:.125em}
+ul.checklist>li>p:first-child>input[type=checkbox]:first-child{margin-right:.25em}
+ul.inline{display:flex;flex-flow:row wrap;list-style:none;margin:0 0 .625em -1.25em}
+ul.inline>li{margin-left:1.25em}
+.unstyled dl dt{font-weight:400;font-style:normal}
+ol.arabic{list-style-type:decimal}
+ol.decimal{list-style-type:decimal-leading-zero}
+ol.loweralpha{list-style-type:lower-alpha}
+ol.upperalpha{list-style-type:upper-alpha}
+ol.lowerroman{list-style-type:lower-roman}
+ol.upperroman{list-style-type:upper-roman}
+ol.lowergreek{list-style-type:lower-greek}
+.hdlist>table,.colist>table{border:0;background:none}
+.hdlist>table>tbody>tr,.colist>table>tbody>tr{background:none}
+td.hdlist1,td.hdlist2{vertical-align:top;padding:0 .625em}
+td.hdlist1{font-weight:bold;padding-bottom:1.25em}
+td.hdlist2{word-wrap:anywhere}
+.literalblock+.colist,.listingblock+.colist{margin-top:-.5em}
+.colist td:not([class]):first-child{padding:.4em .75em 0;line-height:1;vertical-align:top}
+.colist td:not([class]):first-child img{max-width:none}
+.colist td:not([class]):last-child{padding:.25em 0}
+.thumb,.th{line-height:0;display:inline-block;border:4px solid #fff;box-shadow:0 0 0 1px #ddd}
+.imageblock.left{margin:.25em .625em 1.25em 0}
+.imageblock.right{margin:.25em 0 1.25em .625em}
+.imageblock>.title{margin-bottom:0}
+.imageblock.thumb,.imageblock.th{border-width:6px}
+.imageblock.thumb>.title,.imageblock.th>.title{padding:0 .125em}
+.image.left,.image.right{margin-top:.25em;margin-bottom:.25em;display:inline-block;line-height:0}
+.image.left{margin-right:.625em}
+.image.right{margin-left:.625em}
+a.image{text-decoration:none;display:inline-block}
+a.image object{pointer-events:none}
+sup.footnote,sup.footnoteref{font-size:.875em;position:static;vertical-align:super}
+sup.footnote a,sup.footnoteref a{text-decoration:none}
+sup.footnote a:active,sup.footnoteref a:active{text-decoration:underline}
+#footnotes{padding-top:.75em;padding-bottom:.75em;margin-bottom:.625em}
+#footnotes hr{width:20%;min-width:6.25em;margin:-.25em 0 .75em;border-width:1px 0 0}
+#footnotes .footnote{padding:0 .375em 0 .225em;line-height:1.3334;font-size:.875em;margin-left:1.2em;margin-bottom:.2em}
+#footnotes .footnote a:first-of-type{font-weight:bold;text-decoration:none;margin-left:-1.05em}
+#footnotes .footnote:last-of-type{margin-bottom:0}
+#content #footnotes{margin-top:-.625em;margin-bottom:0;padding:.75em 0}
+.gist .file-data>table{border:0;background:#fff;width:100%;margin-bottom:0}
+.gist .file-data>table td.line-data{width:99%}
+div.unbreakable{page-break-inside:avoid}
+.big{font-size:larger}
+.small{font-size:smaller}
+.underline{text-decoration:underline}
+.overline{text-decoration:overline}
+.line-through{text-decoration:line-through}
+.aqua{color:#00bfbf}
+.aqua-background{background:#00fafa}
+.black{color:#000}
+.black-background{background:#000}
+.blue{color:#0000bf}
+.blue-background{background:#0000fa}
+.fuchsia{color:#bf00bf}
+.fuchsia-background{background:#fa00fa}
+.gray{color:#606060}
+.gray-background{background:#7d7d7d}
+.green{color:#006000}
+.green-background{background:#007d00}
+.lime{color:#00bf00}
+.lime-background{background:#00fa00}
+.maroon{color:#600000}
+.maroon-background{background:#7d0000}
+.navy{color:#000060}
+.navy-background{background:#00007d}
+.olive{color:#606000}
+.olive-background{background:#7d7d00}
+.purple{color:#600060}
+.purple-background{background:#7d007d}
+.red{color:#bf0000}
+.red-background{background:#fa0000}
+.silver{color:#909090}
+.silver-background{background:#bcbcbc}
+.teal{color:#006060}
+.teal-background{background:#007d7d}
+.white{color:#bfbfbf}
+.white-background{background:#fafafa}
+.yellow{color:#bfbf00}
+.yellow-background{background:#fafa00}
+span.icon>.fa{cursor:default}
+a span.icon>.fa{cursor:inherit}
+.admonitionblock td.icon [class^="fa icon-"]{font-size:2.5em;text-shadow:1px 1px 2px rgba(0,0,0,.5);cursor:default}
+.admonitionblock td.icon .icon-note::before{content:"\f05a";color:#19407c}
+.admonitionblock td.icon .icon-tip::before{content:"\f0eb";text-shadow:1px 1px 2px rgba(155,155,0,.8);color:#111}
+.admonitionblock td.icon .icon-warning::before{content:"\f071";color:#bf6900}
+.admonitionblock td.icon .icon-caution::before{content:"\f06d";color:#bf3400}
+.admonitionblock td.icon .icon-important::before{content:"\f06a";color:#bf0000}
+.conum[data-value]{display:inline-block;color:#fff!important;background:rgba(0,0,0,.8);border-radius:50%;text-align:center;font-size:.75em;width:1.67em;height:1.67em;line-height:1.67em;font-family:"Open Sans","DejaVu Sans",sans-serif;font-style:normal;font-weight:bold}
+.conum[data-value] *{color:#fff!important}
+.conum[data-value]+b{display:none}
+.conum[data-value]::after{content:attr(data-value)}
+pre .conum[data-value]{position:relative;top:-.125em}
+b.conum *{color:inherit!important}
+.conum:not([data-value]):empty{display:none}
+dt,th.tableblock,td.content,div.footnote{text-rendering:optimizeLegibility}
+h1,h2,p,td.content,span.alt,summary{letter-spacing:-.01em}
+p strong,td.content strong,div.footnote strong{letter-spacing:-.005em}
+p,blockquote,dt,td.content,span.alt,summary{font-size:1.0625rem}
+p{margin-bottom:1.25rem}
+.sidebarblock p,.sidebarblock dt,.sidebarblock td.content,p.tableblock{font-size:1em}
+.exampleblock>.content{background:#fffef7;border-color:#e0e0dc;box-shadow:0 1px 4px #e0e0dc}
+.print-only{display:none!important}
+@page{margin:1.25cm .75cm}
+@media print{*{box-shadow:none!important;text-shadow:none!important}
+html{font-size:80%}
+a{color:inherit!important;text-decoration:underline!important}
+a.bare,a[href^="#"],a[href^="mailto:"]{text-decoration:none!important}
+a[href^="http:"]:not(.bare)::after,a[href^="https:"]:not(.bare)::after{content:"(" attr(href) ")";display:inline-block;font-size:.875em;padding-left:.25em}
+abbr[title]{border-bottom:1px dotted}
+abbr[title]::after{content:" (" attr(title) ")"}
+pre,blockquote,tr,img,object,svg{page-break-inside:avoid}
+thead{display:table-header-group}
+svg{max-width:100%}
+p,blockquote,dt,td.content{font-size:1em;orphans:3;widows:3}
+h2,h3,#toctitle,.sidebarblock>.content>.title{page-break-after:avoid}
+#header,#content,#footnotes,#footer{max-width:none}
+#toc,.sidebarblock,.exampleblock>.content{background:none!important}
+#toc{border-bottom:1px solid #dddddf!important;padding-bottom:0!important}
+body.book #header{text-align:center}
+body.book #header>h1:first-child{border:0!important;margin:2.5em 0 1em}
+body.book #header .details{border:0!important;display:block;padding:0!important}
+body.book #header .details span:first-child{margin-left:0!important}
+body.book #header .details br{display:block}
+body.book #header .details br+span::before{content:none!important}
+body.book #toc{border:0!important;text-align:left!important;padding:0!important;margin:0!important}
+body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-break-before:always}
+.listingblock code[data-lang]::before{display:block}
+#footer{padding:0 .9375em}
+.hide-on-print{display:none!important}
+.print-only{display:block!important}
+.hide-for-print{display:none!important}
+.show-for-print{display:inherit!important}}
+@media amzn-kf8,print{#header>h1:first-child{margin-top:1.25rem}
+.sect1{padding:0!important}
+.sect1+.sect1{border:0}
+#footer{background:none}
+#footer-text{color:rgba(0,0,0,.6);font-size:.9em}}
+@media amzn-kf8{#header,#content,#footnotes,#footer{padding:0}}
+</style>
+</head>
+<body class="article">
+<div id="header">
+<h1>cl_khr_tensor</h1>
+</div>
+<div id="content">
+<div class="sect1">
+<h2 id="cl_khr_tensor">Tensor Data Type</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Purpose of this extension is to provide &#8230;&#8203;</p>
+</div>
+<div class="sect2">
+<h3 id="_general_information">General information</h3>
+<div class="sect3">
+<h4 id="_name_strings">Name Strings</h4>
+<div class="paragraph">
+<p><code>cl_khr_tensor</code></p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_version_history">Version history</h4>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 20%;">
+<col style="width: 20%;">
+<col style="width: 60%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Date</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Version</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">2023-10-XX</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">0.1.0</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">First assigned version.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+<div class="sect3">
+<h4 id="_dependencies">Dependencies</h4>
+<div class="paragraph">
+<p>This extension is written against the OpenCL Specification version 3.0.14.</p>
+</div>
+<div class="paragraph">
+<p>This extension requires OpenCL 1.2 or later.</p>
+</div>
+<div class="paragraph">
+<p>This extension requires cl_khr_command_buffer.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_contributors">Contributors</h4>
+<div class="paragraph">
+<p>Henry Linjamäki, Intel.<br></p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_overview">Overview</h3>
+
+</div>
+<div class="sect2">
+<h3 id="_modifications_to_opencl">Modifications to OpenCL</h3>
+<div class="sect3">
+<h4 id="_new_opencl_functions">New OpenCL Functions</h4>
+<div class="paragraph">
+<p>To create a tensor use:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_tensor clCreateTensor(
+    cl_context context,
+    const cl_tensor_peoperties *properties,
+    size_t rank,
+    size_t shape,
+    cl_tensor_type dtype,
+    cl_int *errcode_ret);</code></pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>context</em> is a valid OpenCL context used to create the tensor object.</p>
+</li>
+<li>
+<p><em>properties</em> is an optional list of properties for the tensor object
+and their corresponding values. The list is terminated with the
+special property 0. If no properties are required, properties may be
+NULL.</p>
+</li>
+<li>
+<p><em>rank</em> is the number of dimensions. Zero value creates a "scalar"
+tensor which has no dimensions but has storage for one element.</p>
+</li>
+<li>
+<p><em>shape</em> is a list of sizes of the dimensions. The length of the list
+must be <em>rank</em> elements. <em>shape</em> can be NULL if <em>rank</em> value is
+zero. All the first <em>rank</em> values in the list must be non-zero.</p>
+</li>
+<li>
+<p><em>dtype</em> is the element type of <em>tensor</em>. Refer to the
+<a href="#TensorDtypes">Tensor element types</a> table for the types.</p>
+</li>
+<li>
+<p><em>errcode_ret</em> may return an appropriate error code. If errcode_ret
+is NULL, no error code is returned.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>clCreateTensor function creates a <code>rank</code>-dimensional tensor with
+<code>shape[0] * shape[1] * &#8230;&#8203; * shape[rank-1]</code> elements of <em>dtype</em>
+type. At the creation time of the tensor, it does not have
+storage. The storage is assigned to the tensor either by:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>calling clCreateBufferWithProperties() with CL_MEM_BIND_TO_TENSOR or</p>
+</li>
+<li>
+<p>automatically by command buffers - possibly on-demand basis - if the
+tensor is created with CL_TENSOR_COMMAND_BUFFER_TEMPORARY property
+set on.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>A command that refers to a tensor must be bound to a valid buffer
+object before enqueuing the command into a command queue unless the
+command is recorded in a command buffer and
+CL_TENSOR_COMMAND_BUFFER_TEMPORARY is set to true.</p>
+</div>
+<div class="paragraph">
+<p><strong>clCreateTensor</strong> returns a valid non-zero tensor object and errcode_ret
+is set to CL_SUCCESS if the tensor object is created
+successfully. Otherwise, they return a NULL value with one of the
+following error values returned in errcode_ret:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_CONTEXT if context is not a valid context.</p>
+</li>
+<li>
+<p>CL_INVALID_PROPERTY if a property name in properties is not a
+supported property name, if the value specified for a supported
+property name is not valid, or if the same property name is
+specified more than once.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if a value specified in dtype is invalid.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<table id="TensorDtypes" class="tableblock frame-all grid-all stripes-odd stretch">
+<caption class="title">Table 1. Tensor element types</caption>
+<colgroup>
+<col style="width: 33.3333%;">
+<col style="width: 66.6667%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Tensor element data type</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_BOOL</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1-bit signedless integer.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_INT8</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">8-bit signed integer.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_INT16</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">16-bit signed integer.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_INT32</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">32-bit signed integer.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_INT64</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">64-bit signed integer.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_UINT8</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">8-bit signed integer.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_UINT16</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">16-bit signed integer.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_UINT32</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">32-bit signed integer.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_UINT64</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">64-bit signed integer.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_HALF</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Half precision floating-point value.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_BFLOAT16</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">16-bit brain floating-point value.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_FLOAT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Single precision floating-point value.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_DOUBLE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Double precision floating-point value.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_COMPLEX64</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">64-bit complex floating point value with
+  32-bit real and imaginary part.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_COMPLEX128</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">128-bit complex floating point value with
+  64-bit real and imaginary part.</p></td>
+</tr>
+</tbody>
+</table>
+<table class="tableblock frame-all grid-all stripes-odd stretch">
+<caption class="title">Table 2. Tensor properties</caption>
+<colgroup>
+<col style="width: 40%;">
+<col style="width: 20%;">
+<col style="width: 40%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Tensor Property</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Property Value</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_COMMAND_BUFFER_TEMPORARY</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
+<p>If the value is true, create a "temporary" tensor that only can be
+used on commands recorded in command buffers. The storage of the
+temporary tensors are managed by command buffers. When a temporary
+tensor is used by multiple command buffer, the tensor receive separate
+storage for each command buffer.</p>
+</div>
+<div class="paragraph">
+<p>Temporary tensors may not be bound to buffer objects.</p>
+</div>
+<div class="paragraph">
+<p>Data stored in temporary tensors are not preserved across command
+buffer executions.</p>
+</div></div></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>To retain a tensor object, call the function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clRetainTensorObject(
+  cl_tensor tensor);</code></pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>tensor</em> is the tensor object to be retained.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The <em>tensor</em> reference count is incremented.</p>
+</div>
+<div class="paragraph">
+<p><strong>clRetainTensor</strong> returns CL_SUCCESS if the function is executed
+successfully. Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_TENSOR if tensor is not a valid tensor object.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>To release a tensor object, call the function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clReleaseTensorObject(
+  cl_tensor tensor);</code></pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>tensor</em> is the tensor object to be released.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The <em>tensor</em> reference count is decremented.</p>
+</div>
+<div class="paragraph">
+<p>The tensor object is deleted once the number of instances that are
+retained to tensor become zero and the tensor object is no longer
+needed by any enqueued or recorded commands that use <em>tensor</em>. Using
+this function to release a reference that was not obtained by creating
+the object or by calling <strong>clRetainTensor</strong> causes undefined behavior.</p>
+</div>
+<div class="paragraph">
+<p><strong>clReleaseTensor</strong> returns CL_SUCCESS if the function is executed
+successfully. Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_TENSOR if tensor is not a valid tensor object.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>To return information about a tensor object, call the function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clGetTensorInfo(
+  cl_tensor tensor,
+  cl_tensor_info param_name,
+  size_t param_value_size,
+  void* param_value,
+  size_t* param_value_size_ret);</code></pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>tensor</em> specifies the tensor object being queried.</p>
+</li>
+<li>
+<p><em>param_name</em> specifies the information to query. The list of
+supported param_name types and the information returned in
+<em>param_value</em> by clGetTensorInfo is described in the <a href="#Tensor Object
+Queries">[Tensor Object
+Queries]</a> table.</p>
+</li>
+<li>
+<p><em>param_value</em> is a pointer to memory where the appropriate result
+being queried is returned. If <em>param_value</em> is NULL, it is ignored.</p>
+</li>
+<li>
+<p><em>param_value_size</em> is used to specify the size in bytes of memory
+pointed to by <em>param_value</em>. This size must be ≥ size of return type
+as described in the <a href="#Tensor Object Queries">[Tensor Object Queries]</a> table.</p>
+</li>
+<li>
+<p><em>param_value_size_ret</em> returns the actual size in bytes of data
+being queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
+ignored.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p><strong>clGetTensorInfo</strong> returns CL_SUCCESS if the function is executed
+ succesfully. Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_TENSOR if <em>tensor</em> is not a valid tensor object.</p>
+</li>
+</ul>
+</div>
+<table class="tableblock frame-all grid-all stripes-odd stretch">
+<caption class="title">Table 3. List of supported param_names by clGetTensorInfo</caption>
+<colgroup>
+<col style="width: 40%;">
+<col style="width: 20%;">
+<col style="width: 40%;">
+</colgroup>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_RANK</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the tensor rank.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_SHAPE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the tensor shape.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_DTYPE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_tensor_type</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the tensor data type.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_COMMAND_BUFFER_TEMPORARY</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return true if the
+tensor is temporary tensor for command buffers.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_BOUND_TO_BUFFER</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return true if the tensor is
+bound to a buffer. If CL_TENSOR_COMMAND_BUFFER_TEMPORARY is true, then
+CL_TENSOR_BOUND_TO_BUFFER must return false.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_BUFFER</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_mem</p></td>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
+<p>If CL_TENSOR_BOUND_TO_BUFFER is true,
+return the buffer object the tensor is bound to. Otherwise,
+clGetTensorInfo call returns:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_MEM_OBJECT if the tensor is not bound to a buffer object.</p>
+</li>
+<li>
+<p>CL_INVALID_PROPERTY otherwise.</p>
+</li>
+</ul>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_CONTEXT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_context</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the context specified when
+  the tensor object is created.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_REFERENCE_COUNT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the tensor reference
+count.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>To read from a tensor to host memory / buffer object or to write to a
+tensor object from host memory / buffer object call one of the functions.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clEnqueueReadTensor(
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  cl_bool blocking_command,
+  cl_mem buffer,
+  void* host_ptr,
+  cl_uint num_events_in_wait_list,
+  const cl_event* event_wait_list,
+  cl_event* event);</code></pre>
+</div>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clEnqueueWriteTensor(
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  cl_bool blocking_command,
+  cl_mem buffer,
+  void* host_ptr,
+  cl_uint num_events_in_wait_list,
+  const cl_event* event_wait_list,
+  cl_event* event);</code></pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>command_queue</em> is a valid host command-queue in which the read /
+write command will be queued. <em>command_queue</em> and <em>tensor</em> must be
+created with the same OpenCL context.</p>
+</li>
+<li>
+<p><em>tensor</em> refers to a valid tensor object which is bound to a buffer.</p>
+</li>
+<li>
+<p><em>blocking_command</em> indicate if the read and write operations are
+blocking or non-blocking (see below).</p>
+</li>
+<li>
+<p><em>buffer</em> refers to a valid buffer object where data is to be
+read into or to be written from when the value of <em>host_ptr</em> is
+NULL. If <em>host_ptr</em> is non-NULL then value of <em>buffer</em> is ignored.</p>
+</li>
+<li>
+<p><em>host_ptr</em> is the pointer to buffer in host memory where data is to
+be read into or to be written from when the value is non-NULL.</p>
+</li>
+<li>
+<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that
+need to complete before this particular command can be executed. If
+<em>event_wait_list</em> is NULL, then this particular command does not
+wait on any event to complete. If <em>event_wait_list</em> is NULL,
+<em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not
+NULL, the list of events pointed to by <em>event_wait_list</em> must be
+valid and <em>num_events_in_wait_list</em> must be greater than 0. The
+events specified in <em>event_wait_list</em> act as synchronization
+points. The context associated with events in <em>event_wait_list</em> and
+<em>command_queue</em> must be the same. The memory associated with
+<em>event_wait_list</em> can be reused or freed after the function returns.</p>
+</li>
+<li>
+<p><em>event</em> returns an event object that identifies this read / write
+command and can be used to query or queue a wait for this command to
+complete. If <em>event</em> is NULL or the enqueue is unsuccessful, no
+event will be created and therefore it will not be possible to query
+the status of this command or to wait for this command to
+complete. If <em>event_wait_list</em> and <em>event</em> are not NULL, <em>event</em>
+must not refer to an element of the <em>event_wait_list</em> array.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For a read and write operation, the elements of N-dimensional tensor are
+related to host memory / buffer object as followed:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>tensor.element(i0, i1, ..., i&lt;N-2&gt;, i&lt;N-1&gt;)) == (tensor.dtype)buffer_or_host_ptr[
+  i0 * tensor.shape[1] * tensor.shape[2] * ... * tensor.shape[N-1] +
+  i1 * tensor.shape[2] * tensor.shape[3] * ... * tensor.shape[N-1] +
+  ... +
+  i&lt;N-2&gt; * tensor.shape[i(N-1)] +
+  i&lt;N-1&gt;]</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Where <code>iX</code> is a tensor coordinate index with inclusive range of <code>0..&lt;shape[X]&gt;</code>.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_add_new_buffer_property_in_section_5_2_1">Add New Buffer Property in Section 5.2.1</h4>
+<table class="tableblock frame-all grid-all stripes-odd stretch">
+<colgroup>
+<col style="width: 40%;">
+<col style="width: 20%;">
+<col style="width: 40%;">
+</colgroup>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_MEM_BIND_TO_TENSOR</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_tensor</p></td>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
+<p>Use the created buffer as
+storage for the given valid tensor. To succeed creating the buffer,
+the target tensor may not have storage already, must not have
+CL_TENSOR_COMMAND_BUFFER_TEMPORARY property set on and <em>size</em> argument
+of the clCreateBufferWithProperties() must be zero.</p>
+</div>
+<div class="paragraph">
+<p>Size of the memory buffer is implementation-defined and it can be
+queried with clGetTensorInfo().</p>
+</div>
+<div class="paragraph">
+<p>Memory layout of the tensor in the created memory buffer is
+implementation-defined and opaque to the applications and it may
+change at unspecified points. Implementation may store auxiliary data
+in the memory buffer for the tensor. Therefore, writing data into the
+memory buffer directly using the cl_mem handle leads to undefined
+behavior.</p>
+</div>
+<div class="paragraph">
+<p>If the tensor is already bound to a buffer object,
+clCreateBufferWithProperties call returns CL_TENSOR_BOUND_TO_BUFFER
+error code.</p>
+</div></div></td>
+</tr>
+</tbody>
+</table>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_sample_codes">Sample Codes</h3>
+<div class="paragraph">
+<p>Helper functions used in the follow up tensor code samples:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_kernel create_matmul_kernel(
+  cl_context ctx, std::span&lt;cl_device_id&gt; device_span,
+  cl_tensor lhs, cl_tensor rhs, cl_tensor out) {
+  // A hypothetical matmul kernel signature in pseudo OpenCL C for
+  // illustrative purposes:
+  //
+  //   kernel void matmul(
+  //     global read_only tensor_t,
+  //     global read_only tensor_t,
+  //     global write_only tensor_t);
+
+  cl_kernel matmul_kernel = /* Omitted. */;
+  clSetKernelArg(matmul_kernel, 0, sizeof(cl_tensor), &amp;lhs);
+  clSetKernelArg(matmul_kernel, 1, sizeof(cl_tensor), &amp;rhs);
+  clSetKernelArg(matmul_kernel, 2, sizeof(cl_tensor), &amp;out);
+  return matmul_kernel;
+}
+
+cl_kernel create_matmul_kernel(
+  cl_context ctx, std::span&lt;cl_device_id&gt; device_span,
+  cl_tensor lhs, cl_tensor rhs, cl_tensor out) {
+  // A hypothetical add kernel signature in pseudo OpenCL C for illustrative
+  // purposes:
+  //
+  // kernel void add(
+  //     global read_only tensor_t,
+  //     global read_only tensor_t,
+  //     global write_only tensor_t);
+
+  cl_tensor add_kernel = /* Omitted. */;
+  clSetKernelArg(add_kernel, 0, sizeof(cl_tensor), &amp;lhs);
+  clSetKernelArg(add_kernel, 1, sizeof(cl_tensor), &amp;rhs);
+  clSetKernelArg(add_kernel, 2, sizeof(cl_tensor), &amp;out);
+  return add_kernel;
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>An example usage of tensors on a command queue:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">constexpr size_t b = 64, m = 100, n = 200, k = 50;
+
+cl_tensor in0 = clCreateTensor(ctx, nullptr, 3, {b, m, k}, CL_TENSOR_FLOAT, err);
+cl_tensor in1 = clCreateTensor(ctx, nullptr, 3, {b, k, n}, CL_TENSOR_FLOAT, err);
+cl_tensor in2 = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor t0  = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor out = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+
+cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0);
+cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out);
+
+// Allocate storage for the tensors. The buffer size must be set to zero
+// when the buffer is bound to a tensor. OpenCL implementation may
+// determine optimal data layout and the storage needed for it, based
+// on the tensor's uses (matmul kernel in this sample) so far.
+cl_int err;
+cl_mem in0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, in0, 0}, CL_MEM_READ_ONLY,
+  0 /* must be zero for CL_MEM_BIND_TO_TENSOR. */, nullptr, &amp;err);
+cl_mem in1_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, in1, 0}, CL_MEM_READ_ONLY,
+  0, nullptr, &amp;err);
+cl_mem in2_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, in2, 0}, CL_MEM_READ_ONLY,
+  0, nullptr, &amp;err);
+cl_mem t0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, t0, 0}, CL_MEM_READ_WRITE,
+  0, nullptr, &amp;err);
+cl_mem out_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, out, 0}, CL_MEM_WRITE_ONLY,
+  0, nullptr, &amp;err);
+
+std::vector&lt;float&gt; in0_data = ...;
+std::vector&lt;float&gt; in1_data = ...;
+std::vector&lt;float&gt; out_data(b * m * n);
+
+// Copies data into in0 tensor while possibly rearranging the data to the
+// optimal data layout.
+clEnqueueWriteTensor(
+  cmd_q, in0, false, nullptr, nullptr, {b, m, k}, nullptr, in0_data.data(),
+  0, nullptr, nullptr);
+
+clEnqueueWriteTensor(
+  cmd_q, in1, false, nullptr, nullptr, {b, k, n}, nullptr, in1_data.data(),
+  0, nullptr, nullptr);
+clEnqueueNDRangeKernel(
+  cmd_q, matmul_kernel, 0, nullptr, nullptr, nullptr, 0, nullptr, nullptr);
+clEnqueueNDRangeKernel(
+  cmd_q, add_kernel, 0, nullptr, nullptr, nullptr, 0, nullptr, nullptr);
+clEnqueueReadTensor(
+  cmd_q, out, false, nullptr, nullptr, {b, m, n}, nullptr, out_data.data(),
+  0, nullptr, nullptr);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>An example use of tensors in a command buffer when cl_khr_command_buffer
+extension is supported:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">constexpr size_t b = 64, m = 100, n = 200, k = 50;
+
+cl_int err;
+// Create tensors which are used as temporaries in a command buffer.
+// Command buffers allocate space for them as needed.
+//
+// NOTE: same temporary tensor handle used in multiple command buffers
+//       will have separate storage. IOW, command buffers may not exchange
+//       data via temporary buffers between them.
+cl_tensor in0 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
+  3, {b, m, k}, CL_TENSOR_FLOAT, err);
+cl_tensor in1 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
+  3, {b, k, n}, CL_TENSOR_FLOAT, err);
+cl_tensor in2 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
+  3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor t0  = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
+  3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor out = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
+  3, {b, m, n}, CL_TENSOR_FLOAT, err);
+
+cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0);
+cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out);
+
+// Binding a buffer to temporary tensor is not allowed.
+auto ignored = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, t0, 0}, CL_MEM_READ_WRITE, 0, nullptr, &amp;err);
+assert(err == CL_TENSOR_IS_TEMPORARY)
+
+std::vector&lt;float&gt; in0_data = ...;
+std::vector&lt;float&gt; in1_data = ...;
+std::vector&lt;float&gt; out_data(b * m * n);
+
+cl_command_buffer_khr cb =
+  clCreateCommandBufferKHR(num_queues, queue_list, nullptr, &amp;err);
+
+cl_sync_point_khr in0_syncp, in1_syncp, matmul_syncp, add_syncp;
+clCommandWriteTensorKHR(
+  cmd_b, cmd_q, in0, false, nullptr, nullptr, {b, m, k}, nullptr,
+  in0_data.data(), 0, nullptr, &amp;in0_syncp);
+clCommandWriteTensorKHR(
+  cmd_b, cmd_q, in1, false, nullptr, nullptr, {b, k, m}, nullptr,
+  in1_data.data(), 0, nullptr, &amp;in1_syncp);
+clCommandNDRangeKernelKHR(
+  cmd_b, cmd_q, nullptr, matmul_kernel, 0, nullptr, nullptr, nullptr,
+  2, {in0_syncp, in2_syncp}, &amp;matmul_syncp, nullptr);
+clCommandNDRangeKernelKHR(
+  cmd_b, cmd_q, nullptr, add_kernel, 0, nullptr, nullptr, nullptr,
+  1, {matmul_syncp}, &amp;add_syncp, nullptr);
+clCommandReadTensorKHR(
+  cmd_b, cmd_q, out,  false, nullptr, nullptr, {b, k, m}, nullptr,
+  out_data.data(), 1, {add_syncp}, nullptr);
+
+// Finalize the command buffer. At this point the OpenCL
+// implementation may reserve enough storage for all the tensor
+// temporaries. Temporary tensors might be eliminated - for example,
+// OpenCL implementation could use 'out' tensor to store result of
+// matmul_kernel , thus, eliminating the need of 't0' tensor.
+clFinalizeCommandBufferKHR(cmd_b);
+
+// Temporary tensors used in a command buffer can't be read or written
+// into. A hypothetical reason is that the finalized command buffer
+// might not use some of the tensor.
+assert(clEnqueueReadTensor(..., t0, ...) == CL_INVALID_OPERATION);</code></pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_open_questions">Open Questions</h3>
+
+</div>
+</div>
+</div>
+</div>
+<div id="footer">
+<div id="footer-text">
+Last updated 2023-10-30 16:51:10 +0200
+</div>
+</div>
+</body>
+</html>
\ No newline at end of file

From bf94321d718fb7da01ff79baf4c6ea81905df563 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <linehill@users.noreply.github.com>
Date: Thu, 2 Nov 2023 14:16:43 +0200
Subject: [PATCH 02/18] Apply suggestions from code review
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
Co-authored-by: Pekka Jääskeläinen <pekka.jaaskelainen@tuni.fi>
---
 ext/cl_khr_tensor.asciidoc | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc
index cd17a42bb..1df37e9e4 100644
--- a/ext/cl_khr_tensor.asciidoc
+++ b/ext/cl_khr_tensor.asciidoc
@@ -51,7 +51,7 @@ cl_tensor clCreateTensor(
     cl_context context,
     const cl_tensor_peoperties *properties,
     size_t rank,
-    size_t shape,
+    const size_t* shape,
     cl_tensor_type dtype,
     cl_int *errcode_ret);
 ----
@@ -88,7 +88,7 @@ storage. The storage is assigned to the tensor either by:
   set on.
 
 A command that refers to a tensor must be bound to a valid buffer
-object before enqueuing the command into a command queue unless the
+object before enqueuing the command that refers to the tensor into a command queue unless the
 command is recorded in a command buffer and
 CL_TENSOR_COMMAND_BUFFER_TEMPORARY is set to true.
 
@@ -124,7 +124,7 @@ following error values returned in errcode_ret:
 | CL_TENSOR_UINT16     | 16-bit signed integer.
 | CL_TENSOR_UINT32     | 32-bit signed integer.
 | CL_TENSOR_UINT64     | 64-bit signed integer.
-| CL_TENSOR_HALF       | Half precision floating-point value.
+| CL_TENSOR_HALF       | Half precision floating-point.
 | CL_TENSOR_BFLOAT16   | 16-bit brain floating-point value.
 | CL_TENSOR_FLOAT      | Single precision floating-point value.
 | CL_TENSOR_DOUBLE     | Double precision floating-point value.
@@ -144,7 +144,7 @@ following error values returned in errcode_ret:
 a| If the value is true, create a "temporary" tensor that only can be
 used on commands recorded in command buffers. The storage of the
 temporary tensors are managed by command buffers. When a temporary
-tensor is used by multiple command buffer, the tensor receive separate
+tensor is used by multiple command buffers, the tensor receives separate
 storage for each command buffer.
 
 // IOW, Data may not be exchanged between command buffers through
@@ -171,7 +171,7 @@ The _tensor_ reference count is incremented.
 *clRetainTensor* returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:
 
-* CL_INVALID_TENSOR if tensor is not a valid tensor object.
+* CL_INVALID_TENSOR if the tensor is not a valid tensor object.
 
 To release a tensor object, call the function
 
@@ -242,7 +242,7 @@ cl_int clGetTensorInfo(
 | CL_TENSOR_DTYPE | cl_tensor_type | Return the tensor data type.
 
 | CL_TENSOR_COMMAND_BUFFER_TEMPORARY | cl_bool | Return true if the
-tensor is temporary tensor for command buffers.
+tensor is a temporary tensor for command buffers.
 
 | CL_TENSOR_BOUND_TO_BUFFER | cl_bool | Return true if the tensor is
 bound to a buffer. If CL_TENSOR_COMMAND_BUFFER_TEMPORARY is true, then
@@ -263,8 +263,8 @@ clGetTensorInfo call returns:
 count.
 |===
 
-To read from a tensor to host memory / buffer object or to write to a
-tensor object from host memory / buffer object call one of the functions.
+The following functions are for reading from a tensor to host memory / buffer object or to write to a
+tensor object from host memory / buffer object.
 
 [source,c]
 ----
@@ -286,7 +286,7 @@ cl_int clEnqueueWriteTensor(
   cl_tensor tensor,
   cl_bool blocking_command,
   cl_mem buffer,
-  void* host_ptr,
+  const void* host_ptr,
   cl_uint num_events_in_wait_list,
   const cl_event* event_wait_list,
   cl_event* event);
@@ -329,10 +329,10 @@ cl_int clEnqueueWriteTensor(
   must not refer to an element of the _event_wait_list_ array.
 
 For a read and write operation, the elements of N-dimensional tensor are
-related to host memory / buffer object as followed:
+related to host memory / buffer object as follows:
 
 ----
-tensor.element(i0, i1, ..., i<N-2>, i<N-1>)) == (tensor.dtype)buffer_or_host_ptr[
+tensor.element(i0, i1, ..., i<N-2>, i<N-1>) == (tensor.dtype)buffer_or_host_ptr[
   i0 * tensor.shape[1] * tensor.shape[2] * ... * tensor.shape[N-1] +
   i1 * tensor.shape[2] * tensor.shape[3] * ... * tensor.shape[N-1] +
   ... +
@@ -505,7 +505,7 @@ cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out);
 // Binding a buffer to temporary tensor is not allowed.
 auto ignored = clCreateBufferWithProperties(
   ctx, {CL_MEM_BIND_TO_TENSOR, t0, 0}, CL_MEM_READ_WRITE, 0, nullptr, &err);
-assert(err == CL_TENSOR_IS_TEMPORARY)
+assert(err == CL_TENSOR_IS_TEMPORARY);
 
 std::vector<float> in0_data = ...;
 std::vector<float> in1_data = ...;

From 36db4b6d9d3ec7caacb6849d16a119ce005a59a7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.mikael.linjamaki@intel.com>
Date: Thu, 2 Nov 2023 09:15:20 +0200
Subject: [PATCH 03/18] * Add brief introduction.

* cl_khr_tensor -> cl_exp_tensor.

* Remove cl_khr_command_buffer requirement.
---
 ext/cl_khr_tensor.asciidoc | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc
index 1df37e9e4..05c7ad521 100644
--- a/ext/cl_khr_tensor.asciidoc
+++ b/ext/cl_khr_tensor.asciidoc
@@ -1,20 +1,25 @@
 // Copyright 2023 The Khronos Group. This work is licensed under a
 // Creative Commons Attribution 4.0 International License; see
 // http://creativecommons.org/licenses/by/4.0/
-= cl_khr_tensor
+= cl_exp_tensor
 
 :source-highlighter: coreray
 
-[[cl_khr_tensor]]
+[[cl_exp_tensor]]
 == Tensor Data Type
 
-Purpose of this extension is to provide ...
+This extension provides a new opaque OpenCL datatype called
+`cl_tensor`. It is used for storing N-dimensional tensor data in
+implementation-defined memory layout which may be optimized based on
+tensor's use cases. The datatype is designed to be efficiently used
+within the `cl_khr_command_buffers` extension to capture task graphs
+which can utilize tensors as input, output and temporary storage.
 
 === General information
 
 ==== Name Strings
 
-`cl_khr_tensor`
+`cl_exp_tensor`
 
 ==== Version history
 
@@ -30,8 +35,6 @@ This extension is written against the OpenCL Specification version 3.0.14.
 
 This extension requires OpenCL 1.2 or later.
 
-This extension requires cl_khr_command_buffer.
-
 ==== Contributors
 
 Henry Linjamäki, Intel. +

From baa768882dbad2d71d667e2282859098e69e4c23 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.mikael.linjamaki@intel.com>
Date: Thu, 2 Nov 2023 09:18:38 +0200
Subject: [PATCH 04/18] Add contributors

---
 ext/cl_khr_tensor.asciidoc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc
index 05c7ad521..5cba054ca 100644
--- a/ext/cl_khr_tensor.asciidoc
+++ b/ext/cl_khr_tensor.asciidoc
@@ -38,6 +38,8 @@ This extension requires OpenCL 1.2 or later.
 ==== Contributors
 
 Henry Linjamäki, Intel. +
+Pekka Jääslkeläinen, Intel and Tampere University. +
+Ben Ashbaugh, Intel. +
 
 === Overview
 

From 9db1e6543d68b5a986aa760f228d098ebc4ff0c4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.mikael.linjamaki@intel.com>
Date: Thu, 2 Nov 2023 09:19:16 +0200
Subject: [PATCH 05/18] * Fix name for add kernel creator

---
 ext/cl_khr_tensor.asciidoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc
index 5cba054ca..0115b054e 100644
--- a/ext/cl_khr_tensor.asciidoc
+++ b/ext/cl_khr_tensor.asciidoc
@@ -403,7 +403,7 @@ cl_kernel create_matmul_kernel(
   return matmul_kernel;
 }
 
-cl_kernel create_matmul_kernel(
+cl_kernel create_add_kernel(
   cl_context ctx, std::span<cl_device_id> device_span,
   cl_tensor lhs, cl_tensor rhs, cl_tensor out) {
   // A hypothetical add kernel signature in pseudo OpenCL C for illustrative

From 141643dc4eacc15c99d6d527889cf55b239c60ac Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.mikael.linjamaki@intel.com>
Date: Thu, 2 Nov 2023 09:21:51 +0200
Subject: [PATCH 06/18] * cl_tensor_type -> cl_tensor _datatype.

* Fix signed -> unsigned.

* Single line cl{Retain,Release}TensorObject declaration.
---
 ext/cl_khr_tensor.asciidoc | 28 +++++++++++++---------------
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc
index 0115b054e..bed45d976 100644
--- a/ext/cl_khr_tensor.asciidoc
+++ b/ext/cl_khr_tensor.asciidoc
@@ -56,8 +56,8 @@ cl_tensor clCreateTensor(
     cl_context context,
     const cl_tensor_peoperties *properties,
     size_t rank,
-    const size_t* shape,
-    cl_tensor_type dtype,
+    const size_t shape,
+    cl_tensor_datatype dtype,
     cl_int *errcode_ret);
 ----
 
@@ -125,17 +125,17 @@ following error values returned in errcode_ret:
 | CL_TENSOR_INT16      | 16-bit signed integer.
 | CL_TENSOR_INT32      | 32-bit signed integer.
 | CL_TENSOR_INT64      | 64-bit signed integer.
-| CL_TENSOR_UINT8      | 8-bit signed integer.
-| CL_TENSOR_UINT16     | 16-bit signed integer.
-| CL_TENSOR_UINT32     | 32-bit signed integer.
-| CL_TENSOR_UINT64     | 64-bit signed integer.
+| CL_TENSOR_UINT8      | 8-bit unsigned integer.
+| CL_TENSOR_UINT16     | 16-bit unsigned integer.
+| CL_TENSOR_UINT32     | 32-bit unsigned integer.
+| CL_TENSOR_UINT64     | 64-bit unsigned integer.
 | CL_TENSOR_HALF       | Half precision floating-point.
-| CL_TENSOR_BFLOAT16   | 16-bit brain floating-point value.
-| CL_TENSOR_FLOAT      | Single precision floating-point value.
-| CL_TENSOR_DOUBLE     | Double precision floating-point value.
-| CL_TENSOR_COMPLEX64  | 64-bit complex floating point value with
+| CL_TENSOR_BFLOAT16   | 16-bit brain floating-point.
+| CL_TENSOR_FLOAT      | Single precision floating-point.
+| CL_TENSOR_DOUBLE     | Double precision floating-point.
+| CL_TENSOR_COMPLEX64  | 64-bit complex floating point with
   32-bit real and imaginary part.
-| CL_TENSOR_COMPLEX128 | 128-bit complex floating point value with
+| CL_TENSOR_COMPLEX128 | 128-bit complex floating point with
   64-bit real and imaginary part.
 |===
 
@@ -165,8 +165,7 @@ To retain a tensor object, call the function
 
 [source,c]
 ----
-cl_int clRetainTensorObject(
-  cl_tensor tensor);
+cl_int clRetainTensorObject(cl_tensor tensor);
 ----
 
 * _tensor_ is the tensor object to be retained.
@@ -182,8 +181,7 @@ To release a tensor object, call the function
 
 [source,c]
 ----
-cl_int clReleaseTensorObject(
-  cl_tensor tensor);
+cl_int clReleaseTensorObject(cl_tensor tensor);
 ----
 
 * _tensor_ is the tensor object to be released.

From db91aee8a971fc9bf3d2d4daacfa197e3ff46929 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.mikael.linjamaki@intel.com>
Date: Thu, 2 Nov 2023 09:59:18 +0200
Subject: [PATCH 07/18] * clEnqueue(Read,Write)Tensor ->
 clEnqueue(TranslateFrom,TranslateTo)Tensor.

* Clarify in clEnqueue{TranslateFrom,TranslateTo}Tensor that data read
  from / written to the tensor in opaque manner.
---
 ext/cl_khr_tensor.asciidoc | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc
index bed45d976..99e653706 100644
--- a/ext/cl_khr_tensor.asciidoc
+++ b/ext/cl_khr_tensor.asciidoc
@@ -271,7 +271,7 @@ tensor object from host memory / buffer object.
 
 [source,c]
 ----
-cl_int clEnqueueReadTensor(
+cl_int clEnqueueTranslateFromTensor(
   cl_command_queue command_queue,
   cl_tensor tensor,
   cl_bool blocking_command,
@@ -284,7 +284,7 @@ cl_int clEnqueueReadTensor(
 
 [source,c]
 ----
-cl_int clEnqueueWriteTensor(
+cl_int clEnqueueTranslateToTensor(
   cl_command_queue command_queue,
   cl_tensor tensor,
   cl_bool blocking_command,
@@ -331,8 +331,14 @@ cl_int clEnqueueWriteTensor(
   complete. If _event_wait_list_ and _event_ are not NULL, _event_
   must not refer to an element of the _event_wait_list_ array.
 
-For a read and write operation, the elements of N-dimensional tensor are
-related to host memory / buffer object as follows:
+The *clEnqueueTranslateToTensor* function copies contents of the buffer
+object / host allocation to tensor's storage in
+implementation-defined, opaque memory layout. The
+*clEnqueueTranslateFromTensor* function copies data from tensor's
+storage to buffer object / host allocation.
+
+The elements of buffer object / host allocation are mapped to tensor
+coordinates as follows:
 
 ----
 tensor.element(i0, i1, ..., i<N-2>, i<N-1>) == (tensor.dtype)buffer_or_host_ptr[
@@ -343,7 +349,11 @@ tensor.element(i0, i1, ..., i<N-2>, i<N-1>) == (tensor.dtype)buffer_or_host_ptr[
   i<N-1>]
 ----
 
-Where `iX` is a tensor coordinate index with inclusive range of `0..<shape[X]>`.
+Where `iX` is a tensor coordinate index with inclusive range of
+`0..<shape[X]-1>`. The `tensor.element()` represents an abstract
+function that accesses a tensor element in its storage at given
+coordinate. The method how the coordinates translate to tensor storage
+addresses is unspecified.
 
 // TODO: add clEnqueueCopyTensor
 

From 6fecc4e7a50b1cd1f1146ad43d58a73d7aaf1479 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.mikael.linjamaki@intel.com>
Date: Thu, 2 Nov 2023 13:38:29 +0200
Subject: [PATCH 08/18] Refactor command buffer temporary property out of
 tensor

---
 ext/cl_khr_tensor.asciidoc | 139 +++++++++++++++++++++----------------
 1 file changed, 78 insertions(+), 61 deletions(-)

diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc
index 99e653706..0de088c70 100644
--- a/ext/cl_khr_tensor.asciidoc
+++ b/ext/cl_khr_tensor.asciidoc
@@ -66,7 +66,8 @@ cl_tensor clCreateTensor(
 * _properties_ is an optional list of properties for the tensor object
   and their corresponding values. The list is terminated with the
   special property 0. If no properties are required, properties may be
-  NULL.
+  NULL. This extension does not define any optional properties for
+  tensors.
 
 * _rank_ is the number of dimensions. Zero value creates a "scalar"
   tensor which has no dimensions but has storage for one element.
@@ -84,18 +85,11 @@ cl_tensor clCreateTensor(
 clCreateTensor function creates a `rank`-dimensional tensor with
 `shape[0] * shape[1] * ... * shape[rank-1]` elements of _dtype_
 type. At the creation time of the tensor, it does not have
-storage. The storage is assigned to the tensor either by:
-
-* calling clCreateBufferWithProperties() with CL_MEM_BIND_TO_TENSOR or
-
-* automatically by command buffers - possibly on-demand basis - if the
-  tensor is created with CL_TENSOR_COMMAND_BUFFER_TEMPORARY property
-  set on.
+storage. The storage is assigned to the tensor by calling
+clCreateBufferWithProperties() with CL_MEM_BIND_TO_TENSOR.
 
 A command that refers to a tensor must be bound to a valid buffer
-object before enqueuing the command that refers to the tensor into a command queue unless the
-command is recorded in a command buffer and
-CL_TENSOR_COMMAND_BUFFER_TEMPORARY is set to true.
+object before enqueuing or recording the command.
 
 *clCreateTensor* returns a valid non-zero tensor object and errcode_ret
 is set to CL_SUCCESS if the tensor object is created
@@ -139,28 +133,6 @@ following error values returned in errcode_ret:
   64-bit real and imaginary part.
 |===
 
-.Tensor properties
-[cols="2,1,2",stripes=odd]
-|===
-| *Tensor Property* | *Property Value* | *Description*
-
-| CL_TENSOR_COMMAND_BUFFER_TEMPORARY | cl_bool
-
-a| If the value is true, create a "temporary" tensor that only can be
-used on commands recorded in command buffers. The storage of the
-temporary tensors are managed by command buffers. When a temporary
-tensor is used by multiple command buffers, the tensor receives separate
-storage for each command buffer.
-
-// IOW, Data may not be exchanged between command buffers through
-// temporary tensors.
-
-Temporary tensors may not be bound to buffer objects.
-
-Data stored in temporary tensors are not preserved across command
-buffer executions.
-|===
-
 To retain a tensor object, call the function
 
 [source,c]
@@ -244,12 +216,8 @@ cl_int clGetTensorInfo(
 | CL_TENSOR_SHAPE | size_t[]       | Return the tensor shape.
 | CL_TENSOR_DTYPE | cl_tensor_type | Return the tensor data type.
 
-| CL_TENSOR_COMMAND_BUFFER_TEMPORARY | cl_bool | Return true if the
-tensor is a temporary tensor for command buffers.
-
 | CL_TENSOR_BOUND_TO_BUFFER | cl_bool | Return true if the tensor is
-bound to a buffer. If CL_TENSOR_COMMAND_BUFFER_TEMPORARY is true, then
-CL_TENSOR_BOUND_TO_BUFFER must return false.
+bound to a buffer.
 
 | CL_TENSOR_BUFFER | cl_mem a| If CL_TENSOR_BOUND_TO_BUFFER is true,
 return the buffer object the tensor is bound to. Otherwise,
@@ -366,11 +334,34 @@ addresses is unspecified.
 
 [cols="2,1,2",stripes=odd]
 |===
+| CL_MEM_COMMAND_BUFFER_TEMPORARY | cl_bool
+
+a| This property can be set if *cl_khr_command_buffer* extension is
+supported.
+
+If the value is true, create a "temporary" buffer object that only can
+be used on commands recorded in command buffers. Non-recording
+command enqueue functions must return CL_INVALID_OPERATION if the
+command refers to a temporary buffer object.
+
+The temporary buffer objects are managed by command buffers. When a
+temporary buffer object is used by multiple command buffer, the object
+receives disjoint storage for each command buffer.
+
+// Consequently, Data may not be exchanged between command buffers through
+// temporary buffers.
+
+Storage of the temporary buffer objects may be allocated on-demand
+basis. At the times the buffer is not needed, OpenCL implementations
+may reuse storage for other tasks within the command buffer.
+
+Contents of the temporary buffers are not guaranteed to be preserved
+across command buffer executions.
+
 | CL_MEM_BIND_TO_TENSOR | cl_tensor a| Use the created buffer as
 storage for the given valid tensor. To succeed creating the buffer,
-the target tensor may not have storage already, must not have
-CL_TENSOR_COMMAND_BUFFER_TEMPORARY property set on and _size_ argument
-of the clCreateBufferWithProperties() must be zero.
+the target tensor may not have storage already and _size_
+argument of the clCreateBufferWithProperties() must be zero.
 
 Size of the memory buffer is implementation-defined and it can be
 queried with clGetTensorInfo().
@@ -387,6 +378,26 @@ clCreateBufferWithProperties call returns CL_TENSOR_BOUND_TO_BUFFER
 error code.
 |===
 
+==== Add New Memory Object Query in Section 5.5.5
+
+[cols="2,1,2",stripes=odd]
+|===
+| CL_MEM_COMMAND_BUFFER_TEMPORARY | cl_bool | This property can be
+queried if *cl_khr_command_buffer* extension is supported.
+
+Return true if the _memobj_ is temporary buffer object for command
+buffers.
+|===
+
+==== Add New Error Codes in Appendix F
+
+[cols="2,3", stripes=odd]
+|===
+| CL_TENSOR_BOUND_TO_BUFFER | Returned when attempting to bind a
+  buffer object to a tensor which already has been bound to the same
+  or another.
+|===
+
 === Sample Codes
 
 Helper functions used in the follow up tensor code samples:
@@ -495,30 +506,36 @@ extension is supported:
 constexpr size_t b = 64, m = 100, n = 200, k = 50;
 
 cl_int err;
-// Create tensors which are used as temporaries in a command buffer.
-// Command buffers allocate space for them as needed.
-//
-// NOTE: same temporary tensor handle used in multiple command buffers
-//       will have separate storage. IOW, command buffers may not exchange
-//       data via temporary buffers between them.
-cl_tensor in0 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
-  3, {b, m, k}, CL_TENSOR_FLOAT, err);
-cl_tensor in1 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
-  3, {b, k, n}, CL_TENSOR_FLOAT, err);
-cl_tensor in2 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
-  3, {b, m, n}, CL_TENSOR_FLOAT, err);
-cl_tensor t0  = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
-  3, {b, m, n}, CL_TENSOR_FLOAT, err);
-cl_tensor out = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
-  3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor in0 = clCreateTensor(ctx, nullptr, 3, {b, m, k}, CL_TENSOR_FLOAT, err);
+cl_tensor in1 = clCreateTensor(ctx, nullptr, 3, {b, k, n}, CL_TENSOR_FLOAT, err);
+cl_tensor in2 = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor t0  = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor out = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
 
 cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0);
 cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out);
 
-// Binding a buffer to temporary tensor is not allowed.
-auto ignored = clCreateBufferWithProperties(
-  ctx, {CL_MEM_BIND_TO_TENSOR, t0, 0}, CL_MEM_READ_WRITE, 0, nullptr, &err);
-assert(err == CL_TENSOR_IS_TEMPORARY);
+// Bind command buffer managed storage to tensors.
+//
+// NOTE: same temporary tensor handle used in multiple command buffers
+//       will have separate storage. IOW, command buffers may not exchange
+//       data via temporary buffers between them.
+cl_mem in0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, in0, 0},
+  CL_MEM_READ_ONLY, 0 /* must be zero for CL_MEM_BIND_TO_TENSOR. */,
+  nullptr, &err);
+cl_mem in1_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, in1, 0},
+  CL_MEM_READ_ONLY, 0, nullptr, &err);
+cl_mem in2_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, in2, 0},
+  CL_MEM_READ_ONLY, 0, nullptr, &err);
+cl_mem t0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, t0, 0},
+  CL_MEM_READ_WRITE, 0, nullptr, &err);
+cl_mem out_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, out, 0},
+  CL_MEM_WRITE_ONLY, 0, nullptr, &err);
 
 std::vector<float> in0_data = ...;
 std::vector<float> in1_data = ...;

From f55a9045552f1a4b2aeff613256ec0a02764e4d0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.mikael.linjamaki@intel.com>
Date: Thu, 2 Nov 2023 13:41:39 +0200
Subject: [PATCH 09/18] Fix cl_tensor_type -> cl_tensor_datatype

---
 ext/cl_khr_tensor.asciidoc | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc
index 0de088c70..22a6cd007 100644
--- a/ext/cl_khr_tensor.asciidoc
+++ b/ext/cl_khr_tensor.asciidoc
@@ -56,7 +56,7 @@ cl_tensor clCreateTensor(
     cl_context context,
     const cl_tensor_peoperties *properties,
     size_t rank,
-    const size_t shape,
+    const size_t* shape,
     cl_tensor_datatype dtype,
     cl_int *errcode_ret);
 ----
@@ -212,9 +212,9 @@ cl_int clGetTensorInfo(
 .List of supported param_names by clGetTensorInfo
 [cols="2,1,2",stripes=odd]
 |===
-| CL_TENSOR_RANK  | size_t         | Return the tensor rank.
-| CL_TENSOR_SHAPE | size_t[]       | Return the tensor shape.
-| CL_TENSOR_DTYPE | cl_tensor_type | Return the tensor data type.
+| CL_TENSOR_RANK  | size_t             | Return the tensor rank.
+| CL_TENSOR_SHAPE | size_t[]           | Return the tensor shape.
+| CL_TENSOR_DTYPE | cl_tensor_datatype | Return the tensor data type.
 
 | CL_TENSOR_BOUND_TO_BUFFER | cl_bool | Return true if the tensor is
 bound to a buffer.

From 6d1c26ff5c86591b9a08bc287d921eaa49968b0b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.mikael.linjamaki@intel.com>
Date: Thu, 2 Nov 2023 13:42:16 +0200
Subject: [PATCH 10/18] Add an open question

---
 ext/cl_khr_tensor.asciidoc | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc
index 22a6cd007..e91f81dff 100644
--- a/ext/cl_khr_tensor.asciidoc
+++ b/ext/cl_khr_tensor.asciidoc
@@ -575,3 +575,10 @@ assert(clEnqueueReadTensor(..., t0, ...) == CL_INVALID_OPERATION);
 ----
 
 === Open Questions ===
+
+. Should we have support for tensors with undefined shape and tensors
+  with unknown / symbolic dimension sizes like in ONNX?
+
+// https://onnx.ai/onnx/repo-docs/ShapeInference.html
+
+*UNRESOLVED*

From 52d8bb3514900c4ec271ebfa87870419427d3f63 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.mikael.linjamaki@intel.com>
Date: Thu, 2 Nov 2023 13:46:58 +0200
Subject: [PATCH 11/18] Add CL_INVALID_TENSOR error code

---
 ext/cl_khr_tensor.asciidoc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc
index e91f81dff..1b2a9686e 100644
--- a/ext/cl_khr_tensor.asciidoc
+++ b/ext/cl_khr_tensor.asciidoc
@@ -396,6 +396,8 @@ buffers.
 | CL_TENSOR_BOUND_TO_BUFFER | Returned when attempting to bind a
   buffer object to a tensor which already has been bound to the same
   or another.
+| CL_INVALID_TENSOR | Returned then the specified tensor is not a
+  valid tensor object.
 |===
 
 === Sample Codes

From 534bcef9c29a6250437927ed2facab163811ff27 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.mikael.linjamaki@intel.com>
Date: Thu, 2 Nov 2023 13:59:25 +0200
Subject: [PATCH 12/18] Require either buffer or host_ptr to be non-NULL

---
 ext/cl_khr_tensor.asciidoc | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_khr_tensor.asciidoc
index 1b2a9686e..f1437dd31 100644
--- a/ext/cl_khr_tensor.asciidoc
+++ b/ext/cl_khr_tensor.asciidoc
@@ -272,12 +272,10 @@ cl_int clEnqueueTranslateToTensor(
 * _blocking_command_ indicate if the read and write operations are
   blocking or non-blocking (see below).
 
-* _buffer_ refers to a valid buffer object where data is to be
-  read into or to be written from when the value of _host_ptr_ is
-  NULL. If _host_ptr_ is non-NULL then value of _buffer_ is ignored.
-
-* _host_ptr_ is the pointer to buffer in host memory where data is to
-  be read into or to be written from when the value is non-NULL.
+* _buffer_ and _host_ptr_ refer to a valid buffer object / host
+  allocation where data is to be read into or to be written from.
+  Either the _buffer_ or _host_ptr_ can be non-NULL in which case the
+  non-NULL argument is used as the operand for the operation.
 
 * _event_wait_list_ and _num_events_in_wait_list_ specify events that
   need to complete before this particular command can be executed. If

From 7447be25b605168073e5d73c639fd99d9d8767fa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.mikael.linjamaki@intel.com>
Date: Thu, 2 Nov 2023 14:27:14 +0200
Subject: [PATCH 13/18] Regenerate html for cl_exp_tensor

---
 ext/cl_khr_tensor.html | 298 +++++++++++++++++++++++------------------
 1 file changed, 168 insertions(+), 130 deletions(-)

diff --git a/ext/cl_khr_tensor.html b/ext/cl_khr_tensor.html
index 878925489..c232ddea7 100644
--- a/ext/cl_khr_tensor.html
+++ b/ext/cl_khr_tensor.html
@@ -5,7 +5,7 @@
 <meta http-equiv="X-UA-Compatible" content="IE=edge">
 <meta name="viewport" content="width=device-width, initial-scale=1.0">
 <meta name="generator" content="Asciidoctor 2.0.16">
-<title>cl_khr_tensor</title>
+<title>cl_exp_tensor</title>
 <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Open+Sans:300,300italic,400,400italic,600,600italic%7CNoto+Serif:400,400italic,700,700italic%7CDroid+Sans+Mono:400,700">
 <style>
 /*! Asciidoctor default stylesheet | MIT License | https://asciidoctor.org */
@@ -439,21 +439,26 @@
 </head>
 <body class="article">
 <div id="header">
-<h1>cl_khr_tensor</h1>
+<h1>cl_exp_tensor</h1>
 </div>
 <div id="content">
 <div class="sect1">
-<h2 id="cl_khr_tensor">Tensor Data Type</h2>
+<h2 id="cl_exp_tensor">Tensor Data Type</h2>
 <div class="sectionbody">
 <div class="paragraph">
-<p>Purpose of this extension is to provide &#8230;&#8203;</p>
+<p>This extension provides a new opaque OpenCL datatype called
+<code>cl_tensor</code>. It is used for storing N-dimensional tensor data in
+implementation-defined memory layout which may be optimized based on
+tensor&#8217;s use cases. The datatype is designed to be efficiently used
+within the <code>cl_khr_command_buffers</code> extension to capture task graphs
+which can utilize tensors as input, output and temporary storage.</p>
 </div>
 <div class="sect2">
 <h3 id="_general_information">General information</h3>
 <div class="sect3">
 <h4 id="_name_strings">Name Strings</h4>
 <div class="paragraph">
-<p><code>cl_khr_tensor</code></p>
+<p><code>cl_exp_tensor</code></p>
 </div>
 </div>
 <div class="sect3">
@@ -488,14 +493,13 @@ <h4 id="_dependencies">Dependencies</h4>
 <div class="paragraph">
 <p>This extension requires OpenCL 1.2 or later.</p>
 </div>
-<div class="paragraph">
-<p>This extension requires cl_khr_command_buffer.</p>
-</div>
 </div>
 <div class="sect3">
 <h4 id="_contributors">Contributors</h4>
 <div class="paragraph">
-<p>Henry Linjamäki, Intel.<br></p>
+<p>Henry Linjamäki, Intel.<br>
+Pekka Jääslkeläinen, Intel and Tampere University.<br>
+Ben Ashbaugh, Intel.<br></p>
 </div>
 </div>
 </div>
@@ -516,8 +520,8 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
     cl_context context,
     const cl_tensor_peoperties *properties,
     size_t rank,
-    size_t shape,
-    cl_tensor_type dtype,
+    const size_t* shape,
+    cl_tensor_datatype dtype,
     cl_int *errcode_ret);</code></pre>
 </div>
 </div>
@@ -530,7 +534,8 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 <p><em>properties</em> is an optional list of properties for the tensor object
 and their corresponding values. The list is terminated with the
 special property 0. If no properties are required, properties may be
-NULL.</p>
+NULL. This extension does not define any optional properties for
+tensors.</p>
 </li>
 <li>
 <p><em>rank</em> is the number of dimensions. Zero value creates a "scalar"
@@ -555,25 +560,12 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 <p>clCreateTensor function creates a <code>rank</code>-dimensional tensor with
 <code>shape[0] * shape[1] * &#8230;&#8203; * shape[rank-1]</code> elements of <em>dtype</em>
 type. At the creation time of the tensor, it does not have
-storage. The storage is assigned to the tensor either by:</p>
-</div>
-<div class="ulist">
-<ul>
-<li>
-<p>calling clCreateBufferWithProperties() with CL_MEM_BIND_TO_TENSOR or</p>
-</li>
-<li>
-<p>automatically by command buffers - possibly on-demand basis - if the
-tensor is created with CL_TENSOR_COMMAND_BUFFER_TEMPORARY property
-set on.</p>
-</li>
-</ul>
+storage. The storage is assigned to the tensor by calling
+clCreateBufferWithProperties() with CL_MEM_BIND_TO_TENSOR.</p>
 </div>
 <div class="paragraph">
 <p>A command that refers to a tensor must be bound to a valid buffer
-object before enqueuing the command into a command queue unless the
-command is recorded in a command buffer and
-CL_TENSOR_COMMAND_BUFFER_TEMPORARY is set to true.</p>
+object before enqueuing or recording the command.</p>
 </div>
 <div class="paragraph">
 <p><strong>clCreateTensor</strong> returns a valid non-zero tensor object and errcode_ret
@@ -636,90 +628,54 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_UINT8</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">8-bit signed integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">8-bit unsigned integer.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_UINT16</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">16-bit signed integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">16-bit unsigned integer.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_UINT32</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">32-bit signed integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">32-bit unsigned integer.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_UINT64</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">64-bit signed integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">64-bit unsigned integer.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_HALF</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Half precision floating-point value.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Half precision floating-point.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_BFLOAT16</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">16-bit brain floating-point value.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">16-bit brain floating-point.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_FLOAT</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Single precision floating-point value.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Single precision floating-point.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_DOUBLE</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Double precision floating-point value.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Double precision floating-point.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_COMPLEX64</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">64-bit complex floating point value with
+<td class="tableblock halign-left valign-top"><p class="tableblock">64-bit complex floating point with
   32-bit real and imaginary part.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_COMPLEX128</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">128-bit complex floating point value with
+<td class="tableblock halign-left valign-top"><p class="tableblock">128-bit complex floating point with
   64-bit real and imaginary part.</p></td>
 </tr>
 </tbody>
 </table>
-<table class="tableblock frame-all grid-all stripes-odd stretch">
-<caption class="title">Table 2. Tensor properties</caption>
-<colgroup>
-<col style="width: 40%;">
-<col style="width: 20%;">
-<col style="width: 40%;">
-</colgroup>
-<thead>
-<tr>
-<th class="tableblock halign-left valign-top"><strong>Tensor Property</strong></th>
-<th class="tableblock halign-left valign-top"><strong>Property Value</strong></th>
-<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_COMMAND_BUFFER_TEMPORARY</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
-<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
-<p>If the value is true, create a "temporary" tensor that only can be
-used on commands recorded in command buffers. The storage of the
-temporary tensors are managed by command buffers. When a temporary
-tensor is used by multiple command buffer, the tensor receive separate
-storage for each command buffer.</p>
-</div>
-<div class="paragraph">
-<p>Temporary tensors may not be bound to buffer objects.</p>
-</div>
-<div class="paragraph">
-<p>Data stored in temporary tensors are not preserved across command
-buffer executions.</p>
-</div></div></td>
-</tr>
-</tbody>
-</table>
 <div class="paragraph">
 <p>To retain a tensor object, call the function</p>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="highlight"><code class="language-c" data-lang="c">cl_int clRetainTensorObject(
-  cl_tensor tensor);</code></pre>
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clRetainTensorObject(cl_tensor tensor);</code></pre>
 </div>
 </div>
 <div class="ulist">
@@ -739,7 +695,7 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 <div class="ulist">
 <ul>
 <li>
-<p>CL_INVALID_TENSOR if tensor is not a valid tensor object.</p>
+<p>CL_INVALID_TENSOR if the tensor is not a valid tensor object.</p>
 </li>
 </ul>
 </div>
@@ -748,8 +704,7 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="highlight"><code class="language-c" data-lang="c">cl_int clReleaseTensorObject(
-  cl_tensor tensor);</code></pre>
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clReleaseTensorObject(cl_tensor tensor);</code></pre>
 </div>
 </div>
 <div class="ulist">
@@ -833,7 +788,7 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 </ul>
 </div>
 <table class="tableblock frame-all grid-all stripes-odd stretch">
-<caption class="title">Table 3. List of supported param_names by clGetTensorInfo</caption>
+<caption class="title">Table 2. List of supported param_names by clGetTensorInfo</caption>
 <colgroup>
 <col style="width: 40%;">
 <col style="width: 20%;">
@@ -852,21 +807,14 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_DTYPE</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">cl_tensor_type</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_tensor_datatype</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Return the tensor data type.</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_COMMAND_BUFFER_TEMPORARY</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Return true if the
-tensor is temporary tensor for command buffers.</p></td>
-</tr>
-<tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_BOUND_TO_BUFFER</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Return true if the tensor is
-bound to a buffer. If CL_TENSOR_COMMAND_BUFFER_TEMPORARY is true, then
-CL_TENSOR_BOUND_TO_BUFFER must return false.</p></td>
+bound to a buffer.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_BUFFER</p></td>
@@ -902,12 +850,12 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 </tbody>
 </table>
 <div class="paragraph">
-<p>To read from a tensor to host memory / buffer object or to write to a
-tensor object from host memory / buffer object call one of the functions.</p>
+<p>The following functions are for reading from a tensor to host memory / buffer object or to write to a
+tensor object from host memory / buffer object.</p>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="highlight"><code class="language-c" data-lang="c">cl_int clEnqueueReadTensor(
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clEnqueueTranslateFromTensor(
   cl_command_queue command_queue,
   cl_tensor tensor,
   cl_bool blocking_command,
@@ -920,12 +868,12 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="highlight"><code class="language-c" data-lang="c">cl_int clEnqueueWriteTensor(
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clEnqueueTranslateToTensor(
   cl_command_queue command_queue,
   cl_tensor tensor,
   cl_bool blocking_command,
   cl_mem buffer,
-  void* host_ptr,
+  const void* host_ptr,
   cl_uint num_events_in_wait_list,
   const cl_event* event_wait_list,
   cl_event* event);</code></pre>
@@ -946,13 +894,10 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 blocking or non-blocking (see below).</p>
 </li>
 <li>
-<p><em>buffer</em> refers to a valid buffer object where data is to be
-read into or to be written from when the value of <em>host_ptr</em> is
-NULL. If <em>host_ptr</em> is non-NULL then value of <em>buffer</em> is ignored.</p>
-</li>
-<li>
-<p><em>host_ptr</em> is the pointer to buffer in host memory where data is to
-be read into or to be written from when the value is non-NULL.</p>
+<p><em>buffer</em> and <em>host_ptr</em> refer to a valid buffer object / host
+allocation where data is to be read into or to be written from.
+Either the <em>buffer</em> or <em>host_ptr</em> can be non-NULL in which case the
+non-NULL argument is used as the operand for the operation.</p>
 </li>
 <li>
 <p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that
@@ -979,12 +924,19 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 </ul>
 </div>
 <div class="paragraph">
-<p>For a read and write operation, the elements of N-dimensional tensor are
-related to host memory / buffer object as followed:</p>
+<p>The <strong>clEnqueueTranslateToTensor</strong> function copies contents of the buffer
+object / host allocation to tensor&#8217;s storage in
+implementation-defined, opaque memory layout. The
+<strong>clEnqueueTranslateFromTensor</strong> function copies data from tensor&#8217;s
+storage to buffer object / host allocation.</p>
+</div>
+<div class="paragraph">
+<p>The elements of buffer object / host allocation are mapped to tensor
+coordinates as follows:</p>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre>tensor.element(i0, i1, ..., i&lt;N-2&gt;, i&lt;N-1&gt;)) == (tensor.dtype)buffer_or_host_ptr[
+<pre>tensor.element(i0, i1, ..., i&lt;N-2&gt;, i&lt;N-1&gt;) == (tensor.dtype)buffer_or_host_ptr[
   i0 * tensor.shape[1] * tensor.shape[2] * ... * tensor.shape[N-1] +
   i1 * tensor.shape[2] * tensor.shape[3] * ... * tensor.shape[N-1] +
   ... +
@@ -993,7 +945,11 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 </div>
 </div>
 <div class="paragraph">
-<p>Where <code>iX</code> is a tensor coordinate index with inclusive range of <code>0..&lt;shape[X]&gt;</code>.</p>
+<p>Where <code>iX</code> is a tensor coordinate index with inclusive range of
+<code>0..&lt;shape[X]-1&gt;</code>. The <code>tensor.element()</code> represents an abstract
+function that accesses a tensor element in its storage at given
+coordinate. The method how the coordinates translate to tensor storage
+addresses is unspecified.</p>
 </div>
 </div>
 <div class="sect3">
@@ -1004,6 +960,31 @@ <h4 id="_add_new_buffer_property_in_section_5_2_1">Add New Buffer Property in Se
 <col style="width: 20%;">
 <col style="width: 40%;">
 </colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top">CL_MEM_COMMAND_BUFFER_TEMPORARY</th>
+<th class="tableblock halign-left valign-top">cl_bool</th>
+<th class="tableblock halign-left valign-top">This property can be set if <strong>cl_khr_command_buffer</strong> extension is
+supported.
+
+If the value is true, create a "temporary" buffer object that only can
+be used on commands recorded in command buffers. Non-recording
+command enqueue functions must return CL_INVALID_OPERATION if the
+command refers to a temporary buffer object.
+
+The temporary buffer objects are managed by command buffers. When a
+temporary buffer object is used by multiple command buffer, the object
+receives disjoint storage for each command buffer.
+
+
+Storage of the temporary buffer objects may be allocated on-demand
+basis. At the times the buffer is not needed, OpenCL implementations
+may reuse storage for other tasks within the command buffer.
+
+Contents of the temporary buffers are not guaranteed to be preserved
+across command buffer executions.</th>
+</tr>
+</thead>
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_MEM_BIND_TO_TENSOR</p></td>
@@ -1011,9 +992,8 @@ <h4 id="_add_new_buffer_property_in_section_5_2_1">Add New Buffer Property in Se
 <td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
 <p>Use the created buffer as
 storage for the given valid tensor. To succeed creating the buffer,
-the target tensor may not have storage already, must not have
-CL_TENSOR_COMMAND_BUFFER_TEMPORARY property set on and <em>size</em> argument
-of the clCreateBufferWithProperties() must be zero.</p>
+the target tensor may not have storage already and <em>size</em>
+argument of the clCreateBufferWithProperties() must be zero.</p>
 </div>
 <div class="paragraph">
 <p>Size of the memory buffer is implementation-defined and it can be
@@ -1036,6 +1016,48 @@ <h4 id="_add_new_buffer_property_in_section_5_2_1">Add New Buffer Property in Se
 </tbody>
 </table>
 </div>
+<div class="sect3">
+<h4 id="_add_new_memory_object_query_in_section_5_5_5">Add New Memory Object Query in Section 5.5.5</h4>
+<table class="tableblock frame-all grid-all stripes-odd stretch">
+<colgroup>
+<col style="width: 40%;">
+<col style="width: 20%;">
+<col style="width: 40%;">
+</colgroup>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_MEM_COMMAND_BUFFER_TEMPORARY</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">This property can be
+queried if <strong>cl_khr_command_buffer</strong> extension is supported.</p>
+<p class="tableblock">Return true if the <em>memobj</em> is temporary buffer object for command
+buffers.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+<div class="sect3">
+<h4 id="_add_new_error_codes_in_appendix_f">Add New Error Codes in Appendix F</h4>
+<table class="tableblock frame-all grid-all stripes-odd stretch">
+<colgroup>
+<col style="width: 40%;">
+<col style="width: 60%;">
+</colgroup>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_BOUND_TO_BUFFER</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returned when attempting to bind a
+  buffer object to a tensor which already has been bound to the same
+  or another.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_INVALID_TENSOR</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returned then the specified tensor is not a
+  valid tensor object.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
 </div>
 <div class="sect2">
 <h3 id="_sample_codes">Sample Codes</h3>
@@ -1062,7 +1084,7 @@ <h3 id="_sample_codes">Sample Codes</h3>
   return matmul_kernel;
 }
 
-cl_kernel create_matmul_kernel(
+cl_kernel create_add_kernel(
   cl_context ctx, std::span&lt;cl_device_id&gt; device_span,
   cl_tensor lhs, cl_tensor rhs, cl_tensor out) {
   // A hypothetical add kernel signature in pseudo OpenCL C for illustrative
@@ -1149,30 +1171,36 @@ <h3 id="_sample_codes">Sample Codes</h3>
 <pre class="highlight"><code class="language-c" data-lang="c">constexpr size_t b = 64, m = 100, n = 200, k = 50;
 
 cl_int err;
-// Create tensors which are used as temporaries in a command buffer.
-// Command buffers allocate space for them as needed.
-//
-// NOTE: same temporary tensor handle used in multiple command buffers
-//       will have separate storage. IOW, command buffers may not exchange
-//       data via temporary buffers between them.
-cl_tensor in0 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
-  3, {b, m, k}, CL_TENSOR_FLOAT, err);
-cl_tensor in1 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
-  3, {b, k, n}, CL_TENSOR_FLOAT, err);
-cl_tensor in2 = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
-  3, {b, m, n}, CL_TENSOR_FLOAT, err);
-cl_tensor t0  = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
-  3, {b, m, n}, CL_TENSOR_FLOAT, err);
-cl_tensor out = clCreateTensor(ctx, {CL_TENSOR_COMMAND_BUFFER_TEMPORARY, true, 0},
-  3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor in0 = clCreateTensor(ctx, nullptr, 3, {b, m, k}, CL_TENSOR_FLOAT, err);
+cl_tensor in1 = clCreateTensor(ctx, nullptr, 3, {b, k, n}, CL_TENSOR_FLOAT, err);
+cl_tensor in2 = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor t0  = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor out = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
 
 cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0);
 cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out);
 
-// Binding a buffer to temporary tensor is not allowed.
-auto ignored = clCreateBufferWithProperties(
-  ctx, {CL_MEM_BIND_TO_TENSOR, t0, 0}, CL_MEM_READ_WRITE, 0, nullptr, &amp;err);
-assert(err == CL_TENSOR_IS_TEMPORARY)
+// Bind command buffer managed storage to tensors.
+//
+// NOTE: same temporary tensor handle used in multiple command buffers
+//       will have separate storage. IOW, command buffers may not exchange
+//       data via temporary buffers between them.
+cl_mem in0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, in0, 0},
+  CL_MEM_READ_ONLY, 0 /* must be zero for CL_MEM_BIND_TO_TENSOR. */,
+  nullptr, &amp;err);
+cl_mem in1_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, in1, 0},
+  CL_MEM_READ_ONLY, 0, nullptr, &amp;err);
+cl_mem in2_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, in2, 0},
+  CL_MEM_READ_ONLY, 0, nullptr, &amp;err);
+cl_mem t0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, t0, 0},
+  CL_MEM_READ_WRITE, 0, nullptr, &amp;err);
+cl_mem out_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, out, 0},
+  CL_MEM_WRITE_ONLY, 0, nullptr, &amp;err);
 
 std::vector&lt;float&gt; in0_data = ...;
 std::vector&lt;float&gt; in1_data = ...;
@@ -1214,14 +1242,24 @@ <h3 id="_sample_codes">Sample Codes</h3>
 </div>
 <div class="sect2">
 <h3 id="_open_questions">Open Questions</h3>
-
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Should we have support for tensors with undefined shape and tensors
+with unknown / symbolic dimension sizes like in ONNX?</p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<p><strong>UNRESOLVED</strong></p>
+</div>
 </div>
 </div>
 </div>
 </div>
 <div id="footer">
 <div id="footer-text">
-Last updated 2023-10-30 16:51:10 +0200
+Last updated 2023-11-02 14:25:56 +0200
 </div>
 </div>
 </body>

From 85edd05d7b6b35ebbec37f64d100f39ba93c708b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.mikael.linjamaki@intel.com>
Date: Thu, 9 Nov 2023 09:39:18 +0200
Subject: [PATCH 14/18] cl_khr_tensor.* -> cl_exp_tensor.*

---
 ext/{cl_khr_tensor.asciidoc => cl_exp_tensor.asciidoc} | 0
 ext/{cl_khr_tensor.html => cl_exp_tensor.html}         | 0
 2 files changed, 0 insertions(+), 0 deletions(-)
 rename ext/{cl_khr_tensor.asciidoc => cl_exp_tensor.asciidoc} (100%)
 rename ext/{cl_khr_tensor.html => cl_exp_tensor.html} (100%)

diff --git a/ext/cl_khr_tensor.asciidoc b/ext/cl_exp_tensor.asciidoc
similarity index 100%
rename from ext/cl_khr_tensor.asciidoc
rename to ext/cl_exp_tensor.asciidoc
diff --git a/ext/cl_khr_tensor.html b/ext/cl_exp_tensor.html
similarity index 100%
rename from ext/cl_khr_tensor.html
rename to ext/cl_exp_tensor.html

From 8fe9046c1dc7500887751d0eaa8d0c16751363bc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.mikael.linjamaki@intel.com>
Date: Wed, 15 Nov 2023 11:20:49 +0200
Subject: [PATCH 15/18] * Add overview

* new section for tensor data type

* add origin, region and pitch parameters for clEnqueueTranslate*Tensor.

* Update code samples.

* Add take on accessing tensors in OpenCL C.
---
 ext/cl_exp_tensor.asciidoc | 268 ++++++++++++++++++++---------
 ext/cl_exp_tensor.html     | 344 +++++++++++++++++++++++++++----------
 2 files changed, 444 insertions(+), 168 deletions(-)

diff --git a/ext/cl_exp_tensor.asciidoc b/ext/cl_exp_tensor.asciidoc
index f1437dd31..b7fd8429c 100644
--- a/ext/cl_exp_tensor.asciidoc
+++ b/ext/cl_exp_tensor.asciidoc
@@ -26,7 +26,7 @@ which can utilize tensors as input, output and temporary storage.
 [cols="1,1,3",options="header",]
 |====
 | *Date*     | *Version* | *Description*
-| 2023-10-XX | 0.1.0     | First assigned version.
+| 2023-11-XX | 0.1.0     | First assigned version.
 |====
 
 ==== Dependencies
@@ -43,10 +43,57 @@ Ben Ashbaugh, Intel. +
 
 === Overview
 
+The new tensor object enables applications to describe N-dimensional
+arrays whose memory layout is abstract to applications. The goal and
+intent of this extension is to give leverage for:
+
+* implementations to have freedom of placement data of the tensors for
+  improving performance of the kernels which use them. This extension
+  should be designed so it allows implementations to determine optimal
+  memory layouts for the tensors based on their use cases for
+  increasing performance - for example, by analyzing kernels’ access
+  patterns - or, in case of built-in kernels, by inspecting tensor
+  arguments they operate on.
+
+* reduce details and boilerplate needed for porting performant
+  applications by being less dependent on platform or device specifics
+  on the memory layout / data arrangements which matters for
+  performance. Such specifics may include:
+
+** alignment of data (e.g. for avoiding misaligned memory accesses)
+
+** arrangement of data required by kernels (column-major vs row-major
+   for matrix multiplication, NHWC vs NCHW for neural network
+   convolution)
+
+** arrangement of the data into tiles (or “packing”) for improving
+   cache and TLB hits
+
+** arrangement of data into specific tiles in order to exploit complex
+   HW operations such as matrix multiplications (Intel AMX, AMD matrix
+   cores).
+
+** arrangement of data into rows separated by a stride in order to
+   avoid back conflicts in GPUs.
+
+The tensor data type is deemed to be effective with command buffers
+and built-in kernels - including kernels to be provided by defined
+built-in kernel (cl_khr_defined_builtin_kernels) extension under work.
 
 === Modifications to OpenCL
 
-==== New OpenCL Functions
+==== New Section: 5.x Tensor Objects
+
+A tensor object stores a N-dimensional array of elements. The memory
+layout of the tensor is opaque to the application. When a tensor
+object is created it initially does not have storage where the
+elements of the tensor are stored into. A storage is bind to a tensor
+by creating a memory buffer with CL_MEM_BIND_TO_BUFFER. Tensor objects
+without storage can be set as kernel arguments for kernels which
+accepts them. Kernels which have tensor arguments must have a storage
+assigned to them prior enqueuing the kernels for execution.
+
+==== New OpenCL Functions added to Tensor Objects section
 
 To create a tensor use:
 
@@ -108,29 +155,32 @@ following error values returned in errcode_ret:
 * CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
   required by the OpenCL implementation on the host.
 
-.Tensor element types
-[cols="1,2",stripes=odd]
+.Tensor element types. The API type indicates the corresponding type for copying elements from an host allocation / buffer object to tensor or vice versa.
+[cols="1,1,1",stripes=even]
 [#TensorDtypes]
 |===
-| *Tensor element data type* | *Description*
-
-| CL_TENSOR_BOOL       | 1-bit signedless integer.
-| CL_TENSOR_INT8       | 8-bit signed integer.
-| CL_TENSOR_INT16      | 16-bit signed integer.
-| CL_TENSOR_INT32      | 32-bit signed integer.
-| CL_TENSOR_INT64      | 64-bit signed integer.
-| CL_TENSOR_UINT8      | 8-bit unsigned integer.
-| CL_TENSOR_UINT16     | 16-bit unsigned integer.
-| CL_TENSOR_UINT32     | 32-bit unsigned integer.
-| CL_TENSOR_UINT64     | 64-bit unsigned integer.
-| CL_TENSOR_HALF       | Half precision floating-point.
-| CL_TENSOR_BFLOAT16   | 16-bit brain floating-point.
-| CL_TENSOR_FLOAT      | Single precision floating-point.
-| CL_TENSOR_DOUBLE     | Double precision floating-point.
-| CL_TENSOR_COMPLEX64  | 64-bit complex floating point with
-  32-bit real and imaginary part.
-| CL_TENSOR_COMPLEX128 | 128-bit complex floating point with
-  64-bit real and imaginary part.
+| *Tensor element data type* | *Description* | *API type*
+
+| CL_TENSOR_BOOL | 1-bit signedless integer.  |
+cl_uchar. footnote:[only LSB bit is considered when writing data to
+tensor. When reading data from tensor the boolean value will be
+written as 0 or 1. The boolean values in the tensor may be packed densenly]
+| CL_TENSOR_INT8       | 8-bit signed integer.            | cl_char.
+| CL_TENSOR_INT16      | 16-bit signed integer.           | cl_short.
+| CL_TENSOR_INT32      | 32-bit signed integer.           | cl_int.
+| CL_TENSOR_INT64      | 64-bit signed integer.           | cl_long.
+| CL_TENSOR_UINT8      | 8-bit unsigned integer.          | cl_uchar.
+| CL_TENSOR_UINT16     | 16-bit unsigned integer.         | cl_ushort.
+| CL_TENSOR_UINT32     | 32-bit unsigned integer.         | cl_uint.
+| CL_TENSOR_UINT64     | 64-bit unsigned integer.         | cl_ulong.
+| CL_TENSOR_HALF       | Half precision floating-point.   | cl_half.
+| CL_TENSOR_BFLOAT16   | 16-bit brain floating-point.     | cl_ushort
+| CL_TENSOR_FLOAT      | Single precision floating-point. | cl_float.
+| CL_TENSOR_DOUBLE     | Double precision floating-point. | cl_double.
+| CL_TENSOR_COMPLEX64  | 64-bit complex floating-point with
+  32-bit real and imaginary part. | cl_float2
+| CL_TENSOR_COMPLEX128 | 128-bit complex floating-point with
+  64-bit real and imaginary part. | cl_double2
 |===
 
 To retain a tensor object, call the function
@@ -234,8 +284,9 @@ clGetTensorInfo call returns:
 count.
 |===
 
-The following functions are for reading from a tensor to host memory / buffer object or to write to a
-tensor object from host memory / buffer object.
+The following functions are for reading from a tensor to host memory /
+buffer object or to write to a tensor object from host memory / buffer
+object.
 
 [source,c]
 ----
@@ -243,6 +294,10 @@ cl_int clEnqueueTranslateFromTensor(
   cl_command_queue command_queue,
   cl_tensor tensor,
   cl_bool blocking_command,
+  const size_t* tensor_origin,
+  const size_t* mem_origin,
+  const size_t* region,
+  const size_t* mem_pitch,
   cl_mem buffer,
   void* host_ptr,
   cl_uint num_events_in_wait_list,
@@ -256,6 +311,10 @@ cl_int clEnqueueTranslateToTensor(
   cl_command_queue command_queue,
   cl_tensor tensor,
   cl_bool blocking_command,
+  const size_t* tensor_origin,
+  const size_t* mem_origin,
+  const size_t* region,
+  const size_t* mem_pitch,
   cl_mem buffer,
   const void* host_ptr,
   cl_uint num_events_in_wait_list,
@@ -272,6 +331,24 @@ cl_int clEnqueueTranslateToTensor(
 * _blocking_command_ indicate if the read and write operations are
   blocking or non-blocking (see below).
 
+* _tensor_origin_ defines the offset coordinates in _tensor_ for start of
+  the regions to read / write tensor data. The length of the array
+  must be at least rank the the _tensor_.
+
+* _mem_origin_ defines the offset coordinates in the memory region
+  pointed by _buffer_ or _host_ptr_ expressed in elements of _tensor_
+  data type. The length of the array must be at least rank the the
+  _tensor_.
+
+* _region_ defines the region being read or written expressed in in
+  elements of _tensor_ data type. The length of the array must be at
+  least rank the the _tensor_. If _region_ is NULL then _tensor_'s
+  shape will be used as the region.
+
+* _mem_pitch_ defines the length of each dimension in elements to be
+  used for the memory region of _buffer_ or _host_ptr_. The length of
+  the array must be at least the rank of _tensor_ minus one.
+
 * _buffer_ and _host_ptr_ refer to a valid buffer object / host
   allocation where data is to be read into or to be written from.
   Either the _buffer_ or _host_ptr_ can be non-NULL in which case the
@@ -304,39 +381,60 @@ implementation-defined, opaque memory layout. The
 storage to buffer object / host allocation.
 
 The elements of buffer object / host allocation are mapped to tensor
-coordinates as follows:
+coordinates and vice versa as follows in pseudo C code:
 
+[source,c]
 ----
-tensor.element(i0, i1, ..., i<N-2>, i<N-1>) == (tensor.dtype)buffer_or_host_ptr[
-  i0 * tensor.shape[1] * tensor.shape[2] * ... * tensor.shape[N-1] +
-  i1 * tensor.shape[2] * tensor.shape[3] * ... * tensor.shape[N-1] +
+tensor_element(
+  tensor_origin[0] + i[0],
+  tensor_origin[1] + i[1],
+  ...,
+  tensor_origin[N-2] + i[N-2],
+  tensor_origin[N-2] + i[N-1]) ==
+((TENSOR_DATATYPE *)buffer_or_host_ptr)[
+  (mem_origin[0] + i[0]) * pitch(0) +
+  (mem_origin[1] + i[1]) * pitch(1) +
   ... +
-  i<N-2> * tensor.shape[i(N-1)] +
-  i<N-1>]
+  (mem_origin[N-2] + i[N-2]) * pitch(N-2) +
+  (mem_origin[N-1] + i[N-1])];
 ----
 
-Where `iX` is a tensor coordinate index with inclusive range of
-`0..<shape[X]-1>`. The `tensor.element()` represents an abstract
-function that accesses a tensor element in its storage at given
-coordinate. The method how the coordinates translate to tensor storage
-addresses is unspecified.
+Where the `N` is tensor rank, the `i[X]` is a tensor coordinate with
+inclusive range of `0..<region[X]-1>` and the `pitch` is computed as
+follows in pseudo C code:
+
+[source,c]
+----
+size_t pitch(size_t dim) {
+  size_t pitch = 1;
+  for (size_t i = dim; i < tensor_rank - 1; i++)
+    pitch *= mem_pitch != NULL ? mem_pitch[i] : region[i + 1];
+  return pitch;
+}
+----
+
+For `dim` in `0..(tensor_rank()-1)`. The `tensor_element()` represents
+an abstract function that accesses a tensor element in its storage at
+given coordinate. The method how the coordinates translate to tensor
+storage addresses is unspecified.
 
 // TODO: add clEnqueueCopyTensor
 
 // TODO: add clEnqueueFillTensor?
 
-// TODO: add command buffer variants for clEnqueue{copy,read,write}Tensor.
-
+TODO: add command buffer variants for clEnqueue*Tensor.
 
 ==== Add New Buffer Property in Section 5.2.1
 
 [cols="2,1,2",stripes=odd]
 |===
 | CL_MEM_COMMAND_BUFFER_TEMPORARY | cl_bool
-
 a| This property can be set if *cl_khr_command_buffer* extension is
 supported.
 
+NOTE: This property temporarily lives here and will be moved to
+a separate extension proposal.
+
 If the value is true, create a "temporary" buffer object that only can
 be used on commands recorded in command buffers. Non-recording
 command enqueue functions must return CL_INVALID_OPERATION if the
@@ -366,10 +464,11 @@ queried with clGetTensorInfo().
 
 Memory layout of the tensor in the created memory buffer is
 implementation-defined and opaque to the applications and it may
-change at unspecified points. Implementation may store auxiliary data
-in the memory buffer for the tensor. Therefore, writing data into the
-memory buffer directly using the cl_mem handle leads to undefined
-behavior.
+change at unspecified points.  Implementation may use non-contiguous
+allocations to store the tensor data and implementation may store
+auxiliary data within the allocations.  Therefore, reading from or
+writing to the memory buffer directly using the cl_mem handle leads to
+undefined behavior.
 
 If the tensor is already bound to a buffer object,
 clCreateBufferWithProperties call returns CL_TENSOR_BOUND_TO_BUFFER
@@ -410,10 +509,8 @@ cl_kernel create_matmul_kernel(
   // A hypothetical matmul kernel signature in pseudo OpenCL C for
   // illustrative purposes:
   //
-  //   kernel void matmul(
-  //     global read_only tensor_t,
-  //     global read_only tensor_t,
-  //     global write_only tensor_t);
+  //   kernel void matmul(global read_only tensor_t, global read_only tensor_t,
+  //                      global write_only tensor_t);
 
   cl_kernel matmul_kernel = /* Omitted. */;
   clSetKernelArg(matmul_kernel, 0, sizeof(cl_tensor), &lhs);
@@ -428,10 +525,8 @@ cl_kernel create_add_kernel(
   // A hypothetical add kernel signature in pseudo OpenCL C for illustrative
   // purposes:
   //
-  // kernel void add(
-  //     global read_only tensor_t,
-  //     global read_only tensor_t,
-  //     global write_only tensor_t);
+  // kernel void add(global read_only tensor_t, global read_only tensor_t,
+  //                 global write_only tensor_t);
 
   cl_tensor add_kernel = /* Omitted. */;
   clSetKernelArg(add_kernel, 0, sizeof(cl_tensor), &lhs);
@@ -446,6 +541,7 @@ An example usage of tensors on a command queue:
 ----
 constexpr size_t b = 64, m = 100, n = 200, k = 50;
 
+cl_int err;
 cl_tensor in0 = clCreateTensor(ctx, nullptr, 3, {b, m, k}, CL_TENSOR_FLOAT, err);
 cl_tensor in1 = clCreateTensor(ctx, nullptr, 3, {b, k, n}, CL_TENSOR_FLOAT, err);
 cl_tensor in2 = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
@@ -455,11 +551,11 @@ cl_tensor out = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err)
 cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0);
 cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out);
 
-// Allocate storage for the tensors. The buffer size must be set to zero
-// when the buffer is bound to a tensor. OpenCL implementation may
-// determine optimal data layout and the storage needed for it, based
-// on the tensor's uses (matmul kernel in this sample) so far.
-cl_int err;
+// Allocate storage for the tensors. The buffer size must be set to
+// zero when the buffer is bound to a tensor. OpenCL implementation
+// may determine optimal data layout and the storage needed for it,
+// based on the tensor's uses (the 'matmul' and 'add' kernels in this
+// sample) so far.
 cl_mem in0_mem = clCreateBufferWithProperties(
   ctx, {CL_MEM_BIND_TO_TENSOR, in0, 0}, CL_MEM_READ_ONLY,
   0 /* must be zero for CL_MEM_BIND_TO_TENSOR. */, nullptr, &err);
@@ -482,20 +578,19 @@ std::vector<float> out_data(b * m * n);
 
 // Copies data into in0 tensor while possibly rearranging the data to the
 // optimal data layout.
-clEnqueueWriteTensor(
-  cmd_q, in0, false, nullptr, nullptr, {b, m, k}, nullptr, in0_data.data(),
-  0, nullptr, nullptr);
-
-clEnqueueWriteTensor(
-  cmd_q, in1, false, nullptr, nullptr, {b, k, n}, nullptr, in1_data.data(),
-  0, nullptr, nullptr);
+clEnqueueTranslateToTensor(
+  cmd_q, in0, false, {0, 0, 0}, {0, 0, 0}, {b, m, k},
+  nullptr, nullptr, nullptr, in0_data.data(), 0, nullptr, nullptr);
+clEnqueueTranslateToTensor(
+  cmd_q, in1, false, {0, 0, 0}, {0, 0, 0}, {b, k, n},
+  nullptr, nullptr, nullptr, in1_data.data(), 0, nullptr, nullptr);
 clEnqueueNDRangeKernel(
-  cmd_q, matmul_kernel, 0, nullptr, nullptr, nullptr, 0, nullptr, nullptr);
+  cmd_q, matmul_kernel, 3, matmul_grid, nullptr, nullptr, 0, nullptr, nullptr);
 clEnqueueNDRangeKernel(
-  cmd_q, add_kernel, 0, nullptr, nullptr, nullptr, 0, nullptr, nullptr);
-clEnqueueReadTensor(
-  cmd_q, out, false, nullptr, nullptr, {b, m, n}, nullptr, out_data.data(),
-  0, nullptr, nullptr);
+  cmd_q, add_kernel, 3, add_grid, nullptr, nullptr, 0, nullptr, nullptr);
+clEnqueueTranslateFromTensor(
+  cmd_q, out, false,  {0, 0, 0}, {0, 0, 0}, {b, m, n},
+  nullptr, nullptr, nullptr, out_data.data(), 0, nullptr, nullptr);
 ----
 
 An example use of tensors in a command buffer when cl_khr_command_buffer
@@ -545,21 +640,21 @@ cl_command_buffer_khr cb =
   clCreateCommandBufferKHR(num_queues, queue_list, nullptr, &err);
 
 cl_sync_point_khr in0_syncp, in1_syncp, matmul_syncp, add_syncp;
-clCommandWriteTensorKHR(
-  cmd_b, cmd_q, in0, false, nullptr, nullptr, {b, m, k}, nullptr,
-  in0_data.data(), 0, nullptr, &in0_syncp);
-clCommandWriteTensorKHR(
-  cmd_b, cmd_q, in1, false, nullptr, nullptr, {b, k, m}, nullptr,
-  in1_data.data(), 0, nullptr, &in1_syncp);
+clCommandTranslateToTensorKHR(
+  cmd_b, cmd_q, in0, {0, 0, 0}, {0, 0, 0}, {b, m, k},
+  nullptr, nullptr, nullptr, in0_data.data(), 0, nullptr, &in0_syncp);
+clCommandTranslateToTensorKHR(
+  cmd_b, cmd_q, in1, {0, 0, 0}, {0, 0, 0}, {b, k, m},
+  nullptr, nullptr, nullptr, in1_data.data(), 0, nullptr, &in1_syncp);
 clCommandNDRangeKernelKHR(
-  cmd_b, cmd_q, nullptr, matmul_kernel, 0, nullptr, nullptr, nullptr,
+  cmd_b, cmd_q, nullptr, matmul_kernel, 3, matmul_grid, nullptr, nullptr,
   2, {in0_syncp, in2_syncp}, &matmul_syncp, nullptr);
 clCommandNDRangeKernelKHR(
-  cmd_b, cmd_q, nullptr, add_kernel, 0, nullptr, nullptr, nullptr,
+  cmd_b, cmd_q, nullptr, add_kernel, 3, add_grid, nullptr, nullptr,
   1, {matmul_syncp}, &add_syncp, nullptr);
-clCommandReadTensorKHR(
-  cmd_b, cmd_q, out,  false, nullptr, nullptr, {b, k, m}, nullptr,
-  out_data.data(), 1, {add_syncp}, nullptr);
+clCommandTranslateFromTensorKHR(
+  cmd_b, cmd_q, out, {0, 0, 0}, {0, 0, 0}, {b, k, m},
+  nullptr, nullptr, nullptr, out_data.data(), 1, {add_syncp}, nullptr);
 
 // Finalize the command buffer. At this point the OpenCL
 // implementation may reserve enough storage for all the tensor
@@ -571,14 +666,23 @@ clFinalizeCommandBufferKHR(cmd_b);
 // Temporary tensors used in a command buffer can't be read or written
 // into. A hypothetical reason is that the finalized command buffer
 // might not use some of the tensor.
-assert(clEnqueueReadTensor(..., t0, ...) == CL_INVALID_OPERATION);
+assert(clEnqueueTranslateFromTensor(..., t0, ...) == CL_INVALID_OPERATION);
 ----
 
 === Open Questions ===
 
 . Should we have support for tensors with undefined shape and tensors
   with unknown / symbolic dimension sizes like in ONNX?
-
++
+--
 // https://onnx.ai/onnx/repo-docs/ShapeInference.html
-
 *UNRESOLVED*
+--
+
+. Should we define OpenCL C language features for accessing tensors?
++
+--
+*RESOLVED*: OpenCL C support for tensors can be introduced later in a
+            separate extension. Built-in kernels may benefit from this
+            extension.
+--
diff --git a/ext/cl_exp_tensor.html b/ext/cl_exp_tensor.html
index c232ddea7..e86b703cd 100644
--- a/ext/cl_exp_tensor.html
+++ b/ext/cl_exp_tensor.html
@@ -478,7 +478,7 @@ <h4 id="_version_history">Version history</h4>
 </thead>
 <tbody>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">2023-10-XX</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">2023-11-XX</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">0.1.0</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">First assigned version.</p></td>
 </tr>
@@ -505,12 +505,78 @@ <h4 id="_contributors">Contributors</h4>
 </div>
 <div class="sect2">
 <h3 id="_overview">Overview</h3>
-
+<div class="paragraph">
+<p>The new tensor object enables applications to describe N-dimensional
+arrays whose memory layout is abstract to applications. The goal and
+intent of this extension is to give leverage for:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>implementations to have freedom of placement data of the tensors for
+improving performance of the kernels which use them. This extension
+should be designed so it allows implementations to determine optimal
+memory layouts for the tensors based on their use cases for
+increasing performance - for example, by analyzing kernels’ access
+patterns - or, in case of built-in kernels, by inspecting tensor
+arguments they operate on.</p>
+</li>
+<li>
+<p>reduce details and boilerplate needed for porting performant
+applications by being less dependent on platform or device specifics
+on the memory layout / data arrangements which matters for
+performance. Such specifics may include:</p>
+<div class="ulist">
+<ul>
+<li>
+<p>alignment of data (e.g. for avoiding misaligned memory accesses)</p>
+</li>
+<li>
+<p>arrangement of data required by kernels (column-major vs row-major
+for matrix multiplication, NHWC vs NCHW for neural network
+convolution)</p>
+</li>
+<li>
+<p>arrangement of the data into tiles (or “packing”) for improving
+cache and TLB hits</p>
+</li>
+<li>
+<p>arrangement of data into specific tiles in order to exploit complex
+HW operations such as matrix multiplications (Intel AMX, AMD matrix
+cores).</p>
+</li>
+<li>
+<p>arrangement of data into rows separated by a stride in order to
+avoid back conflicts in GPUs.</p>
+</li>
+</ul>
+</div>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The tensor data type is deemed to be effective with command buffers
+and built-in kernels - including kernels to be provided by defined
+built-in kernel (cl_khr_defined_builtin_kernels) extension under work.</p>
+</div>
 </div>
 <div class="sect2">
 <h3 id="_modifications_to_opencl">Modifications to OpenCL</h3>
 <div class="sect3">
-<h4 id="_new_opencl_functions">New OpenCL Functions</h4>
+<h4 id="_new_section_5_x_tensor_objects">New Section: 5.x Tensor Objects</h4>
+<div class="paragraph">
+<p>A tensor object stores a N-dimensional array of elements. The memory
+layout of the tensor is opaque to the application. When a tensor
+object is created it initially does not have storage where the
+elements of the tensor are stored into. A storage is bind to a tensor
+by creating a memory buffer with CL_MEM_BIND_TO_BUFFER. Tensor objects
+without storage can be set as kernel arguments for kernels which
+accepts them. Kernels which have tensor arguments must have a storage
+assigned to them prior enqueuing the kernels for execution.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_new_opencl_functions_added_to_tensor_objects_section">New OpenCL Functions added to Tensor Objects section</h4>
 <div class="paragraph">
 <p>To create a tensor use:</p>
 </div>
@@ -548,7 +614,7 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 </li>
 <li>
 <p><em>dtype</em> is the element type of <em>tensor</em>. Refer to the
-<a href="#TensorDtypes">Tensor element types</a> table for the types.</p>
+<a href="#TensorDtypes">Tensor element types. The API type indicates the corresponding type for copying elements from an host allocation / buffer object to tensor or vice versa.</a> table for the types.</p>
 </li>
 <li>
 <p><em>errcode_ret</em> may return an appropriate error code. If errcode_ret
@@ -593,80 +659,97 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 </li>
 </ul>
 </div>
-<table id="TensorDtypes" class="tableblock frame-all grid-all stripes-odd stretch">
-<caption class="title">Table 1. Tensor element types</caption>
+<table id="TensorDtypes" class="tableblock frame-all grid-all stripes-even stretch">
+<caption class="title">Table 1. Tensor element types. The API type indicates the corresponding type for copying elements from an host allocation / buffer object to tensor or vice versa.</caption>
 <colgroup>
 <col style="width: 33.3333%;">
-<col style="width: 66.6667%;">
+<col style="width: 33.3333%;">
+<col style="width: 33.3334%;">
 </colgroup>
 <thead>
 <tr>
 <th class="tableblock halign-left valign-top"><strong>Tensor element data type</strong></th>
 <th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+<th class="tableblock halign-left valign-top"><strong>API type</strong></th>
 </tr>
 </thead>
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_BOOL</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">1-bit signedless integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uchar. <sup class="footnote">[<a id="_footnoteref_1" class="footnote" href="#_footnotedef_1" title="View footnote.">1</a>]</sup></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_INT8</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">8-bit signed integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_char.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_INT16</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">16-bit signed integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_short.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_INT32</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">32-bit signed integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_int.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_INT64</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">64-bit signed integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_long.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_UINT8</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">8-bit unsigned integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uchar.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_UINT16</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">16-bit unsigned integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ushort.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_UINT32</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">32-bit unsigned integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_UINT64</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">64-bit unsigned integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ulong.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_HALF</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Half precision floating-point.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_half.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_BFLOAT16</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">16-bit brain floating-point.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ushort</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_FLOAT</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Single precision floating-point.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_float.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_DOUBLE</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Double precision floating-point.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_double.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_COMPLEX64</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">64-bit complex floating point with
+<td class="tableblock halign-left valign-top"><p class="tableblock">64-bit complex floating-point with
   32-bit real and imaginary part.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_float2</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_COMPLEX128</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">128-bit complex floating point with
+<td class="tableblock halign-left valign-top"><p class="tableblock">128-bit complex floating-point with
   64-bit real and imaginary part.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_double2</p></td>
 </tr>
 </tbody>
 </table>
@@ -850,8 +933,9 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 </tbody>
 </table>
 <div class="paragraph">
-<p>The following functions are for reading from a tensor to host memory / buffer object or to write to a
-tensor object from host memory / buffer object.</p>
+<p>The following functions are for reading from a tensor to host memory /
+buffer object or to write to a tensor object from host memory / buffer
+object.</p>
 </div>
 <div class="listingblock">
 <div class="content">
@@ -859,6 +943,10 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
   cl_command_queue command_queue,
   cl_tensor tensor,
   cl_bool blocking_command,
+  const size_t* tensor_origin,
+  const size_t* mem_origin,
+  const size_t* region,
+  const size_t* mem_pitch,
   cl_mem buffer,
   void* host_ptr,
   cl_uint num_events_in_wait_list,
@@ -872,6 +960,10 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
   cl_command_queue command_queue,
   cl_tensor tensor,
   cl_bool blocking_command,
+  const size_t* tensor_origin,
+  const size_t* mem_origin,
+  const size_t* region,
+  const size_t* mem_pitch,
   cl_mem buffer,
   const void* host_ptr,
   cl_uint num_events_in_wait_list,
@@ -894,6 +986,28 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 blocking or non-blocking (see below).</p>
 </li>
 <li>
+<p><em>tensor_origin</em> defines the offset coordinates in <em>tensor</em> for start of
+the regions to read / write tensor data. The length of the array
+must be at least rank the the <em>tensor</em>.</p>
+</li>
+<li>
+<p><em>mem_origin</em> defines the offset coordinates in the memory region
+pointed by <em>buffer</em> or <em>host_ptr</em> expressed in elements of <em>tensor</em>
+data type. The length of the array must be at least rank the the
+<em>tensor</em>.</p>
+</li>
+<li>
+<p><em>region</em> defines the region being read or written expressed in in
+elements of <em>tensor</em> data type. The length of the array must be at
+least rank the the <em>tensor</em>. If <em>region</em> is NULL then <em>tensor</em>'s
+shape will be used as the region.</p>
+</li>
+<li>
+<p><em>mem_pitch</em> defines the length of each dimension in elements to be
+used for the memory region of <em>buffer</em> or <em>host_ptr</em>. The length of
+the array must be at least the rank of <em>tensor</em> minus one.</p>
+</li>
+<li>
 <p><em>buffer</em> and <em>host_ptr</em> refer to a valid buffer object / host
 allocation where data is to be read into or to be written from.
 Either the <em>buffer</em> or <em>host_ptr</em> can be non-NULL in which case the
@@ -932,24 +1046,47 @@ <h4 id="_new_opencl_functions">New OpenCL Functions</h4>
 </div>
 <div class="paragraph">
 <p>The elements of buffer object / host allocation are mapped to tensor
-coordinates as follows:</p>
+coordinates and vice versa as follows in pseudo C code:</p>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre>tensor.element(i0, i1, ..., i&lt;N-2&gt;, i&lt;N-1&gt;) == (tensor.dtype)buffer_or_host_ptr[
-  i0 * tensor.shape[1] * tensor.shape[2] * ... * tensor.shape[N-1] +
-  i1 * tensor.shape[2] * tensor.shape[3] * ... * tensor.shape[N-1] +
+<pre class="highlight"><code class="language-c" data-lang="c">tensor_element(
+  tensor_origin[0] + i[0],
+  tensor_origin[1] + i[1],
+  ...,
+  tensor_origin[N-2] + i[N-2],
+  tensor_origin[N-2] + i[N-1]) ==
+((TENSOR_DATATYPE *)buffer_or_host_ptr)[
+  (mem_origin[0] + i[0]) * pitch(0) +
+  (mem_origin[1] + i[1]) * pitch(1) +
   ... +
-  i&lt;N-2&gt; * tensor.shape[i(N-1)] +
-  i&lt;N-1&gt;]</pre>
+  (mem_origin[N-2] + i[N-2]) * pitch(N-2) +
+  (mem_origin[N-1] + i[N-1])];</code></pre>
 </div>
 </div>
 <div class="paragraph">
-<p>Where <code>iX</code> is a tensor coordinate index with inclusive range of
-<code>0..&lt;shape[X]-1&gt;</code>. The <code>tensor.element()</code> represents an abstract
-function that accesses a tensor element in its storage at given
-coordinate. The method how the coordinates translate to tensor storage
-addresses is unspecified.</p>
+<p>Where the <code>N</code> is tensor rank, the <code>i[X]</code> is a tensor coordinate with
+inclusive range of <code>0..&lt;region[X]-1&gt;</code> and the <code>pitch</code> is computed as
+follows in pseudo C code:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">size_t pitch(size_t dim) {
+  size_t pitch = 1;
+  for (size_t i = dim; i &lt; tensor_rank - 1; i++)
+    pitch *= mem_pitch != NULL ? mem_pitch[i] : region[i + 1];
+  return pitch;
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>For <code>dim</code> in <code>0..(tensor_rank()-1)</code>. The <code>tensor_element()</code> represents
+an abstract function that accesses a tensor element in its storage at
+given coordinate. The method how the coordinates translate to tensor
+storage addresses is unspecified.</p>
+</div>
+<div class="paragraph">
+<p>TODO: add command buffer variants for clEnqueue*Tensor.</p>
 </div>
 </div>
 <div class="sect3">
@@ -960,32 +1097,48 @@ <h4 id="_add_new_buffer_property_in_section_5_2_1">Add New Buffer Property in Se
 <col style="width: 20%;">
 <col style="width: 40%;">
 </colgroup>
-<thead>
+<tbody>
 <tr>
-<th class="tableblock halign-left valign-top">CL_MEM_COMMAND_BUFFER_TEMPORARY</th>
-<th class="tableblock halign-left valign-top">cl_bool</th>
-<th class="tableblock halign-left valign-top">This property can be set if <strong>cl_khr_command_buffer</strong> extension is
-supported.
-
-If the value is true, create a "temporary" buffer object that only can
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_MEM_COMMAND_BUFFER_TEMPORARY</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
+<p>This property can be set if <strong>cl_khr_command_buffer</strong> extension is
+supported.</p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<div class="title">Note</div>
+</td>
+<td class="content">
+This property temporarily lives here and will be moved to
+a separate extension proposal.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>If the value is true, create a "temporary" buffer object that only can
 be used on commands recorded in command buffers. Non-recording
 command enqueue functions must return CL_INVALID_OPERATION if the
-command refers to a temporary buffer object.
-
-The temporary buffer objects are managed by command buffers. When a
+command refers to a temporary buffer object.</p>
+</div>
+<div class="paragraph">
+<p>The temporary buffer objects are managed by command buffers. When a
 temporary buffer object is used by multiple command buffer, the object
-receives disjoint storage for each command buffer.
-
-
-Storage of the temporary buffer objects may be allocated on-demand
+receives disjoint storage for each command buffer.</p>
+</div>
+<div class="paragraph">
+<p>Storage of the temporary buffer objects may be allocated on-demand
 basis. At the times the buffer is not needed, OpenCL implementations
-may reuse storage for other tasks within the command buffer.
-
-Contents of the temporary buffers are not guaranteed to be preserved
-across command buffer executions.</th>
+may reuse storage for other tasks within the command buffer.</p>
+</div>
+<div class="paragraph">
+<p>Contents of the temporary buffers are not guaranteed to be preserved
+across command buffer executions.</p>
+</div></div></td>
 </tr>
-</thead>
-<tbody>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">CL_MEM_BIND_TO_TENSOR</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">cl_tensor</p></td>
@@ -1002,10 +1155,11 @@ <h4 id="_add_new_buffer_property_in_section_5_2_1">Add New Buffer Property in Se
 <div class="paragraph">
 <p>Memory layout of the tensor in the created memory buffer is
 implementation-defined and opaque to the applications and it may
-change at unspecified points. Implementation may store auxiliary data
-in the memory buffer for the tensor. Therefore, writing data into the
-memory buffer directly using the cl_mem handle leads to undefined
-behavior.</p>
+change at unspecified points.  Implementation may use non-contiguous
+allocations to store the tensor data and implementation may store
+auxiliary data within the allocations.  Therefore, reading from or
+writing to the memory buffer directly using the cl_mem handle leads to
+undefined behavior.</p>
 </div>
 <div class="paragraph">
 <p>If the tensor is already bound to a buffer object,
@@ -1072,10 +1226,8 @@ <h3 id="_sample_codes">Sample Codes</h3>
   // A hypothetical matmul kernel signature in pseudo OpenCL C for
   // illustrative purposes:
   //
-  //   kernel void matmul(
-  //     global read_only tensor_t,
-  //     global read_only tensor_t,
-  //     global write_only tensor_t);
+  //   kernel void matmul(global read_only tensor_t, global read_only tensor_t,
+  //                      global write_only tensor_t);
 
   cl_kernel matmul_kernel = /* Omitted. */;
   clSetKernelArg(matmul_kernel, 0, sizeof(cl_tensor), &amp;lhs);
@@ -1090,10 +1242,8 @@ <h3 id="_sample_codes">Sample Codes</h3>
   // A hypothetical add kernel signature in pseudo OpenCL C for illustrative
   // purposes:
   //
-  // kernel void add(
-  //     global read_only tensor_t,
-  //     global read_only tensor_t,
-  //     global write_only tensor_t);
+  // kernel void add(global read_only tensor_t, global read_only tensor_t,
+  //                 global write_only tensor_t);
 
   cl_tensor add_kernel = /* Omitted. */;
   clSetKernelArg(add_kernel, 0, sizeof(cl_tensor), &amp;lhs);
@@ -1110,6 +1260,7 @@ <h3 id="_sample_codes">Sample Codes</h3>
 <div class="content">
 <pre class="highlight"><code class="language-c" data-lang="c">constexpr size_t b = 64, m = 100, n = 200, k = 50;
 
+cl_int err;
 cl_tensor in0 = clCreateTensor(ctx, nullptr, 3, {b, m, k}, CL_TENSOR_FLOAT, err);
 cl_tensor in1 = clCreateTensor(ctx, nullptr, 3, {b, k, n}, CL_TENSOR_FLOAT, err);
 cl_tensor in2 = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
@@ -1119,11 +1270,11 @@ <h3 id="_sample_codes">Sample Codes</h3>
 cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0);
 cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out);
 
-// Allocate storage for the tensors. The buffer size must be set to zero
-// when the buffer is bound to a tensor. OpenCL implementation may
-// determine optimal data layout and the storage needed for it, based
-// on the tensor's uses (matmul kernel in this sample) so far.
-cl_int err;
+// Allocate storage for the tensors. The buffer size must be set to
+// zero when the buffer is bound to a tensor. OpenCL implementation
+// may determine optimal data layout and the storage needed for it,
+// based on the tensor's uses (the 'matmul' and 'add' kernels in this
+// sample) so far.
 cl_mem in0_mem = clCreateBufferWithProperties(
   ctx, {CL_MEM_BIND_TO_TENSOR, in0, 0}, CL_MEM_READ_ONLY,
   0 /* must be zero for CL_MEM_BIND_TO_TENSOR. */, nullptr, &amp;err);
@@ -1146,20 +1297,19 @@ <h3 id="_sample_codes">Sample Codes</h3>
 
 // Copies data into in0 tensor while possibly rearranging the data to the
 // optimal data layout.
-clEnqueueWriteTensor(
-  cmd_q, in0, false, nullptr, nullptr, {b, m, k}, nullptr, in0_data.data(),
-  0, nullptr, nullptr);
-
-clEnqueueWriteTensor(
-  cmd_q, in1, false, nullptr, nullptr, {b, k, n}, nullptr, in1_data.data(),
-  0, nullptr, nullptr);
+clEnqueueTranslateToTensor(
+  cmd_q, in0, false, {0, 0, 0}, {0, 0, 0}, {b, m, k},
+  nullptr, nullptr, nullptr, in0_data.data(), 0, nullptr, nullptr);
+clEnqueueTranslateToTensor(
+  cmd_q, in1, false, {0, 0, 0}, {0, 0, 0}, {b, k, n},
+  nullptr, nullptr, nullptr, in1_data.data(), 0, nullptr, nullptr);
 clEnqueueNDRangeKernel(
-  cmd_q, matmul_kernel, 0, nullptr, nullptr, nullptr, 0, nullptr, nullptr);
+  cmd_q, matmul_kernel, 3, matmul_grid, nullptr, nullptr, 0, nullptr, nullptr);
 clEnqueueNDRangeKernel(
-  cmd_q, add_kernel, 0, nullptr, nullptr, nullptr, 0, nullptr, nullptr);
-clEnqueueReadTensor(
-  cmd_q, out, false, nullptr, nullptr, {b, m, n}, nullptr, out_data.data(),
-  0, nullptr, nullptr);</code></pre>
+  cmd_q, add_kernel, 3, add_grid, nullptr, nullptr, 0, nullptr, nullptr);
+clEnqueueTranslateFromTensor(
+  cmd_q, out, false,  {0, 0, 0}, {0, 0, 0}, {b, m, n},
+  nullptr, nullptr, nullptr, out_data.data(), 0, nullptr, nullptr);</code></pre>
 </div>
 </div>
 <div class="paragraph">
@@ -1210,21 +1360,21 @@ <h3 id="_sample_codes">Sample Codes</h3>
   clCreateCommandBufferKHR(num_queues, queue_list, nullptr, &amp;err);
 
 cl_sync_point_khr in0_syncp, in1_syncp, matmul_syncp, add_syncp;
-clCommandWriteTensorKHR(
-  cmd_b, cmd_q, in0, false, nullptr, nullptr, {b, m, k}, nullptr,
-  in0_data.data(), 0, nullptr, &amp;in0_syncp);
-clCommandWriteTensorKHR(
-  cmd_b, cmd_q, in1, false, nullptr, nullptr, {b, k, m}, nullptr,
-  in1_data.data(), 0, nullptr, &amp;in1_syncp);
+clCommandTranslateToTensorKHR(
+  cmd_b, cmd_q, in0, {0, 0, 0}, {0, 0, 0}, {b, m, k},
+  nullptr, nullptr, nullptr, in0_data.data(), 0, nullptr, &amp;in0_syncp);
+clCommandTranslateToTensorKHR(
+  cmd_b, cmd_q, in1, {0, 0, 0}, {0, 0, 0}, {b, k, m},
+  nullptr, nullptr, nullptr, in1_data.data(), 0, nullptr, &amp;in1_syncp);
 clCommandNDRangeKernelKHR(
-  cmd_b, cmd_q, nullptr, matmul_kernel, 0, nullptr, nullptr, nullptr,
+  cmd_b, cmd_q, nullptr, matmul_kernel, 3, matmul_grid, nullptr, nullptr,
   2, {in0_syncp, in2_syncp}, &amp;matmul_syncp, nullptr);
 clCommandNDRangeKernelKHR(
-  cmd_b, cmd_q, nullptr, add_kernel, 0, nullptr, nullptr, nullptr,
+  cmd_b, cmd_q, nullptr, add_kernel, 3, add_grid, nullptr, nullptr,
   1, {matmul_syncp}, &amp;add_syncp, nullptr);
-clCommandReadTensorKHR(
-  cmd_b, cmd_q, out,  false, nullptr, nullptr, {b, k, m}, nullptr,
-  out_data.data(), 1, {add_syncp}, nullptr);
+clCommandTranslateFromTensorKHR(
+  cmd_b, cmd_q, out, {0, 0, 0}, {0, 0, 0}, {b, k, m},
+  nullptr, nullptr, nullptr, out_data.data(), 1, {add_syncp}, nullptr);
 
 // Finalize the command buffer. At this point the OpenCL
 // implementation may reserve enough storage for all the tensor
@@ -1236,7 +1386,7 @@ <h3 id="_sample_codes">Sample Codes</h3>
 // Temporary tensors used in a command buffer can't be read or written
 // into. A hypothetical reason is that the finalized command buffer
 // might not use some of the tensor.
-assert(clEnqueueReadTensor(..., t0, ...) == CL_INVALID_OPERATION);</code></pre>
+assert(clEnqueueTranslateFromTensor(..., t0, ...) == CL_INVALID_OPERATION);</code></pre>
 </div>
 </div>
 </div>
@@ -1247,19 +1397,41 @@ <h3 id="_open_questions">Open Questions</h3>
 <li>
 <p>Should we have support for tensors with undefined shape and tensors
 with unknown / symbolic dimension sizes like in ONNX?</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>UNRESOLVED</strong></p>
+</div>
+</div>
+</div>
+</li>
+<li>
+<p>Should we define OpenCL C language features for accessing tensors?</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>RESOLVED</strong>: OpenCL C support for tensors can be introduced later in a
+            separate extension. Built-in kernels may benefit from this
+            extension.</p>
+</div>
+</div>
+</div>
 </li>
 </ol>
 </div>
-<div class="paragraph">
-<p><strong>UNRESOLVED</strong></p>
 </div>
 </div>
 </div>
 </div>
+<div id="footnotes">
+<hr>
+<div class="footnote" id="_footnotedef_1">
+<a href="#_footnoteref_1">1</a>. only LSB bit is considered when writing data to tensor. When reading data from tensor the boolean value will be written as 0 or 1. The boolean values in the tensor may be packed densenly
+</div>
 </div>
 <div id="footer">
 <div id="footer-text">
-Last updated 2023-11-02 14:25:56 +0200
+Last updated 2023-11-15 11:19:22 +0200
 </div>
 </div>
 </body>

From 363c69b52e01d83c735ea89f045ec9cff6e812be Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <linehill@users.noreply.github.com>
Date: Wed, 15 Nov 2023 14:00:42 +0200
Subject: [PATCH 16/18] Apply suggestions from code review
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Pekka Jääskeläinen <pekka.jaaskelainen@tuni.fi>
---
 ext/cl_exp_tensor.asciidoc | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/ext/cl_exp_tensor.asciidoc b/ext/cl_exp_tensor.asciidoc
index b7fd8429c..5cd8feda1 100644
--- a/ext/cl_exp_tensor.asciidoc
+++ b/ext/cl_exp_tensor.asciidoc
@@ -44,18 +44,18 @@ Ben Ashbaugh, Intel. +
 === Overview
 
 The new tensor object enables applications to describe N-dimensional
-arrays whose memory layout is abstract to applications. The goal and
-intent of this extension is to give leverage for:
+arrays whose memory layout is opaque to applications. The goals
+of this extension are the following:
 
-* implementations to have freedom of placement data of the tensors for
+* Enable implementations to have freedom of placement data of the tensors for
   improving performance of the kernels which use them. This extension
-  should be designed so it allows implementations to determine optimal
+  is designed such it allows implementations to determine optimal
   memory layouts for the tensors based on their use cases for
-  increasing performance - for example, by analyzing kernels’ access
-  patterns - or, in case of built-in kernels, by inspecting tensor
+  increased performance, by means of, for example, analyzing kernels’ access
+  patterns or, in case of built-in kernels, by inspecting the tensor
   arguments they operate on.
 
-* reduce details and boilerplate needed for porting performant
+* Reduce details and boilerplate needed for performance portable implementation of
   applications by being less dependent on platform or device specifics
   on the memory layout / data arrangements which matters for
   performance. Such specifics may include:
@@ -74,23 +74,23 @@ intent of this extension is to give leverage for:
    cores).
 
 ** arrangement of data into rows separated by a stride in order to
-   avoid back conflicts in GPUs.
+   avoid bank conflicts in GPUs.
 
-The tensor data type is deemed to be effective with command buffers
-and built-in kernels - including kernels to be provided by defined
-built-in kernel (cl_khr_defined_builtin_kernels) extension under work.
+The tensor data type is designed to be efficiently used together with command buffers (cl_khr_command_buffers)
+and built-in kernels, including kernels to be provided by the Defined
+Built-in Kernels (cl_khr_defined_builtin_kernels) extension that is being prepared together with this extension.
 
 === Modifications to OpenCL
 
 ==== New Section: 5.x Tensor Objects
 
-A tensor object stores a N-dimensional array of elements. The memory
+A tensor object stores an N-dimensional array of elements. The memory
 layout of the tensor is opaque to the application. When a tensor
-object is created it initially does not have storage where the
-elements of the tensor are stored into. A storage is bind to a tensor
+object is created it is initially not associated to any storage for the tensor elements.
+ A storage is bound to a tensor
 by creating a memory buffer with CL_MEM_BIND_TO_BUFFER. Tensor objects
 without storage can be set as kernel arguments for kernels which
-accepts them. Kernels which have tensor arguments must have a storage
+accepts them. Kernels which have tensor arguments must have storage
 assigned to them prior enqueuing the kernels for execution.
 
 ==== New OpenCL Functions added to Tensor Objects section
@@ -684,5 +684,5 @@ assert(clEnqueueTranslateFromTensor(..., t0, ...) == CL_INVALID_OPERATION);
 --
 *RESOLVED*: OpenCL C support for tensors can be introduced later in a
             separate extension. Built-in kernels may benefit from this
-            extension.
+            extension as it is.
 --

From 6402f59a047a4714787b3c2e57328cc8c382cf39 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.mikael.linjamaki@intel.com>
Date: Thu, 16 Nov 2023 14:23:40 +0200
Subject: [PATCH 17/18] * Add command buffer counterparts for tensor
 translation commands

* Add error codes for tensor translation commands.

* Tweaked mem_pitch semantics.
---
 ext/cl_exp_tensor.asciidoc | 141 ++++++++++++++++++++++--
 ext/cl_exp_tensor.html     | 213 ++++++++++++++++++++++++++++++++-----
 2 files changed, 319 insertions(+), 35 deletions(-)

diff --git a/ext/cl_exp_tensor.asciidoc b/ext/cl_exp_tensor.asciidoc
index 5cd8feda1..ddf9cf03a 100644
--- a/ext/cl_exp_tensor.asciidoc
+++ b/ext/cl_exp_tensor.asciidoc
@@ -347,7 +347,9 @@ cl_int clEnqueueTranslateToTensor(
 
 * _mem_pitch_ defines the length of each dimension in elements to be
   used for the memory region of _buffer_ or _host_ptr_. The length of
-  the array must be at least the rank of _tensor_ minus one.
+  the array must be at least the rank of _tensor_ minus one. if
+  _mem_pitch_ is NULL or _mem_pitch_[i] is zero, _mem_pitch_[i] is
+  computed as _region_[i + 1].
 
 * _buffer_ and _host_ptr_ refer to a valid buffer object / host
   allocation where data is to be read into or to be written from.
@@ -408,7 +410,8 @@ follows in pseudo C code:
 size_t pitch(size_t dim) {
   size_t pitch = 1;
   for (size_t i = dim; i < tensor_rank - 1; i++)
-    pitch *= mem_pitch != NULL ? mem_pitch[i] : region[i + 1];
+    pitch *=
+      (mem_pitch != NULL || mem_pitch[i] == 0) ? mem_pitch[i] : region[i + 1];
   return pitch;
 }
 ----
@@ -418,11 +421,131 @@ an abstract function that accesses a tensor element in its storage at
 given coordinate. The method how the coordinates translate to tensor
 storage addresses is unspecified.
 
+*clEnqueueTranslateFsomTensor* and *clEnqueueTranslateToTensor*
+returns CL_SUCCESS if the function is executed
+successfully. Otherwise, it returns one of the following errors:
+
+* CL_INVALID_COMMAND_QUEUE if _command_queue_ is not a valid host
+  command-queue.
+
+* CL_INVALID_CONTEXT if the context associated with _command_queue_
+  and buffer are not the same or if the context associated with
+  _command_queue_ and events in _event_wait_list_ are not the same.
+
+* CL_INVALID_MEM_OBJECT if _buffer_ is not a valid buffer object.
+
+* CL_INVALID_VALUE if _tensor_origin_ or _mem_origin_ is NULL.
+
+* CL_INVALID_VALUE if the region being read or written specified by
+  (_mem_origin_, _region_, _mem_pitch_) is out of bounds.
+
+* CL_INVALID_VALUE if any _region_ array element is 0.
+
+* CL_INVALID_VALUE if _mem_pitch_ is not NULL and _mem_pitch_[i] is
+  not 0 and _mem_pitch_[i] is less than _region_[i].
+
+* CL_INVALID_VALUE if _buffer_ and _host_ptr_ both are NULL or non-NULL.
+
+* CL_INVALID_EVENT_WAIT_LIST if _event_wait_list_ is NULL and
+  _num_events_in_wait_list_ > 0, or _event_wait_list_ is not NULL and
+  _num_events_in_wait_list_ is 0, or if event objects in
+  _event_wait_list_ are not valid events.
+
+* CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the read and write
+  operations are blocking and the execution status of any of the
+  events in _event_wait_list_ is a negative integer value.
+
+* CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
+  memory for data store associated with memory object the _tensor_ is
+  bound to.
+
+* CL_OUT_OF_RESOURCES if there is a failure to allocate resources
+  required by the OpenCL implementation on the device.
+
+* CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+  required by the OpenCL implementation on the host.
+
 // TODO: add clEnqueueCopyTensor
 
 // TODO: add clEnqueueFillTensor?
 
-TODO: add command buffer variants for clEnqueue*Tensor.
+If *cl_khr_command_buffer* is is supported, then the following command
+buffer counterparts of the *clEnqueueTranslateFromTensor* and
+*clEnqueueTranslateToTensor* commands are available.
+
+[source,c]
+----
+cl_int clCommandTranslateFromTensorKHR(
+  cl_command_buffer_khr command_buffer,
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  const size_t* tensor_origin,
+  const size_t* mem_origin,
+  const size_t* region,
+  const size_t* mem_pitch,
+  cl_mem buffer,
+  void* host_ptr,
+  cl_uint num_sync_points_in_wait_list,
+  const cl_sync_point_khr* sync_point_wait_list,
+  cl_sync_point_khr* sync_point,
+  cl_mutable_command_khr* mutable_handle);
+----
+
+[source,c]
+----
+cl_int clCommandTranslateToTensorKHR(
+  cl_command_buffer_khr command_buffer,
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  const size_t* tensor_origin,
+  const size_t* mem_origin,
+  const size_t* region,
+  const size_t* mem_pitch,
+  cl_mem buffer,
+  const void* host_ptr,
+  cl_uint num_sync_points_in_wait_list,
+  const cl_sync_point_khr* sync_point_wait_list,
+  cl_sync_point_khr* sync_point,
+  cl_mutable_command_khr* mutable_handle);
+----
+
+* _command_buffer_ refers to valid command-buffer object.
+
+* For _command_queue_, _tensor_, _tensor_origin_, _mem_origin_,
+  _region_, _mem_pitch_, _buffer_ and _host_ptr_ parameters refer to
+  *clEnqueueTranslateFromTensor*.
+
+* For _num_sync_points_in_wait_list_, _sync_point_wait_list_,
+  _sync_point_, _mutable_handle_ parameters refer to
+  *clCommandCopyBufferKHR*.
+
+*clCommandTranslateFromTensorKHR* and *clCommandTranslateFromTensorKHR*
+returns CL_SUCCESS if the function is executed
+successfully. Otherwise, it returns one of the following errors:
+
+* CL_INVALID_COMMAND_QUEUE if _command_queue_ is not NULL.
+
+* CL_INVALID_COMMAND_BUFFER_KHR if _command_buffer_ is not a valid
+  command-buffer.
+
+* CL_INVALID_CONTEXT if the context associated with _command_queue_
+  and _command_buffer_ is not the same.
+
+* CL_INVALID_OPERATION if _command_buffer_ has been finalized.
+
+* CL_INVALID_VALUE if _mutable_handle_ is not NULL.
+
+* CL_INVALID_SYNC_POINT_WAIT_LIST_KHR if _sync_point_wait_list_ is
+  NULL and _num_sync_points_in_wait_list_ is > 0, or
+  _sync_point_wait_list_ is not NULL and _num_sync_points_in_wait_list_ is
+  0, or if synchronization-point objects in _sync_point_wait_list_ are
+  not valid synchronization-points.
+
+* CL_OUT_OF_RESOURCES if there is a failure to allocate resources
+  required by the OpenCL implementation on the device.
+
+* CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+  required by the OpenCL implementation on the host.
 
 ==== Add New Buffer Property in Section 5.2.1
 
@@ -580,17 +703,17 @@ std::vector<float> out_data(b * m * n);
 // optimal data layout.
 clEnqueueTranslateToTensor(
   cmd_q, in0, false, {0, 0, 0}, {0, 0, 0}, {b, m, k},
-  nullptr, nullptr, nullptr, in0_data.data(), 0, nullptr, nullptr);
+  nullptr, nullptr, in0_data.data(), 0, nullptr, nullptr);
 clEnqueueTranslateToTensor(
   cmd_q, in1, false, {0, 0, 0}, {0, 0, 0}, {b, k, n},
-  nullptr, nullptr, nullptr, in1_data.data(), 0, nullptr, nullptr);
+  nullptr, nullptr, in1_data.data(), 0, nullptr, nullptr);
 clEnqueueNDRangeKernel(
   cmd_q, matmul_kernel, 3, matmul_grid, nullptr, nullptr, 0, nullptr, nullptr);
 clEnqueueNDRangeKernel(
   cmd_q, add_kernel, 3, add_grid, nullptr, nullptr, 0, nullptr, nullptr);
 clEnqueueTranslateFromTensor(
   cmd_q, out, false,  {0, 0, 0}, {0, 0, 0}, {b, m, n},
-  nullptr, nullptr, nullptr, out_data.data(), 0, nullptr, nullptr);
+  nullptr, nullptr, out_data.data(), 0, nullptr, nullptr);
 ----
 
 An example use of tensors in a command buffer when cl_khr_command_buffer
@@ -642,10 +765,10 @@ cl_command_buffer_khr cb =
 cl_sync_point_khr in0_syncp, in1_syncp, matmul_syncp, add_syncp;
 clCommandTranslateToTensorKHR(
   cmd_b, cmd_q, in0, {0, 0, 0}, {0, 0, 0}, {b, m, k},
-  nullptr, nullptr, nullptr, in0_data.data(), 0, nullptr, &in0_syncp);
+  nullptr, nullptr, in0_data.data(), 0, nullptr, &in0_syncp);
 clCommandTranslateToTensorKHR(
   cmd_b, cmd_q, in1, {0, 0, 0}, {0, 0, 0}, {b, k, m},
-  nullptr, nullptr, nullptr, in1_data.data(), 0, nullptr, &in1_syncp);
+  nullptr, nullptr, in1_data.data(), 0, nullptr, &in1_syncp);
 clCommandNDRangeKernelKHR(
   cmd_b, cmd_q, nullptr, matmul_kernel, 3, matmul_grid, nullptr, nullptr,
   2, {in0_syncp, in2_syncp}, &matmul_syncp, nullptr);
@@ -654,7 +777,7 @@ clCommandNDRangeKernelKHR(
   1, {matmul_syncp}, &add_syncp, nullptr);
 clCommandTranslateFromTensorKHR(
   cmd_b, cmd_q, out, {0, 0, 0}, {0, 0, 0}, {b, k, m},
-  nullptr, nullptr, nullptr, out_data.data(), 1, {add_syncp}, nullptr);
+  nullptr, nullptr, out_data.data(), 1, {add_syncp}, nullptr);
 
 // Finalize the command buffer. At this point the OpenCL
 // implementation may reserve enough storage for all the tensor
diff --git a/ext/cl_exp_tensor.html b/ext/cl_exp_tensor.html
index e86b703cd..7303a3729 100644
--- a/ext/cl_exp_tensor.html
+++ b/ext/cl_exp_tensor.html
@@ -507,22 +507,22 @@ <h4 id="_contributors">Contributors</h4>
 <h3 id="_overview">Overview</h3>
 <div class="paragraph">
 <p>The new tensor object enables applications to describe N-dimensional
-arrays whose memory layout is abstract to applications. The goal and
-intent of this extension is to give leverage for:</p>
+arrays whose memory layout is opaque to applications. The goals
+of this extension are the following:</p>
 </div>
 <div class="ulist">
 <ul>
 <li>
-<p>implementations to have freedom of placement data of the tensors for
+<p>Enable implementations to have freedom of placement data of the tensors for
 improving performance of the kernels which use them. This extension
-should be designed so it allows implementations to determine optimal
+is designed such it allows implementations to determine optimal
 memory layouts for the tensors based on their use cases for
-increasing performance - for example, by analyzing kernels’ access
-patterns - or, in case of built-in kernels, by inspecting tensor
+increased performance, by means of, for example, analyzing kernels’ access
+patterns or, in case of built-in kernels, by inspecting the tensor
 arguments they operate on.</p>
 </li>
 <li>
-<p>reduce details and boilerplate needed for porting performant
+<p>Reduce details and boilerplate needed for performance portable implementation of
 applications by being less dependent on platform or device specifics
 on the memory layout / data arrangements which matters for
 performance. Such specifics may include:</p>
@@ -547,7 +547,7 @@ <h3 id="_overview">Overview</h3>
 </li>
 <li>
 <p>arrangement of data into rows separated by a stride in order to
-avoid back conflicts in GPUs.</p>
+avoid bank conflicts in GPUs.</p>
 </li>
 </ul>
 </div>
@@ -555,9 +555,9 @@ <h3 id="_overview">Overview</h3>
 </ul>
 </div>
 <div class="paragraph">
-<p>The tensor data type is deemed to be effective with command buffers
-and built-in kernels - including kernels to be provided by defined
-built-in kernel (cl_khr_defined_builtin_kernels) extension under work.</p>
+<p>The tensor data type is designed to be efficiently used together with command buffers (cl_khr_command_buffers)
+and built-in kernels, including kernels to be provided by the Defined
+Built-in Kernels (cl_khr_defined_builtin_kernels) extension that is being prepared together with this extension.</p>
 </div>
 </div>
 <div class="sect2">
@@ -565,13 +565,13 @@ <h3 id="_modifications_to_opencl">Modifications to OpenCL</h3>
 <div class="sect3">
 <h4 id="_new_section_5_x_tensor_objects">New Section: 5.x Tensor Objects</h4>
 <div class="paragraph">
-<p>A tensor object stores a N-dimensional array of elements. The memory
+<p>A tensor object stores an N-dimensional array of elements. The memory
 layout of the tensor is opaque to the application. When a tensor
-object is created it initially does not have storage where the
-elements of the tensor are stored into. A storage is bind to a tensor
+object is created it is initially not associated to any storage for the tensor elements.
+ A storage is bound to a tensor
 by creating a memory buffer with CL_MEM_BIND_TO_BUFFER. Tensor objects
 without storage can be set as kernel arguments for kernels which
-accepts them. Kernels which have tensor arguments must have a storage
+accepts them. Kernels which have tensor arguments must have storage
 assigned to them prior enqueuing the kernels for execution.</p>
 </div>
 </div>
@@ -1005,7 +1005,9 @@ <h4 id="_new_opencl_functions_added_to_tensor_objects_section">New OpenCL Functi
 <li>
 <p><em>mem_pitch</em> defines the length of each dimension in elements to be
 used for the memory region of <em>buffer</em> or <em>host_ptr</em>. The length of
-the array must be at least the rank of <em>tensor</em> minus one.</p>
+the array must be at least the rank of <em>tensor</em> minus one. if
+<em>mem_pitch</em> is NULL or <em>mem_pitch</em>[i] is zero, <em>mem_pitch</em>[i] is
+computed as <em>region</em>[i + 1].</p>
 </li>
 <li>
 <p><em>buffer</em> and <em>host_ptr</em> refer to a valid buffer object / host
@@ -1074,7 +1076,8 @@ <h4 id="_new_opencl_functions_added_to_tensor_objects_section">New OpenCL Functi
 <pre class="highlight"><code class="language-c" data-lang="c">size_t pitch(size_t dim) {
   size_t pitch = 1;
   for (size_t i = dim; i &lt; tensor_rank - 1; i++)
-    pitch *= mem_pitch != NULL ? mem_pitch[i] : region[i + 1];
+    pitch *=
+      (mem_pitch != NULL || mem_pitch[i] == 0) ? mem_pitch[i] : region[i + 1];
   return pitch;
 }</code></pre>
 </div>
@@ -1086,7 +1089,165 @@ <h4 id="_new_opencl_functions_added_to_tensor_objects_section">New OpenCL Functi
 storage addresses is unspecified.</p>
 </div>
 <div class="paragraph">
-<p>TODO: add command buffer variants for clEnqueue*Tensor.</p>
+<p><strong>clEnqueueTranslateFsomTensor</strong> and <strong>clEnqueueTranslateToTensor</strong>
+returns CL_SUCCESS if the function is executed
+successfully. Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
+command-queue.</p>
+</li>
+<li>
+<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em>
+and buffer are not the same or if the context associated with
+<em>command_queue</em> and events in <em>event_wait_list</em> are not the same.</p>
+</li>
+<li>
+<p>CL_INVALID_MEM_OBJECT if <em>buffer</em> is not a valid buffer object.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>tensor_origin</em> or <em>mem_origin</em> is NULL.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if the region being read or written specified by
+(<em>mem_origin</em>, <em>region</em>, <em>mem_pitch</em>) is out of bounds.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if any <em>region</em> array element is 0.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>mem_pitch</em> is not NULL and <em>mem_pitch</em>[i] is
+not 0 and <em>mem_pitch</em>[i] is less than <em>region</em>[i].</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>buffer</em> and <em>host_ptr</em> both are NULL or non-NULL.</p>
+</li>
+<li>
+<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is NULL and
+<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not NULL and
+<em>num_events_in_wait_list</em> is 0, or if event objects in
+<em>event_wait_list</em> are not valid events.</p>
+</li>
+<li>
+<p>CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the read and write
+operations are blocking and the execution status of any of the
+events in <em>event_wait_list</em> is a negative integer value.</p>
+</li>
+<li>
+<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
+memory for data store associated with memory object the <em>tensor</em> is
+bound to.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources
+required by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>If <strong>cl_khr_command_buffer</strong> is is supported, then the following command
+buffer counterparts of the <strong>clEnqueueTranslateFromTensor</strong> and
+<strong>clEnqueueTranslateToTensor</strong> commands are available.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clCommandTranslateFromTensorKHR(
+  cl_command_buffer_khr command_buffer,
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  const size_t* tensor_origin,
+  const size_t* mem_origin,
+  const size_t* region,
+  const size_t* mem_pitch,
+  cl_mem buffer,
+  void* host_ptr,
+  cl_uint num_sync_points_in_wait_list,
+  const cl_sync_point_khr* sync_point_wait_list,
+  cl_sync_point_khr* sync_point,
+  cl_mutable_command_khr* mutable_handle);</code></pre>
+</div>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clCommandTranslateToTensorKHR(
+  cl_command_buffer_khr command_buffer,
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  const size_t* tensor_origin,
+  const size_t* mem_origin,
+  const size_t* region,
+  const size_t* mem_pitch,
+  cl_mem buffer,
+  const void* host_ptr,
+  cl_uint num_sync_points_in_wait_list,
+  const cl_sync_point_khr* sync_point_wait_list,
+  cl_sync_point_khr* sync_point,
+  cl_mutable_command_khr* mutable_handle);</code></pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>command_buffer</em> refers to valid command-buffer object.</p>
+</li>
+<li>
+<p>For <em>command_queue</em>, <em>tensor</em>, <em>tensor_origin</em>, <em>mem_origin</em>,
+<em>region</em>, <em>mem_pitch</em>, <em>buffer</em> and <em>host_ptr</em> parameters refer to
+<strong>clEnqueueTranslateFromTensor</strong>.</p>
+</li>
+<li>
+<p>For <em>num_sync_points_in_wait_list</em>, <em>sync_point_wait_list</em>,
+<em>sync_point</em>, <em>mutable_handle</em> parameters refer to
+<strong>clCommandCopyBufferKHR</strong>.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p><strong>clCommandTranslateFromTensorKHR</strong> and <strong>clCommandTranslateFromTensorKHR</strong>
+returns CL_SUCCESS if the function is executed
+successfully. Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not NULL.</p>
+</li>
+<li>
+<p>CL_INVALID_COMMAND_BUFFER_KHR if <em>command_buffer</em> is not a valid
+command-buffer.</p>
+</li>
+<li>
+<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em>
+and <em>command_buffer</em> is not the same.</p>
+</li>
+<li>
+<p>CL_INVALID_OPERATION if <em>command_buffer</em> has been finalized.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>mutable_handle</em> is not NULL.</p>
+</li>
+<li>
+<p>CL_INVALID_SYNC_POINT_WAIT_LIST_KHR if <em>sync_point_wait_list</em> is
+NULL and <em>num_sync_points_in_wait_list</em> is &gt; 0, or
+<em>sync_point_wait_list</em> is not NULL and <em>num_sync_points_in_wait_list</em> is
+0, or if synchronization-point objects in <em>sync_point_wait_list</em> are
+not valid synchronization-points.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources
+required by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
 </div>
 </div>
 <div class="sect3">
@@ -1299,17 +1460,17 @@ <h3 id="_sample_codes">Sample Codes</h3>
 // optimal data layout.
 clEnqueueTranslateToTensor(
   cmd_q, in0, false, {0, 0, 0}, {0, 0, 0}, {b, m, k},
-  nullptr, nullptr, nullptr, in0_data.data(), 0, nullptr, nullptr);
+  nullptr, nullptr, in0_data.data(), 0, nullptr, nullptr);
 clEnqueueTranslateToTensor(
   cmd_q, in1, false, {0, 0, 0}, {0, 0, 0}, {b, k, n},
-  nullptr, nullptr, nullptr, in1_data.data(), 0, nullptr, nullptr);
+  nullptr, nullptr, in1_data.data(), 0, nullptr, nullptr);
 clEnqueueNDRangeKernel(
   cmd_q, matmul_kernel, 3, matmul_grid, nullptr, nullptr, 0, nullptr, nullptr);
 clEnqueueNDRangeKernel(
   cmd_q, add_kernel, 3, add_grid, nullptr, nullptr, 0, nullptr, nullptr);
 clEnqueueTranslateFromTensor(
   cmd_q, out, false,  {0, 0, 0}, {0, 0, 0}, {b, m, n},
-  nullptr, nullptr, nullptr, out_data.data(), 0, nullptr, nullptr);</code></pre>
+  nullptr, nullptr, out_data.data(), 0, nullptr, nullptr);</code></pre>
 </div>
 </div>
 <div class="paragraph">
@@ -1362,10 +1523,10 @@ <h3 id="_sample_codes">Sample Codes</h3>
 cl_sync_point_khr in0_syncp, in1_syncp, matmul_syncp, add_syncp;
 clCommandTranslateToTensorKHR(
   cmd_b, cmd_q, in0, {0, 0, 0}, {0, 0, 0}, {b, m, k},
-  nullptr, nullptr, nullptr, in0_data.data(), 0, nullptr, &amp;in0_syncp);
+  nullptr, nullptr, in0_data.data(), 0, nullptr, &amp;in0_syncp);
 clCommandTranslateToTensorKHR(
   cmd_b, cmd_q, in1, {0, 0, 0}, {0, 0, 0}, {b, k, m},
-  nullptr, nullptr, nullptr, in1_data.data(), 0, nullptr, &amp;in1_syncp);
+  nullptr, nullptr, in1_data.data(), 0, nullptr, &amp;in1_syncp);
 clCommandNDRangeKernelKHR(
   cmd_b, cmd_q, nullptr, matmul_kernel, 3, matmul_grid, nullptr, nullptr,
   2, {in0_syncp, in2_syncp}, &amp;matmul_syncp, nullptr);
@@ -1374,7 +1535,7 @@ <h3 id="_sample_codes">Sample Codes</h3>
   1, {matmul_syncp}, &amp;add_syncp, nullptr);
 clCommandTranslateFromTensorKHR(
   cmd_b, cmd_q, out, {0, 0, 0}, {0, 0, 0}, {b, k, m},
-  nullptr, nullptr, nullptr, out_data.data(), 1, {add_syncp}, nullptr);
+  nullptr, nullptr, out_data.data(), 1, {add_syncp}, nullptr);
 
 // Finalize the command buffer. At this point the OpenCL
 // implementation may reserve enough storage for all the tensor
@@ -1412,7 +1573,7 @@ <h3 id="_open_questions">Open Questions</h3>
 <div class="paragraph">
 <p><strong>RESOLVED</strong>: OpenCL C support for tensors can be introduced later in a
             separate extension. Built-in kernels may benefit from this
-            extension.</p>
+            extension as it is.</p>
 </div>
 </div>
 </div>
@@ -1431,7 +1592,7 @@ <h3 id="_open_questions">Open Questions</h3>
 </div>
 <div id="footer">
 <div id="footer-text">
-Last updated 2023-11-15 11:19:22 +0200
+Last updated 2023-11-16 17:25:21 +0200
 </div>
 </div>
 </body>

From 57fda1a6ccdd2ebe186f974d3edda2d6b38d159f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.mikael.linjamaki@intel.com>
Date: Fri, 17 Nov 2023 12:21:20 +0200
Subject: [PATCH 18/18] * translate -> import/export

* Fix typos.
---
 ext/cl_exp_tensor.asciidoc | 38 ++++++++++++++++++------------------
 ext/cl_exp_tensor.html     | 40 +++++++++++++++++++-------------------
 2 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/ext/cl_exp_tensor.asciidoc b/ext/cl_exp_tensor.asciidoc
index ddf9cf03a..5f8ac60b3 100644
--- a/ext/cl_exp_tensor.asciidoc
+++ b/ext/cl_exp_tensor.asciidoc
@@ -290,7 +290,7 @@ object.
 
 [source,c]
 ----
-cl_int clEnqueueTranslateFromTensor(
+cl_int clEnqueueImportFromTensor(
   cl_command_queue command_queue,
   cl_tensor tensor,
   cl_bool blocking_command,
@@ -307,7 +307,7 @@ cl_int clEnqueueTranslateFromTensor(
 
 [source,c]
 ----
-cl_int clEnqueueTranslateToTensor(
+cl_int clEnqueueExportToTensor(
   cl_command_queue command_queue,
   cl_tensor tensor,
   cl_bool blocking_command,
@@ -376,10 +376,10 @@ cl_int clEnqueueTranslateToTensor(
   complete. If _event_wait_list_ and _event_ are not NULL, _event_
   must not refer to an element of the _event_wait_list_ array.
 
-The *clEnqueueTranslateToTensor* function copies contents of the buffer
+The *clEnqueueExportToTensor* function copies contents of the buffer
 object / host allocation to tensor's storage in
 implementation-defined, opaque memory layout. The
-*clEnqueueTranslateFromTensor* function copies data from tensor's
+*clEnqueueImportFromTensor* function copies data from tensor's
 storage to buffer object / host allocation.
 
 The elements of buffer object / host allocation are mapped to tensor
@@ -421,7 +421,7 @@ an abstract function that accesses a tensor element in its storage at
 given coordinate. The method how the coordinates translate to tensor
 storage addresses is unspecified.
 
-*clEnqueueTranslateFsomTensor* and *clEnqueueTranslateToTensor*
+*clEnqueueImportFromTensor* and *clEnqueueExportToTensor*
 returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:
 
@@ -469,13 +469,13 @@ successfully. Otherwise, it returns one of the following errors:
 
 // TODO: add clEnqueueFillTensor?
 
-If *cl_khr_command_buffer* is is supported, then the following command
-buffer counterparts of the *clEnqueueTranslateFromTensor* and
-*clEnqueueTranslateToTensor* commands are available.
+If *cl_khr_command_buffer* is supported, then the following command
+buffer counterparts of the *clEnqueueImportFromTensor* and
+*clEnqueueExportToTensor* commands are available.
 
 [source,c]
 ----
-cl_int clCommandTranslateFromTensorKHR(
+cl_int clCommandImportFromTensorKHR(
   cl_command_buffer_khr command_buffer,
   cl_command_queue command_queue,
   cl_tensor tensor,
@@ -493,7 +493,7 @@ cl_int clCommandTranslateFromTensorKHR(
 
 [source,c]
 ----
-cl_int clCommandTranslateToTensorKHR(
+cl_int clCommandExportToTensorKHR(
   cl_command_buffer_khr command_buffer,
   cl_command_queue command_queue,
   cl_tensor tensor,
@@ -513,13 +513,13 @@ cl_int clCommandTranslateToTensorKHR(
 
 * For _command_queue_, _tensor_, _tensor_origin_, _mem_origin_,
   _region_, _mem_pitch_, _buffer_ and _host_ptr_ parameters refer to
-  *clEnqueueTranslateFromTensor*.
+  *clEnqueueImportFromTensor*.
 
 * For _num_sync_points_in_wait_list_, _sync_point_wait_list_,
   _sync_point_, _mutable_handle_ parameters refer to
   *clCommandCopyBufferKHR*.
 
-*clCommandTranslateFromTensorKHR* and *clCommandTranslateFromTensorKHR*
+*clCommandImportFromTensorKHR* and *clCommandImportFromTensorKHR*
 returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:
 
@@ -701,17 +701,17 @@ std::vector<float> out_data(b * m * n);
 
 // Copies data into in0 tensor while possibly rearranging the data to the
 // optimal data layout.
-clEnqueueTranslateToTensor(
+clEnqueueExportToTensor(
   cmd_q, in0, false, {0, 0, 0}, {0, 0, 0}, {b, m, k},
   nullptr, nullptr, in0_data.data(), 0, nullptr, nullptr);
-clEnqueueTranslateToTensor(
+clEnqueueExportToTensor(
   cmd_q, in1, false, {0, 0, 0}, {0, 0, 0}, {b, k, n},
   nullptr, nullptr, in1_data.data(), 0, nullptr, nullptr);
 clEnqueueNDRangeKernel(
   cmd_q, matmul_kernel, 3, matmul_grid, nullptr, nullptr, 0, nullptr, nullptr);
 clEnqueueNDRangeKernel(
   cmd_q, add_kernel, 3, add_grid, nullptr, nullptr, 0, nullptr, nullptr);
-clEnqueueTranslateFromTensor(
+clEnqueueImportFromTensor(
   cmd_q, out, false,  {0, 0, 0}, {0, 0, 0}, {b, m, n},
   nullptr, nullptr, out_data.data(), 0, nullptr, nullptr);
 ----
@@ -763,10 +763,10 @@ cl_command_buffer_khr cb =
   clCreateCommandBufferKHR(num_queues, queue_list, nullptr, &err);
 
 cl_sync_point_khr in0_syncp, in1_syncp, matmul_syncp, add_syncp;
-clCommandTranslateToTensorKHR(
+clCommandExportToTensorKHR(
   cmd_b, cmd_q, in0, {0, 0, 0}, {0, 0, 0}, {b, m, k},
   nullptr, nullptr, in0_data.data(), 0, nullptr, &in0_syncp);
-clCommandTranslateToTensorKHR(
+clCommandExportToTensorKHR(
   cmd_b, cmd_q, in1, {0, 0, 0}, {0, 0, 0}, {b, k, m},
   nullptr, nullptr, in1_data.data(), 0, nullptr, &in1_syncp);
 clCommandNDRangeKernelKHR(
@@ -775,7 +775,7 @@ clCommandNDRangeKernelKHR(
 clCommandNDRangeKernelKHR(
   cmd_b, cmd_q, nullptr, add_kernel, 3, add_grid, nullptr, nullptr,
   1, {matmul_syncp}, &add_syncp, nullptr);
-clCommandTranslateFromTensorKHR(
+clCommandImportFromTensorKHR(
   cmd_b, cmd_q, out, {0, 0, 0}, {0, 0, 0}, {b, k, m},
   nullptr, nullptr, out_data.data(), 1, {add_syncp}, nullptr);
 
@@ -789,7 +789,7 @@ clFinalizeCommandBufferKHR(cmd_b);
 // Temporary tensors used in a command buffer can't be read or written
 // into. A hypothetical reason is that the finalized command buffer
 // might not use some of the tensor.
-assert(clEnqueueTranslateFromTensor(..., t0, ...) == CL_INVALID_OPERATION);
+assert(clEnqueueImportFromTensor(..., t0, ...) == CL_INVALID_OPERATION);
 ----
 
 === Open Questions ===
diff --git a/ext/cl_exp_tensor.html b/ext/cl_exp_tensor.html
index 7303a3729..ad5d348eb 100644
--- a/ext/cl_exp_tensor.html
+++ b/ext/cl_exp_tensor.html
@@ -939,7 +939,7 @@ <h4 id="_new_opencl_functions_added_to_tensor_objects_section">New OpenCL Functi
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="highlight"><code class="language-c" data-lang="c">cl_int clEnqueueTranslateFromTensor(
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clEnqueueImportFromTensor(
   cl_command_queue command_queue,
   cl_tensor tensor,
   cl_bool blocking_command,
@@ -956,7 +956,7 @@ <h4 id="_new_opencl_functions_added_to_tensor_objects_section">New OpenCL Functi
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="highlight"><code class="language-c" data-lang="c">cl_int clEnqueueTranslateToTensor(
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clEnqueueExportToTensor(
   cl_command_queue command_queue,
   cl_tensor tensor,
   cl_bool blocking_command,
@@ -1040,10 +1040,10 @@ <h4 id="_new_opencl_functions_added_to_tensor_objects_section">New OpenCL Functi
 </ul>
 </div>
 <div class="paragraph">
-<p>The <strong>clEnqueueTranslateToTensor</strong> function copies contents of the buffer
+<p>The <strong>clEnqueueExportToTensor</strong> function copies contents of the buffer
 object / host allocation to tensor&#8217;s storage in
 implementation-defined, opaque memory layout. The
-<strong>clEnqueueTranslateFromTensor</strong> function copies data from tensor&#8217;s
+<strong>clEnqueueImportFromTensor</strong> function copies data from tensor&#8217;s
 storage to buffer object / host allocation.</p>
 </div>
 <div class="paragraph">
@@ -1089,7 +1089,7 @@ <h4 id="_new_opencl_functions_added_to_tensor_objects_section">New OpenCL Functi
 storage addresses is unspecified.</p>
 </div>
 <div class="paragraph">
-<p><strong>clEnqueueTranslateFsomTensor</strong> and <strong>clEnqueueTranslateToTensor</strong>
+<p><strong>clEnqueueImportFromTensor</strong> and <strong>clEnqueueExportToTensor</strong>
 returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p>
 </div>
@@ -1151,13 +1151,13 @@ <h4 id="_new_opencl_functions_added_to_tensor_objects_section">New OpenCL Functi
 </ul>
 </div>
 <div class="paragraph">
-<p>If <strong>cl_khr_command_buffer</strong> is is supported, then the following command
-buffer counterparts of the <strong>clEnqueueTranslateFromTensor</strong> and
-<strong>clEnqueueTranslateToTensor</strong> commands are available.</p>
+<p>If <strong>cl_khr_command_buffer</strong> is supported, then the following command
+buffer counterparts of the <strong>clEnqueueImportFromTensor</strong> and
+<strong>clEnqueueExportToTensor</strong> commands are available.</p>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="highlight"><code class="language-c" data-lang="c">cl_int clCommandTranslateFromTensorKHR(
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clCommandImportFromTensorKHR(
   cl_command_buffer_khr command_buffer,
   cl_command_queue command_queue,
   cl_tensor tensor,
@@ -1175,7 +1175,7 @@ <h4 id="_new_opencl_functions_added_to_tensor_objects_section">New OpenCL Functi
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="highlight"><code class="language-c" data-lang="c">cl_int clCommandTranslateToTensorKHR(
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clCommandExportToTensorKHR(
   cl_command_buffer_khr command_buffer,
   cl_command_queue command_queue,
   cl_tensor tensor,
@@ -1199,7 +1199,7 @@ <h4 id="_new_opencl_functions_added_to_tensor_objects_section">New OpenCL Functi
 <li>
 <p>For <em>command_queue</em>, <em>tensor</em>, <em>tensor_origin</em>, <em>mem_origin</em>,
 <em>region</em>, <em>mem_pitch</em>, <em>buffer</em> and <em>host_ptr</em> parameters refer to
-<strong>clEnqueueTranslateFromTensor</strong>.</p>
+<strong>clEnqueueImportFromTensor</strong>.</p>
 </li>
 <li>
 <p>For <em>num_sync_points_in_wait_list</em>, <em>sync_point_wait_list</em>,
@@ -1209,7 +1209,7 @@ <h4 id="_new_opencl_functions_added_to_tensor_objects_section">New OpenCL Functi
 </ul>
 </div>
 <div class="paragraph">
-<p><strong>clCommandTranslateFromTensorKHR</strong> and <strong>clCommandTranslateFromTensorKHR</strong>
+<p><strong>clCommandImportFromTensorKHR</strong> and <strong>clCommandImportFromTensorKHR</strong>
 returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p>
 </div>
@@ -1458,17 +1458,17 @@ <h3 id="_sample_codes">Sample Codes</h3>
 
 // Copies data into in0 tensor while possibly rearranging the data to the
 // optimal data layout.
-clEnqueueTranslateToTensor(
+clEnqueueExportToTensor(
   cmd_q, in0, false, {0, 0, 0}, {0, 0, 0}, {b, m, k},
   nullptr, nullptr, in0_data.data(), 0, nullptr, nullptr);
-clEnqueueTranslateToTensor(
+clEnqueueExportToTensor(
   cmd_q, in1, false, {0, 0, 0}, {0, 0, 0}, {b, k, n},
   nullptr, nullptr, in1_data.data(), 0, nullptr, nullptr);
 clEnqueueNDRangeKernel(
   cmd_q, matmul_kernel, 3, matmul_grid, nullptr, nullptr, 0, nullptr, nullptr);
 clEnqueueNDRangeKernel(
   cmd_q, add_kernel, 3, add_grid, nullptr, nullptr, 0, nullptr, nullptr);
-clEnqueueTranslateFromTensor(
+clEnqueueImportFromTensor(
   cmd_q, out, false,  {0, 0, 0}, {0, 0, 0}, {b, m, n},
   nullptr, nullptr, out_data.data(), 0, nullptr, nullptr);</code></pre>
 </div>
@@ -1521,10 +1521,10 @@ <h3 id="_sample_codes">Sample Codes</h3>
   clCreateCommandBufferKHR(num_queues, queue_list, nullptr, &amp;err);
 
 cl_sync_point_khr in0_syncp, in1_syncp, matmul_syncp, add_syncp;
-clCommandTranslateToTensorKHR(
+clCommandExportToTensorKHR(
   cmd_b, cmd_q, in0, {0, 0, 0}, {0, 0, 0}, {b, m, k},
   nullptr, nullptr, in0_data.data(), 0, nullptr, &amp;in0_syncp);
-clCommandTranslateToTensorKHR(
+clCommandExportToTensorKHR(
   cmd_b, cmd_q, in1, {0, 0, 0}, {0, 0, 0}, {b, k, m},
   nullptr, nullptr, in1_data.data(), 0, nullptr, &amp;in1_syncp);
 clCommandNDRangeKernelKHR(
@@ -1533,7 +1533,7 @@ <h3 id="_sample_codes">Sample Codes</h3>
 clCommandNDRangeKernelKHR(
   cmd_b, cmd_q, nullptr, add_kernel, 3, add_grid, nullptr, nullptr,
   1, {matmul_syncp}, &amp;add_syncp, nullptr);
-clCommandTranslateFromTensorKHR(
+clCommandImportFromTensorKHR(
   cmd_b, cmd_q, out, {0, 0, 0}, {0, 0, 0}, {b, k, m},
   nullptr, nullptr, out_data.data(), 1, {add_syncp}, nullptr);
 
@@ -1547,7 +1547,7 @@ <h3 id="_sample_codes">Sample Codes</h3>
 // Temporary tensors used in a command buffer can't be read or written
 // into. A hypothetical reason is that the finalized command buffer
 // might not use some of the tensor.
-assert(clEnqueueTranslateFromTensor(..., t0, ...) == CL_INVALID_OPERATION);</code></pre>
+assert(clEnqueueImportFromTensor(..., t0, ...) == CL_INVALID_OPERATION);</code></pre>
 </div>
 </div>
 </div>
@@ -1592,7 +1592,7 @@ <h3 id="_open_questions">Open Questions</h3>
 </div>
 <div id="footer">
 <div id="footer-text">
-Last updated 2023-11-16 17:25:21 +0200
+Last updated 2023-11-17 12:20:18 +0200
 </div>
 </div>
 </body>

Tensor element data type	Description
CL_TENSOR_BOOL	1-bit signedless integer.
CL_TENSOR_INT8	8-bit signed integer.
CL_TENSOR_INT16	16-bit signed integer.
CL_TENSOR_INT32	32-bit signed integer.
CL_TENSOR_INT64	64-bit signed integer.
CL_TENSOR_UINT8	8-bit signed integer.
CL_TENSOR_UINT16	16-bit signed integer.
CL_TENSOR_UINT32	32-bit signed integer.
CL_TENSOR_UINT64	64-bit signed integer.
CL_TENSOR_HALF	Half precision floating-point value.
CL_TENSOR_BFLOAT16	16-bit brain floating-point value.
CL_TENSOR_FLOAT	Single precision floating-point value.
CL_TENSOR_DOUBLE	Double precision floating-point value.
CL_TENSOR_COMPLEX64	64-bit complex floating point value with + 32-bit real and imaginary part.
CL_TENSOR_COMPLEX128	128-bit complex floating point value with + 64-bit real and imaginary part.
CL_TENSOR_RANK	size_t	Return the tensor rank.
CL_TENSOR_SHAPE	size_t[]	Return the tensor shape.
CL_TENSOR_DTYPE	cl_tensor_type	Return the tensor data type.
CL_TENSOR_COMMAND_BUFFER_TEMPORARY	cl_bool	Return true if the +tensor is temporary tensor for command buffers.
CL_TENSOR_BOUND_TO_BUFFER	cl_bool	Return true if the tensor is +bound to a buffer. If CL_TENSOR_COMMAND_BUFFER_TEMPORARY is true, then +CL_TENSOR_BOUND_TO_BUFFER must return false.
CL_TENSOR_BUFFER	cl_mem	+ If CL_TENSOR_BOUND_TO_BUFFER is true, +return the buffer object the tensor is bound to. Otherwise, +clGetTensorInfo call returns: + + + + + CL_INVALID_MEM_OBJECT if the tensor is not bound to a buffer object. + + + CL_INVALID_PROPERTY otherwise. + + +
CL_TENSOR_CONTEXT	cl_context	Return the context specified when + the tensor object is created.
CL_TENSOR_REFERENCE_COUNT	cl_uint	Return the tensor reference +count.