Skip to content

Broadcasting of functions with keyword arguments doesn't work with Julia v1.11 #693

@giordano

Description

@giordano
julia> using CUDA

julia> v = CUDA.ones(Float32, 2)
2-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 1.0
 1.0

julia> all(isapprox.(v, 1.01; rtol=0.01))
ERROR: GPU compilation of MethodInstance for (::GPUArrays.var"#gpu_broadcast_kernel_linear#47")(::KernelAbstractions.CompilerMetadata{…}, ::CuDeviceVector{…}, ::Base.Broadcast.Broadcasted{…}) failed
KernelError: passing non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, Base.Broadcast.var"#29#30"{Base.Pairs{Symbol, Float64, Tuple{Symbol}, @NamedTuple{rtol::Float64}}, typeof(isapprox)}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Float64}}, which is not a bitstype:
  .f is of type Base.Broadcast.var"#29#30"{Base.Pairs{Symbol, Float64, Tuple{Symbol}, @NamedTuple{rtol::Float64}}, typeof(isapprox)} which is not isbits.
    .kwargs is of type Base.Pairs{Symbol, Float64, Tuple{Symbol}, @NamedTuple{rtol::Float64}} which is not isbits.
      .itr is of type Tuple{Symbol} which is not isbits.
        .1 is of type Symbol which is not isbits.


Only bitstypes, which are "plain data" types that are immutable
and contain no references to other values, can be used in GPU kernels.
For more information, see the `Base.isbitstype` function.

Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/validation.jl:108
  [2] compile_unhooked(output::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/driver.jl:87
  [3] compile_unhooked
    @ ~/.julia/packages/GPUCompiler/OCZFZ/src/driver.jl:80 [inlined]
  [4] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/driver.jl:67
  [5] compile
    @ ~/.julia/packages/GPUCompiler/OCZFZ/src/driver.jl:55 [inlined]
  [6] #1188
    @ ~/.julia/packages/CUDA/724Sm/src/compiler/compilation.jl:250 [inlined]
  [7] JuliaContext(f::CUDA.var"#1188#1191"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/driver.jl:34
  [8] JuliaContext(f::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/driver.jl:25
  [9] compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/724Sm/src/compiler/compilation.jl:249
 [10] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/execution.jl:245
 [11] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/execution.jl:159
 [12] macro expansion
    @ ~/.julia/packages/CUDA/724Sm/src/compiler/execution.jl:373 [inlined]
 [13] macro expansion
    @ ./lock.jl:273 [inlined]
 [14] cufunction(f::GPUArrays.var"#gpu_broadcast_kernel_linear#47", tt::Type{Tuple{KernelAbstractions.CompilerMetadata{…}, CuDeviceVector{…}, Base.Broadcast.Broadcasted{…}}}; kwargs::@Kwargs{always_inline::Bool, maxthreads::Nothing})
    @ CUDA ~/.julia/packages/CUDA/724Sm/src/compiler/execution.jl:368
 [15] macro expansion
    @ ~/.julia/packages/CUDA/724Sm/src/compiler/execution.jl:112 [inlined]
 [16] (::KernelAbstractions.Kernel{…})(::CuArray{…}, ::Vararg{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
    @ CUDA.CUDAKernels ~/.julia/packages/CUDA/724Sm/src/CUDAKernels.jl:129
 [17] Kernel
    @ ~/.julia/packages/CUDA/724Sm/src/CUDAKernels.jl:115 [inlined]
 [18] _copyto!
    @ ~/.julia/packages/GPUArrays/3a5jB/src/host/broadcast.jl:71 [inlined]
 [19] copyto!
    @ ~/.julia/packages/GPUArrays/3a5jB/src/host/broadcast.jl:44 [inlined]
 [20] copy
    @ ~/.julia/packages/GPUArrays/3a5jB/src/host/broadcast.jl:29 [inlined]
 [21] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Nothing, Base.Broadcast.var"#29#30"{@Kwargs{rtol::Float64}, typeof(isapprox)}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, Float64}})
    @ Base.Broadcast ./broadcast.jl:872
 [22] top-level scope
    @ REPL[15]:1
Some type information was truncated. Use `show(err)` to see complete types.

(@v1.11) pkg> st -m CUDA GPUArrays GPUCompiler
Status `~/.julia/environments/v1.11/Manifest.toml`
⌃ [052768ef] CUDA v5.9.7
  [0c68f7d7] GPUArrays v11.4.1
  [61eb1bfa] GPUCompiler v1.8.2
Info Packages marked with ⌃ have new versions available and may be upgradable.

julia> versioninfo()
Julia Version 1.11.9
Commit 53a02c0720c (2026-02-06 00:27 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 22 × Intel(R) Core(TM) Ultra 7 155H
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 22 virtual cores)

This works with Julia v1.12. This seems to be a generic issue with keyword arguments, not specific to isapprox, another reproducer is

julia> using CUDA

julia> v = CUDA.ones(Float32, 2)
2-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 1.0
 1.0

julia> f(x; a=1) = true
f (generic function with 1 method)

julia> all(f.(v; a=0.01))
ERROR: GPU compilation of MethodInstance for (::GPUArrays.var"#gpu_broadcast_kernel_linear#47")(::KernelAbstractions.CompilerMetadata{…}, ::CuDeviceVector{…}, ::Base.Broadcast.Broadcasted{…}) failed
KernelError: passing non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, Base.Broadcast.var"#29#30"{Base.Pairs{Symbol, Float64, Tuple{Symbol}, @NamedTuple{a::Float64}}, typeof(f)}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, which is not a bitstype:
  .f is of type Base.Broadcast.var"#29#30"{Base.Pairs{Symbol, Float64, Tuple{Symbol}, @NamedTuple{a::Float64}}, typeof(f)} which is not isbits.
    .kwargs is of type Base.Pairs{Symbol, Float64, Tuple{Symbol}, @NamedTuple{a::Float64}} which is not isbits.
      .itr is of type Tuple{Symbol} which is not isbits.
        .1 is of type Symbol which is not isbits.


Only bitstypes, which are "plain data" types that are immutable
and contain no references to other values, can be used in GPU kernels.
For more information, see the `Base.isbitstype` function.

Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/validation.jl:108
  [2] compile_unhooked(output::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/driver.jl:87
  [3] compile_unhooked
    @ ~/.julia/packages/GPUCompiler/OCZFZ/src/driver.jl:80 [inlined]
  [4] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/driver.jl:67
  [5] compile
    @ ~/.julia/packages/GPUCompiler/OCZFZ/src/driver.jl:55 [inlined]
  [6] #1188
    @ ~/.julia/packages/CUDA/724Sm/src/compiler/compilation.jl:250 [inlined]
  [7] JuliaContext(f::CUDA.var"#1188#1191"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/driver.jl:34
  [8] JuliaContext(f::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/driver.jl:25
  [9] compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/724Sm/src/compiler/compilation.jl:249
 [10] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/execution.jl:245
 [11] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/execution.jl:159
 [12] macro expansion
    @ ~/.julia/packages/CUDA/724Sm/src/compiler/execution.jl:373 [inlined]
 [13] macro expansion
    @ ./lock.jl:273 [inlined]
 [14] cufunction(f::GPUArrays.var"#gpu_broadcast_kernel_linear#47", tt::Type{Tuple{KernelAbstractions.CompilerMetadata{…}, CuDeviceVector{…}, Base.Broadcast.Broadcasted{…}}}; kwargs::@Kwargs{always_inline::Bool, maxthreads::Nothing})
    @ CUDA ~/.julia/packages/CUDA/724Sm/src/compiler/execution.jl:368
 [15] macro expansion
    @ ~/.julia/packages/CUDA/724Sm/src/compiler/execution.jl:112 [inlined]
 [16] (::KernelAbstractions.Kernel{…})(::CuArray{…}, ::Vararg{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
    @ CUDA.CUDAKernels ~/.julia/packages/CUDA/724Sm/src/CUDAKernels.jl:129
 [17] Kernel
    @ ~/.julia/packages/CUDA/724Sm/src/CUDAKernels.jl:115 [inlined]
 [18] _copyto!
    @ ~/.julia/packages/GPUArrays/3a5jB/src/host/broadcast.jl:71 [inlined]
 [19] copyto!
    @ ~/.julia/packages/GPUArrays/3a5jB/src/host/broadcast.jl:44 [inlined]
 [20] copy
    @ ~/.julia/packages/GPUArrays/3a5jB/src/host/broadcast.jl:29 [inlined]
 [21] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Nothing, Base.Broadcast.var"#29#30"{@Kwargs{a::Float64}, typeof(f)}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}}})
    @ Base.Broadcast ./broadcast.jl:872
 [22] top-level scope
    @ REPL[21]:1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions