Skip to content

Test fail on GTX 1080Ti with CUDA_ERROR_OUT_OF_MEMORY  #83

@Jiede1

Description

@Jiede1

The environment is Centos7.4 with Cuda9.0 and one GeForce GTX 1080Ti.

- Run map + reduce on datasets with 100,000,000 elements - multiple partitions
- Run map + map + reduce on datasets - multiple partitions
- Run map + map + map + collect on datasets
- Run map + map + map + reduce on datasets - multiple partitions
- Run map on dataset with a single primitive array column
- Run map with free variables on dataset with a single primitive array column
- Run reduce on dataset with a single primitive array column
- Run map & reduce on a single primitive array in a structure *** FAILED ***
  jcuda.CudaException: CUDA_ERROR_OUT_OF_MEMORY
  at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:312)
  at jcuda.driver.JCudaDriver.cuCtxCreate(JCudaDriver.java:1444)
  at com.ibm.gpuenabler.GPUSparkEnv$.get(GPUSparkEnv.scala:143)
  at com.ibm.gpuenabler.CUDADSFunctionSuite$$anonfun$47.apply$mcV$sp(CUDADSFunctionSuite.scala:743)
  at com.ibm.gpuenabler.CUDADSFunctionSuite$$anonfun$47.apply(CUDADSFunctionSuite.scala:740)
  at com.ibm.gpuenabler.CUDADSFunctionSuite$$anonfun$47.apply(CUDADSFunctionSuite.scala:740)
  at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  ...
- Run logistic regression *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 1.0 failed 1 times, most recent failure: Lost task 5.0 in stage 1.0 (TID 13, localhost, executor driver): jcuda.CudaException: CUDA_ERROR_INVALID_CONTEXT
        at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:312)
        at jcuda.driver.JCudaDriver.cuModuleLoadData(JCudaDriver.java:2014)
        at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:102)
        at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:87)
        at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
        at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
        at com.ibm.gpuenabler.CUDAManager.cachedLoadModule(CUDAManager.scala:87)
        at com.ibm.gpuenabler.CUDAManager.getModule(CUDAManager.scala:62)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$JCUDAIteratorImpl.processGPU(Unknown Source)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$JCUDAIteratorImpl.hasNext(Unknown Source)
        at com.ibm.gpuenabler.MAPGPUExec$$anonfun$doExecute$1.apply(CUDADSUtils.scala:152)
        at com.ibm.gpuenabler.MAPGPUExec$$anonfun$doExecute$1.apply(CUDADSUtils.scala:73)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions