Skip to content

Latest commit

 

History

History

An EMR Cluster launch example that

  • Associates the EMR Cluster with a VPC Elastic IP Address (more details).
  • Ensures that the cluster can run spark programs; via steps.json.
  • Enables all the settings required for Amazon Athena interactions; via configurations.json.

  aws emr create-cluster \
  --applications Name=Hadoop Name=Hive Name=Pig Name=Spark Name=Presto \
  --ec2-attributes file://ec2attributes.json  \
  --steps file://steps.json \
  --release-label emr-6.3.0 \
  --log-uri 's3n://.../logs/emr/' \
  --bootstrap-actions file://bootstrap.json \
  --instance-groups file://instancegroups.json \
  --configurations file://configurations.json \
  --no-visible-to-all-users \
  --service-role EMR_DefaultRole \
  --security-configuration 'Secure EMR S3 Access' \
  --enable-debugging \
  --name 'Prototype' \
  --region eu-west-1 \
  --ebs-root-volume-size 65 \
  --tags 'ClusterFunction=Signals'



The Spark programs are outlined within steps.json. Whilst testing programs/packages, setting

"ActionOnFailure": "CONTINUE"

within steps.json gives us the wherewithal to tunnel into a cluster's master node via

ssh -i KeyNameString.pem hadoop@ec2-xx-xx-xx-xx.xx-xx-x.compute.amazonaws.com

and hence investigate & address program/package problems.