From 42366d215ef02878c61d5ce57229feea83da7041 Mon Sep 17 00:00:00 2001 From: mweber-inventa Date: Sun, 14 May 2023 23:42:52 +0000 Subject: [PATCH] week 3 results --- week3/bbuy_products.json | 2 + week3/result notes.txt | 99 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 101 insertions(+) create mode 100644 week3/result notes.txt diff --git a/week3/bbuy_products.json b/week3/bbuy_products.json index 3ed22c5..4de1666 100644 --- a/week3/bbuy_products.json +++ b/week3/bbuy_products.json @@ -1,5 +1,7 @@ { "settings": { + "number_of_shards": 3, + "number_of_replicas": 0, "analysis": { "analyzer": { "smarter_hyphens": { diff --git a/week3/result notes.txt b/week3/result notes.txt new file mode 100644 index 0000000..3a07586 --- /dev/null +++ b/week3/result notes.txt @@ -0,0 +1,99 @@ +Week 3 + +Q1: Which node was elected as cluster manager? +The three nodes have the cluster_manager role: +"roles": [ + "cluster_manager", + "data", + "ingest", + "remote_cluster_client" + ] +And for the cluster: +"cluster": { + "initial_master_nodes": "opensearch-node1,opensearch-node2,opensearch-node3", + "name": "opensearch-cluster" + } + +But Node 2 is elected master: + +GET _cat/nodes +172.18.0.8 23 93 5 2.33 4.20 3.54 dimr cluster_manager,data,ingest,remote_cluster_client - opensearch-node3 +172.18.0.6 45 93 4 2.33 4.20 3.54 dimr cluster_manager,data,ingest,remote_cluster_client * opensearch-node2 +172.18.0.7 28 93 10 2.33 4.20 3.54 dimr cluster_manager,data,ingest,remote_cluster_client - opensearch-node1 + +Elastic docs: +master, m +(Default) Indicates whether the node is the elected master node. Returned values include * (elected master) and - (not elected master). + +Stop and force a master +docker stop opensearch-node2 + +Q2: After stopping the previous cluster manager, which node was elected the new cluster manager? +opensearch-node3 was selected as the new cluster manager + +GET _cat/nodes +172.18.0.8 44 90 14 1.29 2.56 3.00 dimr cluster_manager,data,ingest,remote_cluster_client * opensearch-node3 +172.18.0.7 49 90 13 1.29 2.56 3.00 dimr cluster_manager,data,ingest,remote_cluster_client - opensearch-node1 + +Q3: Did the cluster manager node change again? (was a different node elected as cluster manager when you started the node back up?) + +No, its still the same: opensearch-node3 + +GET _cat/nodes +172.18.0.6 25 91 6 2.04 2.57 2.93 dimr cluster_manager,data,ingest,remote_cluster_client - opensearch-node2 +172.18.0.8 13 91 4 2.04 2.57 2.93 dimr cluster_manager,data,ingest,remote_cluster_client * opensearch-node3 +172.18.0.7 11 91 4 2.04 2.57 2.93 dimr cluster_manager,data,ingest,remote_cluster_client - opensearch-node1 + + +Level 2 + +curl -k -X PUT -u admin:admin "https://localhost:9200/bbuy_products" -H 'Content-Type: application/json' -d @bbuy_products.json + +python index.py -s /workspace/datasets/product_data/products -w 8 -b 500 +INFO:Indexing /workspace/datasets/product_data/products to bbuy_products with 8 workers, refresh_interval of -1 to host localhost with a maximum number of docs sent per file per worker of 200000 and 500 per batch. +INFO:Done. 1275077 were indexed in 21.183495100050155 minutes. Total accumulated time spent in `bulk` indexing: 57.94000969593083 minutes + + +GET /_cat/shards/bbuy_products?v&s=shard,prirep +index shard prirep state docs store ip node +bbuy_products 0 p STARTED 417730 527.5mb 172.18.0.7 opensearch-node1 +bbuy_products 0 r STARTED 409512 502.8mb 172.18.0.6 opensearch-node2 +bbuy_products 0 r STARTED 402138 578.8mb 172.18.0.8 opensearch-node3 +bbuy_products 1 p STARTED 420736 565.4mb 172.18.0.6 opensearch-node2 +bbuy_products 1 r STARTED 410257 444.5mb 172.18.0.7 opensearch-node1 +bbuy_products 1 r STARTED 419590 683.6mb 172.18.0.8 opensearch-node3 +bbuy_products 2 p STARTED 410388 494.5mb 172.18.0.8 opensearch-node3 +bbuy_products 2 r STARTED 400496 466.8mb 172.18.0.7 opensearch-node1 +bbuy_products 2 r STARTED 401508 421.6mb 172.18.0.6 opensearch-node2 + + +Q4: How much faster was it to index the dataset with 0 replicas versus the previous time with 2 replica shards? +21 min vs 12 min + +INFO:Indexing /workspace/datasets/product_data/products to bbuy_products with 8 workers, refresh_interval of -1 to host localhost with a maximum number of docs sent per file per worker of 200000 and 500 per batch. +INFO:Done. 1275077 were indexed in 12.016417148399826 minutes. Total accumulated time spent in `bulk` indexing: 19.57734590168014 minutes + +Q5: Why was it faster? +Because it didn't have to make the copies from the primary shard to the replicas. + +curl -k -XPUT -u admin:admin 'https://localhost:9200/bbuy_products/_settings' -H 'Content-Type: application/json' -d '{"index": {"number_of_replicas": 2}}' + +Q6: How long did it take to create the new replica shards?  This will be the difference in time between those two log messages. + +It took 2 minutes. + +Q7: Those two messages were both logged by the cluster_manager. Why do you think the cluster manager is the node that logs these actions (versus non-manager nodes)? + +Because its function is to orchestrate the allocations within the cluster. It works as a source of truth for what happens in the cluster. + + +Level 3 + +python query.py -q /workspace/datasets/train.csv -w 4 -m 25000 + +Q8: Looking at the metrics dashboard, what queries/sec rate are you getting? + +152 queries/sec + +Q9: How does that compare to the max queries/sec rate you saw in week 2? +Week 2 was around 80 queries/sec. Almost double!