CodeBook

This codebook describes how the data provided are being used and processed.

Introduction

The script run_analysis.R contains one functi on called run_analysis. This function is intended to be used with the set of datasets collected from the accelerometers from the Samsung Galaxy S smartphone. A full description is available at the site where the data was obtained:

http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

Here are the data for the project to be used in order to fulfiill the project:

https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

The script was designed with the given dataset in mind but might also be used with a different one, as long as is has the same structure and is placed in the same set of files in the same directories (note that this has never been tested).

function run_analysis (path="UCI HAR Dataset")

This function reads the files from the "UCI HAR Dataset", and performs the merging, reshaping and aggregation.

Args:

path: a character value which points to the path, where the extracted dataset can be found. This can either be an absolute or a relative path. If a relative path is being used, make sure that current working directory is the one you expect. The default is the relative path "UCI HAR Dataset".

Returns:

A data frame based on the original training and test data sets, reduced by the "std()" and "mean()" variables, grouped by the activity label and subject, aggregated by the mean of each variable (excluding the grouped by ones).

Files being accessed

The .zip provides a lot of data files which contains

the raw data observed
a preprocessed version
several explanation files
some "master data" files (e.g. for the coding of the activities)

The function makes use of the following files (relative to the extracted .zip archive):

activity_labels.txt
features.txt
train/X_train.txt
train/y_train.txt
train/subject_train.txt
test/X_test.txt
test/y_test.txt
test/subject_test.txt

(for a detailed description of the data format and content refer to the documentation at http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones) In all cases the data are being read my read.table with a blank separator.

Step 1: Merges the training and the test sets to create one data set.

The observed data from the training data set X_train.txt and test data set X_test.txt are being read into the variabled df.train and df.test respectively. The union of both data frames is being generated by rbind and stored in the variable df.all. Note that this implies that the data in the two dataset have the same structure and the variables / columns have the same order.

Step 2: Extracts only the measurements on the mean and standard deviation for each measurement.

The features which represents the measurement names are read in as a data frame from the features.txtand stored in the variable df.features. This dataset is being filtered by features (second variable in the data frame) which ends with -std() or -mean() and is stored in df.features.stdAndMean. It mainly contains the set of features and positions which represents the standard deviations and mean values.

The first variable of the df.features.stdAndMean which represents the required variable indexes in the df.all is being used in order to remove the not needed variabled from df.all. The resulting data frame is being stored in df.meanAndStd.

Step 3: Uses descriptive activity names to name the activities in the data set

The activities for the training and test data set are being read from y_train.txtand y_test.txt, stored in df.train_activity and df.test_activity. The union of both is being generated by rbind and stored in df.activity.

The master data labels for the activities are being read from the activity_labels.txtand stored in the df.activity_lables. Based on the df.activity data frame (which has just on variable) which represents the activities for all the data sets, the vector all.activities with all the mapped label is being generated by using the activity variable as an index into the df.activity_lables. Note the code implies that the data in the df.activity_lables are being order by the activity (i.e. the numeric id). No explicit ordering was performed.

Step 4: Appropriately labels the data set with descriptive variable names.

The automatic generated column names in df.meanAndStd are being replaced by the (filtered) features in df.features.stdAndMean. It assumes that the order in the filtered data frame is still the same as in the df.meanAndStd.

Step 5: From the data set in step 4, creates a second, independent tidy data set with the

average of each variable for each activity and each subject.

The subjects of the test and training data set are being read from subject_train.txt and subject_test.txt and stored in df.train_subject and df.test_subject. The union of both is generated by rbind and the first (and only) column as being stored in all.subject as a vector.

The filtered data set in df.meanAndStd is being grouped by the activities and subjects vectors by using the aggregate function. The columns in the data frames are aggregated by the mean function and the result is being stored in the data frame df.result. The the automatically added columns for the group by steps are renamed by reasonable names.

Finally the df.result dataset is being returned by the function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CodeBook

Introduction

function run_analysis (path="UCI HAR Dataset")

Files being accessed

Step 1: Merges the training and the test sets to create one data set.

Step 2: Extracts only the measurements on the mean and standard deviation for each measurement.

Step 3: Uses descriptive activity names to name the activities in the data set

Step 4: Appropriately labels the data set with descriptive variable names.

Step 5: From the data set in step 4, creates a second, independent tidy data set with the

FilesExpand file tree

CodeBook.md

Latest commit

History

CodeBook.md

File metadata and controls

CodeBook

Introduction

function run_analysis (path="UCI HAR Dataset")

Files being accessed

Step 1: Merges the training and the test sets to create one data set.

Step 2: Extracts only the measurements on the mean and standard deviation for each measurement.

Step 3: Uses descriptive activity names to name the activities in the data set

Step 4: Appropriately labels the data set with descriptive variable names.

Step 5: From the data set in step 4, creates a second, independent tidy data set with the