-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathdataMaid_test.Rmd
More file actions
154 lines (103 loc) · 6.26 KB
/
dataMaid_test.Rmd
File metadata and controls
154 lines (103 loc) · 6.26 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
dataMaid: yes
title: test
subtitle: "Autogenerated data summary from dataMaid"
date: 2018-05-18 12:11:11
output: pdf_document
documentclass: report
header-includes:
- \renewcommand{\chaptername}{Part}
- \newcommand{\fullline}{\noindent\makebox[\linewidth]{\rule{\textwidth}{0.4pt}}}
- \newcommand{\bminione}{\begin{minipage}{0.75 \textwidth}}
- \newcommand{\bminitwo}{\begin{minipage}{0.25 \textwidth}}
- \newcommand{\emini}{\end{minipage}}
---
```{r , echo=FALSE, include=FALSE, warning=FALSE, message=FALSE, error=FALSE}
library(ggplot2)
library(pander)
```
```{r visualFunctions, echo=FALSE, include=FALSE, warning=FALSE, message=FALSE, error=FALSE}
ggAggHist <- getFromNamespace("ggAggHist", "dataMaid")
ggAggBarplot <- getFromNamespace("ggAggBarplot", "dataMaid")
```
# Data report overview
The dataset examined has the following dimensions:
---------------------------------
Feature Result
------------------------ --------
Number of observations 500
Number of variables 1
---------------------------------
### Checks performed
The following variable checks were performed, depending on the data type of each variable:
----------------------------------------------------------------------------------------------------------------------------------
character factor labelled numeric integer logical Date
----------------------------------------------------- ----------- ---------- ---------- ---------- ---------- --------- ----------
Identify miscoded missing values $\times$ $\times$ $\times$ $\times$ $\times$ $\times$
Identify prefixed and suffixed whitespace $\times$ $\times$ $\times$
Identify levels with < 6 obs. $\times$ $\times$ $\times$
Identify case issues $\times$ $\times$ $\times$
Identify misclassified numeric or integer variables $\times$ $\times$ $\times$
Identify outliers (Turkish Boxplot style) $\times$ $\times$
Identify outliers $\times$
----------------------------------------------------------------------------------------------------------------------------------
Please note that all numerical values in the following have been rounded to 2 decimals.
# Summary table
----------------------------------------------------------------------------------
Variable class # unique values Missing observations Any problems?
-------- ---------------- ----------------- ---------------------- ---------------
[x] numeric 415 0.20 % $\times$
----------------------------------------------------------------------------------
# Variable list
## x
\bminione
---------------------------------------
Feature Result
------------------------- -------------
Variable type numeric
Number of missing obs. 1 (0.2 %)
Number of unique values 414
Median 10.22
1st and 3rd quartiles 9.91; 10.47
Min. and max. -Inf; 14
---------------------------------------
\emini
\bminitwo
```{r Var-1-x, echo=FALSE, fig.width=4, fig.height=3, message=FALSE, warning=FALSE}
ggAggHist(data = structure(list(factorV = structure(1:20, .Label = c("[3.17,3.71]",
"(3.71,4.25]", "(4.25,4.79]", "(4.79,5.34]", "(5.34,5.88]", "(5.88,6.42]",
"(6.42,6.96]", "(6.96,7.5]", "(7.5,8.04]", "(8.04,8.58]", "(8.58,9.12]",
"(9.12,9.67]", "(9.67,10.2]", "(10.2,10.7]", "(10.7,11.3]", "(11.3,11.8]",
"(11.8,12.4]", "(12.4,12.9]", "(12.9,13.5]", "(13.5,14]"), class = "factor"),
Freq = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 4L, 58L,
184L, 190L, 35L, 16L, 7L, 2L, 0L, 1L), xmin = c(3.16992500144231,
3.71127890213383, 4.25263280282536, 4.79398670351688, 5.33534060420841,
5.87669450489993, 6.41804840559146, 6.95940230628298, 7.50075620697451,
8.04211010766603, 8.58346400835755, 9.12481790904908, 9.6661718097406,
10.2075257104321, 10.7488796111237, 11.2902335118152, 11.8315874125067,
12.3729413131982, 12.9142952138897, 13.4556491145813), xmax = c(3.71127890213383,
4.25263280282536, 4.79398670351688, 5.33534060420841, 5.87669450489993,
6.41804840559146, 6.95940230628298, 7.50075620697451, 8.04211010766603,
8.58346400835755, 9.12481790904908, 9.6661718097406, 10.2075257104321,
10.7488796111237, 11.2902335118152, 11.8315874125067, 12.3729413131982,
12.9142952138897, 13.4556491145813, 13.9970030152728), ymin = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
ymax = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 4L, 58L,
184L, 190L, 35L, 16L, 7L, 2L, 0L, 1L)), .Names = c("factorV",
"Freq", "xmin", "xmax", "ymin", "ymax"), row.names = c(NA, -20L
), class = "data.frame"), vnam = "x")
```
\emini
- The following suspected missing value codes enter as regular values: \"-Inf\".
- Note that a check function found the following problematic values: \"-Inf\", \"3.17\", \"8.84\", \"8.94\", \"9.02\", \"11.35\", \"11.4\", \"11.41\", \"11.41\", \"11.47\" (21 additional values omitted).
\fullline
Report generation information:
* Created by Anne Helby Petersen (username: `zms499`).
* Report creation time: fr maj 18 2018 12:11:15
* Report Was run from directory: `P:/PCADSC/R`
* dataMaid v1.1.2 [Pkg: 2018-05-03 from CRAN (R 3.4.4)]
* R version 3.4.2 (2017-09-28).
* Platform: x86_64-w64-mingw32/x64 (64-bit)(Windows 7 x64 (build 7601) Service Pack 1).
* Function call: `makeDataReport(data = test, checks = setChecks(numeric = defaultNumericChecks(add = "identifyOutliersTBStyle",
remove = "identifyOutliers"), integer = defaultIntegerChecks(add = "identifyOutliersTBStyle",
remove = "identifyOutliers")))`