This document specifies the format of an Apache Accumulo AccessExpression. An AccessExpression is an encoding of a boolean expression that defines the attributes an entity requires to access specific data.
- AccessExpression - A boolean expression detailing the attributes needed to access an object (e.g. Key/Value pair in Accumulo).
- Authorizations - A set of attributes, typically attributed to the entity trying to access an object.
- AccessEvaluator - An object that determines if an entity can access an object based on the entity's Authorizations and the object's AccessExpression.
The formal definition of the AccessExpression UTF-8 string representation is provided by the following ABNF:
access-expression = [expression] ; empty string is a valid access expression
expression = (access-token / paren-expression) [and-expression / or-expression]
paren-expression = "(" expression ")"
and-expression = "&" (access-token / paren-expression) [and-expression]
or-expression = "|" (access-token / paren-expression) [or-expression]
access-token = 1*( ALPHA / DIGIT / "_" / "-" / "." / ":" / slash )
access-token =/ DQUOTE 1*(unicode-subset / escaped) DQUOTE
unicode-subset = %x00-21 / %x23-5B / %x5D-7F / unicode-beyond-ascii ; unicode minus '"' and '\'
escaped = "\" DQUOTE / "\\"
slash = "/"Authorizations must be Unicode characters. Not all Unicode characters are human readable or even visible (see Unicode control characters), implementations should provide a way to limit valid authorizations to a subset of unicode characters (like human-readable characters).
BLUERED&BLUERED&BLUE&GREEN(RED&BLUE)|(GREEN&(PINK|PURPLE))
&BLUE: Must start with an access token or a paren expression.(RED&BLUE)|: An access token or paren expression must follow a|.RED&BLUE|GREEN: Once a&is seen, then can only have&and not|, unless using parenthesis.RED|BLUE&GREEN: Once a|is seen, then can only have|and not&, unless using parenthesis.
An access expression or authorization must be a Unicode string. Serialization of an access expression or authorization should use UTF-8.
The evaluation process combines set existence checks with boolean algebra. Specifically, AccessExpressions use:
- The symbol
&for logical conjunction (∧in boolean algebra). - The symbol
|for logical disjunction (∨in boolean algebra).
When evaluating an AccessExpression, existence checks are done against an entities Authorizations. The following is the algorithm for evaluation of an AccessExpression.
- For each access-token in an AccessExpression check if it exists in the
entities Authorizations. Replace the access-token with
trueif it exists in the set andfalseotherwise. - Evaluate the resulting expression using boolean algebra. If the result is true, the entity can access the data associated with the AccessExpression.
The following is an example of evaluating the AccessExpression
RED&(BLUE|GREEN) using boolean algebra for an entity with the Authorizations
{RED,GREEN}. In the example below RED ∈ {RED,GREEN} translates to does
RED exist in the set {RED,GREEN} which it does, so it is true.
- RED ∈ {RED,GREEN} ∧ ( BLUE ∈ {RED,GREEN} ∨ GREEN ∈ {RED,GREEN} )
- true ∧ ( false ∨ true )
Since true ∧ ( false ∨ true ) is true then the entity with Authorizations of
{RED,GREEN} can access data labeled with the AccessExpression
RED&(BLUE|GREEN). The AccessExpression (RED&BLUE)|(GREEN&PINK) is an
example of an AccessExpression that is false for an entity with Authorizations of
{RED,GREEN} and it would look like the following using boolean algebra.
- ( RED ∈ {RED,GREEN} ∧ BLUE ∈ {RED,GREEN} ) ∨ ( GREEN ∈ {RED,GREEN} ∧ PINK ∈ {RED,GREEN} )
- ( true ∧ false ) ∨ ( true ∧ false )
An entity with empty Authorizations can only access data associated with an empty access expression. This is because an empty AccessExpression always evaluates to true.
Access tokens can only contain alphanumeric characters or the characters
_,-,.,:, or / unless quoted using ". Within quotes, the characters
" and \ must escaped by prefixing them with \. For example, to use abc\xyz as
an access-token, it would need to be quoted and escaped like "abc\\xyz". When
checking if an access-token exists in the entities Authorizations, it must
be unquoted and unescaped.
Evaluating "abc!12"&"abc\\xyz"&GHI for an entity with Authorizations of
{abc\xyz,abc!12} looks like the following in boolean algebra which evaluates
to false.
- abc!12 ∈ {abc\xyz,abc!12} ∧ abc\xyz ∈ {abc\xyz,abc!12} ∧ GHI ∈ {abc\xyz,abc!12}
- true ∧ true ∧ false
It's important to note that when verifying the existence of "abc\xyz" in the set of authorizations
within the Authorizations object, the token is unquoted, and the \ character is unescaped.