Authorization in microservice architecture, P3: The Policy Language

The DSL for a new Authorization service

Jul 18, 2024

The policy language

Fine-grained authorization systems often utilize an expression language or Domain-Specific Language (DSL) to help customers and engineers manage policies more easily. This also provides a common language for all stakeholders, including project managers, developers, InfoSec, and audit teams, ensuring everyone has the same understanding.

Unlike general-purpose languages like Python or Java, DSLs are designed for a specific field or area of expertise. These languages are tailored to the concepts and rules of that domain, making them easier to use and understand for people working in this field. In the authorization domain, this DSL is known as a policy language.

The policy language uses dynamic boolean functions to decide if an action is allowed, based on the organization's security rules. These rules expresses how a user can perform an action on a given resource but doesn't handle retrieving the attribute data of the user and the resource. This design keeps all authorization services stateless, allowing for easy unit testing and ensuring no security leaks.

Several implementations for modeling access control were available at that time, such as OPA Rego, XACML, ORY Ladon, Casbin, and various open-source options. Let's try to write the policy by these tools for another example of Evidence View permission.

In this example, I'll introduce another of Axon's products, the Record Management System (RMS), which provides dynamic report-writing for officers. RMS can attach evidence when writing reports, so it shares some common policies with EDC. In this scenario, the example requirement is:

Evidence E1 is used in Report R1. User U1 has permission to view E1 when:

The Report Linking feature is enabled in the feature flag system
The subscriber(or user) is from the same agency as the resource's creator
AND EITHER:
- The Evidence is not restricted and the user has access to the Report system, OR
- User U1 is shared Report R1

These requirements can be expressed in OPA Rego and XACML as follows:

OPA Rego

XACML

These policy file could be embedded in the Policy Decision Point(PDP service)

Depending on the use case, these rules can be stored in static files or in a database for later user editing. In Axon's case, customers continue working with Roles and ACLs, while engineers manage policy rules. As a result, the policy can be stored in static files. This approach, known as policy-as-code, allows storing policy content in GitHub repositories. Each file represents a specific action type for a resource, and all changes are reviewed via pull requests by all stakeholders. The above example is just a small part of the Evidence View policy at Axon.

The problems of the existing policy languages

The above implementation seems straightforward. However, enforcing data requires engineers to manually define attributes, assign them to each entity, and map these attributes to specific rules and policies within the system. For large organizations with complex business requirements, this mapping process can be time-consuming and error-prone. Additionally, organizations often have large datasets which could cause the policy engine to experience performance issues. Therefore, when choosing a policy language for our situation, we had some criteria:

Support for lazy loading. Policy engines enforce authorization requests based on predefined policies. Many policy languages require preparing all attribute information before enforcement. As systems grow, objects accumulate numerous attributes, some of which may be expensive to retrieve. For instance, the Evidence metadata object has over 50 attributes for the authorization process, some of these being relationships with other products in different clusters, like the Report object mentioned earlier. Additionally, the authorization requests could be shortened in the decision-making process. For example, if a user is a super admin, they have permission to access all data without checking any additional conditions. Therefore, preparing all attributes upfront is suboptimal and inefficient and wasting resources.
Support handling large collections efficiently. In the example mentioned earlier, when verifying whether a subscriber is in the evidence.rmsAccessList.
The problems arise if this list contains hundred of thousands of items. The policy engine must fetch from other services or database tables and populate these items for every policy checking request. Handling large datasets this way could potentially cause the Policy Decision Point (PDP) service to run out of memory (OOM), making it an inefficient approach.

Easy to use and compatible with Axon’s technical stack. Ideally, the language would support Scala or other JVM languages popular at Axon. It should also be user-friendly for engineers of all skill levels.

Most policy languages did not meet our requirements, as they were designed to handle small objects or simple tasks within an embedded environment. OPA was a promising candidate due to the concise and flexible nature of Rego. Rego's custom built-in functions allow engineers to efficiently manage large collections and even support lazy loading. However, this feature is not first-class supported in the language, requiring engineers to manually bind custom functions in Golang. This process is not natively and cumbersome when dealing with numerous attributes.

To be honest, I also missed that feature when learning Rego. However, I think this absence fortunately led us to design a better solution for our use cases. We decided to develop a new Domain-Specific Language (DSL) based on the existing language specification. This new DSL includes improvements aimed at enhancing performance and effectiveness.

Axon Policy Language

The new DSL aims to combine the expressive power of a language with solutions for the performance issues discussed earlier. We opted to follow the Extensible Access Control Markup Language (XACML), an open OASIS specification for policy definition. XACML uses boolean logic to address various authorization use cases effectively for the following reasons:

Comprehensive Specification: XACML provides a complete language specification for writing conditions, rules, and policies governing access decisions and ABAC architectures. This makes it a valuable resource for all stakeholders involved in managing an authorization system.
Simple Parser: Unlike other languages, XACML features a straightforward parser. Leveraging this clarity in language components, we can develop an interpreter in Scala, a widely used language at Axon. This approach saves time by bypassing the need to learn and modify existing language parsers, such as Rego in Go.

Building upon the XACML specification, we plan to implement several improvements in our DSL to enhance usability for engineers:

Simplified Language: Enhance readability for human understanding.
Lazy Evaluation: Support lazy evaluation to reduce unnecessary data loading.
Lambda Support: Introduce lambda types to handle large collections effectively.

Simplify the language

We've retained the core components of the XACML specification, including Policy, PolicySet, and Rule. To simplify the language, we've removed unnecessary concepts. For improved readability and ease of policy writing, we've opted to use YAML format instead of XML, eliminating redundant information to enhance clarity. Additionally, we've implemented and enhanced several features from XACML:

Support for Basic Types: Our DSL supports basic types such as String, Int, Bool, List, Lambda, and CustomType, facilitating attribute comparisons in ABAC.
Extensibility: We've incorporated extensibility for complex boolean logic using custom Policy Combining Algorithms and Operator Conditions(AND, OR) derived from the XACML specification.
Asynchronous Execution: The DSL supports asynchronous execution using Scala Future, optimizing performance by enabling parallel retrieval of attribute information.
ReferencePolicy Support: Introducing ReferencePolicy for better policy definition reuse enhances the flexibility and maintainability of the DSL.

This is an example to define policies for a messaging app. Below are the policies for a user and their permissions in messaging channels/groups

Define resource for a messaging app

Define the policies for messing app

Of course, we also need to define the AttributeFinder to instruct the policy engine on where to retrieve the necessary data. The following code snippet has been simplified by removing some Scala features for clarity.

The channelService above could be a service, or a database wrapper to call to PIP.

Lazy evaluation

With lazy evaluation, the policy engine will traverse the Abstract Syntax Tree (AST) and determine which values are needed for the evaluation on the runtime. The engine doesn't require engineers to prepare all the values of a channel or user upfront. For example, in the channel.archive policy, if the user.role is not “admin,” the evaluation will return False/Deny immediately and it won’t try to pull the channel.owner information.

This feature is effective for a long policy requirements, the logic is intricate and split into multiple branches. It shortens the evaluation process and reduces the number of queries to the Policy Information Point (PIP) system.

Lambda type

The Lambda type is used for working with large collections. In the above example, channel.members has a Lambda type because the members of a channel could be numerous (for instance, a public channel could have hundreds of thousands of members).

Instead of loading the entire members list into memory and checking for array containment, the Lambda function takes two parameters: contains and user.id, which are passed to the channel.members function. In the Channel AttributeFinder, we implement the query channelService.checkUserExist(tid, userId) to check the existing of the user. This approach has better performance when the member list is long.

A better language

With these enhancements, we have developed a more robust language for engineers to write policies and implement authorization services seamlessly, addressing performance concerns and ensuring standardization across teams company-wide. Utilizing YAML also improves the clarity of policies, enabling non-engineers to engage in and review authorization logic effectively. Moreover, the introduction of testing tools streamlines the testing process. In the next section, we will delve into how to deploy this language to protect their products.

Part 0: Introduction

Part 1: The Motivation to Change

Part 2: The Approaches

Part 3: The Policy Language

Part 4: Production Deployment

Hung's Notes

Discussion about this post