Authorization in microservice architecture, P1: The motivation to change

A coarse-grained access control system

Jul 18, 2024

To illustrate the problem and solution more effectively, let's take a quick look at some key concepts

The concepts

Evidence.com (EDC), Axon's flagship cloud product, is a platform for digital evidence management. It allows users to upload, manage, and share various types of digital evidence, such as body-worn camera footage, photos, documents. EDC also provides comprehensive Identity and Access Management system, including License tiers, Agency, User, Group, Role, and Permission management. Advanced concepts like Nested Groups, Command Hierarchy, Group monitoring, Evidence Group, Access Class, etc enable users to quickly configure any workflow. For more detail of these concepts:

Command Hierarchy organizes an agency’s structure, serving as the source of truth for all workflow setups related to that agency’s hierarchy
Agency’s hierarchy structure
Nested Group allows the creation of groups within groups. Unlike the Command Hierarchy, it offers flexibility for users to assign any group within another. For example, Group G3 is nested Group G2 below:
Nested Group, G3 is nested in G2.
Group Monitoring enables the users to oversee and manage activities and evidence within a group, ensuring accountability. For example, if User M1 is assigned to “monitor” Group G1, Group G1 contains User U1, U2, Group G2. Group G2 contains User U3, U4. Then User M1 can view all of evidences in User U1, U2, U3, U4 (all users in Group G1, G2).

User M1 is monitor of Group G1, who has permission to view all evidence from Group G1

Many more concepts for access control are available in the public Reference Guides of EDC.

Authorization checks are required for almost all resources in EDC, making authorization one of the most demanding tasks in the system. The most basic and complex objects that users frequently need to interact with are Evidence, Cases and its derivatives files. In short,

Evidence refers to digital files and data, such as body-worn camera footage, photos, and documents, that are collected and uploaded to the platform. Each piece of evidence is cataloged with metadata like timestamps, location data… for context and secure management.
Case is a collection of related evidence organized for an investigation or legal proceedings. It helps group and manage multiple types of evidence, ensuring all relevant materials are easily accessible and systematically reviewed.

The access control models

EDC started with Role Based Access Control(RBAC), Discretionary Access Control(DAC) , and additional concepts for all principals (e.g., users, APIs, devices).

The Role-Based Access Control (RBAC) is a security model that manages access to resources by assigning permissions to users based on their roles within an organization or system. EDC features dynamic role settings tailored to meet varying users need. Smaller agencies might use a few default roles, while larger agencies can define numerous roles, potentially in the hundreds. Users can create new roles from an extensive list of permissions. For example, the roles for Admin, User, Investigator…

The Group Monitoring is a special concept in EDC that is mentioned above, it has a predefined set of permission and is implemented separately. It is typically used in law enforcement and related fields. The Group Monitoring enables a hierarchical permission structure where the Monitor can have broad access across multiple groups, facilitating efficient management and oversight without duplicating permissions or roles.

For example, given Group 1, Group 1 contains Group G2, G3 and User U1. If User U2 is assigned to monitor Group G1, U2 has access to all the data of group G1, G2, G3.

Discretionary Access Control (DAC) is another security model that allows the owner (creator or administrator) of a resource to decide who can access it and at what level (read, write, execute, etc.). For example:

Share by scope: User U1 could share Evidence E1 to User 2 with a read permission
Share by role: User U1 could share Evidence E1 to User U3 with all the permission of U3’s role, for example, if U3 has Admin role, so U3 can view, edit the E1.
Share to a nested group: User U1 could share the an Evidence E1 with the view permission to a Group G2, Group G2 contains Group G1 and the G1 contains User U2, then User U2 could view the Evidence E1

Manage Access, source: Evidence Reference Guide

Additionally, EDC incorporates Feature Entitlement & Feature Flag System. These features, combined with other environmental factors, are frequently used in the logic of the authorization service.

A coarse-grained access control system

After a decade of development, the features of these objects have expanded, incorporating complex business logic. For example, the evidence object handles humongous amount of requests per day, with over 30 action types requiring protection (e.g., View, Listing, Downloading, Editing, Sharing…), and over 50 attributes requiring permission enforcement across the extensive system. In fact, it is difficult to determine all the possible ways that a user can access an evidence based on the numerous rules and attributes. With the increasing level of detail needed, the existing access control models could no longer handle the permission management, or in another word, it was too "coarse-grained".

The advantage of using RBAC and DAC is straightforward for engineers and intuitive for users to manage permissions via the UI. This includes tasks like creating roles, assigning them to specific groups or users, or defining Access Control Lists (ACLs) for granting resource permissions.

However, complexity arises as access rules increase, incorporating hierarchical structures and intricate object relationships. Determining a permission goes beyond simple checking scopes in JWT token or ACL. It requires evaluating relationships from the predefined policies, such as Nested Groups, Command Hierarchies, and Monitoring Groups.

Let's look at this example. Consider User U1, who belongs to Group G1, which monitors Group G2. Group G2 contains Groups G3 and G4. Evidence E1, E2, E3, and E4 belong to Case C1. If Case C1 is shared with Group G3 with view permission, then User U1 can view Evidence E1, E2, E3, and E4.

It is difficult to express the above logic using RBAC or DAC alone. Engineers had to implement ad-hoc custom code whenever the system needed to check permissions, leading to a messy and unmanageable codebase for a large number of action types and resources.

The evolution of technical stack problem

The next problem was the technical debt and the changing of the architecture.

Initially, EDC started with a monolithic .NET codebase. In simpler authorization scenarios, authorization code was added ad-hoc within existing code. People did not consider authorization as a separate component. Scattered custom code in multiple files and mixed with business logic affected the performance optimization. The system got extremely slow when working with large collections e.g. a Case with huge numbers of evidences, or a Group with thousand of users. In general policy rules, some logics need to iterate through these big collections to check the permission and made the service running out of memory (OOM).

Another problem is, in a simple authorization system, policy logic didn't need to consider data location much, assuming all necessary information was available or easily retrievable from a single database. Authorization checks were also straightforward within common functions, modules, or libraries. However, as the company expanded tenfold and transitioned to a service-domain implementation to enhance self-service capabilities, each product pillar began managing its own services, products, infrastructure, and release cadence. Consequently, gathering data for enforcement or organizing the code became more complex.

For example, when checking the Evidence View permission in the Evidence Management cluster, it requires information from various sources, such as user details (user, group, nested-group...) from several services in the IAM cluster, evidence metadata and its additional attributes, case information if the evidence is in a restricted case, and record information from the Record Management cluster if the evidence is attached to a police report, etc

Service-domain(Multi-cluster architecture): Each product pillar has a dedicated cluster for its workload

This raises questions about code organizing and deployment. Should we centralize all logic and deployment within the IAM cluster, with one team managing the authorization logic? Or should we decentralize it across resource servers?

Authorization is a core component of the IAM domain. However, centralizing all logic within the IAM cluster has several concerns:

Is the IAM team best positioned to understand all product concepts from other pillars in order to write authorization logic effectively?
Does this approach violate domain boundaries and team topology principles?
Is routing all permission-checking requests for all products through the IAM cluster considered best practice?

On the other hand, a decentralized approach also presents its own challenges:

How can we ensure consistent coding practices across the company and align each team's policy implementation with the UI provided by the IAM team (e.g., Role and Configuration page)?
Is replicating the authorization service in each cluster the right approach, given that it introduces additional issues? The authorization service needs to connect to all relevant data sources to gather attribute information while remaining accessible to multiple microservices across clusters for permission checks. This approach can create complex dependencies and increase network traffic due to inter-cluster routing.
How can we effectively test and maintain the authorization logic when the code is spread across different parts of the system?

Addressing all the above questions about deployment topology immediately was not easy. However, compared to the problem of the access control model, fixing the access control model took priority. This model defined the theoretical framework for how customers and engineers would work. From that, we could tailor the deployment to fit the organization's demands. In the next part, we will explore how 'fine-grained' access control models and their associated architectures can help us address these challenges.

Part 0: Introduction

Part 1: The Motivation to Change

Part 2: The Approaches

Part 3: The Policy Language

Part 4: Production Deployment

Hung's Notes

Discussion about this post