Agentic AI: Understanding "Degree of Agency" & its Impact on Agent Framework
When setting up Agentic systems in an enterprise, it is important to be aware of the level of autonomy that the agent will have as this will dictate the approach to train and manage the agent. The extent of autonomy of an agent is formally referred to as the “degree of agency” of the AI Agent. There is a range of autonomy of AI agents from fully rules-based (deterministic) to fully learned behavior (probabilistic) with no/limited prior knowledge.
Traditional RPA (Robotic Process Automation) and BRMS (Business Rules Management Systems) enable a rules-based approach to building agents that usually deploys some type of forward-chaining logic to accord agency to these agents. These fall in the area of AI often referred to as Expert Systems.
The key advantages are:
1) Relatively easy to set up as this is an established area where approaches to knowledge discovery and codification are well defined. Case in point is KDD – Knowledge Discovery in Databases.
2) Rules management is mature technology.
3) Once the rules hierarchy and paths are defined, the outcome is consistent and reliable.
The key downside of rules-based approach is that it is as good as the completeness and accuracy of the knowledge that drives it. Such systems usually get flustered when they encounter a scenario not supported by embedded knowledge. This often leads to a painstaking track and trace effort to identify the shortcoming and fix it or find an alternate traversal path.
On the other hand, fully autonomous agents come with no or limited prior knowledge. They are assigned a role and objective(s) and policy framework to abide by. They exhibit a goal-seeking behavior that seeks to maximize reward. While they are easier to set up (a series of prompts chosen from a prompt library or ,even better, a pre-defined library of agent templates capturing most common tasks), their behavior is constrained by the knowledge they are trained on. Equally important is the clarity of their goals/objectives especially in relation to other agents.
Given that in any organization there will be multiple such autonomous agents both within and across functions, there is a need to modulate the learned behavior in a multi-agent setting. Technically, modulation is a sense-and-respond mechanism where the agent senses the environment and adjusts the traversal path ,or in some cases, the policy or objective (more on this in the Technical Note at bottom). This modulation is a complex undertaking where complexity is directly proportional to the ambiguity or lack of standardization of the task or over-arching process.
For example, to write code from scratch given requirements and then test it, deploy it and set up DevOps tasks requires a well-defined set of steps that is well documented and has enough best-in-class examples available both withing and outside organizations. So, it may be possible (not trivial, it would still require significant effort and testing) to set up an agent-based code factory that goes through the full development lifecycle autonomously with minimal human supervision.
In contrast, a complex business process like S&OP (Sales & Operations Planning) involves multiple functions with often conflicting objectives and sub-optimal trade-offs. It would be exceedingly difficult to set up autonomous agents to run S&OP. Even a hybrid approach (using a mix of rule-based and learned agents) may be difficult as outcomes may be context-dependent and it may not be possible to imbue the agent with the knowledge of all variations and parameters of contextual complexity,
A simple rule of thumb on where to use autonomous AI agents is:
1) Well defined consistent process
2) Ok to have multiple objectives but an overarching global objective is required to define the leeway allowed to the subjunctive objectives
3) Well documented examples and supporting data capturing the universe of possibilities and outcomes
Technical Note: Even when these conditions are satisfied, training autonomous agents is a technically difficult task and requires unique combination of skills (ML + DP - Dynamic Programming + MDP - Markov Decision Process). One usually has to utilize some kind of Reinforcement Learning technique that tries to maximize reward in an assigned policy framework that may itself be dynamic based upon the environment it operates in. Although there are established methods like SFT (supervised Fine Tuning), Contrastive Learning through DPO (Direct Preference Optimization) or GRPO (Group Relative Policy Optimization), this process takes time and resources as all of these are heavily reliant on high quality data curation and some type of human/system feedback for response refinement.