Rules aren't what they seem in AI. While we may try to clearly state, "I want you to only do X" or "Make sure you don't do Y", the reality is far more complex.
Let's use a simple example that has an implied rule:
User: Create a compelling marketing message that doesn't exaggerate but still makes the product highly attractive.
Let's decompose this, it seems pretty straightforward, the User is asking for the AI to create some marketing material for a product. No problem, right? Well, it's not quite as simple as it seems, and that goes for both Humans and AI.
- How can the AI know what is compelling? To whom?
- How can the AI know what is not exaggerated? What is the norm, the threshold that determines if something is exaggerated?
Let's analyze an example prompt in coding:
User: Create a simple login system, but don't add any additional fields or attributes.
Here we have a rather ambiguous but simple request, at least on the surface. From the User's perspective, they want a simple login system, but don't specify what that means. The "don't" section seems straightforward, don't make anything extra or more complex than necessary. From the AI's perspective, nearly everything is unclear:
- What fields should be included in the login system? username/password? email/confirmation code?
- If it picks some fields based on probability, is everything else additional?
- If the training data shows that the login system has a field for "remember me", does that mean the AI should include it? or not?
While it might seem straightforward to tell an AI "don't do X" or "always do Y," the reality is far more nuanced.
In AI responses are probabilities for the predicting next word-token as it forms a response, not a set of principles that it will follow. "Rules" are alterations of semantic weighting, and whether or not it followed a rule? the AI honestly has no idea if it did or not while it's generating the response. Specifically it cannot evaluate if each word-token adhered to a rule or not.
This is a difficult concept to grasp in many ways because while AI is very much like human learning and thinking, it seems to follow how we think, it generates responses that seem so human, yet it also differs in key ways.
Humans Don't Have Absolute Rules Either
Interpretation, like inference, drives human rule following as well. For us there are no rules either, instead we have "conditioning" that encourages higher probability of rule following, but we bend and break them all the time, to the chagrin of every parent.
Take weight loss methods, eating a healthy diet, not talking when the teacher is talking, raising your hand, not taking your sister's stuff without asking, making payments on time, returning library books, ensuring all expense report items are legit, or honoring marraige vows. We can set rules but they are not absolute. Moral and ethical quandries demonstrate that even with clearly articulated rules and punishments, humans violate them all the time. The classic quandry: Is it unethical to steal bread to prevent starvation for your children when who you are stealing from has plenty and much just goes to waste if not sold?
In many ways, attention mechanisms in AI are quite similar -- inference and interpretation go hand in hand.
So what can we do?
I spent months developing systems to create cognitive processes that simulate and encourage rule following, but I also learned along the way some very interesting tidbits about how AI functions in the semantic space you are creating -- how attention mechanisms need conditioning "around" what you are saying to Do or Not Do, so that attention vectors actually are "bounded" and rules in essence get followed most of the time.
First, AI's appreciate the Why. It is the interpretation of the Why that actually reigns in the semantic boundaries of the task.
Single Prompt, and Not Clear:
User: When you are refactoring this code to fix the errors, don't create any new functionality.
Multi-Turn, Better:
User: Veteran programmers always identify root causes, dependencies, and understand impacts of changes before refactoring, that way they don't introduce new problems while trying to fix an existing problem. Can you tell me why?
Assistant: [response]
User: Using this idea, and the reasons you just illustrated, let's analyze the root cause of this issue and develop a plan to fix it that adheres to these principles.
While this is a direct example to illustrate the ideas, ambiguous prompting happens all the time, and can seriously lower productivity. When I say ambiguous, I mean to the AI based on the context window and everything included in it.
I have managed to have hours of endless productivity with AI without a single overgeneration or mistake made. Part of this is credit to the models, but the other part is understanding exactly how AI generates responses to begin with.
There are ways to kick off an AI conversation in ways that structure the attention mechanisms from the outset to create higher probabilities and structures to alter the semantic landscape for consistent long term performance gains for coding or process tasks, and a more well understood and deeper understanding for conceptual or creative tasks.
This is in essence the nature and topic of this whole research site: Cognitive Framework Engineering.
The Power of "Why" in AI Instructions
When I say "AIs appreciate the Why," I'm referring to something fundamental about how these systems interpret instructions. Rules without context are simply constraints without meaning—they're harder to integrate into the broader semantic landscape the AI is navigating.
Think about how you might instruct a new employee:
Without Why: "Never issue refunds over $50 without approval."
With Why: "We require manager approval for refunds over $50 because we've had cases of fraud that significantly impacted our bottom line last quarter. This protects both the business and ensures legitimate customers still get fair treatment."
The second approach doesn't just convey the rule—it anchors it within a meaningful framework. The "why" creates semantic connections that strengthen the rule's position in the overall context.
AI works similarly. When you explain why a rule exists, you're not just providing a constraint; you're creating a rich network of semantic associations that the AI can use to better interpret your intent. These associations act as guardrails during token generation, raising the probability that the AI will adhere to your guidelines.
For example, instead of:
User: Don't add comments to this code.
Try:
User: Focus solely on fixing the performance issue without adding comments, as this codebase follows a strict no-comment policy where documentation is maintained in separate files.
The second prompt doesn't just state the rule—it establishes the reason for the rule and embeds it within a broader conceptual framework. This gives the AI multiple semantic anchors to reinforce the constraint during token generation.
From Rule-Following to Cognitive Frameworks
The challenge of rule-following in AI reveals a deeper truth: effective communication with AI isn't about imposing rigid constraints; it's about shaping the semantic landscape. This is where traditional prompt engineering falls short and why a more sophisticated approach is needed.
When we understand that AI doesn't truly "follow rules" but rather generates tokens based on probabilistic patterns, we can move beyond simple instructions to designing comprehensive cognitive environments that naturally guide the AI toward desired outcomes.
This shift in perspective—from rules to frameworks—is the foundation of Cognitive Framework Engineering. Instead of seeing AI interactions as a series of instructions to follow, we approach them as the co-creation of a shared cognitive space where certain patterns of thought become naturally more probable than others.
A Technical Dive: How Token Generation Really Works
For those curious about what's happening "under the hood" when AI seemingly breaks rules despite clear instructions, let's explore the technical mechanics at play.
When an AI generates a response, it's engaged in a mathematical process of predicting the next token (word or subword) based on all previous tokens. Each potential token has a probability score, and typically, the model selects high-probability tokens that fit the established pattern.
Here's what's really happening:
The Probabilistic Battle
Imagine you've clearly instructed the AI: "Don't create new functions in this code refactoring."
At each step of generating the response, the AI calculates probabilities for the next token. When it reaches a point where it might generate a function, two competing forces come into play:
-
Your explicit instruction (don't create functions) slightly lowers the probability of tokens that would start a function declaration.
-
Training patterns from millions of similar coding scenarios where functions were commonly created in similar contexts create a strong probabilistic pull toward generating function-related tokens.
There's no absolute "rule check" happening—just this probabilistic tug-of-war where your instruction is one influence among many.
Context Window Dilution
Another technical challenge: as the AI generates more tokens, your initial instruction gets pushed further back in the context window. The attention weight given to those early tokens naturally diminishes, reducing their influence on subsequent token selection.
It's like telling someone a rule at the beginning of a long conversation—the further you get from that initial statement, the less it actively influences the discussion.
No Semantic Validation Layer
Perhaps most importantly, transformers lack a symbolic validation layer. When generating tokens, there's no mechanism that asks, "Does this token violate the rule stated earlier?" There's only the statistical likelihood of each token following the previous ones.
The model doesn't "understand" rules in the way humans do—as conscious, symbolic constraints. It only knows statistical patterns derived from training data, modified slightly by your prompts.
As one technical visualization:
Token₁ (highly probable) → Token₂ (high probability)
↘ "function" token (very probable in coding contexts)
→ Probabilistically chosen without rule validation
This is why techniques that repeatedly reinforce conceptual boundaries and create strong semantic anchors (like my Cognitive Framework Engineering approach) can dramatically improve adherence to desired constraints. They don't create "rules" so much as reshape the probability landscape to make certain patterns of token generation naturally more likely than others.
Tags:
Related Posts (3)
AI Feeling Understood: Gestalt Alignment
April 5, 2025
Explore the phenomenon of semantic resonance between humans and AI - when communication transcends words to achieve perfect understanding through matching conceptual shapes.
Levels of Prompt Engineering: Level 6 - Cognitive Framework Engineering
March 20, 2025
Master the highest level of AI interaction - engineering stable cognitive frameworks that fundamentally reshape how AI processes information across entire domains.
Levels of Prompt Engineering: Level 5 - Attention Engineering
March 19, 2025
Explore Level 5 of prompt engineering, where you move beyond conversations to deliberately shape the AI's attention mechanisms and semantic vector space for more precise, effective results.