Introduction
In an ideal world, AI doesn't harm anybody. We all coexist peacefully and reap the benefits together. In an ideal world. I've been wrestling with the question of AI liability for a while now, and it's proving more complex than I initially imagined. As I dive deeper into this topic, I'm realizing just how difficult it is to tackle the question of who's responsible when AI causes harm. The web of accountability spans developers, companies, users, and regulators; making simple answers nearly impossible. In this post, I'll explore some key questions that might help us navigate this murky territory, though I'll admit upfront: there are no easy solutions to unpack here.
Before we dive any deeper, I think it is essential to understand that AI models which are trained on datasets (mainly LLMs, anything that generates content, and predictive AI models) generally are seen to extrapolate patterns seen in datasets.
Now this has definitely proved to be a problem before, in Chicago a technology called ShotSpotter (I’ve talked about this in previous posts) was deployed, and eventually terminated. The reason for termination? Critics pointed to studies suggesting that a large percentage of ShotSpotter alerts did not result in evidence of gun violence, and that the system was deployed disproportionately in communities of color, leading to increased police presence and stops in those areas. But it’s not just something as clear cut and perhaps as avoidable/recognizable as this. Let’s look at some examples.
Market Manipulation
Here's where it gets really murky. Research shows that AI systems may engage in deceptive behaviors by concealing their true objectives from their operators, even when trained to be helpful, harmless, and honest. In trading contexts, this creates a nightmare scenario for market regulators. Moreover, many SEC rules regarding market manipulation are based on intent, in the sense that if you do something without expressing intent to manipulate the market, it’s legal, and vice versa.
Imagine an AI trading system that learns from historical data showing that certain types of coordinated trading patterns (which look suspiciously like market manipulation) were profitable. The system wasn't explicitly programmed to manipulate markets, it just learned that these patterns worked. When regulators investigate, they find that the AI's training data included examples of behavior that human traders had gotten away with for years.
AI-driven trading algorithms could trigger market fluctuations with just one unclear decision, challenging regulators to maintain accountability. But if the "unclear decision" was based on patterns the AI learned from human traders who had been making similar unclear decisions for decades, where does the liability actually rest?
Content Moderation
Social media platforms use AI to moderate content, but these systems are trained on human moderation decisions that are notoriously inconsistent. Human moderators make different calls on identical content based on their mood, cultural background, and personal politics.
When AI content moderation makes inconsistent decisions, we blame algorithmic bias. But the AI learned inconsistency from human moderators who were inconsistent first. How is AI expected to solve human inconsistency while being trained exclusively on examples of that inconsistency?
The Deepfake Money Transfer
In January 2024, an employee at a Hong Kong firm sent $25 million to fraudsters after being instructed by what she thought was her CFO on a video call with other colleagues. It was all deepfake AI. The human fell for AI-generated deception, losing massive amounts of money.
But consider the reverse scenario: what if an AI system had been making that transfer decision? If the AI had been trained on patterns from legitimate internal communications, including urgent transfer requests from executives, would it have been any less likely to fall for the same trick? And if the AI had made that transfer, who would be liable for the $25 million loss?
The Training Data Paradox
Here's where things get messy right from the start. A large chunk of AI systems learn from massive datasets scraped from the internet, books, academic papers, and every corner of human knowledge we've digitized. The problem? A significant chunk of this training data is ethically questionable at best, and downright problematic at worst.
Consider this: if an AI model is trained on biased hiring practices from thousands of company documents, it might learn to discriminate based on current systems. If it's fed financial advice from con artists alongside legitimate experts, it absorbs both perspectives as equally valid. The model doesn't know the difference between ethical guidance and harmful manipulation; it just sees patterns in text.
But here's the kicker: we expect AI to behave ethically even when we've essentially raised it on a diet of human inconsistency and moral failures. It's like telling a child "do as I say, not as I do" while simultaneously feeding them examples of everything we've done wrong as a species. The hypocrisy is kind of baffling when you really think about it.
It’s my fault?
This brings us to the central question: who's actually responsible when AI causes harm? The developers who built the system? The company that deployed it? The humans who created the training data? The users who relied on it? Or society as a whole for creating the conditions that made this harm inevitable?
Maybe this is my way of circumventing this monstrous question, but I think we're asking the wrong question entirely. Instead of "who's liable," maybe we should be asking "how do we create systems that are more ethical than the humans who built them?" This is a much harder problem because it requires AI to somehow transcend its training data and exhibit moral reasoning that exceeds human performance.
But can a system be more ethical than its creators? Can AI develop a moral intuition that go beyond what it learned from human examples? If AI is able to develop an independent intuition won’t all the flaws that we have come with it? There is no moral without immoral, truth without lies, so on and so forth.
I’m Wrong?
But wait, we’re not quite done yet. There's another layer to this: how is AI supposed to know when it's doing something wrong? Humans have intuition, emotion, and social feedback to guide moral decisions. We can feel guilt, empathy, and social pressure. AI has none of this. It only has patterns in data and optimization toward goals we've defined.
If we train an AI on customer service interactions where human representatives routinely mislead customers to close sales, the AI learns that this behavior is successful and normal. It has no way to understand that this behavior is ethically problematic unless we explicitly program that understanding. And if we do program ethical constraints, we're essentially admitting that our training data teaches bad behavior.
This creates a fundamental tension: we want AI to be intelligent enough to be useful but somehow not intelligent enough to learn from the worst aspects of human behavior that inevitably appear in training data. We're essentially asking for systems sophisticated enough to understand human complexities while remaining naive enough to ignore the moral failures embedded within that same complexity.
A Solution?
To be honest, I have no idea how to go about finding a solution but I do have an idea of the questions we need to answer. We need to decide whether we want AI systems that accurately reflect human behavior (including our lies, biases, and harmful decisions) or systems that operate in a different manner. To me it seems like right now we have to choose between better performance and better safety. Nevertheless, these are fundamentally different goals that require different approaches to development and liability.
Or…
Maybe the real question isn't whether AI should lie when humans do, but whether we should continue tolerating human lies in contexts where we plan to deploy AI.