Colombia's top criminal court cited AI detectors to reject a lawyer's appeal. An attorney then ran the court's ruling through the same software and got a 93% match.
The Supreme Court of Colombia denied a cassation appeal, arguing that it was generated by AI. But the same tool the court used to determine the appeal's purported AI origins said that its own ruling also received generative help.
Is it a double standard by the court, or faulty tools at play?
“Faced with a well-founded suspicion that the brief submitted by the attorney had not been drafted by the legal professional himself, the court submitted the text to the Winston AI tool,” the court argued. “Its analysis indicated that the document contained only 7% human content, evidencing a marked influence of automated writing and leading to the conclusion that it had been produced using artificial intelligence.”
After running the analysis with other tools that provided similar results, the court ruled that “since the filing cannot be regarded as a duly submitted pleading, its dismissal as inadmissible is required.” But when the court’s ruling faced similar scrutiny from legal experts, it showed similar results.
Within hours of the court posting a thread about the decision on X, lawyers began running their own tests. Velasquez's post went viral in legal circles, accumulating tens of thousands of views.
When GPTZero scanned only the opening words of the court text, it returned a 100% AI result.

When the same tool processed a longer version including the factual background section, it reversed course entirely: 100% human.
The tool is simply not reliable enough to be trusted in court or in situations that would require a high degree of certainty.

Colombian attorneys reacted quickly with their own experiments. Criminal defense lawyer and lecturer Andres F. Arango G, submitted a court filing from 2019, years before the large language models these tools were trained to detect even existed, and it came back claiming 95% AI generation.
The technical reasons for these failures are well-documented. AI detectors measure statistical patterns: sentence length, vocabulary predictability, and a quality that researchers call "burstiness," which refers to the natural rhythm variation humans introduce in their writing.
The problem is that formal legal prose, academic writing, and texts produced by people who write in a second language share many of those same statistical signatures.
Studies on AI detection
A 2023 study published in Patterns found that more than 61% of Test of English as a Foreign Language (TOEFL) essays by non-native English speakers were incorrectly flagged as AI-generated.
Even OpenAI had to take down its own AI detection tool following constant inaccuracies and an inability to do its actual job.
Universities have been grappling with this for years. Vanderbilt disabled Turnitin's AI detector in 2023 after estimating it would generate around 3,000 false positives annually.
The University of Arizona dropped AI-detection features from its plagiarism software after a student lost 20% of a grade on a false positive. A 2024 case at UC Davis saw 17 linguistics students flagged, 15 of them non-native English speakers.
The pattern is consistent. The tools penalize the people who write most formally, most repetitively, or most carefully, exactly the profile that lawyers, academics, and second-language speakers fit.
The cultural fallout has bordered on absurdity. Across writing and journalism circles, people have started avoiding em dashes in their work, not because of any style guide, but because AI language models use them frequently and detection tools (and people) have taken notice. Writers are self-editing natural punctuation out of fear of algorithmic suspicion. Beyond the written world, artists have suffered the wrath of moderators and colleagues for making art pieces that look AI
Colombia’s judicial branch adopted formal guidelines in December 2024 that regulate how judges and court staff can use artificial intelligence.
The rules allow AI to be used freely for administrative and support tasks, such as drafting emails, organizing agendas, translating documents, or summarizing texts, while permitting more sensitive uses, like legal research or drafting procedural documents, only with careful human review.
The guidelines explicitly prohibit relying on AI to evaluate evidence, interpret the law, or make judicial decisions, emphasizing that human judges remain fully responsible for all rulings and must disclose when AI tools were used in preparing judicial materials.
These guidelines could be used to contest such a decision. The Supreme Court has not yet issued any additional statement in response to the backlash over its choice of detection tools. The ruling didn’t have em dashes, either.







