Chardet dispute shows how AI will kill software licensing, argues Bruce Perens

Earlier this week, Dan Blanchard, maintainer of a Python character encoding detection library called chardet, released a new version of the library under a new software license. In doing so, he may have killed "copyleft." Version 7.0 employs an MIT license in place of the previous GNU Lesser General Public License (LGPL). Developers who have an eye on commercial use of their open source works tend to prefer permissive licenses like MIT’s because they impose fewer obligations than copyleft licenses like GPL/LGPL, which require derivative works to be distributed under the same terms. The entire economics of software development are dead, gone, over, kaput! Blanchard says he was in the clear to change licenses because he used AI – Anthropic's Claude is now listed as a project contributor – to make what amounts to a clean room implementation of chardet. That's essentially a rewrite done without copying the original code – though it's unclear whether Claude ingested chardet's code during training and, if that occurred, whether Claude's output cloned that training data. An individual claiming to be Mark Pilgrim, the original creator of the library, opened an issue in the project's GitHub repo arguing that Blanchard had no right to change the software license, citing the LPGL requirement that the license remain unchanged. "Licensed code, when modified, must be released under the same LGPL license," said the individual posting under Pilgrim's name. "[The maintainers'] claim that it is a 'complete rewrite' is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a 'clean room' implementation). Adding a fancy code generator into the mix does not somehow grant them any additional rights." Blanchard disagreed, citing how version 7.0.0 and 6.0.0 compare when subjected to JPlag, a library for detecting plagiarism. "Version 7.0.0 is qualitatively different," he wrote. "Its max similarity against every prior version is under 1.3 percent. No file in the 7.0.0 codebase structurally resembles any file from any prior release. The matched tokens are common Python patterns that appear in any project: argparse boilerplate, dict literals, import blocks. This is not a case of 'rewrote most of it but carried some files forward.' Nothing was carried forward." Blanchard told The Register he had wanted to get chardet added to the Python standard library for more than a decade since it’s a core dependency to most Python projects. “After years of maintaining chardet primarily by myself, it became clear to me that there were a few obstacles standing in my way: its license, its speed, and its accuracy,” he said in an email. “The primary thing that had prevented me from accomplishing these goals in the past was time. Claude gave me the ability to accomplish what I wanted in roughly five days. “And the result is a 48x increase in detection speed for a project that lives in the hot loops of many projects. That will lead to noticeable performance increases for literally millions of users (the package gets ~130M downloads per month). And it paves a path towards inclusion in the standard library (assuming they don’t institute policies against using AI tools).” Blanchard said his primary goal has always been to improve the lives of as many Python developers and users as possible and not to start a turf war over licensing. “I was just trying to accomplish those goals with the tools and time I had available. I have never been paid to work on chardet, and have a full-time job on top of all this,” he said. “Software licensing and the laws around it haven’t been tested a lot in this new world of AI-assisted development, and as a longtime open source developer, I am also curious about how this is all going to shake out.” Who writes software now? Armin Ronacher, creator of Flask and a long-time open source developer, said in a blog post that he welcomed the license change, as he had wanted them for years. "What I think is more interesting about this question is the consequences of where we are," Ronacher said. "Copyleft code like the GPL heavily depends on copyrights and friction to enforce it. But because it's fundamentally in the open, with or without tests, you can trivially rewrite it these days." The use of AI raises questions about what level of human involvement is required to copyright AI-assisted code. The US Supreme Court recently refused to reconsider Thaler v. Perlmutter, in which the plaintiff sought to overturn a lower court decision that he could not copyright an AI-generated image. This is an area of ongoing concern among the defenders of copyleft because many open source projects incorporate some level of AI assistance. It's unclear how much AI involvement in coding would dilute the human contribution to the extent that a court would disallow a copyright claim. Zoë Kooyman, executive director for The Free Software Foundation, told The Register in an email, "We can't comment on the specifics or legality of this particular project without doing additional research or consulting lawyers, but there is nothing 'clean' about a Large Language Model (LLM) which has ingested the code it is being asked to reimplement. "As far as the intention of the GPL goes, a permissive license is still technically a free software license, but undermining copyleft is a serious act. Refusing to grant others the rights you yourself received as a user is highly [antisocial], no matter what method you use. Now more than ever, with people exploring new ways of circumventing copyright through machine learning, we need to protect the code that preserves user freedom. Free software relies on user and development communities who strongly support copyleft. Experience has shown that it's our strongest defense against similar efforts to undermine user freedom." Ronacher argues that there will be huge consequences now that AI models have become so capable with code. "When the cost of generating code goes down that much, and we can re-implement it from test suites alone, what does that mean for the future of software?" he mused. 'The entire economics of software development are dead' Bruce Perens, who wrote the original Open Source Definition, has broader concerns about the entire software industry. "I'm breaking the glass and pulling the fire alarm!" he told The Register in an email. "The entire economics of software development are dead, gone, over, kaput! "In a different world, the issue of software and AI would be dealt with by legislators and courts that understand that all AI training is copying and all AI output is copying. That's the world I might like, but not the world we got. The horse is out of the barn and can't be put back. So, what do we do with the world we got?" Perens said that this week he wanted to create a version of an existing site-reliability-engineering (SRE) platform on a different platform, in a different language, under a different license. He'd spent about ten days building a platform to assist the AI in the development of this platform, and paired that with software to check the AI's output for errors. "I then told the AI to look at the promotional materials and documentation of existing SRE platforms, and make one up," he said. "A moment later, it was there. I felt like a magician saying 'abracadabra' and having it really work! I am the Harry Potter of software!" Perens said the economic consequences of this are startling. "Proprietary software and Open Source both get their entire paradigms changed," he explained. "Many proprietary companies will not survive, if they depend on holding close something that can be cloned as easily as I did. The ones that are not so easily repeated would probably go on." Perens expects that niche software that was previously not economical to produce will proliferate. "Robots will be real, and there finally will be one that picks up your room for you," he said. "Their hardware will drive profitable companies, but we might eventually see their reasoning platforms go open. What happens to software licensing? Proprietary licensing seems almost irrelevant. Open Source licensing may well be so too, since every Open Source program is AI training. "I wonder if knowledge got to a critical mass, and this is the inflection point where all of the processes around it changes. We have been there before, for example when the printing press happened and resulted in copyright law, when the scientific method proliferated and suddenly there was a logical structure for the accumulation of knowledge. I think this one is just as large. "The social changes coming are just as big, and as we can see already, even more scary." ®
AI Article