Emerging Legal Challenges for Open Source in the Age of AI
Defensibility and AI-Related Legal Issues for Open Source
Open-source software has been called “the most positive and transformative force in the history of IT,” having led to a global sharing of developer knowledge that has improved countless developers’ day-to-day work and their careers over time. It is also an increasingly fraught space as open-source leaders struggle to properly monetize their projects–particularly as well-heeled corporate teams leverage their (and their communities’) work into billion-dollar proprietary commercial projects. Extremely well-known open-source projects have changed their licensing structure, including MongoDB (2018), Redis (2019), Elastic (2021), Grafana (2021), and HashiCorp (2023).
In addition, a potential new challenge to the integrity of open-source communities is generative AI. GenAI is anecdotally proving to be a boon for day-to-day software development via coding assistants, noticeably increasing developer productivity in early studies. However, experts also point out potential threats from GenAI for open-source projects, such as legal liability resulting from the injection of copyrighted, proprietary code (unwittingly or otherwise) via community contributions that use AI trained on potentially leaked materials. Some major AI vendors themselves are pivoting to alternative licenses for their AI models, including Meta’s LLaMa 2, which is “free for research and commercial use,” but whose licensing model confusingly is not open source.
For more perspective on the ongoing legal and IP-related challenges open-source projects face from being co-opted by massive corporations and from developments in GenAI, we spoke with legal expert and CEO of the OSS advocacy group OpenUK, Amanda Brock. Learn more about the issues that software developers face in the age of AI at the DevGuild: Artificial Intelligence event.
Developments in Patent, Trademark, and IP Protections for OSS
Brock, a former professional solicitor, points out potential headwinds for OSS projects in the form of stifling patent protections, which the US Patent and Trademark Office may be looking to entrench further. “[The USPTO is] putting a proposal in place to remove various things–and one of them was the right to object to patents.”
Stronger patents, she argues, could have a chilling effect on innovation as first-movers can argue ownership not simply for projects that developers code up, but for the ideas behind them. “Patents are probably the strongest kind of IP because they give a monopoly on an idea–not the way the idea is iterated. Compared to copyrighted code–you’re not allowed to copy exact code, but you can take that idea and go and build it yourself. A patent is much broader, which is why people have such big concerns about them and whether they really belong in this world of software, full stop.”
Case Study: GNOME Foundation vs. RPI Patent Trolling
Brock cites the case of the Linux organization GNOME Foundation vs. Rothschild Patent Imaging, in which RPI filed an apparent patent troll case–a lawsuit specifically designed to use patents as cudgels rather than to defend creative property–against GNOME Foundation’s open-source digital photo management tool Shotwell. During the proceedings, RPI perhaps realized that GNOME would not be an easy mark, and attempted to withdraw, only to have GNOME countersue, as well as applying to have the patent set aside.
Brock suggests that an important alternative to a contentious patent landscape is developers rallying around defensive patent orgs such as the Open Invention Network, which collectively oversees a huge patent pool with a tacit agreement among its community to not constantly sue each other, and instead offer licenses among themselves to continue supporting innovations.
Software Supply Chain Threats and Liability
Risk may be increasing around software supply chain regulation, due to the increased adoption of open-source software despite a continued lack of understanding of the OSS model on the part of government agencies. Brock observes, “now what we see is governments trying to work out how to manage [software supply chain security], but they're not deep experts...Because of that increased concern, what we are seeing is a need for clarification of where risk is going to sit.”
“And currently risk sits with the end user–the person who chooses to use the software. If they were buying proprietary software they would have to have a license, so they'd have to have a contractual relationship.” For proprietary purchases, end-users can therefore delegate or disclaim some responsibility to the vendors–who presumably have some kind of in-house expertise on the legal nuances of software licenses.
On the other hand, it’s possible to integrate OSS into a business without a formal contract. A startup’s engineering team can, depending on the license, presumably use code from an OSS project more or less freely in compliance with their own company policies. However, in such situations, responsibility, and liability, ends up sitting with end users, who are likely not thinking about those implications. “Open-source licenses are very clear that they are providing the code without liability. They disclaim liability. And governments are looking at that in all sorts of different ways.”
The Current Shape of Regulation
Brock is mildly optimistic that UK and US regulators appear to be tacking in the direction of disclaiming liability for OSS developers, but the picture hasn’t resolved itself yet. “It appears that generally there is an understanding that open-source developers ought not to have that liability. And if it were to shift from the end user making the decision to use the code with full knowledge of how the code has been used and the sort of legal obligations in their sector, it doesn't look like the intention at least is to pass it to the developer, which is great.”
“And in the US, what we're seeing is one instance of this with the White House's consultation around security and shifting responsibility to any commercial entity–any entity taking money which is profiting from the code or taking money for the code.” Brock suggests that legislators’ attention may be focused on how various open-source business models generate revenue. “The Red Hat subscription model [of paid professional services to support working with open-source software] would be subject to [scrutiny] because money changes hands. A support model that other organizations sell–which supply support services for multiple different products because they're leveraging the open source products to make money–would become responsible under such a proposal–though this is just a proposal, and not a new law at this stage.”
The question of where liability sits will likely lead to lively discussions and a significant amount of pushback. Brock explains, “Good lawyers will always have advised both the open-source companies and the communities never to accept any liability in their code. So, even if you are selling services around the code, the code is still likely to be made up of community contributions that you'll have limited control over. Good open source is collaborative–but suddenly, maintainers having to carry all the liability becomes very problematic. So I expect the White House to receive a huge amount of pushback should they try to shift liability that way.”
Regulatory Confusion: Commercial vs. OSS and the Cyber Resilience Act
In Europe, the proposed Cyber Resilience Act (CRA) is already causing European developers concern, due to the way it attempts to segregate both ‘commercial’ and ‘non-commercial’ OSS projects as well as contributions (which can come from either individuals or individuals representing accompany). “What’s more concerning–they're trying to apply product liability to software, potentially being put onto open-source software. Even if you wanted to do that, I cannot imagine how you could possibly manage the sort of certifications and audits around product liability you would need for open-source projects. It's just impossible at the scale they're talking about.” Brock is concerned that a fundamental lack of understanding of the open source model may be to blame for misplaced policy proposals. “They also suggested that code could be ‘trialed’ and then ‘withdrawn.’ Of course, once you put Open Source out there, you can't ‘take it back.’ It's there in perpetuity.”
A potentially frustrating aspect of OSS regulation is out-of-step legislators who seem to think that regulation should simply be a matter of placing liability on companies that make money, especially given the complexity of open-source software projects that are attempting, or have attempted, to go a hybrid commercial route. However, Brock doesn’t feel that potentially misplaced liability will necessarily halt or even slow down OSS development. “I think [regulation that incorrectly places all liability on developers] will just force splits, and potentially people going ‘underground.’ but I don't think you're going to stop people from coding, and I don't think you're going to stop people using open-source code. Also, there’s no sense in starting to build code from scratch every time. The reason open source is so successful is that its methodology is the natural way to collaborate in code. Collaboration is the base. Standing on the shoulders of giants is natural.”
Part of the confusion on the part of regulators may be intrinsic to how open-source software, in a way, flies in the face of conventional trademark and IP law. Brock confides, “When I first started working with open source, it took me months to get my head round it. It was the opposite of everything that I had been taught as a lawyer. But now, the world has shifted a bit. So I don't think it's quite so hard now. But the idea of collaboration, of sharing your assets–lawyers are taught to do the opposite. They're taught to protect the company, to keep everything closed down. You have to unlearn your way of thinking to accept collaborative development and innovation–to accept open innovation and open-source principles. So there's that mental shift that needs to happen with public-sector policymakers. But some of those folk are quite old-fashioned.”
Where Open Source and Generative AI Intersect
Brock sets up the issue of generative AI meeting open-source software as a huge issue that requires a great deal of deliberation. “I think it's absolutely fascinating, but I've always said that this issue is perhaps ‘too big’–and it may require significant ethical decisions. One of the things I like about open source is that we don't make ethical decisions, but it also is something that requires global, cross-border regulation, in my view. In an ideal world, we’d have this global community where everyone is ‘caught up.’”
“But I don’t think governments have quite caught up on the topic of AI. We’re seeing UK Prime Minister Rishi Sunak hosting an AI summit this year, which I believe is intended to look at risk in AI, and regulation in AI on a global basis. But I think that with AI and open source, suddenly we are right in the middle of the agenda because we have the code concerns we’ve been discussing here. It’s a complicated situation with many viewpoints. We’ve all seen, for instance, the ‘leaked’ memo from Google about the company having ‘no moat.’ A memo attributed to a Google employee, which obviously wasn’t representative of the whole company. But the picture changed significantly with the ‘leak’ of Meta’s LLM LLaMa, which made its way into the research community, but technically was not open source because it didn’t carry a full-fledged OSS license. And later this year, we saw the launch of LLaMa-2, which was also, as I have argued, not open-source.”
“The net effect was that the open-source community got what it had been missing–a LLM that had cost hundreds of millions to train was suddenly available to people. And from there, we saw a scale and pace of development that was unprecedented. And that same leaked memo talked about how things that take months for an individual corporation to work on get fixed in a week among a larger community.” Brock suggests that communities may make all the difference in the GenAI arms race. “The real change is the development activities that the open-source communities are bringing to AI. Which may be why we don’t hear any more commentary from our mystery Google employee–because if open-source communities now have LLMs and other tools in hand which let them create fixes at a pace that Google and the like are struggling to keep up with, why would people pay those large firms for the commercial version, when they can get a slightly more rough-and-ready one for ‘free?’”
Arms Race: Proprietary AI Foundation Models vs. OSS Models
“But then we have the more-difficult question: How are we going to monetize what we have? And again, you see the open collaborative development of OSS challenging the single-minded corporate development cycle. I’ve heard it described as an ‘arms race.’” Brock points out that traditionally, in conflicts between massive corporate entities and open-source communities, corporations will often resort to restricting access to ideas and technology via patents and the like.
“The reality is that that tool, that piece of code, that data–or in this case, that LLM–has already shifted everything. In this case, it has probably shifted everything in a way that will be remembered in the overall history of technology. Speaking personally, I would prefer to see the major players not try to block and control AI tools, and instead discuss the benefits that transparency can bring.”
Brock points out that LLMs themselves aren’t the only consideration here. “One of the key things to be aware of is that we're looking not just at software–we have to look at the data used to train AI models. So we’re going to need a better understanding of what data is being used, and where. So transparency is really going to matter here. We need to open up that data, but we also have to respect privacy and confidentiality. And it’ll likely be worth looking at where companies are building these things, and which data they're using–whether they’re training on their own data or using data from other sources, public or otherwise.”
How OSS Licenses May Need to Evolve in the Age of Generative AI
Turning to the topic of licensing, Brock notes that while many community organizations are convening throughout the year to consult on the topic, open-source licenses on the whole may need to change to accommodate the effects of AI. “I think that we will see some shift in licensing, but the critical bit to understand is that open-source licenses themselves aren't universal.”
“But the reason we’re looking at a new definition isn't because they don't work for the software, but because there is a lot of nuance here. We need to look at the interaction of the software with the data and more.” Brock suggests that evolution is likely to take place asynchronously as companies continue to move at the speed of their own business needs. “Whilst the consultation is going on, I think we will see licenses being modified by companies and used in a way that's appropriate for them to be able to share some of their assets. I suspect that we will also see more around data cards and how those interface with the licenses.”
How International Regulation May Affect GenAI and Open Source
AI regulation is still in extremely early stages worldwide, and there are notable divergences in policy across different countries and territories. For example, while EU regulatory bodies issued multiple policies such as the Digital Services Act and the AI Act, other nations, such as the US, have been slower to respond, while Japan appears to be taking a more-lenient approach. However, Brock is hopeful that at some point, the international regulatory community will find common ground. “I think governments have no choice but to get together on AI. And I'm extraordinarily hopeful that if the idea of open standards ends up in the middle of it, we might see some cohesive approach–not just to AI but also to open source and to that security piece, because they're going to be so intrinsically linked.”
“Let’s look at the Copilot lawsuit–which doesn’t seem to be going anywhere due to a technicality. I think that the copyright should be passed through, though it's very unlikely that regulators are going to give the AI itself copyright. They're going to give the copyright to the person who has asked the question, I think, not even the creator of the algorithm.” Brock suggests that in the new era of AIs that return outputs when prompted, the prompters–those who plug in the initial inputs, the questions, into the models may be the ones to win the copyright in most places.
The Potential Impact on Developers (From a Solicitor’s Perspective)
Brock suggests that some of the doom-and-gloom in AI discourse may be acting as an unhelpful diversion more than anything else. “‘The singularity,’ or whatever you’d like to call it, is a long way off. When we talk about generative AI, what we're really talking about is a calculator. A natural-language calculator that can take information, pass it through a process, and give us an output we may or may not be able to use. And as we think about the many ways we can use such a thing well, there are still many problems with this type of technology.”
“I was chatting with Linux pioneer Greg KH earlier, who suggested that if there are any jobs that AI will take away from coders...those would be the jobs that no one wants to do anyway. As he put it, ‘show me a coder who says they've got nothing to do, or don't have a massive backlog.’ So hopefully, what generative AI does is free up the engineers, the software developers, the coders, to do those other, higher-order jobs their brains are better suited for.”
“And as a lawyer, I should mention that lawyers have all the same concerns. They're all concerned that they're going to lose their income. But for years, we've actually been trying to do the same sort of thing that open-source developers do in terms of reusing code. Lawyers have been trying to reuse templates–to modularize and build document creation systems. My only real worry is: How do new entrants learn the basics to get to the stage where they have the knowhow to properly check the outputs of an AI and discern where the problems lie? There's that piece where you really have to ‘learn by doing.’ And Greg's response was along the lines that ‘there have been other tools that have taken away other tasks, but as usual, it’ll all probably just end out to be the same.’”
“But I do still worry a bit about that, all the same.”
More Resources:
Content from the Library
How to Make Open-Source & Local LLMs Work in Practice
How to Get Open-Source LLMs Running Locally Heavybit has partnered with GenLab and the MLOps Community, which gathers thousands...
How to Start an Open-Source Project
How to Start an Open-Source Project Why is Heavybit posting a first-principles guide on how to create an open-source project?...