Quartz sync: Jan 19, 2024, 10:50 PM

This commit is contained in:
bfahrenfort 2024-01-19 22:50:29 -06:00
parent 7f9311cd6f
commit eb2751c54b
5 changed files with 40 additions and 16 deletions

View File

@ -1,5 +1,5 @@
---
title: Why Generative AI is Just Copyright Infringement In a Trench Coat
title: Generative AI is Copyright Infringement In a Trench Coat
tags:
- essay
- seedling
@ -9,37 +9,52 @@ tags:
date: 2023-11-04
draft: true
---
One ticket to the original, authorized, or in the alternative, properly licensed audiovisual work, please!
*A film roll clatters to the ground from underneath a suspiciously camera-shaped bulge in the figure's oversized trench coat.*
> [!info] Im looking for discourse!
> Critique my points and make your own arguments. Thats what the comments section is for.
Quick reiteration: **This site contains my own opinion in a personal capacity, and is not legal advice, nor is it representative of anyone else's opinion.**
- Also a reminder that I wont permit inputting my work in whole or part into an LLM.
I've seen a few news articles and opinion pieces recently that support training generative AI and LLMs on the broader internet as well as more traditional copyrighted works, without respect to the copyright holders for all of the above. There are some common themes I'd like to address right now, but I'll add more in future.
## Prerequisite: why these arguments are popping up
## Prerequisite: existing precedent
Fair warning, this section is going to be the most law-heavy. The field is notoriously paywalled, but I'll try to link to publicly available versions of my sources whenever possible.
Don't criticize my sources in this section unless a case has been overruled or a statute has been repealed (ie, I **can't** rely on it). This is my interpretation of what's here (also again not legal advice or a professional opinion).
I've seen a few news articles and opinion pieces recently that support training generative AI and LLMs on the broader internet as well as more traditional copyrighted works, without respect to the copyright holders for all of the above. For now, this will be less of a response to any one article and more of a collection of points of consideration that tie together common threads in public perception. I intend for this to become comprehensive.
My opinion here boils down to three main points:
- Training a generative AI model on copyrightable subject matter without authorization is copyright infringement (and the proprietors of the model should be responsible);
- Using a generative AI to generate something where the weights used to determine what the AI outputs were based on copyrightable subject matter trained on without authorization is copyright infringement (and the proprietors and users of the model should be jointly responsible); and
- Fair use is not a defense to either of the above infringements.
## Prologue: why these arguments are popping up
WIP
## The Legal Argument
Fair warning, this section is going to be the most law-heavy, and probably pretty tech-heavy too. Feel free to skip [[#The First Amendment and the "Right to Read"|-> straight to the policy debates.]] The field is notoriously paywalled, but I'll try to link to publicly available versions of my sources whenever possible.
Please don't criticize my sources in this section unless a case has been overruled or a statute has been repealed (ie, I **can't** rely on it). This is my interpretation of what's here (also again not legal advice or a professional opinion). Whether a case is binding on you personally doesn't weigh in on whether its holding is the nationally accepted view.
For all of the below analysis, assume that the hypothetical model in question has been trained on some work which has a US copyright registered with the original author.
### Training
Everything AI starts with a dataset.
The core tenet of copyright is that the doctrine protects original expression, meaning you can't copyright facts. One common legal argument against training as infringement is that the AI extracts facts, not the author's creativity, from a work. But that position assumes that the AI is capable of first differentiating facts and art, and further separating them in a way analogous to the human mind's. First, let's talk about The Chinese Room.
[The Chinese Room](https://plato.stanford.edu/entries/chinese-room/) is a philosophical exercise authored by John Searle where the (in context, American) subject is locked in a room and receives symbols in Chinese slipped under the door. A computer program tells the subject what Chinese outputs to send back out under the door based on patterns and combinations of the input. The subject does not understand Chinese. Yet, it **appears** as if whoever is inside it has a firm understanding of the language to an observer of Searle's room.
Searle's exercise was at the time an extension of the Turing test designed to refute the theory of "Strong AI." At the time that theory was well-named, but today the AI it was talking about is not even considered AI by most. Strong AI was the theory that a computer could be programmed to However, it can be easily applied to many other programming fields—notably compiler design—with the most pertinent here being natural language processing. To distinguish
- Note that some computer science sources like [IBM](https://www.ibm.com/topics/strong-ai) have taken to using Strong AI to denote AGI, which was only a sufficient, not necessary, quality of a philosophical "intelligent" intelligence.
- The idea and expression being the same may give rise to some claims of merger doctrine; that is, the idea merges with the expression, so it is not copyrightable. That would not be a correct reading of merger doctrine. [Ets-Hokin v. Skyy Spirits, Inc.](https://casetext.com/case/ets-hokin-v-skyy-spirits-inc) makes it clear that the doctrine is more about disregarding the types of works that are low-expressivity by default, and that this "merge" is just a nice name to remember the actual test by. Confusing name, easy doctrine.
### Generation
### Fair Use
#### Detour: actual harm caused by specific uses of AI models
My bet for a strong factor when courts start applying fair use tests to AI output is that the use in the instant case causes or does not cause harm. Here's a quick list of uses that probably do cause harm.
This section invalidates a lot of different arguments, but some have other nuances that are more reflective of potential shifts in the law and what enforcing law on the internet really means. Policy debates are always good, so I'll still go into those.
## The First Amendment and the "Right to Read"
WIP
## Putting your work "out there" on the internet
WIP
Artist's will, don't exploit
### Detour: plagiarism
There's also the problem of correctly sourcing information used in forming an opinion.
@ -48,4 +63,7 @@ One proposed "solution" to AI use of copyrighted works is interestingly to attri
WIP
## Mini-arguments
A list of little statements that would cast doubt on the
A list of little statements that would cast doubt on the general legitimacy of the AI boom that I found compelling. Most are spread across the fediverse; others are blog posts/articles.
- [Cartoonist Dorothys emotional story re: midjourney and exploitation against author intent](https://socel.net/@catandgirl/111766715711043428)
- [Misinformation worries](https://mas.to/@gminks/111768883732550499)

View File

@ -24,6 +24,8 @@ And when using my work, I request:
- That you **never** use my work to train an LLM because its output is then plagiarizing my work (and post-hoc attribution will not remedy that use in this specific case)
However, note that much of the entries here are my personal perspective on the subject matter, so much of this website falls under "my source is I made it up" and needs no attribution.
Others have much more elaborate views on creative or entertaining works, so I will touch on the artist's perspective, but that side of plagiarism is outside of my personal knowledge.
## Where My Views Originate
There are two institutions in my life that have most certainly contributed to my broader position on attribution.
@ -31,15 +33,17 @@ Coming from academia, my culture is very much "show your work." It was always be
The legal field is even more source-mandatory due to the system of precedent, that judges *must* follow the rules set by prior decisions binding on their court. The onus is almost entirely on the advocates to inform the judge what they must do (while arguing what they *should* do). The profession places very little value on original statements as a result. The expression inherent in how you arrange the statements of others, combined with your ability to find favorable statements, is what determines your skill level as an attorney.
There's definitely a gap in my knowlege/views that broadens the more creative or traditionally-considered-artistic the subject matter gets. Copyright absolutely extends to the arts, but what place does attribution have when the purpose is to entertain? One can hardly document one's creative experience when working on a novel, a script, a painting, in the same way a legal brief can be. I do believe in the necessity of personal attribution for those who directly contributed to an artistic work (think the credits section of a movie) for professional reasons, but beyond that, I'm uncertain.
As mentioned above, there's definitely a gap in my knowlege/views that broadens the more creative or traditionally-considered-artistic the subject matter gets. Copyright absolutely extends to the arts, but what place does attribution have when the purpose is to entertain? One can hardly document one's creative experience when working on a novel, a script, a painting, in the same way a legal brief can be. I do believe in the necessity of personal attribution for those who directly contributed to an artistic work (think the credits section of a movie) for professional reasons, but beyond that, I'm uncertain.
## Digital Gardening and Plagiarism
For digital gardening in particular, attribution is integral to the concept. [[Misc/what-is-a-garden|A digital garden]] is a network, and the culture of the digital garden is to provide paths out of the current webpage to others on the same site or even to other websites. These associations between webpages make up a comprehensive experience that differs from modern web use (Google search, click, close the tab) and looks more like Wikipedia spelunking.
Thus, the true value of attribution in a digital garden is mostly in the link itself rather than the substance of the current page or the linked page.
Thus, the true value of attribution in a digital garden is mostly in the link itself rather than the substance of the current page or the linked page. This does not discount the importance of linking to those resources, though.
## To-be-written
I want to address piece-by-piece [an argument by Brian Frye](https://www.techdirt.com/2024/01/09/plagiarism-is-fine/) supporting plagiarism in general. He's a prolific IP scholar, so I'll probably look through his academic works as well (*Against Creativity*, 11 N.Y.U. J.L. & Liberty 426 (2017), looks pretty interesting). To be clear, I don't want to get into the absolute witch hunt that inspired the linked article, but in the article he reiterates his greater conclusions about attribution to say that ALL plagiarism accusations are silly, which are what I want to respond to.
- Planned topics: granularity, necessity, nature of the work/merit, nature of the work/type of content.
I also want to discuss disrespect of creators intent for their works and what to label that practice. This applies both to my AI essay and plagiarism talking points.
Anyone who identifies as a "proud plagiarist," this is your notice that I may respond to your opinions, and I will properly attribute you when doing so.
- Readers: **Don't harass anyone I cite, please**. We disagree on the topic, and since all it really bears on is respect and authoritative nature until it goes into copyright infringement territory, there aren't any high stakes.

View File

@ -15,5 +15,5 @@ Not much progress on the more lengthy articles since I'm currently in term paper
- Added page on [[Programs I Like/functional-programming|functional programming]], I'll rant about Monads and Arrows some other time
- Updated the [[Programs I Like/hundred-rabbits|Hundred Rabbits Ecosystem]] (here's their [Nov Update](https://100r.co/site/log.html#nov2023))
## Status Updates
- Finally created a Mastodon account! [@be_far@treehouse.systems](https://social.treehouse.systems/@be_far), go follow me.
- Finally created a Mastodon account! <a rel=“me” href=“https://social.treehouse.systems/@be_far”>@be_far@treehouse.systems</a> go follow me.
- Got email working for comments!! Go check it out and use the fancy new sign in.

View File

@ -3,6 +3,8 @@ title: About Me
date: 2023-08-23
---
Im an enthusiast for all things DIY. Hardware or software, if theres a project to be had I will travel far down the rabbit hole to see it completed.
I can be reached in the comments here or on Mastodon <a rel=“me” href=“https://social.treehouse.systems/@be_far”>@be_far@treehouse.systems</a>.
## By Day
I'm a law student aiming to practice in intellectual property litigation. At a high level, this sort of work primarily involves pointing a lot of fingers and trying to force money to change hands. I enjoy the lower levels the most, where attorneys can really sink their teeth into the kind of technical issues that fascinate me.
## By Night

View File

@ -16,7 +16,7 @@ On my little corner of the internet, I document my adventures in tech and compla
# Welcome!
You're on a site called a [[Misc/what-is-a-garden|Digital Garden]]. Here's some info on [[Essays/why-i-garden|Why I Garden]].
This site changes often. Feel free to subscribe to [the RSS feed](/index.xml) for a ping every time I make a new entry. You can also check [Updates](/Updates) for a monthly list of changes.
This site changes often. Feel free to subscribe to [the RSS feed](/index.xml) for a ping every time I make a new entry. You can also check [Updates](/Updates) for a monthly list of changes. I can also be found on <a rel="me" href="https://social.treehouse.systems/@be_far">Mastodon</a>.
> [!question] What can I see here?
> I [[about-me|(me, myself)]] write about:
> - Projects I've undertaken and programs that I've used