Quartz sync: Jul 25, 2024, 10:50 PM

This commit is contained in:
bfahrenfort 2024-07-25 22:50:43 -05:00
parent 1565a62e35
commit ae52102708
6 changed files with 59 additions and 43 deletions

View File

@ -10,10 +10,12 @@ Artist's will, don't exploit
### Detour: plagiarism
There's also the problem of correctly sourcing information used in forming an opinion.
One proposed "solution" to AI use of copyrighted works is interestingly to cite those works used in generating an answer. But I actually think an anti-plagiarism argument that I disagree with regarding human work finds footing here. I talked about the *reductio ad absurdum* point in [[Essays/plagiarism#Response to Frye|🅿️ my response to Frye on plagiarism]]...
One proposed "solution" to AI use of copyrighted works is interestingly to cite those works used in generating an answer. But I actually think an anti-plagiarism argument that I disagree with regarding human work finds footing here. I talked about the *reductio ad absurdum* point in [[Essays/plagiarism#The Anti-Plagiarism Argument: Response to Frye|🅿️ my response to Frye on plagiarism]]...
## Economics
WIP
One point: I've refuted the technical underpinnings of one of the biggest purported value adds, ie summaries. What does that do to the optics of AI from a business standpoint?
### What these incentives teach us
At the end of the day, these policy arguments are here to suggest what direction the law should move in. To solve the economic "half" of the AI problem, what about a different kind of commercial right? Something more trademark than copyright. ==use of expression; remedies too==
## The enforcement problem

View File

@ -7,7 +7,7 @@ tags:
- copyright
date: 2023-11-04
draft: true
lastmod: 2024-03-31
lastmod: 2024-07-25
---
One ticket to the original, authorized, or in the alternative, properly licensed audiovisual work, please!
@ -19,12 +19,12 @@ One ticket to the original, authorized, or in the alternative, properly licensed
> [!warning]
> CW: US law and politics; memes
>
> **This site contains my own opinion in a personal capacity, and is not legal advice, nor is it representative of anyone else's opinion.**
> **This site contains my own opinion in a personal capacity, and is not legal advice, nor is it representative of anyone else's opinion.** Not every citation is an endorsement, and none of the authors I cite have endorsed this work.
> - Also a reminder that I wont permit inputting my work in whole or part into an LLM.
I've seen a few news articles and opinion pieces recently that support training generative AI and LLMs (such as ChatGPT/GPT-4, LLaMa, and Midjourney) on the broader internet as well as more traditional copyrighted works, without respect to the copyright holders for all of the above. For now, this will be less of a response to any one article and more of a collection of points of consideration that tie together common threads in public perception. I intend for this to become comprehensive over time.
I've seen many news articles and opinion pieces recently that support training generative AI and LLMs (such as ChatGPT/GPT-4, LLaMa, and Midjourney) on the broader internet as well as more traditional copyrighted works, without respect to the copyright holders for all of the above. For now, this will be less of a response to any one article and more of a collection of points of consideration that tie together common threads in public perception. I intend for this to become comprehensive over time.
My opinion here boils down to three main points:
My opinion here boils down to three main points. **Under existing US law**:
- Training a generative AI model on copyrightable subject matter without authorization is copyright infringement (and the proprietors of the model should be responsible);
- Generating something based on copyrightable subject matter is copyright infringement (and the proprietors and users of the model should be jointly responsible); and
- Fair use is not a defense to either of the above.
@ -33,11 +33,11 @@ I also discuss policy later in the essay. Certain policy points are instead made
## Prologue: why these arguments are popping up
<img src="/Attachments/but-he-can.jpg" alt="'I know, but he can' meme, with the RIAA defeating AI art for independent illustrators" style="height: 30em;margin: 0% 25%" loading="lazy">
In short, there's a growing sentiment against copyright in general. Copyright can enable centralization of rights when paired with a capitalist economy, which is what we've been historically experiencing with the advent of record labels/publishing companies. It's even statutorily enshrined as the "work-for-hire" doctrine. AI has the potential to be an end-run around these massive copyright repositories' rights. Some see this as a benefit.
In short, there's a growing sentiment against copyright in general. Copyright can enable centralization of rights when paired with a capitalist economy, which is what we've been historically experiencing with the advent of copyright repositories like record labels and publishing companies. It's even statutorily enshrined as the "work-for-hire" doctrine. AI has the potential to be an end-run around these massive corporations' rights, which many see as a benefit.
However, this argument forgets that intangible rights are not *yet* so centralized that independent rights-holders have ceased to exist. While AI will indeed affect central rights-holders, it will also harm individual creators and the bargaining power of those that choose to work with the central institutions. For those against copyright as a whole, I see AI as a neutral factor to the disestablishment of copyright. Due to my roots in the indie music and open-source communities, I'd much rather keep their/our/**your** present rights intact.
However, this argument forgets that intangible rights are not *yet* so centralized that independent rights-holders have ceased to exist. While AI will indeed affect central rights-holders, it will also harm individual creators and the bargaining power of those that choose to work with central institutions. Instead, I see AI as a neutral factor to the disestablishment of copyright. Due to my roots in the indie music and open-source communities, I'd much rather keep their/our/**your** present rights intact.
Reconciling the two views, I'm sympathetic to arguments against specific parts of the US's copyright regime as enforced by the courts, such as the statutory language of fair use. We as a voting population have the power to compel our representatives to enact reforms that take the threat of ultimate centralization into account, and can even work to break down what's already here. But I don't think that AI should be the impetus for arguments against the system as a whole.
Reconciling the two views, I'm sympathetic to arguments against specific parts of the US's copyright regime as enforced by the courts, such as the DMCA or the statutory language of fair use. We as a voting population have the power to compel our representatives to enact reforms that take the threat of ultimate centralization into account, and can even work to break down what's already here. But I don't think that AI should be the impetus for arguments against the system as a whole.
## The Legal Argument
Fair warning, this section is going to be the most law-heavy, and probably pretty tech-heavy too. Feel free to skip [[#The First Amendment and the "Right to Read"|-> straight to the policy debates.]]
@ -54,22 +54,22 @@ Everything AI starts with a dataset. And most AI models will start with the easi
Acquiring data for training is an unethical mess. **In human terms**, scrapers like Common Crawl will take what they want, without asking (unless you know the magic word to make it go away, or just [[Projects/Obsidian/digital-garden#Block the bot traffic!|block it from the get-go]]), and without providing immediately useful services in return like a search engine. For more information on the ethics of AI datasets, read my take on [[Essays/plagiarism#AI shouldn't disregard the need for attribution|🅿️ the need for AI attribution]], and have a look at the work of [Dr. Damien Williams](https://scholar.google.com/citations?user=riv547sAAAAJ&hl=en) ([Mastodon](https://ourislandgeorgia.net/@Wolven)).
The reason that it's copyright infringement? [*MAI Systems v. Peak Computer*](https://casetext.com/case/mai-systems-corp-v-peak-computer-inc). It holds that RAM copying (ie, moving a file from somewhere to a computer's memory) is an unlicensed copy. As of today, it's still good law, for some reason. Every single file you open in Word or a PDF reader; or any webpage in your browser, is moved to your memory before it gets displayed on the screen. Bring it up at trivia night: just using your computer is copyright infringement! It's silly and needs to be overruled going forward, but it's what we have right now. And it means that a bot drinking from the firehose is committing infringement on a massive scale.
The first reason that it's copyright infringement? [*MAI Systems v. Peak Computer*](https://casetext.com/case/mai-systems-corp-v-peak-computer-inc). It holds that RAM copying (ie, moving a file from somewhere to a computer's memory) is an unlicensed copy. As of today, it's still good law, for some reason. Every single file you open in Word or a PDF reader; or any webpage in your browser, is moved to your memory before it gets displayed on the screen. Bring it up at trivia night: just using your computer is copyright infringement! It's silly and needs to be overruled going forward, but it's what we have right now. And it means that a bot drinking from the firehose is committing infringement on a massive scale.
But then a company actually has to train an AI on that data. What copyright issues does that entail? First, let's talk about The Chinese Room.
[The Chinese Room](https://plato.stanford.edu/entries/chinese-room/) is a philosophical exercise authored by John Searle where the (in context, American) subject is locked in a room and receives symbols in Chinese slipped under the door. A computer program tells the subject what Chinese outputs to send back out under the door based on patterns and combinations of the input. The subject does not understand Chinese. Yet, it **appears** as if whoever is inside it has a firm understanding of the language to an observer of Searle's room.
[The Chinese Room](https://plato.stanford.edu/entries/chinese-room/) is a philosophical exercise authored by John Searle where the (in context, American) subject is locked in a room and receives symbols in Chinese slipped under the door. A computer program tells the subject what Chinese outputs to send back out under the door based on patterns and combinations of the input. The subject does not understand Chinese. Yet to an observer of Searle's room, it **appears** as if whoever is inside it has a firm understanding of the language.
Searle's exercise was at the time an extension of the Turing test designed to refute the theory of "Strong AI." At the time that theory was well-named, but today the AI it was talking about is not even considered AI by most. The hypothetical Strong AI was a computer program capable of understanding its inputs and outputs, and importantly *why* it took each action to solve a problem, with the ability to apply that understanding to new problems (much like our modern conception of Artificial General Intelligence). A Weak AI, on the other hand, was just the Chinese Room: taking inputs and producing outputs among defined rules. Searle reasoned that the "understanding" of a Strong AI was inherently biological, thus one could not presently exist.
- Note that some computer science sources like [IBM](https://www.ibm.com/topics/strong-ai) have taken to using Strong AI to denote only AGI, which was a sufficient, not necessary quality of a philosophical "intelligent" intelligence.
Searle's exercise was at the time an extension of the Turing test. He designed it to refute the theory of "Strong AI." At the time that theory was well-named, but today the AI it was talking about is not even considered AI by most. The hypothetical Strong AI was a computer program capable of understanding its inputs and outputs, and importantly *why* it took each action to solve a problem, with the ability to apply that understanding to new problems (much like our modern conception of Artificial General Intelligence). A Weak AI, on the other hand, was just the Chinese Room: taking inputs and producing outputs among defined rules. Searle reasoned that the "understanding" of a Strong AI was inherently biological, thus one could not presently exist.
- Note that some computer science sources like [IBM](https://www.ibm.com/topics/strong-ai) have taken to using Strong AI to denote only AGI, which was a sufficient, not necessary quality of a philosophical "intelligent" intelligence like the kind Searle contemplated.
Generative AI models from different sources are architected in a variety of different ways, but they all boil down to one abstract process: tuning an absurdly massive number of parameters to the exact values that produce the most desirable output. (note: [CGP Grey's video on AI](https://www.youtube.com/watch?v=R9OHn5ZF4Uo) and its follow-up are mainly directed towards neural networks, but do apply to LLMs, and do a great job illustrating this). This process requires a gargantuan stream of data to use to calibrate those parameters and then test the model. How it parses that incoming data suggests that, even if the method of acquisition is disregarded, the AI model still infringes the input.
#### The Actual Tech
At the risk of bleeding the [[#Generation]] section into this one, generative AI is effectively a very sophisticated next-word predictor based on the words it has read and written previously.
First, this training is deterministic. It's a pure, one-way, data-to-model transformation (one part of the process for which "transformer models" are named). The words are taken in and converted into various different representations. It's important to remember that given a specific work and a step of the training process, it's always possible to calculate by hand the resulting state of the model after training on that work. The "black box" that's often discussed in connection with AI refers to the final state of the model, when it's no longer possible to tell what effect of certain portions of the training data have had on the model.
First, this training is deterministic. It's a pure, one-way, data-to-model transformation (one part of the process for which "transformer models" are named). The words are ingested and converted into one of various types of formal representations to comprise the model. It's important to remember that given a specific work and a step of the training process, it's always possible to calculate by hand the resulting state of the model after training on that work. The "black box" that's often discussed in connection with AI refers to the final state of the model, when it's no longer possible to tell what effect of certain portions of the training data have had on the model.
If some words are more frequently associated, then that association is more "correct" to generate in a given scenario than other options. As this relates to training, the only data for that correctness determination is corpus training input. This means that an AI trains only on the words as they are on the page. Training doesn't have some external indicator of semantics that a secondary natural-language processor on the generation side can incorporate. Training thus can't be analogized to human learning processes, because **when an AI "reads" something, it isn't reading for the *forest*—it's reading for the *trees***. Idea and expression in training data are indistinguishable to AI.
If some words are more frequently associated together, then that association is more "correct" to generate in a given scenario than other options. And the only data to determine whether an association *is* correct would be that training input. This means that an AI trains only on the words as they are on the page. Training doesn't have some external indicator of semantics that a secondary natural-language processor on the generation side can incorporate. Training thus can't be analogized to human learning processes, because **when an AI trains by "reading" something, it isn't reading for the *forest*—it's reading for the *trees***. Idea and expression are meaningless distinctions to AI.
As such, modern generative AI, like the statistical data models and machine learners before it, is a Weak AI. And weak AIs use weak AI data. Here's how that translates to copyright.
- Sidebar: this point doesn't consider an AI's ability to summarize a work since the section focuses on how the *training* inputs are used rather than how the output is generated from real input. This is why I didn't want to get into generation in this section. It's confusing, but training and generation are merely linked concepts rather than direct results of each other when talking about machine learning. Especially when you introduce concepts like "temperature", which is a degree of randomness added to a model's (already variant) choices in response to an user in order to simulate creativity.
@ -88,15 +88,26 @@ The idea and expression being indistinguishable by AI may make one immediately t
### Generation
The model itself is only one side of the legal AI coin. What of the output? It's certainly not copyrightable. The US is extremely strict when it comes to the human authorship requirement for protection. If an AI is seen as the creator, the requirement is obviously not satisfied. And the human "pushing the button" probably isn't enough either. But does it infringe the training data? It depends.
#### Human Authorship
As an initial matter, AI-generated works do not satisfy the human authorship requirement. This makes them uncopyrightable, but more importantly, it also gives legal weight to the distinction between the human and AI learning process. It can be said that anything a human produces is just a recombination of everything that person's ever read. Similarly, that process is a simplified understanding of how an AI trains.
#### Expression and Infringement
Like training, generation also involves reproduction of But where a deterministic process creates training's legal issues, generation is problematic for its *non*-deterministic output.
As an initial matter, AI-generated works do not satisfy the human authorship requirement. This makes them uncopyrightable, but more importantly, it also gives legal weight to the distinction between the human and AI learning process. Like I mentioned in the training section, it's very difficult to keep discussions of training and generation separate because they're related concepts, and this argument is a perfect example of that challenge.
#### Summaries
This section is the most direct refutation of the "AI understands what it trains on" conclusion. I also think it's the most important aspect of generative models for me to discuss. **The question**: If an AI can't understand what it reads, how does it choose what parts of a work should be included in a summary of that work? A book, an article, an email?
Once again, the answer is mere probability. In training, the model is told what word to come after a word is more "correct" by how many times that sequence of words occurs in its training data. And in generation, if more of the work mentions a particular subject than the actual conclusion of the work, the subject given most attention will be what the model includes in a summary.
Empirical evidence of this fact can be found in the excellent post, [When ChatGPT Summarizes, it Actually does Nothing of the Kind](https://ea.rna.nl/2024/05/27/when-chatgpt-summarises-it-actually-does-nothing-of-the-kind/). It's funny how this single approach is responsible for nearly all of the problems with generative AI, from the decidedly unartistic way it "creates" to its [[Essays/plagiarism##1 Revealing what's behind the curtain|🅿️ majoritarian bent]]. I don't want this sort of technology to take any place in daily life.
#### Dr. Edgecase, or how I learned to stop worrying (about AI) and love the gig worker
Further supporting the conclusion that AI doesn't understand what it is trained on is the concept of a human-performed [microtask](https://hal.science/hal-02554196/document). AI can get things wrong, that's not new. But take a look at this:
So how do corporations try to solve the problem? Human-performed [microtasks](https://hal.science/hal-02554196/document).
AI can get things wrong, that's not new. Take a look at this:
![[limmygpt.png|Question for chatgpt: Which is heavier, 2kg of feathers or 1kg of lead? Answer: Even though it might sound counterintuitive, 1 kilogram of lead is heavier than 2 kilograms of feathers...]]
Slight variance in semantics, same answer because it's the most popular string of words to respond to that pattern of a prompt. Again, nothing new. Yet GPT-4 will get it right. This probably isn't due to an advancement in the model. My theory is that OpenAI looks at the failures published on the internet (sites like ShareGPT, Twitter, etc) and has remote validation gig workers ([already a staple in AI](https://www.businessinsider.com/amazons-just-walk-out-actually-1-000-people-in-india-2024-4)) "correct" the model's responses to that sort of query. In effect, humans are creating a massive **network of edge cases** to fix the actual model's plausible-sounding-yet-wrong responses. So that begs the question: who's responsible for the expressive, copyrightable content of these edge cases?
Slight variance in semantics, same answer because it's the most popular string of words to respond to that pattern of a prompt. Again, nothing new. Yet GPT-4 will get it right. This probably isn't due to an advancement in the model. My theory is that OpenAI looks at the failures published on the internet (sites like ShareGPT, Twitter, etc) and has remote validation gig workers ([already a staple in AI](https://www.businessinsider.com/amazons-just-walk-out-actually-1-000-people-in-india-2024-4)) "correct" the model's responses to that sort of query. In effect, corporations are exploiting ([yes, exploiting](https://www.noemamag.com/the-exploited-labor-behind-artificial-intelligence/)) developing countries to create a massive **network of edge cases** to fix the actual model's plausible-sounding-yet-wrong responses. So that begs the question: who's responsible for the expressive, copyrightable content of these edge cases?
#### Expression and Infringement; "The law part" again
Like training, generation also involves reproduction of But where a deterministic process creates training's legal issues, generation is problematic for its *non*-deterministic output.
It can be said that anything a human produces is just a recombination of everything that person's ever read. Similarly, that process is a simplified understanding of how an AI trains.
==MORE==
#### Detour: actual harm caused by specific uses of AI models
My bet for a strong factor when courts start applying fair use tests to AI output: **harm**. { *and I actually wrote this before the [[Essays/no-ai-fraud-act|No AI FRAUD Act]] 's negligible-harm provision was published, -ed.* } Here's a quick list of uses that probably do cause harm, some of them maybe even harmful *per se* (definitely harmful without even looking at specific facts).
- Election fraud and misleading voters, including even **more** corporate influence on US elections ([not hypothetical](https://www.washingtonpost.com/elections/2024/01/18/ai-tech-biden/) [in the slightest](https://web.archive.org/web/20240131220028/https://openai.com/careers/elections-program-manager), [and knowingly unethical](https://www.npr.org/2024/01/19/1225573883/politicians-lobbyists-are-banned-from-using-chatgpt-for-official-campaign-busine))
@ -104,7 +115,7 @@ My bet for a strong factor when courts start applying fair use tests to AI outpu
- Other fraud, like telemarketing/robocalls, phishing, etc
- Competition with actual artists and authors (I am VERY excited to see where trademark law evolves around trademarking one's art or literary style. Currently, the arguments are weak and listed in the mini-argument section).
- Obsoletes human online workforces in tech support, translation, etc
- [[plagiarism##1 Revealing what's behind the curtain|🅿️ Reinforces systemic bias]]
- [[Essays/plagiarism##1 Revealing what's behind the curtain|🅿️ Reinforces systemic bias]]
- [Violates the GDPR on a technological level](https://www.theregister.com/2024/04/29/openai_hit_by_gdpr_complaint/)
- I also think being unable to delete personal data that it *has* acquired and not just hallucinated is a big problem
#### Detour 2: An Alternative Argument
@ -112,9 +123,7 @@ There's a much more concise argument that generative AI output infringes on its
Recall that AI output taken right from the model (straight from the horse's mouth) is [not copyrightable according to USCO](https://www.federalregister.gov/documents/2023/03/16/2023-05321/copyright-registration-guidance-works-containing-material-generated-by-artificial-intelligence). If the model's input is copyrighted, and the output can't be copyrighted, then there's nothing in the AI "black box" that adds to the final product, so it's literally *just* the training data reproduced and recombined. Et voila, infringement.
This argument isn't to say that anything uncopyrightable will infringe something else, but it does mean that the defendant's likelihood of prevailing on a fair use defense could be minimal.
Additionally, it makes damages infinitely harder to analyze in terms of apportionment. To be sure, the technical argument above,
This isn't to say that anything uncopyrightable will infringe something else, but it does mean that the defendant's likelihood of prevailing on a fair use defense could be minimal. Additionally, the simpler argument makes damages infinitely harder to prove in terms of apportionment.
Note that there are many conclusions in the USCO guidance, so you should definitely read the whole thing if you're looking for a complete understanding of the (very scarce) actual legal coverage of AI issues so far.
### Where do we go from here?
@ -157,5 +166,5 @@ A list of smaller points that would cast doubt on the general zeitgeist around t
- First, you could make a case for the way data is scraped from the internet being so comprehensive that there's no way to compete with it by using more fair/ethical methods. This could allow a remedy that mandates AI be trained using some judicially devised (or hey, how about we get Congress involved if they don't like the judicial mechanism), ethical procedure. The arguments are weaker, but they could be persuasive to the right judge.
- Second, AI work product is on balance massively cheaper than hiring humans, but has little other benefit, and causes many adverse effects. A pure cost advantage providing windfall for one company but not others could also be unfair. Again, it's very weak right now in my opinion.
## Further Reading
- Copyleft advocate Cory Doctorow has written a piece on [why copyright is the wrong vehicle to respond to AI](https://pluralistic.net/2024/05/13/spooky-action-at-a-close-up/#invisible-hand). Reply-guying his technical facts and legal conclusions is left as an exercise for the reader; I articulated [[#Training|that]] [[#Generation|background]] in this write-up so it can can be used as a reference. What's more interesting is his take on the non-fair use parts of the [[#Policy|normative]] debate. Reasonable minds can and should differ in whether they think copyright *ought to* be enforced against AI.
- Copyleft advocate Cory Doctorow has written a piece on [why copyright is the wrong vehicle to respond to AI](https://pluralistic.net/2024/05/13/spooky-action-at-a-close-up/#invisible-hand). Reply-guying his technical facts and legal conclusions is left as an exercise for the reader; I articulated [[#Training#The Actual Tech|that]] [[#Generation|background]] in this write-up as comprehensively as I could so that readers can reference it to evaluate the conclusions of other works. What's more interesting is his take on the non-fair use parts of the [[#Policy|normative]] debate. This entry holds my conclusions on why copyright *can* be enforced against AI; reasonable minds can and should differ on whether it *ought to* be.
- [TechDirt has a great article](https://www.techdirt.com/2023/11/29/lets-not-flip-sides-on-ip-maximalism-because-of-ai/) that highlights the history of and special concerns around fair use. I do think that it's possible to regulate AI via copyright without implicating these issues, however. And note that I don't believe that AI training is fair use, for the many reasons above.

View File

@ -54,6 +54,7 @@ First, AI holds itself out as authoritative. Wrongfully so, due to incessant "ha
Second and perhaps most importantly, because of the actual issue of AI bias, transparency in what an AI was trained on is paramount. As a society, the ability to question the source of some facts presented to us is already beneficial (as discussed elsewhere in this essay). But for AI, we need to ensure that the generated statements are not only correct, but not disregarding other positions categorically because they were made by sources that the AI incorrectly considers non-authoritative. An AI model could look at two positions, one with many more datapoints supporting it, and thus completely ignore the second position in its answer to a prompt. Now imagine that the former is a white man's perspective, and the second a black woman's. It's not inconceivable that an AI could enshrine systemic bias. Attribution allows people who've made careers in this field to critically examine a dataset and look for this sort of gap. In that way, it makes a **better** AI model (assuming the goal of AI is to be accurate) because of more community oversight, not just one that's more ethically trained. More information available at the [Distributed AI Research Institute](https://www.dair-institute.org/) from
- Sidebar: huh, turns out that this argument parallels the open-source philosophy.
- Countless actual examples exist, too many to list. I documented one incident [here](https://social.treehouse.systems/@be_far/111990173625090669).
- More directly related to my hypothetical, covert *racism* is present in LLMs as well because of the content of their training datasets. It's almost impossible to remove. But that's outside of the scope of this entry.
### #2: \[citation needed\] for responses to prompts
Not to be confused with Molly White's [excellent newsletter](https://citationneeded.news/). This requirement is a more fine-grained mitigation for the transparency issues present in the dataset at large. It also provides evidence for potential copyright infringement lawsuits if the AI has also copied the expression of the paper it sourced. Note that this isn't the be-all, end-all solution to the problem of copyright infringement by AI. Read more of my take on that [[Essays/ai-infringement|🤖 here]].
@ -61,7 +62,7 @@ Not to be confused with Molly White's [excellent newsletter](https://citationnee
## The Anti-Plagiarism Argument: Response to Frye
Above, I outlined some specific examples that I come across in my daily life as a contributor, digital gardener, and academic writer. But now, I'd like to address piece-by-piece [an argument by Brian Frye](https://www.techdirt.com/2024/01/09/plagiarism-is-fine/) supporting plagiarism in general. I've also structured my complete case that supports good-faith deterrence of plagiarism as a social and academic norm.
- Sidebar: The specific event that sparked Frye's article was conducted in anything but good faith.
- Sidebar: The specific event that sparked Frye's article was conducted in anything but good faith. The conclusions reiterated using that event as a vehicle are what I wish to address.
There are many instances where enforcing an author to either use original content or state whose content they are using can be valuable, many of which demonstrate the key values underlying anti-plagiarism sentiment. The article makes several claims about plagiarism, and I have a different interpretation of quite a few of them. Since it's a summary piece, I also looked at Frye's considerable scholarly work on plagiarism to get a better understanding of the points made.
### Granularity
@ -84,6 +85,6 @@ Anyone who identifies as a "proud plagiarist," this is your notice that I may re
I'm dying to dig into enterprise software engineering and attribution/licensing as well. "The StackOverflow problem" is something that the industry has been struggling with for years, and there are some pretty strong counter arguments to my position that come out of critique of softeng and originality. Given the existence of Copilot (and StackOverflow's ai stance), this ties into AI as well.
## Further Resources
This paper spells out what we should be thinking about relative to information authority, trust, and societal need when talking about generative AI. **Sections 4 and 5 are very good**; section 6 jumps the shark by immediately forgetting that it's about modern generative AI and ranting about historical Google bugs instead (which the paper would actually classify as a discriminative IA system, good under its arguments). [Bender & Shah (unpublished)](https://faculty.washington.edu/ebender/papers/Envisioning_IAS_preprint.pdf)
[Bender & Shah](https://dl.acm.org/doi/10.1145/3649468): This paper spells out what we should be thinking about relative to information authority, trust, and societal need when talking about generative AI. **Sections 4 and 5 are very good**; section 6 jumps the shark by immediately forgetting that it's about modern generative AI and ranting about historical Google bugs instead (which the paper would actually classify as a discriminative IA system, good under its arguments).
This feels a lot more callout-y and like a public shaming of those who have plagiarized, but massively popular video essayist hbomberguy has a [piece on YouTube content plagiarism](https://www.youtube.com/watch?v=yDp3cB5fHXQ). I think that it's an especially dangerous area to be shouting "go plagiarists go" in, because of the effects. People's entire livelihoods are at stake on YouTube; someone else should not be able to make considerably more money than the original uploader for a reupload just by virtue of an algorithm that cannot be understood. This is not entirely a plagiarism problem (it's a platform and platform inertia problem too, but I haven't written my essay on those yet...sneak peek: [[Essays/content-death|Content Death]]), but this kind of unfair competition is a very significant side effect. See also [ProZD's experience](https://www.youtube.com/watch?v=b9iw6UUMOuw) and [follow-up](https://www.youtube.com/watch?v=Fel4WTp7cTc).
This feels a lot more callout-y and like a public shaming of those who have plagiarized, but massively popular video essayist hbomberguy has a [piece on YouTube content plagiarism](https://www.youtube.com/watch?v=yDp3cB5fHXQ). The effects detailed in this video make me think that YouTube is an especially dangerous area to be shouting "go plagiarists go" in. People's mental health and entire livelihoods are at stake on YouTube; someone else should not be able to make considerably more money than the original uploader for a reupload just by virtue of an algorithm that cannot be understood. This is not entirely a plagiarism problem (it's a platform and platform inertia problem too, but I haven't written my essay on those yet...sneak peek: [[Essays/content-death|Content Death]]), but this kind of unfair competition is a very significant side effect. See also [ProZD's experience](https://www.youtube.com/watch?v=b9iw6UUMOuw) and [follow-up](https://www.youtube.com/watch?v=Fel4WTp7cTc).

View File

@ -9,9 +9,6 @@ tags:
draft: true
date: 9-08-23
---
I built two mechanical keyboards in the past month. Here's what I learned.
## The Problem
I have two areas where I use keyboards. My home desk, and my work.
@ -20,11 +17,16 @@ At home, I had a "gaming keyboard", which was starting to become unbearable. It
And at work, I had a generic membrane keyboard that always felt off no matter how I positioned it. Obviously, a change was needed.
I do still like a quieter typing experience, as long as it feels alright to my fingers. So I decided to go with newer silent switches.
As such, I did what I do best, and I hyperfixated. I have now built two mechanical keyboards in the past month, and I'm very happy with them! Here's what I learned. There are three basic components to a keyboard build:
## Switches
I've previously
I've previously tested all different kinds of switches. A switch's sound and feel falls into three different categories:
- Linear: Most people will have experienced this with a cheap HP membrane keyboard at their work or school. For those that haven't, it's a much longer travel compared to the flat, short press of a laptop keyboard or similar "scissor switch" keyboards. The amount of force needed to press it down is the same throughout the keypress.
- Tactile: Unlike a linear switch, somewhere in the keystroke, a tactile switch will feature a 'bump' where the force required increases and decreases. A **D-shape** bump will be in the middle of the stroke, a **P-shape** bump will be at the end of the stroke.
- I think a D-shape should be called a thorn bump, but I'm weird.
- Clicky: instead of the tactile bump, where the change is mostly in feel (and the added force of the bump makes *you* cause the noise), clicky switches have a separate metal tang that gets compressed and snapped against another piece of metal during the stroke. This produces a sharp metallic sound and unique feel that some people enjoy.
Personally, I like
### Tech Detour

View File

@ -7,11 +7,11 @@ date: 2024-07-08
lastmod: 2024-07-31
---
## Housekeeping
At some point this decade, I'd like to stop living through major historical events, please.
## Pages
- Fixes: [[Essays/no-ai-fraud-act|No AI FRAUD Act]]
- New: [[Projects/legal-practice-automation|Automation and the Law]]
## Status Updates
-
- Content update: [[Essays/plagiarism|Plagiarism]]
- Precisely **one** main section and **one** subsection left in the AI infringement page before it's ready for edits and publication!
## Helpful Links
[[todo-list|Site To-Do List]] | [[index|Home]]

View File

@ -9,17 +9,19 @@ draft: false
---
Heres what Im working on right now. Some of it might not make sense, I use this personally to keep track of what Im writing.
Bolded entries are being actively written, and may either be published with or note edits in the next update post. Italicized entries have been started, but placed on the back burner.
The date on this page will not be accurate in order to avoid spamming RSS feeds.
- High Priority
- [ ] ai-infringement
- [ ] **ai-infringement**
- [ ] Ranting about ethics and AI research in a misc diatribe
- [ ] how to ruin a brand (google, SO, more generally Youtube)
- [ ] Fn Lock
- [ ] Everything you need to know to swap to Linux
- [ ] Judicial-action
- [ ] Add the third party doctrine to my-cloud
- [ ] Add https://www.404media.co/google-leak-reveals-thousands-of-privacy-incidents/ to my-cloud
- [ ] *Fn Lock*
- [ ] **Everything you need to know to swap to Linux**
- [ ] *Judicial-action*
- [ ] Add the third party doctrine to my-cloud, add the “if you arent persuaded to not use proprietary services, please be careful about what you put on them” section (google, tesla…)
- [ ] https://www.404media.co/google-leak-reveals-thousands-of-privacy-incidents/ to my-cloud
- [ ] FPV
- [ ] Keyboard writeup
- [ ] **Keyboard writeup**
- [ ] Moving to FIDO2 and password managers