From 661673e9c695f8988c5aa551f2c90f1c9a8f1b4a Mon Sep 17 00:00:00 2001 From: bfahrenfort Date: Sat, 2 Nov 2024 23:36:40 +1100 Subject: [PATCH] Quartz sync: Nov 2, 2024, 11:36 PM --- content/{Dict => Atomic}/BSD.md | 0 content/{Dict => Atomic}/friction.md | 0 content/Atomic/gen-ai.md | 65 ++++++++++++++++ content/{Dict => Atomic}/index.md | 2 +- content/{Dict => Atomic}/integrity.md | 0 content/{Dict => Atomic}/linux-isms.md | 0 content/{Dict => Atomic}/lsat.md | 0 content/{Dict => Atomic}/resistance.md | 0 content/{Dict => Atomic}/shell.md | 0 content/{Dict => Atomic}/symlink.md | 0 content/{Dict => Atomic}/what-is-a-garden.md | 0 content/Essays/ai-infringement.md | 2 +- content/Essays/law-school.md | 4 +- content/Essays/normative-ai.md | 81 ++++++++++++++++++++ content/Essays/plagiarism.md | 2 +- content/Misc/ai-integrity.md | 2 +- content/Misc/ai-prologue.md | 23 ++++++ content/Misc/generation-copyright.md | 56 ++++++++++++++ content/Misc/training-copyright.md | 39 ++++++++++ content/Programs I Like/code-editors.md | 4 +- content/Programs I Like/terminals.md | 2 +- content/Projects/Obsidian/home.md | 2 +- content/Projects/my-computer.md | 2 +- content/Projects/nvidia-linux.md | 63 ++++++++------- content/Resources/copyright.md | 20 +++++ content/Resources/learning-linux.md | 4 +- content/Updates/2024/nov.md | 26 +++++++ content/Updates/2024/sept.md | 2 +- content/bookmarks.md | 3 + content/index.md | 2 +- content/todo-list.md | 1 - 31 files changed, 363 insertions(+), 44 deletions(-) rename content/{Dict => Atomic}/BSD.md (100%) rename content/{Dict => Atomic}/friction.md (100%) create mode 100644 content/Atomic/gen-ai.md rename content/{Dict => Atomic}/index.md (94%) rename content/{Dict => Atomic}/integrity.md (100%) rename content/{Dict => Atomic}/linux-isms.md (100%) rename content/{Dict => Atomic}/lsat.md (100%) rename content/{Dict => Atomic}/resistance.md (100%) rename content/{Dict => Atomic}/shell.md (100%) rename content/{Dict => Atomic}/symlink.md (100%) rename content/{Dict => Atomic}/what-is-a-garden.md (100%) create mode 100644 content/Essays/normative-ai.md create mode 100644 content/Misc/ai-prologue.md create mode 100644 content/Misc/generation-copyright.md create mode 100644 content/Misc/training-copyright.md create mode 100644 content/Resources/copyright.md create mode 100644 content/Updates/2024/nov.md diff --git a/content/Dict/BSD.md b/content/Atomic/BSD.md similarity index 100% rename from content/Dict/BSD.md rename to content/Atomic/BSD.md diff --git a/content/Dict/friction.md b/content/Atomic/friction.md similarity index 100% rename from content/Dict/friction.md rename to content/Atomic/friction.md diff --git a/content/Atomic/gen-ai.md b/content/Atomic/gen-ai.md new file mode 100644 index 000000000..f518e8702 --- /dev/null +++ b/content/Atomic/gen-ai.md @@ -0,0 +1,65 @@ +--- +title: Generative AI +tags: + - ai + - seedling + - glossary + - essay + - legal + - programming + - toc +date: 2024-11-02 +lastmod: 2024-11-02 +draft: true +--- +Generative AI models from different sources are architected in a variety of different ways, but they all boil down to one abstract process: tuning an absurdly massive number of parameters to values that produce the most desirable output. (note: [CGP Grey's video on AI](https://www.youtube.com/watch?v=R9OHn5ZF4Uo) and its follow-up are mainly directed towards neural networks, but do apply to LLMs, and do a great job illustrating this). This process requires a gargantuan stream of data to use to calibrate those parameters and then test the model. +- Sidebar: you're nearly guaranteed not to find the optimal combination of several billion parameters, each tunable to several decimals. When I say "desirable," I really mean "good enough." + +Generative AI resembles a Chinese Room. [The Chinese Room](https://plato.stanford.edu/entries/chinese-room/) is a philosophical exercise authored by John Searle where the (in context, American) subject is locked in a room and receives symbols in Chinese slipped under the door. A computer program tells the subject what Chinese outputs to send back out under the door based on patterns and combinations of the input. The subject does not understand Chinese. Yet to an observer of Searle's room, it **appears** as if whoever is inside it has a firm understanding of the language. + +Searle's exercise was at the time an extension of the Turing test. He designed it to refute the theory of "Strong AI." At the time that theory was well-named, but today the AI it was talking about is not even considered AI by most. The hypothetical Strong AI was a computer program capable of understanding its inputs and outputs, and importantly *why* it took each action to solve a problem, with the ability to apply that understanding to new problems (much like our modern conception of Artificial General Intelligence). A Weak AI, on the other hand, is just the Chinese Room: taking inputs and producing outputs among defined rules. Searle reasoned that the "understanding" of a Strong AI was inherently biological, thus one could not presently exist. +- Note that some computer science sources like [IBM](https://www.ibm.com/topics/strong-ai) have taken to using Strong AI to denote only AGI, which was a sufficient, not necessary quality +### Causes for concern +Here are some of the many actualized and potential misuses of AI: +- Election fraud and misleading voters, including even **more** corporate influence on US elections ([not hypothetical](https://www.washingtonpost.com/elections/2024/01/18/ai-tech-biden/) [in the slightest](https://web.archive.org/web/20240131220028/https://openai.com/careers/elections-program-manager), [and knowingly unethical](https://www.npr.org/2024/01/19/1225573883/politicians-lobbyists-are-banned-from-using-chatgpt-for-official-campaign-busine)) + - [Claiming](https://www.washingtonpost.com/politics/2024/03/13/trump-video-ai-truth-social/) misleading voters? + - { *ed.: although this gives us the glorious schadenfreude of "I HATE TAYLOR SWIFT!"* } +- Other fraud, like telemarketing/robocalls, phishing, etc +- Competition with actual artists and authors (I am VERY excited to see where trademark law evolves around trademarking one's art or literary style. Currently, the arguments are weak) +- Obsoletes human online workforces in tech support, translation, etc +- [[Essays/plagiarism##1 Revealing what's behind the curtain|🅿️ Reinforces systemic bias]] +- [Violates the GDPR on a technological level](https://www.theregister.com/2024/04/29/openai_hit_by_gdpr_complaint/) + - I also think being unable to delete personal data that it *has* acquired and not just hallucinated is a big problem generally + +## Training +Training is a deterministic process. It's a pure, one-way, data-to-model transformation (one part of the process for which "transformer models" are named). The words are ingested and converted into one of various types of formal representations to comprise the model. It's important to remember that given a specific work and a step of the training process, it's always possible to calculate by hand the resulting state of the model after training on that work. The "black box" that's often discussed in connection with AI refers to the final state of the model, when it's no longer possible to tell what effects the data ingested at earlier steps had on the model. + +Training can't be analogized to human learning processes, because when an AI trains by "reading" something, it isn't reading for the *forest*; it's reading for the *trees*. In the model, if some words are more frequently associated together, then that association is more "correct" to generate in a given scenario than other options. A parameter sometimes called "temperature" determines how far the model will stray from the correct next word. And the only data to determine whether an association *is* correct would be that training input. This means that an AI trains only on the words as they are on the page. Training can't have some external indicator of semantics that a secondary natural-language processor on the generation side could. If it could, it would need some encoding—some expression—that it turns the facts into. Instead, it just incorporates the word as it read it in, and the data about the body of text it was contained in. + +As such, idea and expression are meaningless distinctions to AI. + +[[Misc/training-copyright|Training AI may be copyright infringement]]. If it is, perhaps the biggest legal question surrounding AI is: [[Essays/normative-ai#Fair Use|does AI training count as fair use?]] +### Detour: Garbage In, Garbage Out + +Common Crawl logo edited to say 'common crap' instead + +A very big middle finger to the Common Crawl dataset, who still tries to scrape this website. [[Projects/Obsidian/digital-garden#Block the bot traffic!|Block the bot traffic]]. If I had the time or motivation, I would find a way to instead of blocking these bots, redirect them to an AI generated fanfiction featuring characters from The Bee Movie, including poisoned codewords. +## Generation +Generative AI training creates a sophisticated next-word predictor that generates text based on the words it has read and written previously. + +In the case of image models, it creates an interpolator that starts from a noise pattern and moves values until they resemble portions of its training data. Specifically, portions which it has been told have orthogonal expression to the prompt given to it by the user. + +[[Misc/generation-copyright|Generated output may infringe the training data]]. +## Other/emerging terminology +"Retrieval-augmented generation" (RAG) partitions off a specific set of a model's training data as the "knowledge body", which the model will attempt to copy-paste from when responding to your questions. It's implemented by skewing the weights of the training data, and searching the output back in the knowledge body to find the source of the output. + +"Deep document understanding" is the name of a tool to classify regions of a file. It's a misnomer, this is not in and of itself an 'understanding' any more than drawing circles around your tax return boxes would be. + +"Large reasoning models" (LRMs) are LLMs that use "repeated sampling" to generate multiple responses to one query. They then use a reinforcement learner to decide which of these responses is more...responsive. Then, they generate a slew of steps which could be used to reach that response, and a learner picks which procedure looks the most correct. This isn't reasoning. +## Further Reading +- Read about [[Essays/normative-ai|why]] copyright law should be enforced against AI in its dedicated essay. +- If you're *really* interested in the math behind an LLM (like I am, haha), [here's a great introduction to the plumbing of a transformer model](https://santhoshkolloju.github.io/transformers/). +- [Pivot to AI](https://pivot-to-ai.com/) is a hilariously snarky newsletter (and RSS feed!) that lampoons AI and particularly AI hype for what it is. +- Read about the problems that generative AI is causing at the [Distributed AI Research Institute](https://www.dair-institute.org/). + +What if we invert GenAI to make a /gen AI lmao \ No newline at end of file diff --git a/content/Dict/index.md b/content/Atomic/index.md similarity index 94% rename from content/Dict/index.md rename to content/Atomic/index.md index 4a2ee484d..a60a89f12 100644 --- a/content/Dict/index.md +++ b/content/Atomic/index.md @@ -1,5 +1,5 @@ --- -title: Dict +title: Atomic tags: - toc - glossary diff --git a/content/Dict/integrity.md b/content/Atomic/integrity.md similarity index 100% rename from content/Dict/integrity.md rename to content/Atomic/integrity.md diff --git a/content/Dict/linux-isms.md b/content/Atomic/linux-isms.md similarity index 100% rename from content/Dict/linux-isms.md rename to content/Atomic/linux-isms.md diff --git a/content/Dict/lsat.md b/content/Atomic/lsat.md similarity index 100% rename from content/Dict/lsat.md rename to content/Atomic/lsat.md diff --git a/content/Dict/resistance.md b/content/Atomic/resistance.md similarity index 100% rename from content/Dict/resistance.md rename to content/Atomic/resistance.md diff --git a/content/Dict/shell.md b/content/Atomic/shell.md similarity index 100% rename from content/Dict/shell.md rename to content/Atomic/shell.md diff --git a/content/Dict/symlink.md b/content/Atomic/symlink.md similarity index 100% rename from content/Dict/symlink.md rename to content/Atomic/symlink.md diff --git a/content/Dict/what-is-a-garden.md b/content/Atomic/what-is-a-garden.md similarity index 100% rename from content/Dict/what-is-a-garden.md rename to content/Atomic/what-is-a-garden.md diff --git a/content/Essays/ai-infringement.md b/content/Essays/ai-infringement.md index f3467066a..37cfed385 100755 --- a/content/Essays/ai-infringement.md +++ b/content/Essays/ai-infringement.md @@ -1,5 +1,5 @@ --- -title: "Generative AI: Bad Faith Copyright Infringement" +title: "[ARCHIVED] Generative AI: Bad Faith Copyright Infringement" tags: - essay - ai diff --git a/content/Essays/law-school.md b/content/Essays/law-school.md index adf198c8d..52294c79b 100755 --- a/content/Essays/law-school.md +++ b/content/Essays/law-school.md @@ -16,14 +16,14 @@ Law school is a concept that deserves scrutiny, both as an institution and for t I don't have a central thesis for this entry, and there isn't really anything profound about the content. I just want to point out what law school does wrong and suggest some alternatives that do or should improve the experience for students. > [!hint] Law school as a process *usually* looks like this: -> Take the [[Dict/lsat|entrance exam]] -> apply -> first semester -> 1L job offer -> Second semester -> 1L summer job -> 2L job offer -> second year -> 2L summer job -> career offer -> third year -> career. +> Take the [[Atomic/lsat|entrance exam]] -> apply -> first semester -> 1L job offer -> Second semester -> 1L summer job -> 2L job offer -> second year -> 2L summer job -> career offer -> third year -> career. > > Sometimes job offers will be delayed, as it depends on the type of employment pursued. I talk about this more in the [[#Job Prospects]] section. ## Applying I was one of the lucky ones that knew I wanted to be a lawyer right out of the gate. -With law school, a substantial minority of applicants are on their second career ("nontraditional students"). Quite a few also view law school as a backup plan after job prospects from their recent degree didn't pan out. Teachers and former aspiring history professors are plentiful in this degree. Others will go to law school because it feels like a logical step from their previous degree, rather than out of an actual desire to be an attorney. Unfortunately, the lack of easily accessible or common knowledge about law school harms both of these groups. For those not fully committed to being a lawyer, there is nothing in place to inform prospective students that their attitude would be detrimental to their performance. And for nontraditional students, the system is outright hostile, as it requires considerable time to be carved out of a working adult's day to navigate the steps for starting an application, *on top of* the time spent studying for the [[Dict/lsat|LSAT]] entrance exam. +With law school, a substantial minority of applicants are on their second career ("nontraditional students"). Quite a few also view law school as a backup plan after job prospects from their recent degree didn't pan out. Teachers and former aspiring history professors are plentiful in this degree. Others will go to law school because it feels like a logical step from their previous degree, rather than out of an actual desire to be an attorney. Unfortunately, the lack of easily accessible or common knowledge about law school harms both of these groups. For those not fully committed to being a lawyer, there is nothing in place to inform prospective students that their attitude would be detrimental to their performance. And for nontraditional students, the system is outright hostile, as it requires considerable time to be carved out of a working adult's day to navigate the steps for starting an application, *on top of* the time spent studying for the [[lsat|LSAT]] entrance exam. When looking at the LSAT, it first appears to be a type of aptitude test where you either "have it" or you don't. It's designed to be an indicator of success in law school classes, so this would make sense. Unfortunately, that's not the case. It's absolutely an exam that can be studied for, and one that you can obtain a substantially higher grade from than your first attempt. As such, many different "prep courses" exist which will walk participants through previous question solutions or provide general strategies for question types. **Those who pay for a more expensive prep course will almost always do better than those who do not.** This makes the test hailed as an equalizer really just another secret indicator of financial ability that hampers the fairness of the process. diff --git a/content/Essays/normative-ai.md b/content/Essays/normative-ai.md new file mode 100644 index 000000000..6de25301f --- /dev/null +++ b/content/Essays/normative-ai.md @@ -0,0 +1,81 @@ +--- +title: Why Copyright Should Apply to Generative AI +tags: + - essay + - seedling + - ai + - legal +date: 2024-11-02 +lastmod: 2024-11-02 +draft: true +--- +Reasonable minds can and should differ on whether copyright ought to be enforced against [[Atomic/gen-ai|GenAI]]. I think it should be. + +The most important debate is up first, but the others are not particularly ordered. + +> [!info] Under Construction +> More topics under this section forthcoming! I work and edit in an alternate document and copy over sections as I finish them. +> Brief teasers: +> - online artists' assumption of the risk +> - economic incentives +> - roadblocks to enforcement +> - the effect on truth + +## Fair Use +In modern copyright practice, this defense seems to be the pivotal question. It's probably going to be the exact same in AI. + +Whenever a legal doctrine has strong roots in collective consciousness and policy, there's an epistemological question about how to approach the issue. The debate asks: in the abstract, should the courts protect what *descriptively is* considered within the bounds of protection, or what *ought to be* recognized by society as deserving protection? +- Nerd sidebar: This debate is common in criminal law. For example, examine the reasonable expectation of privacy. *Are* members of the public actually concerned with police access to the data on their phone or do they think they have nothing to hide? *Should* they be? Recent cases on searches and third party access trend towards analysis under the latter, more paternalistic position. + +In fair use, the first ("empirical") perspective teaches that fair use should only extend to concepts analogous to prior enforcement which has been accepted in the collective consciousness. In contrast, the second ("normative") perspective would disregard comparison with enforcement in favor of comparison with societal values. + +Because it's such an alien technology to the law, I'd argue that generative AI's fair use should be analyzed in view of the normative approach. But even under that approach, I don't think AI training or generation should be considered fair use. + +US fair use doctrine has four factors, of which three can speak to whether it ought to be enforced. +### Purpose and character of the use +Training is conducted at a massive scale. Earlier, I mentioned the firehose. + +But for generated output, this factor gets messier. Criticism or comment? Of/on who/what? I can think of one use that would be fair use, but only to defend the person using the model to generate text: criticism of the model itself, or demonstration that it can reproduce copyrighted works. Not to mention if a publisher actually sued a person for *using* a generative AI, that would Streisand Effect the hell out of whatever was generated. +### Nature of training data + +### Market value; competition +And most importantly (especially in recent years), let's talk about the competitive position of an AI model. This is directly linked to the notion that AI harms independent artists, and is the strongest reason for enforcement of copyright against AI in my opinion. + +Interestingly, I think the USCO Guidance [[#Detour 2 An Alternative Argument|talked about in the Generation section]] is instructive. It analogizes prompting a model to commissioning art, which applies well to a discussion of competition. AI lets me find an artist and say to them, "I want a Warhol, but I don't want to pay Warhol prices"; or "I want to read Harry Potter, but I don't want to give J.K. Rowling my money \[for good reason\]." The purpose of AI's "work product" is solely to compete with human output. + +A problem I have not researched in detail is the level of competency in alternative needed to prove that an infringing use does compete with the underlying work. Today, many people see AI as the intermediate step on the scale between the average proficiency of an individual at any given task (painting, photography, poetry, *shudder* legal matters) and that of an expert in that field. Does AI need to be "on the level" of that expert in order to be considered a competitor? It certainly makes a stronger argument for infringment if they are, like with creative mediums. But does this hold up with legal advice, where it will produce output but (in my opinion) sane professionals should tell you that AI doesn't know the first thing about the field? + +Note that there are very valid criticisms with being resistant to a technology solely because of the "AI is gonna take our jobs" sentiment. I think there are real parallels between that worry and a merits analysis of the competition factor. So if you find those criticisms persuasive, that would probably mean that you disagree with my evaluation of this factor. +## Who's holding the bag? +WIP https://www.wsj.com/tech/ai/the-ai-industry-is-steaming-toward-a-legal-iceberg-5d9a6ac1?St=5rjze6ic54rocro&reflink=desktopwebshare_permalink +### Detour: Section 230 (*again*) +Well, here it is once more. I think that you can identify a strangely inverse relationship between fair use and § 230 immunity. If the content is directly what was put in (and is not fair use), then it's user content, and Section 230 immunity applies. If the content by an AI is *not* just the user's content and is in fact transformative fair use, then it's the website's content, not user content, and the website can be sued for the effects of their AI. Someone makes an investment decision based on the recommendation of ChatGPT? Maybe it's financial advice. I won't bother with engaging the effects further here. I have written about § 230 and AI [[no-ai-fraud-act#00230: Incentive to Kill|elsewhere]], albeit in reference to AI-generated user content *hosted* by the platform. +## The First Amendment and the "Right to Read" +This argument favors allowing GAI to train on the entire corpus of the internet, copyright- and attribution-free, and bootstraps GAI output into being lawful as well. The position most commonly taken is that the First Amendment protects a citizen's right to information, and that there should be an analogous right for generative AI. + +The right to read, at least in spirit, is still being enforced today. Even the 5th Circuit (!!!) believes that this particular flavor of First Amendment claim will be likely to succeed on appeal after prevailing at the trial level. [*Book People v. Wong*](https://law.justia.com/cases/federal/appellate-courts/ca5/23-50668/23-50668-2024-01-17.html), No. 23-50668 (5th Cir. 2024) (not an AI case). It also incorporates principles from intellectual property law. Notably, this argument states that one can read the content of a work without diminishing the value of the author's expression (*i.e.*, ideas aren't copyrightable). As such, the output of an AI is not taking anything from an author that a human wouldn't take when writing something based on their knowledge. + +I take issue with the argument on two points that stem from the same technological foundation. + +First, as a policy point, the argument incorrectly humanizes current generative AI. There are no characteristics of current GAI that would warrant the analogy between a human reading a webpage and an AI training on that webpage. Even emerging tools like the improperly named [Deep Document Understanding](https://github.com/infiniflow/ragflow/blob/main/deepdoc/README.md) —which claim to ingest documents "as \[a\] human being"—are just classifiers on stochastic data at the technical level, and are not actual "understanding." + +Second, and more technically, [[#Training|the training section]] above is my case for why an AI does not learn in the same way that a human does in the eyes of copyright law. ==more== + +But for both of these points, I can see where the confusion comes from. The previous leap in machine learning was called "neural networks", which definitely evokes a feeling that it has something to do with the human brain. Even more so when the techniques from neural network learners are used extensively in transformer models (that's those absurd numbers of parameters mentioned earlier). +## Points of concern, or "watch this space" +These are smaller points that would cast doubt on the general zeitgeist around the AI boom that I found compelling. These may be someone else's undeveloped opinion, or it might be a point that I don't think I could contribute to in a valuable way. Many are spread across the fediverse; others are blog posts or articles. Others still would be better placed a Further Reading section, ~~but I don't like to tack on more than one post-script-style heading.~~ { *ed.: [[#Further Reading|so that was a fucking lie]]* }. If any become more temporally relevant, I may expand on them. +- [Cartoonist Dorothy’s emotional story re: midjourney and exploitation against author intent](https://socel.net/@catandgirl/111766715711043428) +- [Misinformation worries](https://mas.to/@gminks/111768883732550499) +- [Large Language Monkeys](https://arxiv.org/abs/2407.21787): another very new innovation in generative AI is called "repeated sampling." It literally just has the AI generate output multiple times and decide among those which is the most correct. This is more stochastic nonsense, and again not how a human learns, despite OpenAI marketing GPT-o1 (which uses the technique) as being capable of reason. +- Stronger over time + - One of the lauded features of bleeding-edge AI is its increasingly perfect recall from a dataset. So you're saying that as AI gets more advanced, it'll be easier for it to exactly reproduce what it was trained on? Sounds like an even better case for copyright infringement. +- Inevitable harm + - Temperature and the very fact that word generation is used mean that there's no way to completely eliminate hallucination, so truth in AI is unobtainable. [Xu, et al.](https://arxiv.org/abs/2401.11817) +- Unfair competition + - This doctrine is a catch-all for claims that don't fit neatly into any of the IP categories, but where someone is still being wronged by a competitor. I see two potential arguments here. + - First, you could make a case for the way data is scraped from the internet being so comprehensive that there's no way to compete with it by using more fair/ethical methods. This could allow a remedy that mandates AI be trained using some judicially devised (or hey, how about we get Congress involved if they don't like the judicial mechanism), ethical procedure. The arguments are weaker, but they could be persuasive to the right judge. + - Second, AI work product is on balance massively cheaper than hiring humans, but has little other benefit, and causes many adverse effects. A pure cost advantage providing windfall for one company but not others could also be unfair. Again, it's very weak right now in my opinion. + - A further barrier to unfair competition is the doctrine of **copyright preemption**, which procedurally prevents many extensions of state or federal unfair competition law. +## Further Reading +- Copyleft advocate Cory Doctorow has written a piece on [why copyright is the wrong vehicle to respond to AI](https://pluralistic.net/2024/05/13/spooky-action-at-a-close-up/#invisible-hand). Reply-guying his technical facts and legal conclusions is left as an exercise for the reader; if you do, feel free to reference my [[Atomic/gen-ai|technical explanation]]. What's most interesting is his take on the non-fair use parts of the normative debate. +- [TechDirt has a great article](https://www.techdirt.com/2023/11/29/lets-not-flip-sides-on-ip-maximalism-because-of-ai/) that highlights the history of and special concerns around fair use. I do think that it's possible to regulate AI via copyright without implicating these issues, however. And note that I don't believe that AI training is fair use, for the many reasons above. \ No newline at end of file diff --git a/content/Essays/plagiarism.md b/content/Essays/plagiarism.md index 822ad20c2..b9055594d 100755 --- a/content/Essays/plagiarism.md +++ b/content/Essays/plagiarism.md @@ -42,7 +42,7 @@ The legal field is even more source-mandatory due to the system of precedent, th As mentioned above, there's definitely a gap in my knowlege/views that broadens the more creative or traditionally-considered-artistic the subject matter gets. Copyright absolutely extends to the arts, but what place does attribution have when the purpose is to entertain? One can hardly document one's creative experience when working on a novel, a script, a painting, in the same way a legal brief can be. I do believe in the necessity of personal attribution for those who directly contributed to an artistic work (think the credits section of a movie) for professional reasons, but beyond that, I'm uncertain. ## Digital Gardening and Plagiarism -For digital gardening in particular, attribution is integral to the concept. [[Dict/what-is-a-garden|A digital garden]] is a network, and the culture of the digital garden is to provide paths out of the current webpage to others on the same site or even to other websites. These associations between webpages make up a comprehensive experience that differs from modern web use (Google search, click, close the tab) and looks more like Wikipedia spelunking. +For digital gardening in particular, attribution is integral to the concept. [[Atomic/what-is-a-garden|A digital garden]] is a network, and the culture of the digital garden is to provide paths out of the current webpage to others on the same site or even to other websites. These associations between webpages make up a comprehensive experience that differs from modern web use (Google search, click, close the tab) and looks more like Wikipedia spelunking. Thus, the true value of attribution in a digital garden is mostly in the link itself rather than the substance of the current page or the linked page. This does not discount the importance of linking to those resources, though. diff --git a/content/Misc/ai-integrity.md b/content/Misc/ai-integrity.md index c112ea4c7..dc50d68c9 100644 --- a/content/Misc/ai-integrity.md +++ b/content/Misc/ai-integrity.md @@ -8,7 +8,7 @@ date: 2024-09-14 lastmod: 2024-10-23 draft: false --- -Recent studies reveal that the use of AI is becoming increasingly common in academic writings. On [Google Scholar](https://misinforeview.hks.harvard.edu/article/gpt-fabricated-scientific-papers-on-google-scholar-key-features-spread-and-implications-for-preempting-evidence-manipulation/), and on [arXiv](https://arxiv.org/abs/2403.13812); but most shockingly, on platforms like Elsevier's [Science Direct](http://web.archive.org/web/20240315011933/https://www.sciencedirect.com/science/article/abs/pii/S2468023024002402) (check the Introduction). Elsevier supposedly prides itself on its comprehensive review process, which ensures that its publications are of the highest quality. More generally, the academic *profession* insists that it possesses what I call [[Dict/integrity|integrity]]: rigor, attention to detail, and authority or credibility. But AI is casting light on a greater issue: **does it**? +Recent studies reveal that the use of AI is becoming increasingly common in academic writings. On [Google Scholar](https://misinforeview.hks.harvard.edu/article/gpt-fabricated-scientific-papers-on-google-scholar-key-features-spread-and-implications-for-preempting-evidence-manipulation/), and on [arXiv](https://arxiv.org/abs/2403.13812); but most shockingly, on platforms like Elsevier's [Science Direct](http://web.archive.org/web/20240315011933/https://www.sciencedirect.com/science/article/abs/pii/S2468023024002402) (check the Introduction). Elsevier supposedly prides itself on its comprehensive review process, which ensures that its publications are of the highest quality. More generally, the academic *profession* insists that it possesses what I call [[Atomic/integrity|integrity]]: rigor, attention to detail, and authority or credibility. But AI is casting light on a greater issue: **does it**? ## Competing framings I think there are two ways of framing the emergence of the problem. ### 1: Statistical (not dataset) bias and sample size diff --git a/content/Misc/ai-prologue.md b/content/Misc/ai-prologue.md new file mode 100644 index 000000000..315489763 --- /dev/null +++ b/content/Misc/ai-prologue.md @@ -0,0 +1,23 @@ +--- +title: Why I wanted to write about AI +tags: + - seedling + - essay + - ai + - legal + - copyright +date: 2024-11-02 +lastmod: 2024-11-02 +draft: false +--- +I've seen many news articles and opinion pieces recently that support training generative AI, most particularly LLMs (such as ChatGPT/GPT-4, LLaMa, and Midjourney) on the broader internet, as well as on more traditional copyrighted works. The general sentiment from the industry and some critics is that training should not consider the copyright holders for all of the above. + +'I know, but he can' meme, with the RIAA defeating AI art for independent illustrators + +This is likely because there's a growing sentiment against copyright in general. Copyright can enable centralization of rights when paired with a capitalist economy, which is what we've historically experienced with the advent of copyright repositories like record labels and publishing companies. It's even statutorily enshrined as the "work-for-hire" doctrine. AI has the potential to be an end-run around these massive corporations' rights, which many see as a benefit. + +However, this argument forgets that intangible rights are not *yet* so centralized that independent rights-holders have ceased to exist. While AI will indeed affect central rights-holders, it will also harm individual creators and diminish the bargaining power of those that choose to work with central institutions. I see AI as a neutral factor to the disestablishment of copyright. Due to my roots in the indie music and open-source communities, I'd much rather keep their/our/**your** present rights intact. + +Unfortunately, because US copyright law is so easily abused, I think the most likely outcome is that publishers/centralized rights holders get their due, and individual creators get the shaft. This makes me sympathetic to arguments against specific parts of the US's copyright regime as enforced by the courts, such as the DMCA or the statutory language of fair use. We as a voting population have the power to compel our representatives to enact reforms that take the threat of ultimate centralization into account. We can even work to break down what's already here. But I don't think that AI should be the impetus for arguments against the system as a whole. + +Finally, remember that perfect is the enemy of good enough. While we're having these discussions about how to regulate GenAI, ==unregulated use== is causing real economic and personal harm to creators and ==underrepresented minorities.== \ No newline at end of file diff --git a/content/Misc/generation-copyright.md b/content/Misc/generation-copyright.md new file mode 100644 index 000000000..5f9f36e23 --- /dev/null +++ b/content/Misc/generation-copyright.md @@ -0,0 +1,56 @@ +--- +title: "Theories of Copyright: AI Output" +tags: + - ai + - legal + - copyright + - essay + - misc +date: 2024-11-02 +lastmod: 2024-11-02 +draft: true +--- +Generated output may infringe the training data. + +First, generated output is certainly not copyrightable. The US is extremely strict when it comes to the human authorship requirement for protection. If an AI is seen as the creator, the requirement is obviously not satisfied. And the human "pushing the button" probably isn't enough either. But does the output infringe the training data? It depends. +## Human Authorship +According to the US Copyright Office, AI-generated works do not satisfy the human authorship requirement. This makes them uncopyrightable, but more importantly, it also gives legal weight to the distinction between the human and AI learning process. +## Summaries +This is probably the most direct non-technical refutation of the "AI understands what it trains on" argument possible. I also think it's the most important aspect of current generative models for me to highlight. **The question**: If an AI can't understand what it reads, how does it choose what parts of a work should be included in a summary of that work? A book, an article, an email? + +Once again, the answer is mere probability. In training, the model is told what word to come after a word is more "correct" by how many times that sequence of words occurs in its training data. And in generation, if more of the work mentions a particular subject than the actual conclusion of the work, the subject given most attention will be what the model includes in a summary. + +Empirical evidence of this fact can be found in the excellent post, [When ChatGPT Summarizes, it Actually does Nothing of the Kind](https://ea.rna.nl/2024/05/27/when-chatgpt-summarises-it-actually-does-nothing-of-the-kind/). It's funny how this single approach is responsible for nearly all of the problems with generative AI, from the decidedly unartistic way it "creates" to its [[Essays/plagiarism##1 Revealing what's behind the curtain|🅿️ majoritarian bent]]. +## Dr. Edgecase, or how I learned to stop worrying (about AI) and love the gig worker +So how do corporations try to solve the problem? Human-performed [microtasks](https://hal.science/hal-02554196/document). + +AI can get things wrong, that's not new. Take a look at this: + +![[limmygpt.png|Question for chatgpt: Which is heavier, 2kg of feathers or 1kg of lead? Answer: Even though it might sound counterintuitive, 1 kilogram of lead is heavier than 2 kilograms of feathers...]] +Slight variance in semantics, same answer because it's the most popular string of words to respond to that pattern of a prompt. Again, nothing new. Yet GPT-4 will get it right. This probably isn't due to an advancement in the model. My theory is that OpenAI looks at the failures published on the internet (sites like ShareGPT, Twitter, etc) and has remote validation gig workers ([already a staple in AI](https://www.businessinsider.com/amazons-just-walk-out-actually-1-000-people-in-india-2024-4)) "correct" the model's responses to that sort of query. In effect, corporations could be exploiting ([yes, exploiting](https://www.noemamag.com/the-exploited-labor-behind-artificial-intelligence/)) developing countries to create a massive **network of edge cases** to fix the actual model's plausible-sounding-yet-wrong responses. +- This paragraph does border on conspiracy theory. However, which is more likely: + - Company in the competitive business of *wow*ing financial backers leverages existing business contacts to massively boost user-facing performance of their product as a whole at little added cost; or + - Said company finds a needle of improvement over their last haystack in an even *bigger* haystack that enables the most expensive facet of their product to do more of the work. + +> [!question] +> I won't analyze this today, but who owns the human authored content of these edge cases? They're *probably* expressive and copyrightable. + +## Expression and Infringement +It can be said that anything a human produces is just a recombination of everything that person's ever read. Similarly, that process is a simplified understanding of how an AI trains. + +However, everything a *person* has ever read is stored as concepts, floating around in their brain. My brain doesn't have a specific person's explanation of a transformer model architecture prepped, or even particular phrases from that explanation. It has a "visual" and emotional linkage of **ideas**, that other regions of my brain leverage vocabulary to put to paper when I explain it. An AI stores words that occurred in its corpus that can be considered responsive to the prompt. It may also have words that succeeded the prompt as the next portion in a containing work of both the prompt and the output. N-grams, not neurons. + +The key difference: talking about a human brain making a work by recombining its input is **metaphor**; talking about an AI recombining a work is **technologically accurate**. A chatbot goes to look at the secret code and shows you the photograph it corresponds to when you ask it to. + +Naturally, there are occurrences where a human and an AI would reach approximately the same factual response if you asked them the same question. So what makes some of AI output infringement? The same thing that makes some human responses copyright infringement: reproduction of a copyrighted work. But the difference is that some human responses would be copyrightable in themselves because they don't reproduce enough of a specific work or multiple works to be considered either an ordinary derivative or a compilation derivative. ==ughthis is hardddd== +## Detour: An Alternative Argument +There's a more concise and less squishy argument that generative AI output infringes on its training dataset. + +Recall that AI output taken right from the model (straight from the horse's mouth) is [not copyrightable according to USCO](https://www.federalregister.gov/documents/2023/03/16/2023-05321/copyright-registration-guidance-works-containing-material-generated-by-artificial-intelligence). If the model's input is copyrighted, and the output can't be copyrighted, then there's nothing in the AI "black box" that adds to the final product, so it's literally *just* the training data reproduced and recombined. Et voila, infringement. + +This isn't to say that anything uncopyrightable will infringe something else, but it does mean that the defendant's likelihood of prevailing on a fair use defense could be minimal. Additionally, the simpler argument makes damages infinitely harder to prove in terms of apportionment. + +Note that there are many conclusions in the USCO guidance, so you should definitely read the whole thing if you're looking for a complete understanding of the (very scarce) actual legal coverage of AI issues so far. +## Further Reading +- Sibling entry on [[Misc/training-copyright|training]] +- Who should be responsible for the harm caused by a generated work? [[Essays/normative-ai#Who's holding the bag?]] \ No newline at end of file diff --git a/content/Misc/training-copyright.md b/content/Misc/training-copyright.md new file mode 100644 index 000000000..5b6ab624f --- /dev/null +++ b/content/Misc/training-copyright.md @@ -0,0 +1,39 @@ +--- +title: "Theories of Copyright: AI Training" +tags: + - ai + - legal + - copyright + - essay + - misc +date: 2024-11-02 +lastmod: 2024-11-02 +draft: true +--- +AI training may be [[Resources/copyright|copyright]] infringement. + +> [!info] *mea culpa* +> It's very difficult to keep discussions of training and generation separate because they're related concepts. They do not directly flow from one another though, so I've done my best to divide the subject. + +I think that reasoning about the implications of how an AI stores data requires a complete understanding of the technical foundation, which [[Atomic/gen-ai#Training|Generative AI#Training]] tries to lay out. Every legal hypothesis about training except fair use is in this section. + +First, I think a very convoluted analogy is helpful. Let's say I publish a book. Every page of this book is a different photograph. Some of the photos are public domain, but the vast majority are copyrighted, and I don't have authorization to publish those ones. Now, I don't just put the photos on the page directly; that would be copyright infringement! Instead, each page is a secret code that I derive from the photo (and all other photos already in the book) that I can decipher to show you the photo (if you ask me to, after you've bought the book). Is my book still copyright infringment? +- Alternatively, I let you download the instructions on how to access a photo from the secret codes in the book onto your computer. Now, if an artist uses these instructions and gets their own photo, and they sue me, did I injure them or did they injure themselves? + - This analogy relates to the standing argument in *Doe v. GitHub*. +- Related but ludicrous: suppose I'm not selling the book. I bought prints of all these photographs for myself, and if you ask me to, I'll show you a photograph that I bought. But since I only bought one photograph, if I'm showing you the photograph I bought, I can't be showing it to someone else at the same time. This *is* considered copyright infringement?!?! At least, that's what *Hachette v. Internet Archive* tells us. + +In copyright, reproduction of expression is infringement. And I believe that inputting a work into a generative AI creates an infringing derivative of the work, because it reproduces both the facts and expression of that work in a way that you could do by hand. Eventually, the model is effectively a compilation of all works passed in. Finally—on a related topic—there is nothing copyrightable in how the model has arranged the works in that compilation, even if every work trained on is authorized. + +Recall that training on a work incorporates its facts and the way the author expressed those facts into the model. When the training process takes a model and extracts weights on the words within, it's first reproducing copyrightable expression, and then creating something directly from the expression. You can analogize the model at this point to a translation (a [specifically recognized](https://www.law.cornell.edu/uscode/text/17/101#:~:text=preexisting%20works%2C%20such%20as%20a%20translation) type of derivative) into a language the AI can understand. But where a normal translation would be copyrightable (if authorized) because the human translating a work has to make expressive choices and no two translations are exactly equal, an AI's model would not be. A given AI will always produce the same translation for a work it's been given, it's not a creative process. Even if every work trained on expressly authorized data, I don't think the resulting AI model would be copyrightable. And absent authorization, it's infringement. +- Nerdy sidebar: I desperately want Adobe to sue someone for appropriating their new model now so I can see if this theory holds up. The fight might turn on an anti-circumvention question, because if it's not a copyrightable work, there's no claim from circumventing protections on that work. + +As the AI training scales and amasses even more works, it starts to look like a compilation, another type of derivative work. Normally, the expressive component of an authorized compilation is in the arrangement of the works. Here, the specific process of arrangement is predetermined, and encompasses only uncopyrightable material. I wasn't able to find precedent on whether a deterministically-assembled compilation of uncopyrightable derivatives passes the bar for protection, but that just doesn't sound good. Maybe there's some creativity in the process of creating the algorithms for layering the model (related: is code art?). + +The common thread running through this and many [[Essays/normative-ai|normative]] points that because the iteration is on such a gargantuan scale, it discounts the fact that you could (over a period of years) theoretically recreate the exact compilation by hand following the AI's steps, and that the arrangement is completely fungible in that way. This is one facet of how GenAI is well suited to helping a person avoid liability. +### Detour: point for the observant +The idea and expression being indistinguishable to an AI may make one immediately think of merger doctrine. That argument looks like: the idea inherent in the work trained on merges with its expression, so that segment of the training data must not be copyrightable. However, that argument would not be a correct reading of the doctrine. [*Ets-Hokin v. Skyy Spirits, Inc.*](https://casetext.com/case/ets-hokin-v-skyy-spirits-inc) suggests that the doctrine is more about disregarding the types of works that are low-expressivity by default, and that this "merger" is just a nice name to remember the actual test by. Confusing name, easy doctrine. +- Yet somehow this doctrine doesn't extend to RGB colors. I'll die on the hill that you shouldn't be able to copyright a hex code the same way you can't copyright an executable binary. I know, small specific part of US copyright doctrine that I'm sympathetic to arguments against, moving on. +### Real-world exposure +The Northern District of California has actually considered the above infringing-derivative argument in *Kadrey v. Meta*. They called it "nonsensical", and based on how it was presented in that case, I don't blame them. I'd have some serious difficulty compressing this entry into something a judge could read (even ignoring court rule word limits) or that I could orate concisely to a jury. I'm open to suggestions on a more digestible way to persuade people of this point. +## Further Reading +[[Misc/generation-copyright|Sibling entry on AI generation]] \ No newline at end of file diff --git a/content/Programs I Like/code-editors.md b/content/Programs I Like/code-editors.md index 4551d8a11..11e023a4a 100755 --- a/content/Programs I Like/code-editors.md +++ b/content/Programs I Like/code-editors.md @@ -27,7 +27,7 @@ And It's not a perfect solution by any menas. For example, you're faced with a c ## Neovim Sometimes, the [[Misc/keys|most efficient solution]] only arises because it was once technically necessary. In this scenario, iterations or new paths don't seem to measure up to how good the original workflow was. Let's say you just want to bang out a few lines of code, hit save, and go back to whatever you were doing before. This is [Neovim](https://neovim.io/). -Based on the older `vim` text editor (which was in turn based on `vi`, the [[Dict/BSD|BSD]] Unix program), Neovim is designed to be as minimally intrusive as possible while remaining responsive to the needs of a developer. +Based on the older `vim` text editor (which was in turn based on `vi`, the [[Atomic/BSD|BSD]] Unix program), Neovim is designed to be as minimally intrusive as possible while remaining responsive to the needs of a developer. This does come with a high learning curve, as Neovim is a *modal text editor*. `vi` was created in the days that a computer was simply a circuit board, a keyboard, and a CRT monitor; no fancy peripherals like a "mouse" or a "touch screen". As such, it needed to be usable in such a non-user-friendly environment. @@ -36,7 +36,7 @@ Neovim has three commonly used modes (among others): - *Insert mode*: This one is most familiar to those that use Notepad on Windows, or any of the similar Linux/Mac programs. It's just a normal text editor, type letters/numbers/punctuation and navigate with the arrow keys. - *Visual mode*: For selecting blocks of text and doing things with a selected block like cutting it to paste somewhere else. -In Normal mode, you can tell Neovim what to do by giving it commands. By default, you start a command with the colon. I shouldn't tell you this, but typing `:q` from Normal mode and pressing Enter will exit the program, because `q` is the Quit command. [[Dict/linux-isms#On Acronyms|Unix loves their acronyms]]. +In Normal mode, you can tell Neovim what to do by giving it commands. By default, you start a command with the colon. I shouldn't tell you this, but typing `:q` from Normal mode and pressing Enter will exit the program, because `q` is the Quit command. [[Atomic/linux-isms#On Acronyms|Unix loves their acronyms]]. I'm a believer in the principle that your computer should adapt to you, so I often find myself writing tiny little files around [[Projects/my-computer|my computer]] that I don't want to open VSCode to edit. I just open a terminal (if I'm not already working in one), pull up the path, type the file name, make my changes, and done. It's quick, it's easy, and (my favorite) it's free. - To speed the process of opening a terminal, I recommend a dropdown terminal (also called a "quake-style" terminal). The aim is that when you press a keyboard shortcut (Alt+backtick for me), it opens a terminal. I've used both [Guake](http://guake-project.org/) and a docked [tabby-terminal](https://tabby.sh/) for the same end. Still on the fence over which I like more. diff --git a/content/Programs I Like/terminals.md b/content/Programs I Like/terminals.md index e917f009f..1901404e9 100644 --- a/content/Programs I Like/terminals.md +++ b/content/Programs I Like/terminals.md @@ -12,7 +12,7 @@ I...have a problem. ![[Attachments/terminal-illnesses.png|A folder of applications on my computer containing nine different terminal and shell programs.]] -Because of my desire for [[Dict/friction|low-friction]] software, I'm always looking for a terminal that I can pop in and out of for its specific purpose. All of the above are worth touching on when I get time, but two have emerged as perfect for my use case. +Because of my desire for [[Atomic/friction|low-friction]] software, I'm always looking for a terminal that I can pop in and out of for its specific purpose. All of the above are worth touching on when I get time, but two have emerged as perfect for my use case. ## Run-And-Done: ddterm [ddterm](https://extensions.gnome.org/extension/3780/ddterm/) is a GNOME shell extension for a "Quake-style terminal." This means that when you press a keybind, the already-in-the-background terminal drops down from the top of the screen above all other windows, ready to go to work. It mimics the behavior of the in-game console of the video game Quake, which is where it gets its name. You've seen similar behavior if you've ever pressed the grave (\`) key in Counter-Strike or Team Fortress 2. My keybind is Alt+grave—although the common one is F12—and pressing it pops up: diff --git a/content/Projects/Obsidian/home.md b/content/Projects/Obsidian/home.md index 64076562c..30fc58339 100755 --- a/content/Projects/Obsidian/home.md +++ b/content/Projects/Obsidian/home.md @@ -18,4 +18,4 @@ I think my use is divided into three easily separable parts: - These are accessible from any device that can run an Obsidian client. - I just use it currently on my laptop and phone. - [[digital-garden|Digital Garden]]: This website. - - It's secretly just run out of a folder in my ^ LiveSync'ed notes repository that I have [[Dict/symlink|symlinked]] into my laptop's local repository for Quartz. + - It's secretly just run out of a folder in my ^ LiveSync'ed notes repository that I have [[Atomic/symlink|symlinked]] into my laptop's local repository for Quartz. diff --git a/content/Projects/my-computer.md b/content/Projects/my-computer.md index b4a0747f5..99cd08943 100755 --- a/content/Projects/my-computer.md +++ b/content/Projects/my-computer.md @@ -52,7 +52,7 @@ Upgrades are inevitable with any piece of hardware. Now that my GPU is up to a 3 ## Software Any specific software that I like using can be found in [[Programs I Like/home|Programs I Like]]. Here, I'll just go over some tenets I've noticed when dealing with my computer as a tool for my work, my projects, and my personal life. -I value low-[[Dict/resistance|resistance]], low-[[Dict/friction|friction]] software. It's what led me to pursue linux, Obsidian, and this website in general. If something is fast to use, I'll use it more often. +I value low-[[Atomic/resistance|resistance]], low-[[Atomic/friction|friction]] software. It's what led me to pursue linux, Obsidian, and this website in general. If something is fast to use, I'll use it more often. #### Immutable Distros Something that's gaining popularity is the immutable operating system, where the underlying filesystem is intentionally resistant to change. I don't see this as overly resistant in my sense, mainly because providers like VanillaOS and Fedora Silverblue recognize that this resistance is present and provide alternative routes to install software. It's more of a compromise. diff --git a/content/Projects/nvidia-linux.md b/content/Projects/nvidia-linux.md index 34425e4f6..f537baf02 100755 --- a/content/Projects/nvidia-linux.md +++ b/content/Projects/nvidia-linux.md @@ -5,7 +5,7 @@ tags: - difficulty-easy - foss date: 2024-03-26 -lastmod: 2024-10-06 +lastmod: 2024-11-02 draft: false --- The year is 2024. NVIDIA on linux is in a usable state! Of course, there are still many pitfalls and options required for a good experience. This page documents every configuration trick I've used and has all the resources that you need to use it yourself. @@ -24,39 +24,22 @@ Start by installing the nvidia driver that your distro bundles (or a community p **If your workflow requires the NVENC codec**: opt for the package containing all proprietary blobs rather than the package with the open source kernel driver. -I recommend adding `nvidia.NVreg_OpenRmEnableUnsupportedGpus=1 nvidia.NVreg_PreserveVideoMemoryAllocations=1 nvidia_drm.modeset=1` to your kernel parameters. These help with hardware detection, sleep, and display configuration, respectively. +I recommend adding `nvidia.NVreg_OpenRmEnableUnsupportedGpus=1 nvidia.NVreg_PreserveVideoMemoryAllocations=1 nvidia_drm.modeset=1 nvidia_drm.fbdev=1` to your kernel parameters. These help with hardware detection, sleep, and display configuration, respectively. Also consider `nvidia.NVreg_UsePageAttributeTable=1` for performance and `nvidia.NVReg_EnableResizableBar=1` for potential hotplug benefits. - If you do add the third option, you will only be able to set the first two by kernel parameters. This is because **for modesetting drivers, options set in modprobe .conf files have no effect.** You should also blacklist the Noveau video driver. You can do this with kernel parameters through `modprobe.blacklist=noveau` (effective next boot), or in your module config files (effective after rebuilding the initramfs). -## X11 -In my opinion (and with my hardware), X11 is more usable right now with nvidia cards. - -This config recipe will set the same options for every device using the nvidia drivers: - -```xorg -# File: /etc/X11/xorg.conf.d/10-nvidia.conf -Section "OutputClass" - Identifier "nvidia" - MatchDriver "nvidia-drm" - Driver "nvidia" - Option "AllowEmptyInitialConfiguration" # Prevent crashes on startup - Option "SLI" "Auto" # Configure system based on no. of gpus present - Option "BaseMosaic" "on" # Optimize multi-display rendering - Option "TripleBuffer" "off" # Unnecessary performance overhead - Option "ForceFullCompositionPipeline" "on" # Fixes screen tearing. - # Option "ForceCompositionPipeline" "on" # If you still experience tearing with ForceFullCompositionPipeline, turn that setting off and turn this one on - # Option "CoolBits" "28" # Only necessary for overclocking/undervolting. - # If the GPU is too old, use the value 20 instead. - # If you don't want to overclock, you don't need to touch this line! -EndSection -``` - -The options for the nvidia driver are documented [here](https://download.nvidia.com/XFree86/Linux-x86_64/396.51/README/xconfigoptions.html). ## Wayland > [!info] Full disclosure -> Wayland is not yet usable on NVIDIA in my opinion, but it's so close now! +> Wayland is *almost* usable on NVIDIA! -On both Gnome and Plasma, I've managed to get the display working simultaneously with an Intel-driven internal display on 6.x kernels and 5xx drivers as long as I've enabled `all-ways-egpu` and kernel modesetting.all-ways-egpu is less necessary nowadays, and in my modern setup I elect to go without it. +On both Gnome and Plasma, I've managed to get the display working simultaneously with an Intel-driven internal display on 6.x kernels and 5xx drivers as long as I've enabled `all-ways-egpu` and kernel modesetting. all-ways-egpu is less necessary nowadays, and in my modern setup I elect to go without it. + +I've tested Wayland in its first daily-usable state for over a month now, and it's to the point where there are so few dealbreakers that I can almost recommend it! + +Current caveats/niceties: +- There's about a 10% performance hit on NVIDIA cards compared to X11. +- If there's nothing (including a display server) running on the eGPU, hotplug works! + - My process is to disable the monitor in settings before unplugging, but even then sometimes it won't be detected on replug and requires a restart. YMMV. For more stable logins, ensure that your display manager (GDM for gnome, defaults to SDDM on Plasma) is using Wayland. @@ -78,6 +61,7 @@ DisplayServer=wayland XWayland will have degraded performance on NVIDIA cards. On Arch specifically, some people have found success mitigating this with [wayland-protocols](https://archlinux.org/packages/extra/any/wayland-protocols/), { *merged -ed.* } ~~mutter-vrr on GNOME~~, and [xorg-xwayland-git](https://aur.archlinux.org/packages/xorg-xwayland-git). That combination didn't work for me when I tried it in April 2024, and with a few other wayland issues compounding the poor performance, I swapped back to X11. I do periodically check on Wayland though, so expect updates. August 2024 did not yield any new results. However, **September 2024**: Explicit sync is supported across Wayland, XWayland, Mutter, KWin, AND the graphics card drivers. The performance problems with NVIDIA are mostly gone. I was able to run games at X11 fidelity with maybe 10 less FPS, and it's no longer choppy or flickery. Input latency is the final issue, and I experienced it even while using LatencyFleX. I'm hopeful that once Mutter gets fullscreen tearing support in Wayland, I can finally make the switch. I haven't tested in Plasma again, but it's definitely possible that Plasma is now usable as a Wayland gaming DE. +- On Arch, you can test this by installing `mutter-dynamic-buffering` from the AUR. ### GTK apps not opening GTK 4.16 (in conjunction with the release of GNOME 47) swapped to Vulkan renderer by default. Vulkan has issues creating surfaces across display devices on Wayland, which is called PRIME in the X11 world. You may experience crashes in GTK apps for this reason. **Fix:** @@ -87,6 +71,29 @@ GTK 4.16 (in conjunction with the release of GNOME 47) swapped to Vulkan rendere GSK_RENDERER=ngl ``` +## X11 +This config recipe will set the same options for every device using the nvidia drivers: + +```xorg +# File: /etc/X11/xorg.conf.d/10-nvidia.conf +Section "OutputClass" + Identifier "nvidia" + MatchDriver "nvidia-drm" + Driver "nvidia" + Option "AllowEmptyInitialConfiguration" # Prevent crashes on startup + Option "SLI" "Auto" # Configure system based on no. of gpus present + Option "BaseMosaic" "on" # Optimize multi-display rendering + Option "TripleBuffer" "off" # Unnecessary performance overhead + # Option "AllowExternalGpus" "True" # Uncomment if your NVIDIA card is an eGPU + Option "ForceFullCompositionPipeline" "on" # Fixes screen tearing. + # Option "ForceCompositionPipeline" "on" # If you still experience tearing with ForceFullCompositionPipeline, turn that setting off and turn this one on + # Option "CoolBits" "28" # Only necessary for overclocking/undervolting. + # If the GPU is too old, use the value 20 instead. + # If you don't want to overclock, you don't need to touch this line! +EndSection +``` + +The options for the nvidia driver are documented [here](https://download.nvidia.com/XFree86/Linux-x86_64/396.51/README/xconfigoptions.html). ## More Resources Allow me to dump every other page that I've needed to comb through for a working nvidia card. - [Archwiki - NVIDIA](https://wiki.archlinux.org/title/NVIDIA) (useful on more distros than Arch!) \ No newline at end of file diff --git a/content/Resources/copyright.md b/content/Resources/copyright.md new file mode 100644 index 000000000..5b2411159 --- /dev/null +++ b/content/Resources/copyright.md @@ -0,0 +1,20 @@ +--- +title: Basic Principles of Copyright Law +tags: + - legal + - copyright + - ai + - resources +date: 2024-11-02 +lastmod: 2024-11-02 +draft: true +--- +> [!important] Note +> **Seek legal counsel before acting/refraining from action re: copyright**. + +The field is notoriously paywalled, but I'll try to link to publicly available versions of my sources whenever possible. The content of this entry is my interpretation, and is not legal advice or a professional opinion. Whether a case is binding on you personally doesn't weigh in on whether its holding is the nationally accepted view. + +The core tenet of copyright is that it protects original expression, which the Constitution authorizes regulation of as "works of authorship." This means **you can't copyright facts**. It also results in two logical ends of the spectrum of arguments made by plaintiffs (seeking protection) and defendants (arguing that enforcement is unnecessary in their case). For example, you can't be sued for using the formula you read in a math textbook, but if you scan that math textbook into a PDF, you might be found liable for infringement because your reproduction contains the way the author wrote and arranged the words and formulas on the page. + +## Further Reading +You can apply these principles to [[gen-ai|generative AI]], both to [[Misc/training-copyright|training]] and [[Misc/generation-copyright|generation]]. \ No newline at end of file diff --git a/content/Resources/learning-linux.md b/content/Resources/learning-linux.md index a17d7925f..caa99c3e9 100755 --- a/content/Resources/learning-linux.md +++ b/content/Resources/learning-linux.md @@ -22,7 +22,7 @@ Traditionally, the Linux community is known for being hostile to newcomers. But > [!info] Need [[digital-garden#Using this Site|help navigating]] my site? ## Scope -I've been daily driving Linux for a combined total of 1.5 years, chronicled [[Essays/on-linux|here]]. I want this entry to serve as a starting point that explains Linux from zero, but I'll try to avoid reinventing the wheel. Many people have written or produced content on . It'll be updated over time. If anything is confusing or if I miss an important topic, please let me know! A [[Dict/what-is-a-garden|digital garden]] is an iterative process. +I've been daily driving Linux for a combined total of 1.5 years, chronicled [[Essays/on-linux|here]]. I want this entry to serve as a starting point that explains Linux from zero, but I'll try to avoid reinventing the wheel. Many people have written or produced content on . It'll be updated over time. If anything is confusing or if I miss an important topic, please let me know! A [[Atomic/what-is-a-garden|digital garden]] is an iterative process. ## Basic knowledge Linux is designed for someone already familiar with one variant to be able to make certain assumptions about any other Linux system. This is more of a guideline to modern design choices than an actual rule. ### What the operating system is @@ -41,7 +41,7 @@ Jokes aside, there's a grain of truth in that statement. Linux—the operating s Many of these are compartmentalized and can only interact with each other in well-defined ways. This document is going to focus on the parts you'll touch the most as an everyday user: primarily userspace -Linux grew out of a collection of operating system standards called POSIX. Most of those standards pertain to how the system behaves when you interact with it through a [[Dict/shell#The Terminal|terminal]]. But when the open source community got involved with its development, its design had to evolve in a way that could satisfy group "consensus," and could handle many groups developing all its different facets asynchronously. +Linux grew out of a collection of operating system standards called POSIX. Most of those standards pertain to how the system behaves when you interact with it through a [[Atomic/shell#The Terminal|terminal]]. But when the open source community got involved with its development, its design had to evolve in a way that could satisfy group "consensus," and could handle many groups developing all its different facets asynchronously. ### Installing programs diff --git a/content/Updates/2024/nov.md b/content/Updates/2024/nov.md new file mode 100644 index 000000000..f6e30b086 --- /dev/null +++ b/content/Updates/2024/nov.md @@ -0,0 +1,26 @@ +--- +title: 11/24 - Summary of Changes +draft: true +tags: + - "#update" +date: 2024-11-02 +lastmod: "" +--- +## Housekeeping +Mariah Carey is thawing. May God have mercy, for she has none. + +I've made the difficult decision to divide my massive AI essay, approaching 10 thousand words of content, into a more digestible atomic format. You can pick and choose the rabbit holes you go down in my new garden-like structure. Start at [[Atomic/gen-ai|Generative AI]]. +## Pages +==they're all DRAFTS RN UNDRAFT BEFORE PUB== +- New: **The AI Essay** + - [[Atomic/gen-ai|Atomic/Generative AI]] + - [[Resources/copyright|Basic Principles of Copyright]] + - [[Essays/normative-ai|Why Copyright Should Apply to AI]] + - [[Misc/training-copyright|Theories of Copyright: AI Training]] + - [[Misc/generation-copyright|Theories of Copyright: AI Output]] +- Content Update (Wayland is now discussed first in light of new testing!): [[Projects/nvidia-linux|Nvidia on Linux]] +## Status Updates +- `Dict/` has been renamed to [[Atomic]]. +- Nice little cosmetic changes! +## Helpful Links +[[todo-list|Site To-Do List]] | [[Garden/index|Home]] diff --git a/content/Updates/2024/sept.md b/content/Updates/2024/sept.md index 9dbc50765..34d9bd26d 100644 --- a/content/Updates/2024/sept.md +++ b/content/Updates/2024/sept.md @@ -10,7 +10,7 @@ lastmod: 2024-10-02 - New: [[Programs I Like/terminals|My Terminal Roundup]] - New: [[Misc/a-font|CG Times License Violation]] - New: [[Projects/keyboards|A Mechanical Keyboard Journey]] -- Content update: [[Dict/shell|Dict/Terminal]] +- Content update: [[Atomic/shell|Dict/Terminal]] - Content update: [[Essays/on-linux|The Linux Experience]] - Content update: [[Programs I Like/code-editors|Code Editors]] - Content update (**exciting**!): [[Projects/nvidia-linux|NVIDIA on Linux]] diff --git a/content/bookmarks.md b/content/bookmarks.md index 4ed6ff6b8..a06cf37b3 100755 --- a/content/bookmarks.md +++ b/content/bookmarks.md @@ -12,6 +12,9 @@ One of the core philosophies of digital gardening is that one should document th - [YouTube: Tech for Tea - The Mess that is Application Theming](https://youtube.com/watch?v=HqlwUjogMSM) - [Nyxt Browser](https://nyxt.atlas.engineer/) - [The Heat Death of the Internet](https://www.takahe.org.nz/heat-death-of-the-internet/) +- [Pivot to AI - Strawberry and chain-of-thought](https://pivot-to-ai.com/2024/09/13/strawberry-fields-forever-openais-new-o1-model/) + - [Strawberry Contd](https://pivot-to-ai.com/2024/09/17/openai-does-not-want-you-delving-into-o1-strawberrys-alleged-chain-of-thought/) +- [The AI Copyright Hype: Legal Claims that Didn't Hold Up](https://www.techdirt.com/2024/09/05/the-ai-copyright-hype-legal-claims-that-didnt-hold-up/) ## Historical - [Nix Flakes: An Introduction](https://xeiaso.net/blog/nix-flakes-1-2022-02-21/) diff --git a/content/index.md b/content/index.md index 695e19615..62402c6b6 100755 --- a/content/index.md +++ b/content/index.md @@ -15,7 +15,7 @@ date: 2023-08-23 On my little corner of the internet, I document my adventures in tech and complain about the internet of shit. This is **Projects & Privacy**. # Welcome! -You're on a [[Dict/what-is-a-garden|Digital Garden]] dedicated to open-source use and contribution, legal issues in tech, and more. +You're on a [[Atomic/what-is-a-garden|Digital Garden]] dedicated to open-source use and contribution, legal issues in tech, and more. For a monthly list of what's new on the site, subscribe to the [Updates RSS feed](/Updates.xml). ## Important Links diff --git a/content/todo-list.md b/content/todo-list.md index ca3e416e3..7eb127e6f 100644 --- a/content/todo-list.md +++ b/content/todo-list.md @@ -24,6 +24,5 @@ The date on this page will not be accurate in order to avoid spamming RSS feeds. - [ ] Add the third party doctrine to my-cloud, add the “if you aren’t persuaded to not use proprietary services, please be careful about what you put on them” section (google, tesla…) - [ ] https://www.404media.co/google-leak-reveals-thousands-of-privacy-incidents/ to my-cloud - [ ] FPV -- [ ] **Keyboard writeup** - [ ] **Moving to FIDO2 and password managers** - [ ] In the interest of transparency and reducing barriers, put together and periodically update an entry with the tips in the legal profession that are typically institutional knowledge. Learning in Public: A Window into Private Law \ No newline at end of file