Quartz sync: Feb 28, 2024, 12:10 PM

This commit is contained in:
bfahrenfort 2024-02-28 12:10:33 -06:00
parent 6424d28d05
commit 4568d28361
13 changed files with 1917 additions and 14 deletions

View File

@ -35,7 +35,7 @@ I also discuss policy later in the essay. Certain policy points are instead made
In short, there's a growing sentiment against copyright in general. Copyright can enable centralization of rights when paired with a capitalist economy, which is what we've been historically experiencing with the advent of record labels/publishing companies. It's even statutorily enshrined as the "work-for-hire" doctrine. AI has the potential to be an end-run around these massive copyright repositories' rights, which many people see as beneficial.
However, this argument forgets that intangible rights are not *yet* so centralized that independent rights-holders have ceased to exist. While AI will indeed harm central rights-holders, it will also affect individual creators and the bargaining power of creators that choose to work with the central institutions. For those against copyright as a whole, this is a neutral factor to the disestablishment of copyright. Due to my roots in the indie and open-source communities, I'd much rather keep their/our/**your** rights intact.
However, this argument forgets that intangible rights are not *yet* so centralized that independent rights-holders have ceased to exist. While AI will indeed affect central rights-holders, it will also harm individual creators and the bargaining power of those that choose to work with the central institutions. For those against copyright as a whole, this is a neutral factor to the disestablishment of copyright. Due to my roots in the indie music and open-source communities, I'd much rather keep their/our/**your** rights intact.
Reconciling the two views, I'm sympathetic to arguments against specific parts of the US's copyright regime as enforced by the courts, such as the way fair use is statutorily worded. We as a voting population have the power to compel our representatives to enact reforms that take the threat of ultimate centralization into account, and can even work to break down what's already here. But I don't think that AI should be the impetus for arguments against the system as a whole.
## The Legal Argument
@ -55,18 +55,23 @@ One common legal argument against training as infringement is that the AI extrac
Everything AI starts with a dataset. And most AI models will start with the easiest, most freely available resource: the internet. Hundreds of different scrapers exist with the goal of collecting as much of the internet as possible to train modern AI (or previously, machine learners, neural networks, or even just classifiers/cluster models).
Acquiring data for training is an unethical mess. **In human terms**, scrapers like Common Crawl will take what they want, without asking (unless you know the magic word to make it go away, or just [[Projects/Obsidian/digital-garden#Block the bot traffic!|block it from the get-go]]), and without providing immediately useful service in return like a search engine. For more information on the ethics of AI datasets, read my tidbit on [[Essays/plagiarism#AI shouldn't disregard the need for attribution|🅿️ the need for AI attribution]], and have a look at the work of [Dr. Damien Williams](https://scholar.google.com/citations?user=riv547sAAAAJ&hl=en) ([Mastodon](https://ourislandgeorgia.net/@Wolven)).
- Sidebar: and acquiring this data is copyright infringement too, as unlicensed copying. The case is tremendously stupid: [*MAI Systems v. Peak Computer*](https://casetext.com/case/mai-systems-corp-v-peak-computer-inc) holds that RAM copying (ie, moving a file from somewhere to a computer's memory) is an unlicensed copy. As of today, it's still good law, for some reason. Note that every single file you open in Word, a PDF reader, or your browser is moved to your memory before it gets displayed on the screen. Bring it up at trivia night, just using your computer is copyright infringement!
- Sidebar: and acquiring this data is copyright infringement too, as unlicensed copying. The case is tremendously stupid: [*MAI Systems v. Peak Computer*](https://casetext.com/case/mai-systems-corp-v-peak-computer-inc) holds that RAM copying (ie, moving a file from somewhere to a computer's memory) is an unlicensed copy. As of today, it's still good law, for some reason. Note that every single file you open in Word, a PDF reader, or your browser is moved to your memory before it gets displayed on the screen. Bring it up at trivia night: just using your computer is copyright infringement!
But then a company actually has to train an AI on that data. What copyright issues does that entail? First, let's talk about The Chinese Room.
[The Chinese Room](https://plato.stanford.edu/entries/chinese-room/) is a philosophical exercise authored by John Searle where the (in context, American) subject is locked in a room and receives symbols in Chinese slipped under the door. A computer program tells the subject what Chinese outputs to send back out under the door based on patterns and combinations of the input. The subject does not understand Chinese. Yet, it **appears** as if whoever is inside it has a firm understanding of the language to an observer of Searle's room.
Searle's exercise was at the time an extension of the Turing test designed to refute the theory of "Strong AI." At the time that theory was well-named, but today the AI it was talking about is not even considered AI by most. Strong AI was the theory that a computer could be programmed to However, it can be easily applied to many other programming fields—notably compiler design—with the most pertinent here being natural language processing. The hypothetical Strong AI was a computer program capable of understanding its inputs and outputs, and importantly *why* it took each action. A Weak AI, on the other hand, was just the Chinese Room. Searle reasoned that the "understanding" of a Strong AI was inherently biological, thus one could not presently exist.
- Note that some computer science sources like [IBM](https://www.ibm.com/topics/strong-ai) have taken to using Strong AI to denote AGI, which was only a sufficient, not necessary, quality of a philosophical "intelligent" intelligence.
Searle's exercise was at the time an extension of the Turing test designed to refute the theory of "Strong AI." At the time that theory was well-named, but today the AI it was talking about is not even considered AI by most. The hypothetical Strong AI was a computer program capable of understanding its inputs and outputs, and importantly *why* it took each action to solve a problem, with the ability to apply that understanding to new problems (much like our modern conception of Artificial General Intelligence). A Weak AI, on the other hand, was just the Chinese Room: taking inputs and producing outputs among defined rules. Searle reasoned that the "understanding" of a Strong AI was inherently biological, thus one could not presently exist.
- Note that some computer science sources like [IBM](https://www.ibm.com/topics/strong-ai) have taken to using Strong AI to denote only AGI, which was a sufficient, not necessary quality of a philosophical "intelligent" intelligence.
Generative AI models from different sources are architected in a variety of different ways, but they all boil down to one abstract process, where an absurdly massive number of parameters are tuned to the exact values that produce the most desirable output. (note: [CGP Grey's video on AI](https://www.youtube.com/watch?v=R9OHn5ZF4Uo) and its follow-up are mainly directed towards neural networks, but do apply to LLMs, and do a great job illustrating this). ==more==
Modern generative AI, like the statistical data models and machine learners before it, is a Weak AI. And weak AIs use weak AI data.
- Sidebar: this point doesn't consider an AI's ability to summarize a work since the section focuses on how the *training* inputs are used rather than how the output is generated from real input. It's confusing, but these are two linked concepts when talking about machine learning rather than direct results of each other. Especially when you introduce concepts like "temperature", which is a degree of randomness added to a model's (already variant) choices in response to an input to simulate creativity.
- ...I'll talk about that in the next section.
- The idea and expression being the same may give rise to some claims of merger doctrine; that is, the idea merges with the expression, so it is not copyrightable. That would not be a correct reading of merger doctrine. [*Ets-Hokin v. Skyy Spirits, Inc.*](https://casetext.com/case/ets-hokin-v-skyy-spirits-inc) makes it clear that the doctrine is more about disregarding the types of works that are low-expressivity by default, and that this "merge" is just a nice name to remember the actual test by. Confusing name, easy doctrine.
#### Detour: point for the observant
The idea and expression being indistinguishable by AI may make one immediately think to merger doctrine; that is, the idea inherent in the work trained on merges with its expression, so it is not copyrightable. That would not be a correct reading of merger doctrine. [*Ets-Hokin v. Skyy Spirits, Inc.*](https://casetext.com/case/ets-hokin-v-skyy-spirits-inc) makes it clear that the doctrine is more about disregarding the types of works that are low-expressivity by default, and that this "merge" is just a nice name to remember the actual test by. Confusing name, easy doctrine.
### Generation
### Fair Use

View File

@ -0,0 +1,10 @@
---
title: Content Death
tags:
- essay
- seedling
- meta
date: 2024-02-21
draft: true
---
A

View File

@ -0,0 +1,68 @@
---
title: "🦀 Rust Macros: Enough to be Dangerous"
tags:
- "#programming"
- misc
- seedling
date: 2024-02-28
lastmod: 2024-02-28
---
Rust's [[Programs I Like/functional-programming|functional patterns]] are great, but sometimes you need to get weird. What if you want to construct a struct type, but you (the programmer) don't know what types the fields will be while you're writing this? Rust has you covered in situations just like this one.
It's important to note that **Rust does not have runtime dynamic typing**. All of this must be done at compile time. That's where the macro system comes in. Unlike C-style macros, it's not pure substitution, it's much more powerful: Rust inserts your code into the AST-manipulation step of the compiler. Rather than `rustc`, *you* parse the tokens and make your own types from them to then generate new tokens to pass to the compiler.
## Prerequisites
See the [Rust Book on procedural macros](https://doc.rust-lang.org/reference/procedural-macros.html). The syntax there is much more complicated because it uses `macro_rules!()`, but pay attention to what a crate has to have to use the macro features and the various types of macros.
## Cardinal syntax
Now, let's ignore the builtin `proc_macro` crate in favor of `proc_quote`. This crate's `quote` macro is the meat of a procedural macro, as it returns what becomes actual code at compile time (a TokenStream). Its expansions are limited but very powerful. Here's a simple example with boilerplate stripped out:
```rust
let name = &input.ident;
let output = quote! {
impl #name {
pub fn hello_world() -> String {
"Hello World".to_string()
}
}
};
```
This macro creates a function at compile time as a member of the struct in `input` 's implementation that returns a `String` from the slice "Hello World". It expands `name` into the name of the input struct with the `#` operator.
There's also a way to iterate `Vec<>` inside macros with the `*` repetition operator. This operator has two parts, a body and a separator, but I couldn't find a satisfactory tutorial online. Here's my attempt:
```rust
quote!{
#(let #some_vec = 5);*
}
```
Here, everything inside the `#()` parenthetical will be repeatedly generated for each element of `some_vec`, with `#some_vec` expanding to the element at the current index. Presumably it contains the `Ident` s of some variable names of type `i32` that we want to declare and assign 5 to all of them in our macro. An expansion might look like:
```rust
let x = 5;let y = 5; let z = 5;
```
It's okay that it's not pretty because the compiler will see it as valid anyway.
## \#\[proc_macro_derive()\]
[Rust traits](https://doc.rust-lang.org/book/ch10-02-traits.html) are powerful inheritance-like features that let the compiler know it can expect the "deriving" types to behave in the same way. What if you could generate trait implementations with a macro on the deriving type?
Note that the only thing that can be expanded inside a `quote!` is a base identifier. This is because you can do something like `#newtype_field_name.0` and the `.0` will remain in the generated code. Let's look at a more complicated example that uses that property along with the iterative :
```rust
#[proc_macro_derive()]
//...some boilerplate and parsing of the input struct
// stmts: Vec<Ident> containing the name of every field of the deriving (input) type that is also present in SomeType
let name = &input.ident;
let output = quote! {
impl #name {
pub fn from(f: SomeType) -> #name {
#name {
#(#stmts: f.#stmts), *
}
}
}
};
```
This `from` method assumes that every field of SomeType is present in the input type and implements automatic conversion without needing to know either type's full implementation, just because the input type wanted to derive the trait `From`
### Further Reading
A good case study on deriving proc macros is my project `rsgistry`, which exports several with full boilerplate using `syn` and `quote!` for viewing [here](https://github.com/bfahrenfort/rsgistry/tree/main/macros) with details in the [[Projects/rsgistry|garden entry]].

View File

@ -0,0 +1,27 @@
---
title: r/[es]/gistry
tags:
- foss
- "#rust"
- programming
- project
- difficulty-easy
- seedling
date: 2024-02-28
lastmod: 2024-02-28
---
[Repository](https://github.com/bfahrenfort/rsgistry)
I have a vision that all should take to write a customized, full-stack, ready-to-deploy registry web app for your packages or community extensions is editing a single type. More info to come soon.
This entry will be a technical overview of my implementation choices and program design. Documentation on actually using the codebase will be hosted in the repository. Enjoy!
- Sidebar: this was advanced for me but it will be extremely easy for someone with limited coding knowledge to fork and deploy in a way that supports their use case.
## Background
Ive run into the same ecosystem problem in about three different spaces now: theres a really robust system for **community extensions, but no real way to share them**. Either theyre too trivial for individual GitHub repositories, too non-tech-oriented, or still need some additional metadata hosted online in order to have a good API consumer UX. Thus, Im adapting a test project into a batteries-included codebase for hosting a registry. API is in Axum and set up to be hosted for free on Shuttle, unsure about the frontend as of yet but looking at Leptos.
...The name stylization is just a regex joke.
## Implementation Details
### Macros
I'm cultivating a tidbit on [[Programs I Like/rust-macros|Rust Macros]], so feel free to read for a practical introduction to the topic.
This program works by generating multiple model types, their helper functions, and sql queries all from a single type at compile time.

View File

@ -0,0 +1,13 @@
---
title: The Future of RSS
tags:
- foss
- project
- seedling
date: 2024-02-14
---
RSS is the best and most private way to subscribe to a website, social media account, or more. No site analytics, no page loading, no Javascript, no ads. All the website can see is that you pulled one plaintext file from it.
## Vision
RSS shouldn't just be a one-feed thing. Granularity is key.
More to come when I get time haha

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,15 @@
---
title: 🏠 Law School Outlines - Home
tags:
- toc
- legal
date: 2024-02-19
draft: true
---
In the interest of public access to knowledge, I will be uploading all of my outlines from law school classes I've taken to this website in downloadable file and webpage format. An outline is the condensed sum content of a legal course's casebook readings and class discussion in the format that will be most useful for either studying for an exam or for quick reference during that exam.
Feel free to use these for your own understanding of a doctrine or as a starting point for further research.
## Caveats
These are **not legal advice**, and **not a professional opinion**; nor are they the complete picture of any area of the law. If the professor thought it was important, it was in here; thus it's not even my own complete understanding of the law.
If you're a law student, making your own outline will make studying exponentially easier and you *will* score higher. Your professor cares about different things than mine.

View File

@ -9,6 +9,6 @@ lastmod: 2024-02-28
## Housekeeping
## Pages
-
- New RSS feed at [Updates.xml](/Updates.xml) that only changes once a month when I push updates like this one.
## Status Updates
- Updated the engine, which was terrifying given how many custom tweaks I have to it.

View File

@ -15,9 +15,11 @@ date: 2023-08-23
On my little corner of the internet, I document my adventures in tech and complain about the internet of shit. This is **Projects & Privacy**.
# Welcome!
You're on a site called a [[Misc/what-is-a-garden|Digital Garden]]. I write about open-source software, my tech projects, legal issues, and more.
You're on a [[Misc/what-is-a-garden|Digital Garden]] dedicated to open-source use and contribution, legal issues in tech, and more.
For a monthly list of what's new on the site, subscribe to the [Updates RSS feed](/Updates.xml).
## Important Links
[[about-me|About Me]] | [[curated|Recommended Reading]] | [[Misc/disclaimers|Disclaimers/Terms of Use]] | [Monthly Changelog](/Updates) | <a rel="me" href="https://social.treehouse.systems/@be_far">Mastodon</a>
[[about-me|About Me]] | [[curated|Recommended Reading]] | [[Misc/disclaimers|Disclaimers/Terms of Use]] | [[/Updates|Monthly Changelog]] | <a rel="me" href="https://social.treehouse.systems/@be_far">Mastodon</a>
<br/><br/>
not legal advice 🤟

View File

@ -1,5 +1,6 @@
---
title: New Note
tags: []
tags:
date: <% tp.date.now("yyyy-MM-DD") %>
lastmod: <% tp.date.now("yyyy-MM-DD") %>
---

View File

@ -29,13 +29,18 @@ export const defaultContentPageLayout: PageLayout = {
Component.Darkmode(),
Component.DesktopOnly(Component.Explorer({
sortFn: (a, b) => {
const emojis = /([\u2700-\u27BF]|[\uE000-\uF8FF]|\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDFFF]|[\u2011-\u26FF]|\uD83E[\uDD10-\uDDFF])/g
const a_name = a.name.replace(emojis, '').trim()
const a_dname = a.displayName.replace(emojis, '').trim()
const b_name = b.name.replace(emojis, '').trim()
const b_dname = b.displayName.replace(emojis, '').trim()
// Sort order: folders first, then files. Sort folders and files alphabetically
if (a.name.match(/Home$/)) { return -1 }
if (b.name.match(/Home$/)) { return 1 }
if (/^.*Home$/.test(a_dname)) { return -1 }
if (/^.*Home$/.test(b_dname)) { return 1 }
if ((!a.file && !b.file) || (a.file && b.file)) {
// numeric: true: Whether numeric collation should be used, such that "1" < "2" < "10"
// sensitivity: "base": Only strings that differ in base letters compare as unequal. Examples: a ≠ b, a = á, a = A
return a.displayName.localeCompare(b.displayName, undefined, {
return a_dname.localeCompare(b_dname, undefined, {
numeric: true,
sensitivity: "base",
})

View File

@ -51,7 +51,7 @@ export default ((opts?: Partial<FolderContentOptions>) => {
<p>{content}</p>
</article>
<div class="page-listing">
{options.showFolderCount && (
{options.showFolderCount && allPagesInFolder.length != 0 && (
<p>
{i18n(cfg.locale).pages.folderContent.itemsUnderFolder({
count: allPagesInFolder.length,

View File

@ -90,7 +90,13 @@ function TagContent(props: QuartzComponentProps) {
<div class={classes}>
<article>{content}</article>
<div class="page-listing">
<p>{i18n(cfg.locale).pages.tagContent.itemsUnderTag({ count: pages.length })}</p>
{pages.length != 0 && (
<p>
{i18n(cfg.locale).pages.tagContent.itemsUnderTag({
count: pages.length
})}
</p>
)}
<div>
<PageList {...listProps} />
</div>