Merge pull request #95 from plastic-labs/vince/link-updates

fix: add more links
2026-03-22 05:55:42 -05:00 · 2025-03-06 16:20:46 -05:00 · 2025-03-06 16:20:46 -05:00 · 783108953f
commit 783108953f
parent 919a794362 635add3458
1 changed files with 4 additions and 0 deletions
--- a/content/careers/Founding
+++ b/content/careers/Founding
@ -52,8 +52,11 @@ We're building systems that haven't been built before, solving problems that hav
 [Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm](https://arxiv.org/pdf/2102.07350)  
 [Theory of Mind May Have Spontaneously Emerged in Large Language Models](https://arxiv.org/pdf/2302.02083v3)  
 [Think Twice: Perspective-Taking Improved Large Language Models' Theory-of-Mind Capabilities](https://arxiv.org/pdf/2311.10227)  
+[Refusal in Language Models is Mediated by a Single Direction](https://arxiv.org/abs/2406.11717)  
 [Representation Engineering: A Top-Down Approach to AI Transparency](https://arxiv.org/abs/2310.01405)  
 [Theia Vogel's post on Representation Engineering Mistral 7B an Acid Trip](https://vgel.me/posts/representation-engineering/)  
+[Cognitive Behaviors that Enable Self-Improving Reasoners](https://arxiv.org/abs/2503.01307)  
+[All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning](https://arxiv.org/abs/2503.01067)  
 [A Roadmap to Pluralistic Alignment](https://arxiv.org/abs/2402.05070)  
 [Open-Endedness is Essential for Artificial Superhuman Intelligence](https://arxiv.org/pdf/2406.04268)  
 [Simulators](https://generative.ink/posts/simulators/)  
@ -64,6 +67,7 @@ We're building systems that haven't been built before, solving problems that hav
 [Language Models Represent Space and Time](https://arxiv.org/pdf/2310.02207)  
 [Generative Agents: Interactive Simulacra of Human Behavior](https://arxiv.org/abs/2304.03442)  
 [Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge](https://arxiv.org/abs/2407.19594) 
+[Synthetic Sentience: Joscha Bach](https://www.youtube.com/watch?v=cs9Ls0m5QVE)  
 [Cyborgism](https://www.lesswrong.com/posts/bxt7uCiHam4QXrQAA/cyborgism)  
 [Spontaneous Reward Hacking in Iterative Self-Refinement](https://arxiv.org/abs/2407.04549)  
 [... accompanying twitter thread](https://x.com/JanePan_/status/1813208688343052639)