🌱 a fast, batteries-included static-site generator that transforms Markdown content into fully functional websites
Go to file
うろちょろ ec26ebcc9e
Some checks failed
Build and Test / build-and-test (macos-latest) (push) Has been cancelled
Build and Test / build-and-test (ubuntu-latest) (push) Has been cancelled
Build and Test / build-and-test (windows-latest) (push) Has been cancelled
Build and Test / publish-tag (push) Has been cancelled
Docker build & push image / build (push) Has been cancelled
feat: improve search tokenization for CJK languages (#2231)
* feat: improve search tokenization for CJK languages

Enhance the encoder function to properly tokenize CJK (Chinese, Japanese,
Korean) characters while maintaining English word tokenization. This fixes
search issues where CJK text was not searchable due to whitespace-only
splitting.

Changes:
- Tokenize CJK characters (Hiragana, Katakana, Kanji, Hangul) individually
- Preserve whitespace-based tokenization for non-CJK text
- Support mixed CJK/English content in search queries

This addresses the CJK search issues reported in #2109 where Japanese text
like "て以来" was not searchable because the encoder only split on whitespace.

Tested with Japanese, Chinese, and Korean content to verify character-level
tokenization works correctly while maintaining English search functionality.

* perf: optimize CJK search encoder with manual buffer tracking

Replace regex-based tokenization with index-based buffer management.
This improves performance by ~2.93x according to benchmark results.

- Use explicit buffer start/end indices instead of string concatenation
- Replace split(/\s+/) with direct whitespace code point checks
- Remove redundant filter() operations
- Add CJK Extension A support (U+20000-U+2A6DF)

Performance: ~878ms → ~300ms (100 iterations, mixed CJK/English text)

* test: add comprehensive unit tests for CJK search encoder

Add 21 unit tests covering:
- English word tokenization
- CJK character-level tokenization (Japanese, Korean, Chinese)
- Mixed CJK/English content
- Edge cases

All tests pass, confirming the encoder correctly handles CJK text.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-02 10:04:38 -08:00
.github chore(deps): bump the ci-dependencies group across 1 directory with 2 updates (#2234) 2025-12-01 17:32:52 -08:00
content re-add gitkeep to content 2023-12-11 15:34:21 -08:00
docs Fix optional chaining in tag explorer exclude example (#2200) 2025-11-11 09:14:04 -08:00
quartz feat: improve search tokenization for CJK languages (#2231) 2025-12-02 10:04:38 -08:00
.gitattributes add gitattributes for windows 2023-08-02 20:59:56 -07:00
.gitignore feat: support configurable ws port and remote development (#429) 2023-08-27 17:39:42 -07:00
.node-version Node 22 (#1997) 2025-05-28 16:20:59 -07:00
.npmrc add engines field 2023-08-20 08:57:56 -07:00
.prettierignore fix notes 2023-08-07 23:57:24 -07:00
.prettierrc Use semi: false for prettier config 2022-05-02 08:57:25 -07:00
CODE_OF_CONDUCT.md run prettier 2023-07-22 17:27:41 -07:00
Dockerfile fix(docker): instructions + bump deps + bind mount (#1809) 2025-03-06 10:01:25 -08:00
globals.d.ts refactor(comments): move script to files (#1308) 2024-08-05 15:17:11 -04:00
index.d.ts feat: reader mode 2025-04-17 19:45:17 -07:00
LICENSE.txt add base structure 2021-07-18 09:35:42 -04:00
package-lock.json chore(deps): bump the production-dependencies group across 1 directory with 7 updates (#2233) 2025-12-01 17:33:43 -08:00
package.json chore(deps): bump the production-dependencies group across 1 directory with 7 updates (#2233) 2025-12-01 17:33:43 -08:00
quartz.config.ts feat(favicon): add plugin to expose favicon from icon.png (#1942) 2025-04-26 11:06:59 -07:00
quartz.layout.ts feat: reader mode 2025-04-17 19:45:17 -07:00
README.md fix: remove quartz 3 references, update font style in popovers 2024-01-21 12:39:20 -08:00
tsconfig.json perf: incremental rebuild (--fastRebuild v2 but default) (#1841) 2025-03-16 14:17:31 -07:00

Quartz v4

“[One] who works with the door open gets all kinds of interruptions, but [they] also occasionally gets clues as to what the world is and what might be important.” — Richard Hamming

Quartz is a set of tools that helps you publish your digital garden and notes as a website for free. Quartz v4 features a from-the-ground rewrite focusing on end-user extensibility and ease-of-use.

🔗 Read the documentation and get started: https://quartz.jzhao.xyz/

Join the Discord Community

Sponsors