quartz/content/notes/characters.md
2022-11-02 13:17:25 +13:00

47 lines
1.8 KiB
Markdown

---
title: "characters"
aliases:
tags:
- cosc204
---
# Characters
- A written symbol.
- In english are represented as a single byte, (other languages use 2 bytes or more)
- e.g., [different types of characters](https://i.imgur.com/DBLVhw8.png)
- characters are joined together to make human readable numbers and words
- `char ch`
- ch is a variable name (identifier)
- used to label a location in the computer's memory where a byte is stored
- when the code is compile, the name is assigned an address, in memory. The meaning of that data depends on how a human interprets it. it might be small integer, or a character, or a color etc.d
- each byte (or group of bytes) represents a number which maps to a character using a mapping like [Unicode](notes/characters.md#Unicode) or [ASCII](notes/characters.md#ASCII)
# ASCII
![ascii code|300](https://i.imgur.com/NbBtm1v.png)
A char is a 7-bit number (usually stored as a byte with the 8th bit set to zero) used as an index into a table of characters. The font describes how to draw these characters
- ASCII (american standard code for information) describes what should be drawn for Roman (english like) alphabets
- ASCII characters are stored using 7-bits
- so there are 128 (2^7) possible characters
- stored as a byte with the 8th bit set to zero
- For sorting purposes characters are compared on their numeric value (called the *collating sequence*)
- 'A' is before 'Z' but 'a' is after 'Z'!
# Unicode
![unicode|300](https://i.imgur.com/GEtVItW.png)
A 21-bit code with 144,697 characters from 159 scripts
- Other non roman languages
- greek, arabic, chinese, hebrew, japanese, thai etc.
- atrology symbols
- emoji etc
- Unicode
- developed by the Unicode Consortium
- coordinated with ISO/IEC 10646
- unicode maps from character numbers (code points) into glyphs (graphical representations)
- Some(many) are reserved