quartz/content/notes/characters.md

---
title: "characters"
aliases:
tags:
- cosc204
---

# Characters
- A written symbol.
- In english are represented as a single byte, (other languages use 2 bytes or more)
- e.g., [different types of characters](https://i.imgur.com/DBLVhw8.png)

- characters are joined together to make human readable numbers and words

- `char ch`
- ch is a variable name (identifier)
- used to label a location in the computer's memory where a byte is stored
- when the code is compile, the name is assigned an address, in memory. The meaning of that data depends on how a human interprets it. it might be small integer, or a character, or a color etc.d

- each byte (or group of bytes) represents a number which maps to a character using a mapping like [Unicode](notes/characters.md#Unicode) or [ASCII](notes/characters.md#ASCII)

# ASCII
![ascii code|300](https://i.imgur.com/NbBtm1v.png)

A char is a 7-bit number (usually stored as a byte with the 8th bit set to zero) used as an index into a table of characters. The font describes how to draw these characters

- ASCII (american standard code for information) describes what should be drawn for Roman (english like) alphabets
- ASCII characters are stored using 7-bits
	- so there are 128 (2^7) possible characters
	- stored as a byte with the 8th bit set to zero
	- For sorting purposes characters are compared on their numeric value (called the *collating sequence*)
	- 'A' is before 'Z' but 'a' is after 'Z'!

# Unicode
![unicode|300](https://i.imgur.com/GEtVItW.png)

A 21-bit code with 144,697 characters from 159 scripts

- Other non roman languages
	- greek, arabic, chinese, hebrew, japanese, thai etc.
	- atrology symbols
	- emoji etc
- Unicode
	- developed by the Unicode Consortium
	- coordinated with ISO/IEC 10646
	- unicode maps from character numbers (code points) into glyphs (graphical representations)
	- Some(many) are reserved