quartz/content/notes/01-bits-and-bytes.md
2022-07-14 20:21:10 +12:00

134 lines
4.1 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "01-bits-and-bytes"
aliases:
tags:
- cosc204
- lecture
sr-due: 2022-07-21
sr-interval: 7
sr-ease: 250
---
[memory](notes/memory.md)
[unicode](notes/unicode.md)
[ASCII](notes/ASCII.md)
[digital-data](notes/digital-data.md)
# What is data
- A facta piece of information
- corresponds to discreete facts about phenomena from which we gain information abou the world
- The concept of a *value* is fundamental to data
- e.g., 25, $356.00, April, "this is a sentence", colours etc
- Vaues are abstract, they are interpretations of data
- There are many way of storing the same data
- e.g., 12, twelve, XII, 1100, · ··, ·----··---
# How computers represent data
- In *Binary*
- Stored in one of two states, true/false, 1/0, on/off, voltage/no voltage
- Each instance of a state is called a *bit*. (binary digit)
- *Values* are represented as a sequence of bits.
- e.g., 1000001
- The computer doesn't "know" what any given sequence means, **you** know.
- could be 65, A, or anything **You** want it to mean
# Computer memory
- SImilar to the switch board in your home
- Each switch has a number
- they are all always there
- you can switch the state by flipping the switch
- Each switch has:
- A address/location (swtich number)
- A value (on/off)
- computer languages allow us to name some of the locations, its easuer than remembering its number (variable)
# Bits, Nibbles, Bytes
- The smallest unit of storage is a buit (0 or 1)
- (for convenience) bit are grouped into larger units.
- a nibble is 4 bits
- a byte is 8 bits
- For convenience bytes are given addresses, not nibbles or bits. (they are too small to work with most of the time)
# A Word of memory
- The word is the number of bits the cpu uses internally, varies between manufacturers and CPUs.
- Now its usually 64 bits
- [amount of bits for different devices](https://i.imgur.com/nHrz1zX.png)
# Characters
- A written symbol.
- In english are represented as a single byte, (other languages use 2 bytes or more)
- e.g., [different types of characters](https://i.imgur.com/DBLVhw8.png)
- characters are joined together to make human readable numbers and words
- `char ch`
- ch is a variable name (identifier)
- used to label a location in the computer's memory where a byte is stored
- when the code is compile, the name is assigned an address, in memory. The meaning of that data depends on how a human interprets it. it might be small integer, or a character, or a color etc.
## ASCII Character Code
![ascii code](https://i.imgur.com/NbBtm1v.png)
1. The computer uses ch as a integer index into a pre-existing table
2. the computer screen is made up of a thousand little dots called pixels. theyre in a rectangular grid like a table.a
- [ascii code example](https://i.imgur.com/9uvKRVo.png)
- There are several tables that describe what to draw
- fonts describe how to draw them
- ASCII (american standard code for information) describes what should be drawn for Roman (english like) alphabets
- e.g.,
- A 1000001 (65)
- B 1100001 (97)
- 9 0111001 (57)
- There are only a few letter numbers and punctuation marks. The remaining ASCII code are non-printing and have other meaning (line feed, for feed, tab etcc)
- ASCII characters are stored using 7-bits
- so there are 128 (2^7) possible characters
- stored as a byte with the 8th bit set to zero
- For sorting purposes characters are compared on their numeric value (called the *collating sequence*)
- 'A' is before 'Z' but 'a' is after 'Z'!
## Unicode
![unicode](https://i.imgur.com/GEtVItW.png)
- Other non roman languages
- greek, arabic, chinese, hebrew, japanese, thai etc.
- atrology symbols
- emoji etc
- Unicode
- developed by the Unicode Consortium
- coordinated with ISO/IEC 10646
- a 21-bit code with 144,697 characters from 159 scripts
- unicode maps from character numbers (code points) into glyphs (graphical representations)
- Some(many) are reserved
# Homework
- How are character strings (e.g. “hello world”) stored in a computer?
- Is this different between different programming languages (for example; C and Java)?