quartz/content/notes/18-ML-in-IA-2.md
2022-10-17 13:28:03 +13:00

121 lines
3.4 KiB
Markdown

---
title: "18-ML-in-IA-2"
aliases:
tags:
- lecture
- comp210
sr-due: 2022-11-08
sr-interval: 22
sr-ease: 270
---
# nefarious uses of ml
## password guessing
- normally based on heuristics that are designed by humans
- biased may not match true distributions of passwords
- leaked data can be used to "learn" what to guess
- gain insight into what users use as passwords
## alternative - PassGan
- use statistical distribution of passwords then use this to generate guesses
![passGAN](https://i.imgur.com/439hrXq.png)
![passGAN sample data](https://i.imgur.com/Y7A4ygP.png)
- can generate passwords that are likely to be used
- also based off previous passwords
- passwords can be guessed in less attempts
- need to update our rules - e.g., how many guesses makes an attempt likely to be suspicious
-
- new password generate, which also provides real world indicator of password strength
- faster password guessing
- hackers will get in faster
- need to be a step ahead of this
- insight into strong but unused passwords
- passwords get close and closer to those typically used
## password "guessing"
- gets faster as machines get faster (Moore's law)
- machine learning reduces number of trials further by learning distributions of passwords
- useful for us
- even if we didn't do this research the hackers would
- use passgan to detect guesses which may have come from passgan
- can analyse the source of guesses for suspicous stuff e.g., ip, location etc
- can analyse data from antivirus programs
- useful for hackers
- hackers can conquer our strategies
## steganography
- hiding secret messages in a medium that is not meant to be secret (e.g., image, audio, video)
- used to hide content and reduce suspicion e.g., in forensic investigation
- hidden message usually encryted but not in the sense of cryptography
- goal is to decieve
![](https://i.imgur.com/JWOIBw1.png)
- embed noise into images
### signal to noise
- most signals contain noise e.g., static
- noise carries info as the least significant bits in value
- hiding data in an image in the least significant bits will be visually percieved as noise
### e.g., derek uphams JSteg
![slide](https://i.imgur.com/nGEGhPA.png)
### stegnalysis
- detecting hidden content
- usually visually undetectable
how
- analyse DCT distributions
- ![example DCT distribution](https://i.imgur.com/iki90fH.png)
F5 steganographic algorithm
- developed to fool analysis of dct distributions
- seeded with key to create pseudorandom sequence for embedding
- can preserve statistical properties of DCT distributions
can use ML to find hidden images
- then hackers will try to fool this
- some will always get through
# bigger issues
- deepfakes to to shape political views of the day
- pixel replacement with segmentation and inpainting
- ![examples](https://i.imgur.com/zGOtqZa.png)
## is ML good or bad
- being used everywhere
- should we care
- data and modelling cannot always be 100% perfect
- e.g., killer drones
- privacy concerns
- linked data
- pipelins - information seepage
nx integrated data infrastructure
ethics
- what considerations need to be made
- ML being used to automate decision making
- ML sentencing of criminals
theft
- theft of data
- data is more valuable
- transfer learning
- reverse engineering a ML model
![ml extraction attack](https://i.imgur.com/jiinX6m.png)
# where to from here
- good and bad are human constructs
- how will laws work
- can we use ML to make laws
- Do we need to stop it?