mirror of
https://github.com/jackyzha0/quartz.git
synced 2025-12-28 07:14:05 -06:00
121 lines
3.4 KiB
Markdown
121 lines
3.4 KiB
Markdown
---
|
|
title: "18-ML-in-IA-2"
|
|
aliases:
|
|
tags:
|
|
- lecture
|
|
- comp210
|
|
sr-due: 2022-10-13
|
|
sr-interval: 3
|
|
sr-ease: 250
|
|
---
|
|
|
|
# nefarious uses of ml
|
|
## password guessing
|
|
- normally based on heuristics that are designed by humans
|
|
- biased may not match true distributions of passwords
|
|
- leaked data can be used to "learn" what to guess
|
|
- gain insight into what users use as passwords
|
|
|
|
## alternative - PassGan
|
|
- use statistical distribution of passwords then use this to generate guesses
|
|
|
|

|
|

|
|
- can generate passwords that are likely to be used
|
|
- also based off previous passwords
|
|
- passwords can be guessed in less attempts
|
|
- need to update our rules - e.g., how many guesses makes an attempt likely to be suspicious
|
|
-
|
|
|
|
- new password generate, which also provides real world indicator of password strength
|
|
- faster password guessing
|
|
- hackers will get in faster
|
|
- need to be a step ahead of this
|
|
- insight into strong but unused passwords
|
|
- passwords get close and closer to those typically used
|
|
|
|
## password "guessing"
|
|
- gets faster as machines get faster (Moore's law)
|
|
- machine learning reduces number of trials further by learning distributions of passwords
|
|
|
|
- useful for us
|
|
- even if we didn't do this research the hackers would
|
|
- use passgan to detect guesses which may have come from passgan
|
|
- can analyse the source of guesses for suspicous stuff e.g., ip, location etc
|
|
- can analyse data from antivirus programs
|
|
|
|
- useful for hackers
|
|
- hackers can conquer our strategies
|
|
|
|
## steganography
|
|
- hiding secret messages in a medium that is not meant to be secret (e.g., image, audio, video)
|
|
- used to hide content and reduce suspicion e.g., in forensic investigation
|
|
- hidden message usually encryted but not in the sense of cryptography
|
|
- goal is to decieve
|
|
|
|

|
|
- embed noise into images
|
|
|
|
### signal to noise
|
|
- most signals contain noise e.g., static
|
|
- noise carries info as the least significant bits in value
|
|
- hiding data in an image in the least significant bits will be visually percieved as noise
|
|
|
|
### e.g., derek uphams JSteg
|
|

|
|
|
|
### stegnalysis
|
|
- detecting hidden content
|
|
- usually visually undetectable
|
|
|
|
how
|
|
- analyse DCT distributions
|
|
- 
|
|
|
|
F5 steganographic algorithm
|
|
- developed to fool analysis of dct distributions
|
|
- seeded with key to create pseudorandom sequence for embedding
|
|
- can preserve statistical properties of DCT distributions
|
|
|
|
can use ML to find hidden images
|
|
- then hackers will try to fool this
|
|
- some will always get through
|
|
|
|
# bigger issues
|
|
- deepfakes to to shape political views of the day
|
|
- pixel replacement with segmentation and inpainting
|
|
- 
|
|
|
|
## is ML good or bad
|
|
- being used everywhere
|
|
|
|
- should we care
|
|
|
|
- data and modelling cannot always be 100% perfect
|
|
- e.g., killer drones
|
|
|
|
- privacy concerns
|
|
- linked data
|
|
- pipelins - information seepage
|
|
|
|
nx integrated data infrastructure
|
|
|
|
ethics
|
|
- what considerations need to be made
|
|
- ML being used to automate decision making
|
|
- ML sentencing of criminals
|
|
|
|
theft
|
|
- theft of data
|
|
- data is more valuable
|
|
- transfer learning
|
|
|
|
- reverse engineering a ML model
|
|

|
|
|
|
|
|
# where to from here
|
|
- good and bad are human constructs
|
|
- how will laws work
|
|
- can we use ML to make laws
|
|
- Do we need to stop it? |