quartz/content/notes/18-ML-in-IA-2.md
2022-11-14 10:03:32 +13:00

3.4 KiB

title aliases tags sr-due sr-interval sr-ease
18-ML-in-IA-2
lecture
comp210
2023-01-21 68 270

nefarious uses of ml

password guessing

  • normally based on heuristics that are designed by humans
    • biased may not match true distributions of passwords
  • leaked data can be used to "learn" what to guess
  • gain insight into what users use as passwords

alternative - PassGan

  • use statistical distribution of passwords then use this to generate guesses

passGAN passGAN sample data

  • can generate passwords that are likely to be used

  • also based off previous passwords

  • passwords can be guessed in less attempts

    • need to update our rules - e.g., how many guesses makes an attempt likely to be suspicious
  • new password generate, which also provides real world indicator of password strength

  • faster password guessing

    • hackers will get in faster
    • need to be a step ahead of this
  • insight into strong but unused passwords

  • passwords get close and closer to those typically used

password "guessing"

  • gets faster as machines get faster (Moore's law)

  • machine learning reduces number of trials further by learning distributions of passwords

  • useful for us

    • even if we didn't do this research the hackers would
    • use passgan to detect guesses which may have come from passgan
    • can analyse the source of guesses for suspicous stuff e.g., ip, location etc
    • can analyse data from antivirus programs
  • useful for hackers

    • hackers can conquer our strategies

steganography

  • hiding secret messages in a medium that is not meant to be secret (e.g., image, audio, video)
  • used to hide content and reduce suspicion e.g., in forensic investigation
  • hidden message usually encryted but not in the sense of cryptography
  • goal is to decieve

  • embed noise into images

signal to noise

  • most signals contain noise e.g., static
  • noise carries info as the least significant bits in value
  • hiding data in an image in the least significant bits will be visually percieved as noise

e.g., derek uphams JSteg

slide

stegnalysis

  • detecting hidden content
  • usually visually undetectable

how

  • analyse DCT distributions
  • example DCT distribution

F5 steganographic algorithm

  • developed to fool analysis of dct distributions
  • seeded with key to create pseudorandom sequence for embedding
  • can preserve statistical properties of DCT distributions

can use ML to find hidden images

  • then hackers will try to fool this
  • some will always get through

bigger issues

  • deepfakes to to shape political views of the day
  • pixel replacement with segmentation and inpainting
    • examples

is ML good or bad

  • being used everywhere

  • should we care

  • data and modelling cannot always be 100% perfect

    • e.g., killer drones
  • privacy concerns

  • linked data

  • pipelins - information seepage

nx integrated data infrastructure

ethics

  • what considerations need to be made
  • ML being used to automate decision making
  • ML sentencing of criminals

theft

  • theft of data

  • data is more valuable

  • transfer learning

  • reverse engineering a ML model ml extraction attack

where to from here

  • good and bad are human constructs
  • how will laws work
  • can we use ML to make laws
  • Do we need to stop it?