Adversarial Machine Learning – CF060
This week on Cyber Frontiers Christian and Jim dive into ‘jedi mind tricks’ of AI, better known as adversarial machine learning. We narrate stories that tell tales of fooling the common systems that humans have come to rely on today, from maps to self driving cars, to airport scanners and identification systems. We discuss the growing surface area for cyber physical attacks, and the lack of general purpose solutions and proofs needed to tackle securing machine learning algorithms. We tie together IoT, AI, and ML technologies on the cyber frontier with a new security perspective not analyzed prior on the show. Grab your speakers and hone your jedi-like concentration skills, this is a show you won’t want to miss!
Cyber Frontiers is all about Exploring Cyber security, Big Data, and the Technologies Shaping the Future! Christian Johnson will bring fresh and relevant topics to the show based on the current work he does.
Support the Average Guy: https://www.patreon.com/theaverageguy
WANT TO SUBSCRIBE? We now have Video Large / Small and Video iTunes options at http://theAverageGuy.tv/subscribe
You can contact us via email at jim@theaverageguy.tv
Full show notes and video at http://theAverageGuy.tv/cf060
Podcast, Cyber Frontiers,
An Artist Used 99 Phones to Fake a Google Maps Traffic Jam
https://www.wired.com/story/99-phones-fake-google-maps-traffic-jam/
https://www.youtube.com/watch?v=k5eL_al_m7Q&feature=emb_title
Tainted Data Can Teach Algorithms the Wrong Lessons
https://www.wired.com/story/tainted-data-teach-algorithms-wrong-lessons/
“A recent survey of executives by Accenture found that 75 percent believe their business would be threatened within five years if they don’t deploy AI. Amid this urgency, security is rarely a consideration.”
People Keep Coming Up With Ways to Fool Tesla’s Autopilot
https://www.thedrive.com/sheetmetal/18168/people-keep-coming-up-with-ways-to-fool-teslas-autopilot
- The orange sensor hack
- The DUI driving defense.
Experimental Security Research of Tesla Autopilot
https://keenlab.tencent.com/en/whitepapers/Experimental_Security_Research_of_Tesla_Autopilot.pdf
Researchers use $5 speaker to hack IoT devices, smartphones, automobiles
“If you look through the lens of computer science, you won’t see this security problem,” Fu said in the release. “If you look through the lens of materials science, you won’t see this security problem. Only when looking through both lenses at the same time can one see these vulnerabilities.”
To cripple AI, hackers are turning data against itself
https://www.wired.co.uk/article/artificial-intelligence-hacking-machine-learning-adversarial
Researchers Fooled a Google AI Into Thinking a Rifle Was a Helicopter
https://www.wired.com/story/researcher-fooled-a-google-ai-into-thinking-a-rifle-was-a-helicopter/
- Synthetic imagery generation. Independent optimization of color values. Indirect encoding with more regular patterns.
Academic Papers:
Jim Collison [0:00]
This is the Average Guy Network and you have found Cyber Frontiers show number 60 recorded on February 11, 2020.
Jim Collison [0:19]
Here on cyber frontiers we explore cyber security big data and the technologies that are shaping the future. If you have questions, comments or contributions, you can always send us an email. send that to me, Jim at the average guy.tv you can contact Christian he’s really the brainchild behind this Christian at the average guy.tv find me on Twitter at j Coulson and Christian is at board whisper course the average guy.tv powered by Maple Grove partners get secure reliable high speed hosting from people that you know and you trust plans start as little as $10 a month for all kinds of great stuff. Maple Grove partners.com I think Wieger took you up on your email offer to I think it five $5 for for Email hosting was at your maybe? Oh, yeah. Yeah. And he loves it
Christian Johnson [1:04]
will price match any other email plan if you’re bringing something new over so he’s, he’s digging it as well.
Jim Collison [1:12]
Christian we’ve titled The show adversary machine learning I don’t think I’ve ever had a show title actually ready to go at the very beginning of a show.
Christian Johnson [1:21]
Amazing right?
Christian Johnson [1:22]
It no I just asked planning
Jim Collison [1:25]
and actually these are the best show notes I’ve ever seen. So make sure you head out to the average guy.tv/cf060 I think is what will get you there. And you’ll want to see the show notes as well. What do you mean by Adversarial Machine Learning?
Christian Johnson [1:41]
Yeah, so I kind of wanted to do some storytelling tonight but really adversarial machine learning I’m increasingly convinced is the largest undefined area of the cyber security industry right. So when we say that cyber security and large has like hugely gained in visibility in the C suite, etc etc or talking about like your bread and butter issues of what people understand as common enterprise grade cyber security today, or protecting consumers and consumer grade security, where we’re talking about adversarial learning is really what I call on the frontiers of cyber period because it covers an area where we are rushing into the future of the technology and security is not the primary consideration. And when I think about the technologies that we now find to be commonplace today that we take for granted. We did the same thing 20 years ago, where we rushed into those technologies, and then we had to bake in layers of security after the fact on top of it, when the reality was that the protocol or the service or the or the technical standard, was never designed for security just wasn’t baked in from the get go. And so we’ve talked about on the show ad nauseum about, you know where we are with data science, where we are with ml where we are with AI, and we’ve talked about the capabilities, and we continue to run forward, you know, 200 miles in a given direction. And I saw an article this week that just really made me think this is this is something we definitely need to talk about cyber frontiers. Without a doubt, it’s an area that is really going to be defined after the fact because we have committed ourselves so greatly to the technology without committing ourselves to securing that technology. So the article came out February 3 2020, and it was entitled and artists used 99 phones to fake a Google Maps traffic jam. And I just thought that was the most fascinating thing I’d heard in a while. Not because it was like you required like a man in a science lab. You know, some amazing genius to do it, but because I just thought, What a cool and weird and unusual little experiment that the average guy could go and do, and have this type of broad scale impact. I mean, this article got coverage everywhere in tech blogs articles. And I thought, that’s kind of interesting to me that that many people care about this. And what the guy basically did was he bought 99 cell phones, he rented them or bought them whatever he did. They each had unique SIM cards, unique IDs, unique operating systems, etc. He loaded up Google maps onto all of them. He placed all 99 phones and a little red wagon like you would imagine pulling a toddler down the street with and he starts walking around the block of the Google Berlin campus in Germany. And what you start to see in his YouTube video that he’s chronicled, is that as he’s walking down that street, Google is showing you know how when everyone turns on their Google Maps when you live in a big area like here in DC man if you don’t have Google Maps or ways on and your your rush hour commuter like what are you doing And so, you know, he’s got all these map apps open and he’s walking around the city and you know it like I do when you’re driving in the car and you’re looking at Google Maps, you’re looking to avoid those, you know, double bar red lines when you’re cruising down the street. And if the Google algorithm is smart enough or realizes that there’s as much distress as what traffic shows, chances are, it’s going to automatically recommend, hey, I’m going to reroute you because you’re,
Christian Johnson [5:33]
you’re you’re about to run up against the double red line. And so here, this guy has basically recreated a DC like traffic jam when there’s not a single car on the road. And so he’s walking around with his little, you know, toddler red tote in the back, and he’s like circling around the Google campus to see if anyone at Google notices the strange phenomenon of this giant traffic jam reported. Google Maps. And you know, humorously, he talks about how roads were virtually empty. cars were getting redirected around his traffic jam and there wasn’t a single car on the road while he was walking by. And so I just thought to myself, this is such a, like, I love painting complex problems that we’re going to face with dirt, stupid, simple stuff like this. And this is brilliant, right? Like anyone could go and set up these devices and do this. I think it takes a certain level of creativity to do it, and a certain amount of resources and fund money. I think this guy had all of that going for him in his favor. And this just made my week in terms of an article. But it brought up kind of this bigger theme of our consumers get alone the enterprise really prepared to handle this world where we are becoming increasingly reliant on artificial intelligence to power our daily lives. You know, whether it’s how we’re getting to and from work, whether it’s what types of financial transactions where, you know, day trading, whether it’s our, our cars or self driving cars getting us from point A to B, we are just becoming increasingly reliant on these, you know, algorithms that operate things that we think we’ve automated ourselves out of a job, right? The dream of any software engineer is to be able to automate yourself out of the job because it means you’ve reached some kind of, you know, core code complexity where either your services a value greater than you maintaining it, you know, or you’ve, you’ve cracked the holy grail of artificial intelligence where it can become self writing and self describing. So, I just I wanted to open it up because once I peel this onion a little bit deeper, to talk about the kind of relationship to our average daily lives. There is just example after example of this kind of thing going on. And it’s evident to me that There is very little in the way of a security model. I mean, it’s virtually untested, you will find so many academic papers that talk about the issue as it stands, which it’s been getting increasingly published about in the last couple of years, you won’t find a single paper that proposes much in the way of a solution to how do you stop basically, what I call adversarial machine learning. It’s what academic calls it. And it’s really kind of this fundamental notion that with very small, well, it’s two things. It’s two things I want to be clear about that number one, it’s this use case where I just talked about where you introduce sensors into an environment, or you mutate or alter sensors into an environment, such that you’ve taken control of an algorithm to do something that wasn’t designed for its intended purpose. Or you aren’t my With the sensors or the capability of analyzing that data, but you’re introducing something into the environment with a slight variance, such that the algorithm is absolutely convinced that that thing with a slight deviation is the same or correct thing. And that one is the much scarier adversary use case that I want to talk about a little bit more, because to the human eye, for example, it might look completely normal, but to an algorithm, it might look completely different. And so we are getting to this place as well with what I call it kind of, I call it synthetic adversarial machine learning, which is when we scope the problem to AI in the visualization aspects of AI. Producing synthetic data examples is a classic way of basically fooling a deep neural network into thinking that seeing something that in reality isn’t in the environment physically from what a human would assess, but the algorithm assesses it to be because some slight variant that was introduced that in reality isn’t real. And so these things that we’re introducing as topics have a lot of cultural footprint is being talked about culturally, but the solution map is totally non existent right now.
Jim Collison [10:18]
Christian, before we we dive into that, let me ask you this question because I feel like sometimes the machines don’t get the benefit of the doubt. So a human gets fooled all the time. I mean, is something something tricks us? I mean, just magic right? in some regards, a sleight of hand we weren’t looking we weren’t seeing you’ve been gets fooled in it’s the it’s you know, Fool me once Shame on you fool me twice. Shame on me, right. But that the first time the human, they don’t get blamed, they’re like, Oh, it was the trickster whatever. I don’t feel like the machines get the same benefit of the doubt sometimes. In other words, they have to be perfect out the gate. Or people are like, Ah, this is never going to work or you know, like how bad this is. I’m not sure as we as we begin to get into these complex neural nets and some of these other complex algorithms that have to work, are we going to expect that machines get it perfectly every time, especially when machine learning is used against them? You know, where you have one algorithm pitted against another. Maybe in that case, you know, I just I don’t feel like we have the same tolerance for the machine as we do for the human. But there’s a reason for that.
Christian Johnson [11:27]
And think about it. Humans don’t scale. So we give ourselves the benefit of the doubt that humans are allowed to be faulty because usually when a humans at fault, their blast radius is limited to their circle. When a machine is at fault, their blast radius is potentially infinite, and can span the entire human population. So when you assert or claim to assert that a machine can reasonably operate at or better than a human, you should be damn well sure that is a correct assertion, because that’s going to end impact, how you end up with scalable disasters, and how quickly you have to respond to them. Yeah. And I think that any litmus test or operational bar for introducing an AI algorithm, a machine learning implementation or otherwise at scale, should be held to the measure of what would be the worst thing that happened if this thing went sideways? Or what is the testing confidence that we can give based on running x millions of simulations that tells us the probability of catastrophic scenario x is point o 9%. Probable, and if you can’t get it down to that, if it’s like you’re dealing with a one to 10% error rate on something that has broad scale impact. And on top of that, you have no security algorithm to define a model of defense in that disaster scenario, we’re setting ourselves up for undoing a lot of the progress that we’ve made in the cyber security industry predominantly because people will stop caring as much about going after the soft targets that were five years ago. Right? When we talk about what are the soft targets today, they changed quite a bit from what they were 510 years ago. And why. To me, it’s not that the technology drastically changed in this time frame so much as it is people’s ability to handle incidents, better defense systems, better secure and design them from the onset improved dramatically in that time frame. But the underlying technologies didn’t have a fundamental gravity shift. What we’re talking about here is quite different. We’re saying that the technology is zooming off into a direction of fundamental sea change, and especially with respect to how humans will adopt and use the technology and we’re saying that cybersecurity is not running in parallel as aggressively after that new technology domain. So yeah, it’s great that everyone has cyber security awareness. It’s great that they’ve gone and defended what we talked about today better. But people are just going to stop paying attention to going after exploiting that domain, they’re going to start focusing more and more on what are high value impact things that have this large surface area. And we talk again and again and again on the show about how surface area is one of the number one measurements by which you should assess whether or not you are dealing with a large scale security consideration or small scale, right because your typical average hacker unless we’re talking something where it’s you’re a high value target, it’s a it’s a specific target of interest, and they’re going to go after you matter of what right. It’s like any other crime on the street. It’s a crime of opportunity. And so when you ask yourself as a as an attacker, what’s the low hanging fruit here? It’s gonna be fooling that AI algorithm which for very cheap can return a lot of fun and profit for me as the attacker. And I don’t even necessarily need to know much about computers, which is the most bizarre thing to say. And yet, seems like it could increasingly be the case where I don’t have to write code to attack these systems, I don’t have to be an internet wizard, I have to do any of that. I just have to interact with physical materials in my environment influence something that is a virtual decision maker, and have an outcome that is tilted in my favor as a result. And that’s insane when you think about it. I mean, for the first time that we can really say there’s active interest in people using physical what I would call material science, with some very high level understanding of computer science without having to you know, have a university degree in it by any means. And now I can influence a virtual environment or a virtual outcome using my own physical world. I mean, it’s insane.
Jim Collison [15:58]
The That gets better over time, gets more secure over time. I mean, are we doomed? It seems like we’ve been saying this for a long time. Are we doomed to repeat that the innovation stays just far enough ahead that the cyber security side of it can’t keep up? And it’s always trailing five or 10 years? Is that always going to be reality? You think?
Christian Johnson [16:19]
Yeah, I mean, I think it’s an untested battlefield. We talked about for the, you know, we talked about how AI itself is improving cybersecurity by having, you know, machine learning based capabilities to detect adversarial threats. And that’s, you know, great news. But I think this new area where we’re talking about what I call cyber physical interactions, how does how does a cyber domain interact with physical entities were much less developed and mature in that space. And it seems to me that because our technology capabilities are so strong, but our focus is not like, like everyone feels like they’re the They’re the business of yesterday, if they’re not somehow using the buzzword AI and their bottom line, right? Like, think about that, like we are just like, all c suites are probably having daily meeting saying, you know, wait a second guys like, where are we bringing in the next level of predictive capability or automation or AI for either customers or business? And if they don’t have a good answer to that question, people say, well go find an answer to this question. Like these technologies are already available. People are integrating them, they are becoming very cheap and expensive. And so we get to this point where we have to ask ourselves, is it as expedient for them to invest in securing the artificial intelligence right now as it is for them to just invest in the AI? And the answer is, well, of course not. Because there’s no real strong common place solution. available today. For This is how I’m going to address the fundamental issue. Right? If I talk about how do I secure your average home consumer, I can immediately rattle off three very low cost easy things, go buy a firewall, go buy yourself some antivirus, go buy some cute little Internet of Things, device monitor that, you know, make sure everything is patched in the firmware is good. And these are cheap things, right? low cost, I can mass scale and produce them. The barrier to entry in the education for the average person is very low. None of those things are holding true right now with this particular domain.
Jim Collison [18:39]
Christian What about when we think about you know, this machine learning has to be taught. And what happens when it gets taught? Like we think we’re teaching it the right things. Yeah, but we’re actually teaching it the wrong I think we’ve seen some some pretty interesting examples of that.
Christian Johnson [18:55]
Yeah. So this hits on the exact definition of like, what does it mean to be adversarial and machine learning. And so the intention is, you know, a lot of these models that we can talk about, let’s just scope it to the world of neural networks, right? Where we’ve tried to make deep neural networks that are self learning that learn from past examples and past mistakes, they have their own method of error correction of what the algorithm predicts is error correction, or optimizing to an ideal case, based off what it’s being trained to detect or understand. And it’s sample literature. And over time, sample literature grows and grows and grows. But remember, at the end of the day, those things come down to math. And so, this is like a forest through the trees problem, where if an algorithm over time starts to cluster these types of trees to all be this, you know, species of tree, then to an algorithm, if I make a slight mutation on the color of each of those leaves, algorithms probably gonna say that’s still, you know, deciduous tree x. But humans gonna go and say, Hmm, I’ve never seen a tree with blue leaves in my life before. That can’t be any species I know of. But to the algorithms perspective, hey, my color pixel values are, you know, statistically 99.9% similar or the gradients are similar or the contrast is similar or the brightness is similar, the size and shape of the tree is similar, the height is similar, the roots are similar, the type of soil it’s growing is similar. And so all these values start to get added up over time, where, let’s say an algorithm now takes into consideration over 100 different features that it measures. And, you know, maybe those distill down into 20 kind of special focus features that the algorithm decides, hey, yes, I can measure these 100 things, but really is only these 20 things that matter. Right. And I as an adversary, pick up on one of those features that have tweaked ever so slightly. A human’s not going to care or notice, but an algorithm is going to completely change its meaning. And so a classic example of this that you can find in the show notes is, you know, researchers that fooled a Google AI appliance into thinking of rifle was a helicopter. And so as a human, you’re like, immediately baffled by that statement, right? No human that you know of, is going to look at a,
Christian Johnson [21:36]
a, a rifle, and say, that’s a helicopter, right? But then, like, we start to unravel, like the adversarial example, right. So what’s really happening Well, all these image based algorithms, I have an image that I present to it, okay. And you know how like, image error correction works, right. So You’ve all maybe at one point or another lost or found some photo that you wanted restored to its great former glory or whatever like, and what’s one of the stupidest algorithms you can think of for an image error correction? Well, if you want to increase the quality of the picture, when you don’t have a higher resolution available, or you want to correct regions that are, quote, unquote, statistically incorrect, one of the simplest algorithms you could apply is to look at nearest pixel values. So if it’s a gray here, chances are it would be a great in one pixel to the right one pixel above one pixel below. If I take a picture of I don’t know a dog running across a field and I missing 40% of that image. And I I have the dog but I don’t have the rest of the image. Well, a I can probably fill in pretty quickly what the rest of the grass looks like in that in it. And what the rest of the sky looks like. So even with a, you know, 40% data missing error rate, I could apply this very simple algorithm to restore that picture. Now, that’s a trivial use case. But let’s talk about why that matters to adversarial machine learning. Right? So let’s say I have a picture of two people skiing. And this is one of the, you know, examples that’s covered in this article from wired. Two people standing on a mountain slope there, they have their skis on the other hat, and they’re heard on, you can amount on a map. Imagine a mountain range in the back. It’s a nice snowy day. It’s got a good coat of snow on the ground. And I just ever so slowly introduced the picture of a border collie, right? 1% translucence, 2%, translucence, 3% translucence. And all of a sudden, the AI algorithm start seeing features in that data that you’re presenting that say, Oh, this is a pic. Sure of a border collie that we have here, but to a human eye 5% translucence underlaid, over a picture of two people standing on top of a snow mountain, you’re not going to notice it, you’ll be lucky if maybe around 30 to 40% of that translucent value where you would actually see some hidden border color behind the image. And yet, all of a sudden, the classification algorithm will go from two people chilling out on a mountaintop to border collie. That’s a little bit of a less trivial example, but it’s the same concept. How do I fool a classification algorithm into thinking a gun as a helicopter? I just tweaked those features ever so slightly, or I tweak that sensor ever so slightly, to record values that aren’t really there in the environment, or to make slight mutations that a human trying to validate it would not be Be able to see or correct?
Jim Collison [25:04]
Do we get to the point where we need to? You know, sometimes the perceptions too sharp, right? And White in this in this example, where you’re saying, hey, look, it’s 5% a human would not make that. That determination. We shouldn’t being we being the algorithm, where it doesn’t either, right? I mean it kind of, it’s got to have a little bit of tolerance can that well, that gets smarter. I mean, will we get the more examples we have of that the more mistakes we make? Almost like these, this machine learning is our children, really young children right now? Who don’t really understand, you know, I think about my granddaughter and some of the things she says, and we think it’s cute, it’s wrong, but it’s cute right now surely get better because we say Oh, you don’t say it that way. You say it this way. Much like the whole A child who was given misinformation from their parents taught one way versus another may be taught to hate begins to take on some hate tendencies. Sure, right in that take on some, you know, so do you think what is we approach this is that yes the vulnerabilities but do we also need to set some expectations on the learning cycles so that these, these bots that get racist don’t like we just turned them on and like, well, let’s see what happens. You know do do we need to do a little more training or a little more time?
Christian Johnson [26:35]
Well, so it’s interesting. So the thing that
Christian Johnson [26:39]
I think makes this such a unquantifiable problem space yet is how do you train an algorithm for all types of pollution? Right. So right now what we’re really good at doing an AI is training to a set of use cases, right? what we’re suggesting in order to defend that algorithm is one of two things, either the features that the algorithm pick up on cannot be tampered with. And then you have to assert Well, what does it mean to tamper with this use case? And, you know, are we talking about tampering with sensors? Are we talking about tampering with like serialized data going into the algorithm? Are we talking about, you know, what is it that’s going to tamper with that features natural evolution and learning what it is it’s supposed to be learning? Then what we’re really talking about is there’s a huge difference between tampering with an algorithm at the point where you’re training it, right, like offline training, you get it kind of caught up, and then you set it out into the free world and it’s learning from there. Right? Whereas you’re tampering with it dynamically after it’s already deployed in some kind of prison. Like environment? The second question we have to ask is, so either we can solve the problem by getting the algorithm to assert that we have confidence that the features the algorithm cares about can’t be tampered with, which seems unlikely. Or we have to assert that we have come up with some type of algorithm that can handle all inverse cases from what the use case is that the algorithm is trained to detect. So if I’m trained to detect,I don’t know. We’ll take the previous example
Christian Johnson [28:40]
a rifle going through a security belt, right? You want to, you know, scan people’s bags and have you know, Ai, maybe replace some of the security functions that a TSA inspector would do. For example. What will you show that algorithm to show it all other things and universe that are not the thing it’s looking for. And that is the type of proof that you would have to go figure out if you couldn’t solve the first proof, which is a positive based proof of proof to me that the features that this algorithm relies on for positive identification cannot be altered in its environment, like the Delta would not be significant enough to try to change a training outcome. Or prove to me that you can show all potential things that would otherwise throw this thing off and add it to like a negative data set where the algorithm is going to know that Patta decoyed Berg, right. So I like to classify this set like if I were not to use the words adversarial machine learning the word was invented, like 40 years ago, 3040 years ago, Star Wars fans are gonna you know, get me on my number here, but the term Jedi mind trick adversarial machine learning? Absolutely. It is the same idea, right? Where it’s like a slight alteration your environment, and suddenly you’re doing something completely different in Star Wars universe that are slightly altering the force and your brain cells aren’t working the way you thought they were. But that’s literally what we’re talking about here in a machine learning sense, right? or slightly altering the, the brain cells or the neural nets of these deep algorithms that are constantly reading a new data and making slight adjustments. And we’re, we’re arriving at wildly different outcomes. So totally synonymous.
Jim Collison [30:35]
One of the things I see I’m going to go off script a little bit on this one or off our notes a little bit. One of the things I see in here is we often put these algorithms in or these learning opportunities and isolation and one of the things humans do really well as we learn and we learn in packs like we learn from each other, we learn as you and I are learning you’re learning something and I may see what you learn and then learn from that. In other words, to get the validation, right, and I have lots of, I have lots of things around me for me to bump into, literally and figuratively, that keeps me learning, at least from a norms perspective. And it keeps me from going crazy, like from going from learning something wildly off the low, no, now I should run out into the street, right? You know, we have people around us to say, you know, when you’re young stop, right to do that. We also have a thing called pain, which machines don’t have where they don’t like were reinforced bad behavior is reinforced the or the incorrect behavior is reinforced, whether it’s physical pain or social pain or shame, or correction that comes in the form of something that really makes us change. Where I kind of wonder this is the weakness of an algorithm is it doesn’t experience any kind of pain, it doesn’t have a reason not to go down that behavior from it from incorrect training, you know, type thing hand on the stove, even though I know it, I may try it the first time. Second time somebody tells me to do it, I’m probably not gonna, because I have that lesson learned. So I don’t know, it seems like we were training without any at times in isolation. We’re training without social pressure, so to speak to stay inside the guardrails into or in three. We don’t we don’t have any kind of feedback mechanism for the machine to remember like, Oh, yeah, that was a bad. It was a bad decision. Right? That, am I that are often thinking.
Christian Johnson [32:31]
I mean, that’s so part of the whole intention of deep neural network training is that there is that feedback mechanism that if it gets something wrong, or in a semi supervised approach, right, it gets something wrong and you say, hey, you were wrong. it corrects and adjust for that, right? It just comes down to
Jim Collison [32:51]
at what it’s isolated, though, a lot of times. In other words, we have one researcher who’s doing that, who’s working with it or whatever, where it’s not getting a deep It’s not getting a D, the crowd, so to speak. I mean, if you ever want to figure out how right or wrong you are just puts it right on social media like in Yeah. That’s a really bad example because I don’t think you want to learn based on that. But you know, sometimes I feel like in this area, we’ve got a handful of researchers training it where they’re not thinking through all the real world’s solutions or situations.
Christian Johnson [33:22]
Yeah, and I think that’s fair, right. There are plenty of people who do the kind of one off rodeo machine learning example, where is that? But definitely some of the deeper AI examples. overcome that right. Yeah, not semi supervised. It’s fully unsupervised learning approaches. And the question is when you get to a fully unsupervised approach, can you truly pass a blush test that says, I might error correction rate or my false positive rate is sufficiently low enough that most permutation Or disruptions in my environment is not going to be substantive enough to change the fundamental learning outcome of the algorithm.
Jim Collison [34:07]
It takes humans though years sometimes to learn those permeations Oh, yeah. In in just I sometimes wonder if we are not giving enough, even in a fully supervised forgiving it enough experience to have all the possible scenarios, you know, back in, and then when it does land on it, we blame it, you know, some really obscure which a human would have missed a lot in a long way. It would have fooled us too. And we’re kind of hard on the machine because of that, you know,
Christian Johnson [34:36]
yeah, no, I mean, right on it, it is one of these things where we will need to decide again, what is the formal evaluative criteria, and I think that is why you’re finding that because there is no easy answer for the general use case. The answer is, you know, quite honestly, to run away From it right? In an article about another wired article, by the way, they seem to be very on top of the subject, which I appreciate because anyone can go and buy a Wired magazine for a couple bucks a
Jim Collison [35:10]
month and you’ll learn these things regularly. But they’re offering a $5 a year plan right now.
Christian Johnson [35:15]
There you have it. Set and it will show up at your door. It’ll be like the air before, I think
Jim Collison [35:21]
it’s I think it’s only live version.
Christian Johnson [35:26]
All right. Well, the title of the article is tainted data can teach algorithms long lessons. We just spent the last half, half hour talking about that. But one of these quotes is a very telling quote about what we’ve been discussing. Quote, a recent survey of executives by Accenture found that 75% believe their business world would be threatened within five years that they don’t deploy AI. Amid this urgency security is rarely a consideration. So we painted that brush, right, like, everyone’s sitting in the boardroom. They’re trying to Figure out how to get the next big sexy AI deployed out in the wild so that they can rake in the profits. And security is completely an afterthought in this case, because most humans will say, oh, AI is still kind of that new, edgy, trendy thing. Like I don’t expect it to be this secure robust service yet. But in the same breath, people will be a guest if you know Siri does something wrong or order something wrong, right. So there’s, there’s AI, quote, Ai, which is, again, become a way over broad term, but we’re just going to say, ai that we’ve come to know and depend on and if you think about it, Siri is one of the biggest examples or any type of learning assessment is one of the like, longest running examples of continuous learning and improvement, right? I think Siri over anything else on the market has had this huge leg up on being able to learn from past mistakes, right? I don’t remember What year Siri was introduced? Another iPhone was around Oh, six? No.
Jim Collison [37:04]
So 2010. Let’s just say maybe,
Christian Johnson [37:06]
yeah, so let’s, let’s even just say that, you know, it’s had a full year decade of learning, right? And that data is powerful. And it’s taken 10 years to kind of get to that level of maturity. But in the same breath, I can still find a bunch of ways to make it just not have an answer, which isn’t the end of the world. But that’s better than having it definitively do something in a positive action that is otherwise completely wrong. Yeah, right. So there’s a difference between an AI service that just isn’t mature enough yet to do what you want it to do, and something that is blazingly confident that it’s doing the right thing for you, when in reality, it’s a polar opposite. And so the fact that there’s just this huge rush to get the technology out, and there are not the formal proofs and rebuttals that we’ve discussed for the security mechanisms. This is a wide open surface area. And we’re not going to see it just with the ML algorithms, right? We’re going to see it with just basic sensor data. And those sensors might be feeding an AI algorithm and it might be adversarial in nature. But there’s plenty of examples where I don’t have to have any kind of fundamental understanding of the algorithm I’m trying to coerce or influence I just need to have an under standard understanding of how the sensor works. A great example of this is the new hot Tesla’s right, their stock is going up big right now, Ilan is feeling very confident. They can go zero to 60. And you know, four seconds to your head whips back in the seat. You have this large sexy display in the center that does virtually anything you could imagine on the road and you’re like, Wow, my Tesla is just an amazing machine. It’s a amazing feat of human engineering. And there’s obviously been some safety concerns that have arisen from that. Like, what happens when you get pulled over completely intoxicated for a DUI and the defense that you present to the judges? Well, the car was driving. So I couldn’t have been DUI because I was in auto pilot mode.
Christian Johnson [39:24]
Of course, that usually doesn’t prevent you from getting arrested under suspicion of DUI. But it’ll be interesting to see how that argument plays out in the courtroom. So what does Tesla have to do? Well, they go introduce a sensor that you know, every couple minutes if your hands aren’t on the steering wheel, it’s got disengage autopilot, so you’re not going to be sleeping on the job. Or you’re not going to be you know, going to the wheel having the the tea swift Drink it up party of your life, which you know, Jim is thinking about for his next Cox streaming services. Saying, but But what do I have to do? I just have to prove to the sensor that I’m a loyal and safe driver. So what do I need to do? Find something with a little weight. Find something with a little moisture like would be on a human fingertip. So go to YouTube and what do you find? People have found all sorts of creative solutions for meeting the criteria, that sensor, the most creative of which I thought it was an orange. Let me shove a single orange between the top part of the steering column and the center where you would like honk the horn puts enough weight on the handle. It’s got a nice little moisture on the on the top bar. You can cruise for hours on end because you’re being a safe driver with your autopilot. So it’s like when we have put that level of sophistication and investment into these types of things that you know. You’re gonna go pay one At least 45,000 for a base model Tesla, up to like $130,000. And I’m not necessarily saying that that feature in and of itself is a deal breaker. But what happens when it’s more than just I’m fooling my car sensor do autopilot. What happens when someone doesn’t like you? They know you’re a big value target. They know you have a lot of money because you went and bought a Tesla that costs you 130,000 and they want you gone. So what do they do? You know, they figure out what your common driving patterns are. They figure out you know, you’re usually on this road between this and this time, and they stick up some sign on the road. And that sign to that AI algorithm and your Tesla Tesla autopilot looks like a beautiful sign that says to stop. Only problem is you’re driving 70 miles an hour on a highway and your car just comes to a full stop and the guy behind us driving You know, more than 70 miles an hour because he wants to get home from work. And next thing you know, you’re in a total custom bike crash. Why? Because I mutated the environment ever so slightly. If I go and look visually outside, I’m not going to see a stop sign. But I introduced something arbitrary and then environment that your Tesla said, holler down, I’ll brakes on. So what is going to be the trade off that we assess with the types of things that can lead to literally loss of human life? And how are we going to weigh that against? ease, comfort, cost, accessibility, usability, right? And I’m not talking about the 99%. Right? I’m talking about those 1% edge cases that become game stoppers, right? It’s something that every company should be asking themselves when you when promoting any type of product responsibly, that has the These types of technologies, which is like what is the worst case scenario? What is the probability associated with that worst case scenario? Like do you even have a formal quantitative method of asserting that? And then how are you going to reduce the risk? Right? Okay, the risk is x, how do you get to x minus y. And I think we’re going to find very quickly,
Christian Johnson [43:25]
we will get to a point where we’ll be able to accurately measure x, we still don’t have real world mechanisms that can be generically applied to get to a risk value of x minus y. And that will cost us and when we talk about the cyber domain, where systems will start being increasingly defended by AI over you know, signature based or whatever other method that is out there, traditional cyber security. It becomes the same thing right now, I’m not talking about the cyber physical environments that I’m discussing earlier, but I’m talking about those plain vanilla things that once looked really well locked down. Now all of a sudden start to slip back through because we’ve introduced a new technology to an old problem. And now we’ve discovered that that new technology isn’t as fully cracked out as we thought it was yet. So it’s going to be, I think, some learning pain. And I think most people will dismiss this episode as being applicable to them at the time only because it’s a frontier technology, meaning these technologies are just starting to get really prevalent really pick up in the last few years. And so that means the pain we’re going to start to feel within five to 10 years is my guess. question becomes what are those critical like?
Christian Johnson [44:47]
It’s, it’s one thing to say you can do it. It’s another thing to say doesn’t meet the criteria of general purpose. Hackers going to do this on a daily basis, which usually the criteria for that is it cheap and as easy to Do and is it reproducible? I think we definitely have reproducible. I would argue we don’t necessarily have easy to do yet, because there’s a lot of things that still have to go into your favor in order to exert that much control over environment, right? It’s one thing to be able to do it in a test lab and show that it’s reproducible in the test lab. It’s a whole nother thing to do it in a real world situation. And I don’t necessarily know that it’s cheap yet, right? It still smells to me like there’s a lot of time investment and effort needed to research that specific use case, to get to a type of outcome that you might want for what I would call full on adversarial machine learning. Now, you might have much more luck meeting the criteria of cheap and easily accessible for just I want to fool a sensor and you have maybe an unpredictable outcome that you can’t control quite fully. And I think we’ve seen that all the time. I think that predates adversarial machine learning. But when you think about the growth and the explosion of IoT devices. We have very strong data that correlates to we also have an explosion of sensors. Sometimes those IoT devices themselves are the sensor. Other times a device will have dozens of sensors. And as those continue to get packaged into more and more of everything that we do, this surface area just keeps kind of growing.
Jim Collison [46:27]
You mentioned price in the in the show notes, you’ll have a link to an article, researchers use a $5 speaker to hack an IoT device, smartphones, automobiles talk a little bit about that, because that gets to the point of what happens when the hack gets so affordable. Right, right now, some of these still still really, really costly, much like crypto, right, having to crack a crypto code is pretty expensive. But what happens when we use a $5? Speaker?
Christian Johnson [46:56]
Yeah, so the intention here was again talking about how do you incrementally gain access into influencing a system without any, like real world access, right. So for example, your iPhone could be fully locked, not going to get in, you’re not going to know the pen. It’s encrypted at rest all of these things. However, your iPhone is currently running Apple Maps or Google Maps are the maps of your choice. And you want to perturb its environment in such a way that you exert some level of influence over it. So for example, this might not have any real world outcome for you. But all we want to prove is that we can coerce the device into thinking something it’s not supposed to. And then we can keep expanding that use case right. So when we talk about this for on the cheap, almost all the devices, almost everyone today that is digitally connected in some way, is carrying an accelerometer in their pocket, right. Your Google Maps knows that you’re 10 miles over the speed limit. is watching your accelerometer in relation to the posted speed limit and telling you Hey, slow down. What this article showed, again in a controlled lab environment is that I can go buy a $5 speaker, I can put it at a very specific note and frequency, and my accelerometer will register that as movement. And so now I’ve gotten a sensor that said, I went from no movement to a little bit of movement. And so that by itself sounds pretty harmless. Sounds pretty benign, what’s the big deal? But when you get a little bit more of those things kind of coming up and up, you start to add together those little cheap things in a way that then a more complex attack vector shows itself and they they talked about manipulating the sensors on a Fitbit and similar other IoT devices and again, you know, what was reported to fit that accorded steps that didn’t actually happen. And so like anyone who reads that is going to laugh. They’re like, you’re off your rocker. This is not an issue. adversarial machine learning sounds like a joke. And too many use cases like this, you would be correct. Like, who cares that an extra two steps were recorded? Like, if you want to cheat yourself that you are athletically fit. There are many other ways to do it without this level of effort. So why bother? But that that wasn’t the point of that particular article? Right? It was just that you could get very cheap and inexpensive things to mutate and environment do something that a system wasn’t programmed to do. I think on the surface, it’s quite a common conclusion to arrive at we wouldn’t question it on its premise seems like a pretty normal digestible thing. But then when that’s on a backdrop of what we just talked about, with the whole intro to adversarial machine learning and the spectrum of solve problems it starts to look a bit dicey or a whole lot quicker
Jim Collison [50:04]
well and I think there’s a great example in with all the work I’ve done and since the last time we were on Christian you know had somebody go through my car so I put in some ring cameras guy got a decent home automation stuff, kind of lock down my security a bunch more. But you know I got is I was in there I got digging through, like, Hey, I can set my phone that when I get a certain distance, a certain radius from my house to do certain things, or the opposite is true, like so when I pull in the driveway. Maybe I wanted to do some things that are helpful to me to unlock things and to set things up is someone through it with a fight or speaker could spoof my phone. This isn’t true, by the way, but let’s just say they could. I’m not that good. I don’t have automatic locks. So this wouldn’t work in my house. But see, I did it naked spoof it to trigger that. That geo instance, that geo fence to Break the geo fence and all of a sudden now, I’m somewhere else but my home is now opened itself up, so to speak. And there’s somebody there waiting for that to happen. I don’t think that’s too far. I don’t you know, that’s not a stretch, like, all I have to do is spoof spoof the geo fence, aspect of that. And, and boom, it’s it’s taken care of. So again, security convenience. That’s great. I in my case, we had this conversation where I almost put a smart lock on the door between the garage and the house. And I was like, yeah, this would be great, because then I can open it on my phone before I come in so that I don’t because my hands are full. That’s a convenience thing. Right. But then again, spoofed that opens up the garage door that opens up the door in between, and so you’re right, a very inexpensive solution to to create disruption on on a convenience, right? Yeah,
Christian Johnson [51:57]
exactly. That’s spot
Jim Collison [52:00]
Right on, I don’t think we’re that far away. Right now we’re in until you put all that I didn’t think about that till you started putting all those things together and I’m kind of like, oh, wait a minute geo fencing, if I can, if I can. Now it’s where we may be far away from being able to, to set a tone on that changes the geo fence, right, that’s GPS and some other things. But again, that that’s kind of what I think of like, hey, if it makes me move, and they can do a certain number of things. Right, do I really want to trust those? Those pieces? Same thing for cameras, right? I’ve got cameras available and watching but are there things that can be done with those cameras? You know, it’s funny, we had movies for years in the 80s and 90s, that would do all this. We couldn’t do it yet. Like, you know, they would hack in, and then change the picture thinking like Ocean’s 11 and notions swell, right? They changed the picture to fool the security guard and stuff like that. Now, that’s actually those all those things. possible, tragically, you don’t even need that, you know, you know, you need to be there to splice into the, we thought it was cool. Now we’re like, oh, gosh, man, that’s that’s some serious stuff. So yeah, I just think that’s maybe a little more real kind of real than we think. I think of, you know, you say town, which is using AI now to start identifying what it’s seeing in the picture. So they also have, they also have license plate identification and facial. They’re doing some stuff with facial recognition. Why wouldn’t I say hey, recognize my face? Unlock the front door. Yeah. And then could that How could that be spoofed or trained differently? How long does it take for me to figure out a good picture of Jim Collison space that has the right features that I just hold up the door to?
Christian Johnson [53:47]
Which they swear can never happen? Right But yeah, what happens if I then 3d print your face when How would I do that? Huh? Well, I get could get pictures of you had on from the side. From the other side, like can I recreate a 3d CAD model Jim’s face and present like your head on a stick, right?
Jim Collison [54:10]
Yeah, not like in the Lord of the Rings kannaway
Christian Johnson [54:14]
Hunger Games Game of Thrones, I guess.
Jim Collison [54:16]
Yikes, dude. Yeah, that got dark really, really fast. What else Christian, anything else as we as we think about? No,
Christian Johnson [54:25]
that’s the brush. I wanted to paint. I wanted to do some storytelling telling. We’ve talked about some of the stories. I think
Jim Collison [54:32]
one one more thing it says to cripple AI. This is another article of wired to cripple Yeah, hackers are turning data against itself. Did you mention that in the
Christian Johnson [54:41]
Yeah, so not specifically, again, wired with the rescue here. They’re just
Jim Collison [54:45]
all of us. I like it.
Christian Johnson [54:47]
I know. It’s unusual to because I have not cited them in quite a bit in some of the stuff we’re doing. But you know, they’re basically just calling out that. They called out several the examples that I together through other articles, right so they talk about a picture of a turtle is seen as a rifle. self driving car blows past a stop sign because a carefully crafted sticker bamboozled it’s computer vision. And I glass frame confuses facial recognition tech and I’m thinking a random dude is actress. Mila Oh, this is embarrassing. Milla Jovovich close enough, I guess she’s famous because I couldn’t tell you things. So that shows me where my my Hollywood knowledge is. But it really just kind of stitches together the spectrum of attacks that the generational machine learning models that we have available to date are are going to have to deal with and they’re currently just completely unprepared to deal with it. And they kind of go through a lot of the specific lab examples that we’ve discussed about how an environment gets mutated. what the risks are. What’s interesting is that this article says specifically quote, that sir, there’s no reason to panic simply simply slapping these stickers on a stop sign won’t crash a self driving car. He explains that self driving cars use multiple sensors and algorithms and don’t make decisions on any single machine learning models. So although are working full of single ml model, it doesn’t imply that fooling It is enough to cause physical harm. Well, to me that says, I’ve proved it. Now I just have to figure out how to make it probable that it happens. Yeah. So not a great defense against what we’ve discussed.
Jim Collison [56:42]
I think I think we have some interesting times still ahead for this, you know, everybody, you know, we joke all the time about the robots taking over and you know, Skynet and those things and then you hear about some of these things, you know? Well, I don’t know if we’re as smart as we think we are or that that That AI is as smart as we think it is. Certainly, it has made huge. I mean, we are seeing some really amazing, you know, including, you know, in the Amazon space having a store that just is able to kind of track your purchases at the store. sure there’s ways to fool that, right? I’m sure there is. But it’s it apparently, and a lot of testers seem to be working pretty well. So, you know, we’re beginning to make progress in those areas. I don’t think I think sometimes we paint this world of that’s happening like one day we wake up, and it’s different. And I actually think it’s a slow it’s more like the frog in the boiling water that we just get the temperatures keeps getting turned up over time. I also think it’s going to take a lot longer for any of that stuff to happen then we think all right, but it will have significant impact on us by the time it gets here. They will have some significant impact to us and on us. In greater ways than we think. So it’d be interesting to see I’m not sure. You know, I’m 50. I got 30 years 30 good years left on the planet. Let’s just be you know, if I take care of myself. Well, I see some of that if you if I were to ask you that question, Christian next 30 years. I’m not asking you to make a prediction on any single technology, but from a significant standpoint, that significance AI to be the real stuff that we’re talking about. Do I think I see it you think I get to see it in the next 30 years? I’m bullish on it. Alright, I’m bullish on I don’t know. I guess we’ll find out we’ll be in will have done 10 podcasts in 30 years. So
Christian Johnson [58:44]
they’ll be able to start publishing my show for me, I won’t have to do anything.
Jim Collison [58:47]
Maybe by maybe by 90. Maybe by my 90th birthday. We’ll have a show number 90 out for folks. I do. Actually this is an area to be You know, to bring this back home. Ai, you know, we’re starting to see a lot of language recognition now, right when we think about facial and language and but what I do in podcasting is this is super important. And you know, we’ve talked about deep fakes. I think we’ve talked about that here on this show in the past. Yeah, being able to create deep fake, D fake on both of those. But I think there’s some good things that we can By the way, this neck this wind, this current Windows Update that’s going on right now. brutal. It’s tough, brutal. Yeah.
Christian Johnson [59:33]
I did not even mention on this show, which I probably would have if we didn’t do such a fanatic show today, the Microsoft crypto API debacle just yondu Yaga.
Jim Collison [59:48]
This is making me reboot twice, like So Ken brought up an interesting point that this is the first update for folks that were foot pushed over to Windows 10 from Windows seven. Like this is their first First update, welcome to Windows 10 By the way, a to reboot, pretty massive update that that is taking two reboots that I know of, I just started the second one on this I thought, well, you know, we’re talking all I’ll start updating this to see how see kind of how, how bad this is. It’s that okay, in the, in the, in the span of things that ruin anything yet, but it’s taken a while to get this over pretty fast machine and it’s, it’s taking a good chunk of the show to get this thing at least a half an hour. Um, so I kind of wonder, you know, what will I see it and I’m not as bullish as you are, I actually think well, will will. It’ll be a lot slower than we think. I’m more excited. I’d rather I’d rather a support time is gonna be because of my age into the bio into the bio mechanical or biotech world where we actually like that ourselves. I mean, that’s happening, right?
Christian Johnson [1:01:01]
Yeah. Oh, yeah.
Jim Collison [1:01:03]
Yeah, I think it’s slower though. I think it’s slower and it’s harder to do then facial recognition or Oh, sure everybody can see the back to the voice and facial there. There are things in archives, like I want to go back and be able to find things I said, when I said them and have it take me right to that spot on demand. Yeah, basically. Yeah. And we’re kind of getting there. I’ve got I’ve got a team working on that here at a university and using some of the Amazon services that are available to them in we’re making some progress. It’s not great right now, but we’re in the early early, early phases and stages of it. I think I’ve seen some stuff from both Microsoft on this. Microsoft’s got a good jump on it actually sits sounds weird, but they actually have a really good jump on this and are doing some really, really interesting things of being able to take you to the spot. I can actually put like, hey, I want these words. And I want you to cut that out make a video and it just does it and you’re like, Oh, that’s pretty cool. So some some great stuff. Christian thanks again for for jumping in here. Always good to see you on cyber cyber frontiers couple reminders for individuals if you’re looking for high speed hosting both data and web Christian does both kind of optimized for podcasters great service. If you’re thinking about doing that, or you need help with anything along those lines, Maple Grove partners com they they power the lightning fast. We are the platform for podcasters Yeah, it’s pretty great. Still spooky, how like, when I go to update plugins, they’re like, boom, done, like a click a boom, boom, done.
Christian Johnson [1:02:43]
I now measure directly the average guys response time latencies. And then the, you know, what I will call the eastern seaboard of the United States, the response times are averaging 420 milliseconds right now. So that’s for anything end like it’s on your screen. So you know, take it, take it.
Jim Collison [1:03:04]
Yeah. No, super great Maple Grove. partners.com. Just reach out to Christian he’ll get you set up as well. If you have questions for us, send us an email. Jim at the average guy.tv the updates are finally done. Christian at the average guy.tv find me on Twitter at Jay Collison You can find him at board whisper. And one more thing since we only do this every two months. If you’re thinking about it and you liked it, share it. Just share it with somebody. I think this is a unique enough kind of show from a podcast perspective that for the person in your life, that’s kind of that cyber security nerd. So, get out there. Thanks for listening. We’ll be back next time we’ll say goodbye
Transcribed by https://otter.ai
Contact Christian: christian@theaverageguy.tv
Contact the show at jim@theaverageguy.tv
Music courtesy of Ryan King. Check out the Die Hard Cafe band and other original works at:
http://diehardcafe.bandcamp.com/ / http://cokehabitgo.tumblr.com/tagged/my-music
http://theaverageguy.tv is powered by Maplegrove Partners web hosting. Get secure, reliable, high-speed hosting from people you know and trust. For more information visit http://maplegrovepartners.com