Tying everything together – Solving a Machine Learning problem in the Cloud (Part 4of 4)

>>Welcome to Part 4 of what I
did at the Toronto AI user group. The exciting conclusion
awaits. Make sure you tune in. [MUSIC]>>So now what I’m going to
do is I’m actually going to make a model. By the way, I talked about all
these things, they’re pretty cool. So I’m going to do a
more complex example. Obviously, machine-learning is wonderful and deep
learning is awesome. So I’m going to apply
this to a love of mine, which is, I’m going to
make a convolutional to neural network to detect
whether it’s tacos or burritos. Which is an important
data science problem. Because there is some general world confusion and we need to fix it. Okay, so I did this thing here, so I can stop that. So what I’m going to do is, I went to the wrong screen.
This is the right one. Okay, so what I’m going to
do is I’m going to show you a three-step process with
TensorFlow to build this model. So what I’m going to do
is I’m going to show you that it already ran. So let’s go to the experiments here and I’ll give
it a second to load. Here’s the latest run of my
seer experiment because it sees Mexican food, which
is really important. What you’ll see in a
second as it loads up is you’ll see the three-step
pipeline that we do. The pipeline is basically
prepare the images, do the training, and then register
the model. Yeah, there it is. So you’re probably wondering, well, what preparation are
you doing for images? Well, these things
are fairly finicky, like if someone puts an Alpha channel and one of your pictures, boom. Because I planned for three RGB
and sometimes in some models they switch the channels or
they put the channel’s first and so you have to pre-process
this stuff a little bit. So what I’ve done, as you can see in my prep, I’m basically using TensorFlow. I’m basically looping through all the files and
pushing out TF records. You can see here in the
data when I do the prep, it’s basically a bunch of TF record. Anyone use TF records
before, by the way? Let me explain what
TF records are then. TF records is just basically a way to store tensor data
in an intelligent way. Do you want to add anything?
Because maybe you know more about tensor TF
records than I do.>>Oh, good. I’m always worried, I actually know a lot more
people know more than me. So if you have something
to add, please add it. So basically it’s taking
all of these images, resizing them to the
appropriate size, making sure they have
the right channel ordering and then stuffing
them into this TF record. The reason why is because it
makes it easier to train, and I’ll show you shortly how. Once this data
preparation step stops, and by the way, in
Azure machine learning, when you do this stuff, we actually store that all
of these steps will run. This data source is the parameter, so I can change where
it’s sourcing the data from at anytime and it will
load different images. You can see here, you can look at all the details
of everything that happened. Here’s the snapshot
of the code that ran. Here are the logs that it output. This is literally what it did. So notice that now because
I’m running it in the Cloud, you have visibility and traceability
to what actually happened. Because usually you just
run it on your machine, which is nice, but it doesn’t
play well with others. Now, as you’re noticing that data science practice is
involving now bigger teams. How do you interact with
them appropriately? Well, if Sally laughed because she’s the best data
scientists you have, and then Bill comes in. Bill usually starts from the first, as a data scientist
I’d be like, “Okay, let me try logistic
regression, decision trees. If it doesn’t work, then let’s
go with something else.” But if Sally already did that, and you can have a record of
the experiments that happened. Bill can start at the right spot. Okay, and that’s what
this is all about. When you’re doing model training. For example, in this case,
when it loads a detail, there’s a way in that,
let me get to the model because I didn’t show you the
actual training algorithm. So this is the prep step. By the way, if you’re wondering like, I want to access to this. If you go to GitHub.com and
then go to Cloud-scale ML. It’s under this thing called seer. So everything is in here. You can totally do all of this stuff. If you want, I can run it
on my machine right now. I can make it run on my machine. Let me do that so that
you can see LS run, oh sorry, Conda activate TF run. That’s not the one I want, which one is it? Pipeline, pipeline.command. It’s going to delete
all those folders and then we’re going to
keep our hands off. But here’s the thing, I can run this all in probably
under five minutes. There, it’s building the TF records. Taking burritos and Tacos and making the TF records and this is going to take a while, well not too long. It’s probably like 600 images. It’s not that much and
I’ll let it do that. So you can see that I
am running it locally. Right no you’re not
going to hear the GPU, but you will in a second. Here we go. We’re getting into it. Oh, did you see this beauty? Look at this beautifulness.
That’s cool, right? So yeah, you can run it locally, but who’s going to know
I ran this locally? How am I going to share this work
with other people on my team. So what I’m saying is, as I said before, since all of this stuff, basically when it comes to Azure, so you can use whatever you want. You can do the logging directly
from here on your local machine and log to Azure machine learning so that you can have stuff
that looks like this. I’ll give it a second to load up. But basically, you
don’t have to run it. I don’t care where you run it. Use just the logging
mechanism of the experiments, and you can log all
of the metrics there. So for example, I actually
ran this one in the Cloud, but let’s just say you ran it
locally and you’d logged out. In this case, I don’t have it logging because it’s running
in an offline mode. But basically now you
can track anything. For example, accuracy, loss. This is what I’m actually logging and you can log it out as
a chart or as a table. I don’t know why things are
taking so long to load up. But basically you have a
chart of the accuracy of this model and the loss function. So just do the logging here or
maybe he saved the models there. I don’t care what you do. As long as I’m solving
your problem, I’m happy. So if you have fantastically
good Machines, yeah, the way that you log
these things, hold on, looks like there’s no data in there
yet because I actually re-ran it unfortunately when
I did the other one. So let me show you how this
actually works for training. You’re probably
wondering how the logs happen and you can take this file. So basically in this
case, when I train, and this is why I really like TF records is because now you
can do something like this. This is the training dataset. What you can do is you can create this TF record dataset
with this train array.>>Basically it’s an
array of the name of the files of the TF records. Now, you can do some
crazy stuff like, here’s the parse record to pull
out the vectors and matrices, here’s the number of
parallel calls we do, and i t creates this pipeline of data that you can use
when you’re training. This is actually really cool. Usually, I skip over this
bit because the people I sometimes talk to don’t
understand this bit, but you do. This is why I’m starting to like
the front part of TensorFlow. You can see here that
I am model stealing, otherwise known as transfer learning. I’m taking mobile net
and I’m just putting it into a sequential model
with an output on a softmax, which is cool. Then you can see here that
I’m optimizing this stuff. Now, here is the bit
that’s interesting, that gets to your point,
how I logged this stuff. Notice that I have a
couple of callback things. One is called the model checkpoints, so that it just outputs
models whenever it has one. But here’s one that I created
called an AML callback. Notice that it’s called log AML. Notice that on the callbacks, when I do the fit, I’m passing log AML in. What does this do?
I’m glad you asked. It turns out, and this is code
that I’ve never shown us. I’m so happy that I get to show it to people with nobody looks at this. Basically, an AML callback
is an implementation of a callback function in Keras that
says anytime something is run, log it to the AML run. So if I put that callback
into my Keras fit, it’s basically going to shoot metrics up to Azure machine
learning like TensorBoard. Yeah, you can make it so that you can create a new run and run it locally? Yeah, that’s a good question.
So you can shoot that up. If it doesn’t work, let me know
because they told me it does.>>Run it on your local
machine, but log out. Now, TensorBoard is awesome too. If you’re doing Tensorboard, you can actually save those
into the output folder of your run on the actual
Cloud-based one, and you can call TensorBoard
on the run in the Cloud, or you can just do these metrics. Now, the thing is again,
when I say I’m serious, this is a buffet. You take whatever you want. If you are just going
to have the salad because that’s what you want, great. If you already have
powerful machines that do the thing, I’m a fan. But you can use other aspects of
this stuff that may be helpful. If it’s not helpful,
then I want to know why. Great. Tell me why, and
I mean that honestly. It’s a great question. This is
again, you can just take this. Notice that the AML
callbacks are using this. They’re just inheriting from
the callback Keras function, and these are just the
things that I overwrite, like when the batch ends, I’m going to log all the metrics. So I’m using whatever the TensorFlow dataset
does with a TF record set, whatever it does, it does. I’m not doing anything
more than that. So I don’t know how it does it. All I know is that when I
create a TensorFlow dataset, I’m using TF records. What it does with memory it? Well, I guess I do know a little bit. You can notice that, when
you’re looking at TF records, it’s pretty smart about
how many it’s pulling out. So it loads it up from memory, and then I have this map
function which parses the TF record into a batch output. Right now, I’m doing
five parallel calls. So it’s basically has a memory pipeline of things that
it’s queuing up for it to do, and then it shuffles it. If the buffer size is far as 10,000, that means it needs to load up
10,000 records to do that shuffle. Then I create the batch size, and then I buffer prefetched
by a buffer size of five. So depending on how you order these, it will do the thing. If you look at the Tensor
Flow documentation, it’ll tell about this. I’m not doing it, I’m just letting the framework is now known as you’re getting a memory
management and stuff. So that’s how it does it. This is how I do the thing. Then the last thing that I do, and this may be a part of
your workflow is as well, where the model register. The last thing, I’m doing
is I’m basically finding the best model that it output, and I’m just saving it to
Azure machine learning. If you only want to
do this bit, great. Now, you can version your models. Now, what you’re
pricing is a model of single file or is it
a bunch of files? The answer is yes, do
whatever you want. For example, PyTorch YOLO happens to be in its implementation that I’ve seen that’s like four or five files. So you put that all in there.
So the question is, well, how do you use this when you’re
actually going to run it? In Azure machine learning, we have the end points that you can
create all you need for this is a Python file that
has an in it and a run. Basically, the raw data comes in. You load up the model in
the init so that it can store it in the global model. Then you just run it. I could just run this locally
and you can see it will do it, but you can put this in
as a service as well. I’m running out of time. So I made this work and you can download the
code and try it yourself. There’s a dataset as well. You folks know that
convolutional neural networks, especially with mobile net, will do pretty good at this
in a pretty short order. So I showed you the pipeline. So let’s do a little
review and then I want to answer any questions. I want the most hostile
of questions because my purpose here isn’t
to sell you something, it’s to solve problems, and insofar as our stuff
doesn’t solve problems, I’m happy to learn about this stuff. Because then I can
write it down and be like, the people in the group. As I tell all the time, you need
to fix this, this and this. But if I say you need to fix
this and so and so from TD, whatever to said at the time that they needed
to, they’re like, oh. Whatever you say has 10
times the weight that I say. David? So I want to know. So just a little
review of what we did. We talked about foundations of
linear models and neural networks. We did convolutional neural networks, looked at Azure Machine Learning, we looked at
machine-learning pipelines, and we did a little bit more. If you go to that link, it’ll ask you to sign up,
something you don’t have to. You can just get some
links that will show you a little bit of the docks
of Azure machine learning, and then this other
thing called MLOps, which is DevOps for machine learning. I forgot to add this. I don’t
know why I didn’t have this. If you’ve ever used something like for continuous integration
and continuous deployment, we have something called
Azure DevOps that does this. Notice, that in our particular case, we can have pipelines
that take your code. Let me go back , I’m too
fast. No, I’m not too fast. We have things that when
you check this code in, will automatically run a new
Azure machine learning pipeline. Then we have CD things, which is a continuous delivery, that anytime a new
model is checked in to Azure machine learning
service will kick off an integration build for you. The cool thing about these
is you can actually add steps that check if your model actually works
or if your model is racist, or if your model has ethical issues, and you can check those
before you put it out. There’s another thing we have. It’s called, oh, shoot, Marunouchi stuff. What’s it called? Yeah, we have
interpretability toolkit, which you can run on your models, which will run and say, if it’s a black box
model, it will tell you. If you change these inputs, this is what will happen. It will also tell you which features
affect your model the most. Some people think, well, if I take out gender
or race or whatever, then my model is ethical. But it turns out that there
can be combinations of features that can produce
that same effect. So you’re able to do that, and
so you’ll be able to see that. For example, I did a video on this and I can get a link
to you-all if you want, where Marunouchi shows this, and she showed, if you change gender, does it change pay scale? It’s interesting. Actually it was job recitative is where they would
leave or come back for a job. So we have some stuff that will automatically do that for
you as well in the Cloud. But again, like I said, this is a buffet. It’s not a seven course meal where you have sit and you
feel awkward and you’re like, oh, I just got a call.
No, I didn’t hear it. Well, I did but, it’s not like that. It’s whatever you want to
use that’s helpful to you. We want to be able to
help you with that. [MUSIC]


Leave a Reply

Your email address will not be published. Required fields are marked *