Neural Networks Demystified [Part 1: Data and Architecture]

let’s say you want to predict some
output value y given some input value x. For example maybe you want to predict your score on test
based on how many hours you sleep and how many hours you study the night before. To use a machine
learning approach we first need some data. Let’s say for the
last three tests you recorded your number of hours studying, you number of hours sleeping, and your score on the test. We’ll use the
programming language Python to store data in two-dimensional “numpy”
arrays. Now that we have some data we’re going to use it to train a model to
predict how well you’ll do on your next test, based on how many hours you sleep and
how many hours you study. This is called a supervised regression problem. It’s supervised because our examples
have inputs and outputs. It’s a regression problem
because we’re predicting your test score, which is a continuous output. If, we
were predicting your letter grade this would be called a classification
problem and not a regression problem. There are an overwhelming number of
models within machine learning here we’re going to use a particularly
interesting one called an artificial neural network
these guys are loosely based on how the neurons in your brain work and had been particularly successful
recently at solving really big and really hard problems before we throw our data into the model we need to account for the differences in the units of our data. Both of our inputs are on hours, but our output is a test score, scaled between 0
and 100. Neural networks are smart, but not smart
enough to guess the units of our data. It’s kinda like asking our model to
compare apples to oranges when most learning models really only
want to compare apples to apples. The solution is to scale our data, thus
our model will only see standardized units. Here we’re going to take advantage of
the fact that all our data is positive and simply divide by the maximum value
for each variable effectively scaling a result between 0 and 1. Now we can build our neural net. We know
our network must have two inputs, and one output because these are the
dimensions of our data. We’ll call or output layer y hat,
because it’s an estimate of y, but not the same as y. Any layer
between our input and output layers is called a hidden layer recently researchers have built networks
with many many many hidden layers these are known as deep belief networks
giving rise to the term deep learning here going to use one hidden layer with
three hidden units but if we wanted to build a deep
neural network we would just stack a bunch of these layers together. In neural net visuals, circles represent
neurons and lines represent synapses Synapses have a really simple job, they take a value from their input, multiply it by a specific weight and output the result. Neurons are a
little more complicated their job is to add together the output
from other synapses and apply an activation function. Certain
activation functions allow neural nets to model complex nonlinear patterns that simpler models
may miss. For our neural net we’ll use sigmoid activation functions. Next we’ll build out our neural net in Python


  1. I love your video but the background music is very distracting. At the very least it's too loud. Just my opinion. Thanks for reading.

  2. Great video, but just a side-note: Scaling the target variables (y) is almost always not necessary. Also, scaling the features is only necessary if their values vary a lot between themselves. Since here both of them are in hours (and you can expect them to be relatively close to each other, not orders of magnitude different for instance) then it is also not necessary to scale them.

  3. Coming back to this video a year later, I realize that this is super unhelpful to those learning how to program neural nets. I don't recommend anybody be taught to visualize neural nets as layered sets of neurons. For programming, and for logical understanding it is better to use matrix multiplication and an activation function. The graph is honestly just a clever way of explaining why this mathematical setup has any relation to a brain. I wish I had known this sooner, and it would have made the calculus a lot more bearable.

  4. Hi,
    It is one of the best video on ANN. I did rewind many times to understand the code.
    Do you have finalized program where I can get the result?
     I am little confused as how weights are corrected during training . I am correct in that – at beginning a random value is assigned to all weights, on each training sample the error is determined, if negative then add small delta to the current beta (to all weights?)to see which way to go to minimize the error, stop when error becomes very small -please correct or affirm

  5. I recommend a great but paid tutorial about Neural Networks, How to Code and Train.
    You can see a quick tutorial overview here

  6. Very awesome explaination , please make more videos on different algorithms like SVM, Logistics regression and also for Unsupervised learning algorithms.

  7. What are the input weights to get better prediction ?? Plz helpp project tomorrow !! Just need the first weights before hidden layer !!. :/

  8. This series of videos about Neural Networks can be complement well by the amazing and clear lectures from Dr Yaser Abu-Mostafa (CalTech). Link to the course's webpage:

  9. i am a newbie so bear with me. i wanted to ask why choose this specific activation function when there are many ? how can you know that this function should be applied here? Are there cases of what function to choose when ?

  10. I wished if it was without that loud background music… or at least add subtitle, so someone who really want to learn, is able to mute the video and read the subtitle

  11. Why do we need to scale data? I mean, wouldn't the weights be bigger to compensate how hours are much smaller than the test score? Even if we were using different data as imput, like hours of sleep and, say, the student's IQ. Wouldnt the weights predicted by the model be bigger for hours of sleep (in order to make them go from 0-10 to 0 – 100) and smaller for the IQ (in order to make it go from, idk, 80-180 to 0-100)? I understand that scaling the data makes the weights feel more intuitive when you look at which is bigger and which isn't, but the machine doesn't need to understand that, does it? All it has to do is predict the output.

  12. I am a full time dep learning professional by now. This was the first video I have ever watched about neural nets and until now it is one of the best videos serieses I know. Keep up the amazing work!

  13. 2 input values, but why 3 neurons in the hidden layer? Why not 2? Why not 4? I don't get it. How to determine neurons count or hiddent layers count?

  14. The number of pretentious, entitled comments is astounding…whether or not I like the music is irrelevant. Learn to cope…this video simplified a complex topic well and it was much appreciated!

  15. I'm here late but honestly the music isn't much of a problem. Perhaps just needs to be quieter compared to the voice

  16. السلام عليكم ، اشكرك كثيرا على هذه المعلومات القيمة، و ارجوا ان تتم اضافة للترجمة باللغة العربية . تحياتي

  17. Great explanation. I tried it and works quite well. But if the range of values in outcome variable is high, for example house prices, it does not converge well. Probably simple standardization by maximum value is not enough, or probably stochastic gradient descent would work. Any suggestions around kind of feature scaling to be done in such cases??

  18. This is absolutely awesome!!! You have provided – IMHO – the best introduction to and explanation of neural networks EVER! Thank you so much for this. As advertised, you really have demystified this for me. My only regret is that I wasted so much time on other videos before literally stumbling across this. Thanks again.

Leave a Reply

Your email address will not be published. Required fields are marked *