5 Things You Should Carry To Gym

I have been going to the gym since 3 months now. I have seen quite some progress in my endurance and in my muscle building as well. My confidence has increased and my posture has also improved. Here…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




An intuitive and philosophical understanding of machine learning

Well judging by you being here and the name of the title, I think that you and me and you are quite alike in that we’ve both have had trouble finding free online resources for complete beginners that help to give us a very deep, comprehensive and explorative understanding of Machine Learning for beginners.

But why is this?

Well all of these resources seem to assume quite a developed knowledge of high school maths and coding where they completely negate the wider discussions to do with machine learning, only focusing on the technical aspects.

So I’m going to do the opposite. I’m going to assume you have a very basic understanding of how to code as well as high school maths and I’ll be using lots of illustrations and real coded examples to comprehensively cover wider issues in the field so that you can get the intuitive understanding you’re looking for.

Well let’s look at what it generally means to learn, Google tells us that learning is, “The acquisition of knowledge or skills through study, experience, or being taught”.

So I guess the subject name — “Machine Learning” implies that computers are capable of performing such a process meaning that given some data and an algorithm on how to study this data, a machine should be able to acquire a posteriori knowledge (knowledge proceeding from observations or experiences) by forming models with predictive capability to do with the subject area of the data, thus creating an artificial intelligence.

I like our definition: “being able to form models of predictive capability with useful data” — let’s roll with it.

If you don’t get what that means yet, don’t worry we’ll explain it in detail now:

When you were just a child, you had already formed your own vague (but hugely effective) model of gravity and the data you used to do this came from everyday life, Observing the leaves fall from trees, watching rain descend from the clouds along with infinite other examples. You didn’t necessarily know any of the technicalities to how gravity worked but your vague mental predictive model would at least grant you enough knowledge to mentally map out that it’s an invisible pulling force accelerating objects towards it.

Then with time and a greater variety of data however, you would then later be able to build upon this model to form a much more complex and predictive one, to be able to recognise the parabolic trajectory of projectiles (the arc shape that objects follow when projected up at an angle in the air).

Parabolic trajectory of a football

This model actually has such predictive capability in-fact as to allow us to be able to predict where ball P will be in space given a certain amount of time X so that you can then catch it in a game of football as a goalkeeper with great precision.

And to do all this, the extremely basic version of the learning algorithm we are following were defined by Natural selection and Evolution. However, being capable of higher order reasoning, humans are capable of consciously exploring and scrutinising these models of reality through reflection which is why we can very precisely define our models and improve our learning algorithm initially defined by evolution.

E.g. A mathematical description of the parabolic model above says that:

Can you see it? With the mathematical model above, in a vacuum, we can (to perfect accuracy) predict an output y given an input x by finding:

Don’t worry, you’re not meant to know what any of this means yet — it’s just to illustrate the fact that with mathematics, humans can concisely, clearly and to perfection (given the right conditions) define models with predictive capability.

The mathematical model above is called a quadratic function and we’ll be looking much closer at some mathematical models like this later which leads us to our next point.

Models can be malformed/incorrect with little to no predictive capability if certain conditions aren’t met during an agent’s (“A person or thing that takes an active role or produces a specified effect’s”) learning process.

A challenge for you.

Given that learning can be defined as, “being able to form models of predictive capability with useful data” , I want you to take a few moments to think about what can cause this process to give us a bad model.

Got it?

The answer is bad data and/or a bad learning algorithm.

Operant conditioning on Pigeons

It’s basic premise is that rewards and punishments bring about changes in an agent’s behaviour as they find correlations between certain actions with respect to those actions’ ability to bring about particular consequences — desired or undesired. So after a few epochs (period’s of time) of training the Pigeons, the situation was then adapted so that food would still dispense in the apparatus but after a random period of time, essentially meaning that there were no patterns to be found with key pecking with respect to food dispensation.

So having said this, B. F. Skinner noticed something quite strange… The Pigeons began exhibiting odd behaviour. One was flapping its wings erratically, another was moving it’s head in circular motions and another’s behaviour, I quote:

So although the Pigeons’ behaviour no longer had an objective causation link with the dispensing of food, the Pigeons still did it because sometime’s, they’d perform a particular action and food would dispense by chance which only reinforced their fallacious thinking. So the Pigeon developed a model of reality that no longer had predictive capability so they were wasting their time performing all of these odd behaviours.

So what happened here?

It’s called superstition. Finding a pattern where there is none despite explanations for that pattern contradicting basic universal laws and logical axioms (propositions assumed to be true) requiring the suspension of basic laws of physics for the link to make sense.

The Pigeons’ learning algorithm is very simple and narrow compared to a human, they’re incapable of human like higher order reasoning. This makes them very bad at scrutinising and adapting their models. Because of this, they cannot then analyse these apparent links with basic logical axioms in mind which is why they succumbed to the logical fallacy of “causation by correlation” which is for them to believe that a correlation implies a causation link between the analysed features.

So while their model of reality is perfect for the predicting outcomes in a situation where the pecking of keys did in-fact link to the dispensation of food, the model suddenly lost all predictive capability due to nuances of the context of the environment upon which the learning was which meant that said link did not exist anymore and all of this applies to today’s machine learning models.

And here we’ve found one of the main limiting factors in machine learning.

It yields wonderful performance when it only has one narrow task to perform but because of a machine’s lack of a higher order — general intelligence, a machine learning algorithm can very easily become utterly useless when context changes, so the machine learning algorithm can become superstitious in a way as their’s no AI yet intelligent enough to understand context and philosophical axioms.

This is why bias data (an unrepresentative aggregate of data with respect to one or more features) leads to bias predictions and a large majority of models only have useful predictive capability for an extremely narrowly defined task with a human picking all the “relevant” datasets beforehand. There are concern of the potentially huge impacts that this can have on wider society who perceive artificial intelligence to be an isolated black box who’s algorithm is completely impartial to the phycological baggage of human kind:

Remember earlier when I touched upon deep learning? Well we can think of deep learning as a particular implementation of machine learning which better handles the nuances in our environment through the use of 1 or more hidden layers:

Neural Network

This is called a neural network and its topology is loosely inspired by the how real animal brains work with an example here:

Biological Neuron

And this translation from a biological to a digital neuron is illustrated below.

In a short while, I’ll get you to code this artificial neuron with me so that it can perform a task for us called linear regression but before then, I’d like to give you a high level overview of what’s going to happen.

So when inputs go through the:

Our network will output a theorised output which we’ll call “y^” which is then compared to the real output “y” where the difference between them, (y^- y ) is our loss (aka error) as it shows how far off (or the difference between) how our model’s predictions are to the real outcomes of reality, it shows our model’s predictive capability.

But this error isn’t all a bad thing. Because of this it we actually have an indicator as to how well our model is doing — it’s a performance metric which we can now optimise our model against. So this combined with our humans’ amazing invention of mathematics means we can actually then go backwards in the neural network, tweaking our weights to see the emphasising of or phasing out of particular signals increasing or decreasing our error and aligning our machine learning model’s predictions with reality.

And we do this multiple times. We’ll keep going forwards and back in the neural net again and again an appropriate number of times which will minimise the loss as much as possible within the most reasonable time and this is called gradient descent via back-propagation.

Gradient descent

This graph serves to visualise what I just said and you might notice that the y axis is labelled “Cost” which is just another word for error or loss in this case which we earlier defined as (y^- y).

So when we get into calculus later, you’ll learn that we’ll be able to get a partial derivative (gradient/slope at a point on a curve) of the z axis (slope) or the x axis (intercept) shown in the illustration with respect to loss.

In plain English all this means is that we’ll be able to isolate specific features, e.g. slope or intercept and then be able to work out a gradient (steepness) of our isolated feature with respect to loss.

So what actually is a gradient? It’s a vector meaning that as well as having magnitude (a gradient can get bigger or smaller), a gradient also has direction (a gradient can be directed in a particular direction — it can be a positive or negative slope). So when we get a slope of a particular point in the graph, this slope can actually point us in the direction we want the gradient decent runner to go so that we find the global minima (the point of a graph of the smallest gradient where the cost is at its lowest).

So we know at a high level how gradient decent with back-propagation works but now for a deeper mathematical understanding…

Gradient descent graph again

A quick note, “∆” means “change in”.

So let’s first make a few observations of this graph:

A Computational graph

So what the above is showing is a computational graph.

This is just a graph showing the order of operations applied to the signals which are being transformed by these operations.

Now we have a somewhat basic understanding of how gradient decent via back-propagation works and in the next part we’ll jump straight into code, implementing a lot of the theory discussed in this part of the series.

So here we have it — A (somewhat erratically structured) high level overview of what machine learning is and just a taste of the most popular ways it’s being implemented today.

So the key things to have taken away is that the way machine learning is generally implemented today is really just mathematical optimisation in trying to optimise a model by tweaking the weights of particular parameters until your model is an accurate one of reality to have good predicative capability.

In the next section part 2 (In the works…), we’re going to actually going to code our first simple machine learning algorithm and then reflect upon the process — exploring how similar techniques are used in the real world to execute machine learning.

Can’t wait to see you there, in the meantime — if anything above especially interested you, there will be some sources for additional learning below.

Add a comment

Related posts:

you and love

When the world is busy to define love I try to understand you When the world talks about love I listen to you When they care about love I care about you For while their loves last forever We only…

New Oscar Award Categories

In a bid to better reflect the rapidly evolving film industry, The Academy of Motion Picture Arts and Sciences has announced an unprecedented fourteen new award categories for the 2018 Oscars.

Figuring Out Your Life At 23 Is Not Mandatory

Dear young generations, it’s okay — if you’re twenty barely making a living. Trying to perfect an egg omelet on weekends while sweating, forcing yourself to get up in the morning, and striving to…