How to teach computer to recognize dogs and cakes from images?

Learn how to recognize images using Jupyter Notebooks

To be a muffin, or not to be a muffin - that is the question

Artificial intelligence is really a very interesting topic that evokes a lot of emotions. This is due to the fact that the development of new technologies involves many opportunities and threats.

Some artificial intelligence technologies have been around for a long time, but advances in computational power and numerical optimization routines, the availability of huge amounts of data, have led to great breakthroughs in this field.

Artificial intelligence is widely used to provide personalized recommendations when shopping or simply searching for information on the web. More advanced inventions include autonomous self-driving cars - which, in a simplified way, make decisions about the next movements of vehicles based on data collected from various types of sensors installed in them.

autonomus_cars
Using artificial intelligence in automated vehicles.

When it comes to threats, you probably know a few sci-fiction movies in which rebel machines gain self-awareness and try to take over the world. It is because of such images that many myths and fears have arisen around the artificial intelligence. Let's try to answer basic question - is artificial intelligence a threat or a chance for development?

It is possible to understand AI?

First of all, we should state what this mysterious Artificial Intelligence (AI) actually is.

As the adjective artificial does not usually cause us too much problem, because we associate it with a some unnatural object created by human - a machine, a purely engineering being - the word intelligence can cause a lot of problems. Why? Because it is an abstract concept that we can understand as the ability to solve problems or deal with various situations. In order to get a better idea of ​​what intelligence is, I will refer here to the methods of its measurement.

The most classic tests of intelligence (IQ tests) are tests of Raven's Progressive Matrices. I am convinced that many of you have had contact with them, because they are tests with matrices with geometric shapes of various colors with missing last element. The examined person must catch the relationship between the elements of the pattern (of this matrix) and indicate the missing element from the examples given below. This is how general intelligence - ability to recognize patterns - is tested. Artificial intelligence is any form of intelligence that is not natural and is implemented by a machine or, viewed more closely, by a human-made program - algorithm. It is also important that the tasks are performed by the device itself, without the need for constant supervision from the user side.

Raven_Matrix
An IQ test item in the style of a Raven's Progressive Matrices test.

Combination of several disciplines

In addition to artificial intelligence, it is worth knowing, at least by name, a few more closely related fields. Here in detail we can list, Machine Learning, Data Science, and Deep Learning, among others. It can be said that Machine Learning is a subfield of artificial intelligence, which in turn is a subfield of computer sciences. Simply speaking, machine learning enables the use of adaptive solutions in the field of artificial intelligence.

data_learning_type.png
Fields of AI.

We can perceive machine learning as the techniques of extracting hidden knowledge from data. Based on the kind of data available and the research question at hand, a scientist will choose to train an algorithm using a specific learning model:

  • supervised learning (with the teacher): occurs when all data presented to the machine is annotated, i.e. labeled, in exactly the same way as the answer which we expect from machine;
  • unsupervised learning (without a teacher): occurs when we have a large amount of data without any labels, and the main machine's task is to determine the structure of the data;
  • reinforcement learning (with the critic): learning through trial and error method, the machine seeks a solution to a formulated task, and it is rewarded for its actions (when it does the right thing) or punished (when it is wrong) - machine is not given any other tips or suggestions.



ParametersSupervised learningUnsupervised learningReinforcement learning
Input DataAlgorithms are trained using labeled data.Algorithms are used against data which is not labelledAI agents are getting feedback about their actions.
Example MethodsClassification
Regression
Clustering
Anomaly detection
Q-Learning
Monte Carlo
Basic learning models.

Computer vision processes

Now, we move on to the essentials of the introduction - explanation how this computer vision, based on neural networks, actually works. And here I have to state two things:

  • firstly, how such a machines's eyes are constructed,
  • and secondly, how data, such as images, is processed by machines, and their algorithms.

Let's think about the machine learning process. The first thing is to construct the algorithm itself, that is, to define certain rules or model, and the second thing is to define parameters, which iteratively change their value during the process.

Inside the computer, the modern calculating machine, we want to reproduce a certain procedure, e.g. detect something in an image. As we are talking about a calculating machine, it should not surprise us that we are actually going to solve a mathematical equation. For the simplified task, imagine that we have a series of mathematical operations with some parameters with unknown real values - these are adaptive parameters (model weights) that will be changing their values during the machine learning process. The sum of the weights multiplied by the inputs data is called the linear combination of the inputs. It's a bit like a receipt - we multiply the number of items by their unit price and add the results together to get the final payment. Here, the unknown price of the article is the weight, the number of purchased items is the input data, and the network's answer should be the total price.

Neural Network model

Such a single mathematical equation together with the appropriate reaction to its result (activation function) can be called an artificial neuron. Just as neurons in the brain form certain structures, artificial neurons are combined into complex structures called a neural network in which we can distinguish layers. In below image it is shown symbolically - each neuron is a single circle. At the beginning, on the left side, there is a layer of neurons called the input layer. In our computer vision task - clasification of images with dogs or muffins - the input data is all pixels that make up the analized image. On the right side, there is an output layer that gives predictions - in our case two classes. There may be several more hidden layers in between. The neuron in the next layer is connected to all the neurons in the preceding layer.

architecture
Simple neural network model.

Let the classes begin…

Armed with this knowledge, we can move on to the practical part. In the repository you will find jupyter notebooks that you can use for further study. The practical part was written in Polish, but it is based on materials from beginner's workshop for SJSU ML Club in English.

Link on YouTube (PL)

The excercises in Deep learning for polish students were conducted in winter 2020/21 (polish) with cooperation with Wroclaw University of Technology and OPI as a part of NAVOICA project.

Sylwia Majchrowska
Sylwia Majchrowska
Postdoc Research Fellow

Unleashing the power of AI to solve real-world challenges, an AI Enthusiast and Problem Solver.

Related