Computers, Information Technology

Modern computer vision. Tasks and technologies of computer vision. Programming computer vision in Python

How to teach a computer to understand what is pictured in a picture or a photo? It seems easy for us, but for a computer it is just a matrix consisting of zeros and ones, from which it is necessary to extract important information.

What is computer vision? This is the ability of the computer to "see"

Vision is an important source of information for a person, with the help of it we get, according to various data, from 70 to 90% of all information. And, naturally, if we want to create a smart machine, we need to implement the same skills in the computer.

The task of computer vision can be formulated rather fuzzy. What is "see"? It's to understand that where is located, just looking. This is the difference between computer vision and human vision. Vision for us is the source of knowledge about the world, as well as a source of metric information - that is, the ability to understand distances and dimensions.

The semantic core of the image

Looking at the image, we can characterize it in a number of ways, so to speak, to extract semantic information.

For example, looking at this photo, we can say that this is outside the room. What is this city, street traffic. That there are cars here. By the configuration of the building and by hieroglyphs, we can guess that this is Southeast Asia. According to Mao Zedong's portrait, we understand that this is Beijing, and if someone has seen the video broadcast or visited there himself, he will be able to guess that this is the famous Tiananmen Square.

What can we say about the picture, considering it? We can select the objects on the image, say, there are people there, closer here is the fence. Here are the umbrellas, here is the building, here are the posters. These are examples of classes of very important objects, which are currently being searched for.

We can also extract some attributes or attributes of objects. For example, here we can determine that this is not a portrait of some ordinary Chinese, namely Mao Zedong.

By car, you can determine that this is a moving object, and it's hard, that is, it does not deform during movement. About the flags you can say that these are objects, they also move, but they are not rigid, permanently deformed. And also in the scene there is a wind, it can be determined by the developing flag, and even you can determine the direction of the wind, for example, it blows from left to right.

The value of distances and lengths in computer vision

Very important is the metric information in science about computer vision. These are all possible distances. For example, for a rover, this is especially important, because commands from the Earth go on about 20 minutes and the answer is the same. Accordingly, the connection there and back - 40 minutes. And if we draw up a plan of movement for the commands of the Earth, then we need to take this into account.

Fortunately, computer vision technologies are integrated into video games. According to the video, you can build three-dimensional models of objects, people, and user photos can restore three-dimensional models of cities. And then walk on them.

Computer vision - this is a fairly broad area. It is closely intertwined with various other sciences. Partially computer vision Captures the area of image processing and sometimes identifies the field of computer vision, historically so.

Analysis, pattern recognition - the way to create higher mind

We will analyze these concepts separately.

Image processing is an area of algorithms in which the input and output are an image, and we are already doing something with it.

Image analysis is a field of computer vision that focuses on working with a two-dimensional image and draws conclusions from this.

Image recognition is an abstract mathematical discipline that recognizes data in the form of vectors. That is, the input is a vector and we need to do something with it. Whence this vector, it is not so important for us to know.

Computer vision - this was originally a restoration of the structure of two-dimensional images. Now this area has become more extensive and it can be treated generally as the decision-making about physical objects, based on the image. That is, it is the task of artificial intelligence.

In parallel with computer vision in a completely different area, in geodesy, photogrammetry developed - this is the measurement of distances between objects on two-dimensional images.

Robots can "see"

And the last thing is machine vision. By computer vision is meant the sight of robots. That is the solution of some production problems. We can say that computer vision is one big science. It unites some other sciences in part. And when the computer vision receives a specific application, it turns into a computer vision.

The field of computer vision has a lot of practical applications. It is associated with the automation of production. At enterprises, it becomes more effective to replace manual labor with machinery. The machine does not get tired, does not sleep, it has an unregulated work schedule, it is ready to work 365 days a year. So, using machine work, we can get a guaranteed result at a certain time, and this is quite interesting. All tasks for computer vision systems have a visual application. And there is nothing better than seeing the result immediately from the picture, only at the calculation stage.

On the threshold of the world of artificial intelligence

Plus the area - it's complicated! A significant part of the brain is responsible for vision and it is believed that if you teach the computer to "see", that is, to fully use computer vision, then this is one of the full tasks of artificial intelligence. If we can solve the problem at the human level, most likely, at the same time we will solve the AI problem. Which is very good! Or not very good, if you look at "Terminator 2".

Why is vision difficult? Because the image of the same objects can vary greatly depending on external factors. Depending on the observation points, the objects look different.

For example, one and the same figure, shot from different angles. And what is most interesting, a figure can have one eye, two eyes or one and a half. And depending on the context (if it's a photo of a man in a T-shirt with painted eyes), then the eye can be more than two.

The computer does not yet understand, but already "sees"

Another factor that creates complexity is lighting. The same scene with different lighting will look different. The size of the objects can vary. And objects of any classes. Well, how can you say about a man that his height is 2 meters? No way. The height of a person can be 2.3 m and 80 cm. Like objects of other types, they are nevertheless objects of the same class.

Especially living objects undergo a variety of deformations. Hair of people, sportsmen, animals. Look at the pictures of running horses, it's impossible to determine what happens to their mane and tail. A overlap of objects in the image? If you shove such a picture on a computer, even the most powerful machine will find it difficult to give the right solution.

The next kind is disguise. Some objects, animals are masked under the environment, and skillfully enough. And the stains are the same and the color. But nevertheless we see them, although not always from afar.

Another problem is the movement. Objects in motion undergo unimaginable deformations.

Many objects are very variable. Here, for example, in two photos below objects such as "armchair".

And on this you can sit. But to teach the machine that such different things in form, color, material are all objects of "armchair" - very difficult. This is the task. Integrate the methods of computer vision - this is to teach the machine to understand, analyze, assume.

Integration of computer vision into various platforms

In the masses, computer vision began to penetrate as far back as 2001, when the first face detectors were created. Did this two authors: Viola, Jones. This was the first fast and sufficiently reliable algorithm, which demonstrated the power of machine learning methods.

Now, computer vision has a fairly new practical application - recognizing a person by face.

But it is impossible to recognize a person, as shown in films - in arbitrary angles, with different lighting conditions. But to solve the problem, one this or different people with different lighting or in different positions, similar, like the photos in the passport, you can with a high degree of confidence.

Requirements for passport photos are largely due to the peculiarity of face recognition algorithms.

For example, if you have a biometric passport, then in some modern airports you can use an automatic passport control system.

The unresolved problem of computer vision is the ability to recognize arbitrary text

Perhaps someone used the system of text recognition. One of these is Fine Reader, a very popular system in Runet. There are many forms where you need to fill in the data, they are perfectly scanned, the information is recognized by the system very well. But with arbitrary text on the image, things are much worse. This task remains unresolved.

Games involving computer vision, motion capture

A separate large area is the creation of three-dimensional models and motion capture (which is quite successfully implemented in computer games). The first program, using computer vision, is a system of interacting with a computer with the help of gestures. When it was created, there was much that was open.

The algorithm itself is quite simple, but to configure it, it was necessary to create a generator of artificial images of people to get a million pictures. Supercomputer with their help picked up the parameters of the algorithm, according to which it now works best.

That's how a million images and a week of supercomputer time allowed to create an algorithm that consumes 12% of the power of one processor and allows you to perceive a person's pose in real time. This is the Microsoft Kinect system (2010).

Searching for images by content allows you to upload a photo to the system, and by results it will display all pictures with the same content and taken from the same angle.

Examples of computer vision: three-dimensional and two-dimensional maps are now being made with it. Maps for car navigators are updated regularly according to data from DVRs.

There is a base with billions of photos with geometers. Uploading a snapshot to this database, you can determine where it was made and even from which perspective. Naturally, provided that the place is quite popular, that at one time there were tourists and made a series of photos of the area.

Robots are everywhere

Robotics are now everywhere, without it at all. Now there are cars in which there are special cameras that recognize pedestrians and road signs in order to pass commands to the driver (this is in a sense a computer vision program that helps a car enthusiast). And there are fully automated robotic cars, but they can not rely solely on the camera system without using a lot of additional information.

A modern camera is an analog of a camera obscura

Let's talk about the digital image. Modern digital cameras are built on the principle of camera obscura. Only instead of the hole through which a ray of light penetrates and projects the contour of the object on the back wall of the camera, we have a special optical system called the lens. Its task is to assemble a large beam of light and transform it in such a way that all rays pass through one virtual point in order to obtain a projection and form an image on a film or matrix.

Modern digital cameras (matrix) consist of separate elements - pixels. Each pixel allows you to measure the energy of light, which falls on this pixel in total, and output one number. Therefore, in a digital camera, we get instead of an image a set of measurements of the brightness of light that has fallen into a separate pixel - computer fields of view. Therefore, as the image is enlarged, we see not smooth lines and clear contours, but a grid of pixels that are colored in different tones - pixels.

Below you can see the first digital image in the world.

But what is missing in this image? Colour. And what is color?

Psychological perception of color

Color is what we see. The color of the object, the same object for man and cat will be different. Since we (in humans) and animals have an optical system - vision, it is different. Therefore, color is a psychological property of our vision, arising from the observation of objects and light. And not the physical property of the object and light. Color is the result of the interaction of the components of light, the scene and our visual system.

Programming computer vision in Python using libraries

If you decide to seriously study computer vision, you should immediately prepare for a series of difficulties, this science is not the easiest and hides a number of pitfalls. But "Programming computer vision in Python" in the authorship of Jan Eric Solem is a book in which everything is set out in the simplest possible language. Here you will get acquainted with the methods of recognizing various objects in 3D, learn how to work with stereo images, virtual reality and many other computer vision applications. There are enough examples in the book in Python. But the explanations are presented, so to speak, in general, so as not to overburden with too much scientific and heavy information. Work suits students, just lovers and enthusiasts. You can download this book and others about computer vision (pdf-format) online.

At the moment there is an open library of computer vision algorithms, as well as image processing and numerical algorithms of OpenCV. It is implemented in most modern programming languages, it has open source code. If we talk about computer vision, Python uses it as a programming language, then it also has the support of this library, in addition, it is constantly evolving and has a large community.

The company "Microsoft" provides its Api-services, which can train neural networks for work with images of individuals. It is also possible to use computer vision, Python using as a programming language .

Computers, Information Technology

Modern computer vision. Tasks and technologies of computer vision. Programming computer vision in Python

What is computer vision? This is the ability of the computer to "see"

The semantic core of the image

The value of distances and lengths in computer vision

Analysis, pattern recognition - the way to create higher mind

Robots can "see"

On the threshold of the world of artificial intelligence

The computer does not yet understand, but already "sees"

Integration of computer vision into various platforms

The unresolved problem of computer vision is the ability to recognize arbitrary text

Games involving computer vision, motion capture

Robots are everywhere

A modern camera is an analog of a camera obscura

Psychological perception of color

Programming computer vision in Python using libraries

Similar articles

Computers

Computers

Computers

Computers

Computers

Computers

Trending Now

Law

News and Society

Homeliness

Health

Fashion

Travels

Newest

Hobby

Travels

Homeliness

Arts & Entertainment

Health

Arts & Entertainment