[SOUND] In this video, I will talk about motion in video. Motion is the main difference between static images and videos. Motion itself is a powerful visual que. And many actions are defined by motion. Sometimes it's enough to see the motion of a set of sparse points for recognition of object properties and actions. For example, from dynamics point lights on parts of human body, you can recognize gender, complexion, and mood of a person. I recommend you to check this cool demo from Biomotion Lab to see for yourself. Suppose points of observed scene are moving relative to the camera. Vector field of 2D projections of scene point motion vectors into image is called motion field. We need to measure the motion field to obtain motion features for subsequent recognition. Sometimes object does move, but we can't see its motion. Try to imagine smooth gray sphere with uniform texture which rotates around its axis. All points of the sphere are moving, but we can't distinguish one point from the other. So for us, the sphere seems static. Optical flow is a vector field of apparent motion of pixels between frames. Optical flow is what we can estimate from video. We can treat optical flow as estimation of the true motion field. Optical flow estimation is one of the key problems in video analysis. Optical flow estimation can be regarded as a dense correspondence problem. Let vector field u, v be the optical flow field. To estimate optical flow field for each point x, y in first frame, we need to find a corresponding point x + u, y + v in the second frame, which corresponds to the same point of zone c as the point on the first frame. There are two main ways to visualize the result of optical flow estimation. First is to directly draw motion vectors, but in this case, we can draw motion vectors only for a sparse set of points because, otherwise, the image will be unreadable. The other way is color coding. For each possible motion vector, we specify a color. Usually, vector orientation is coded by color hue and vector length is coded by color saturation. In this case, we can visualize motion vectors for each pixel in the image. To evaluate the procedure of optical flow estimation, we need to compare estimated optical flow field and ground truth optical flow field. There are several metrics. Two of the most often used are angular error and endpoint error. Angular error is the angle between estimated optical flow vector and ground truth optical flow vector. It is measured as arccos of dot product of these vectors. Endpoint error is the distance between endpoints of estimated optical flow vector and ground truth optical flow vector. But how to obtain ground truth optical flow? It's very tricky problem. Compared to image classification or object detection, we can't just annotate image with human iterator. And we need to specify correct optical flow vectors for every pixel in the image. As a result, the number of available ground truth data for optical flow estimation is very limited. In Middleburry optical flow dataset, which was published in 2011, three ways for generation of ground truth data were used. First, is frame-by-frame capturing of scene in both common and fluorescent lighting. A lot of dot seen only in fluorescent lighting are placed on the surface of the object from which high quality optical flow can be estimated by existing method. Second is a fully synthetic image generation from 3D model scene. And the third is just interpolation of video frames from high speed camera if first and last frame in the sequence are null. In this case, we don't know the true optical flow. But if we can compare the interpolated video frame, this is the true intermediate frame. The second dataset is KITTI Vision Benchmark Suite. It is a collection of data obtained by laser scanning of the urban environment from a car. Optical flow dataset is one dataset from this collection. There are other datasets. To obtain this dataset, the complicated procedure has been used. 3D models are separately fit to the ground scene and to each of the moving objects. From these models, a correct optical flow between images is reconstructed. Due to the complexity of the procedure, the dataset contains only less than 200 image pairs. The third contemporary optical flow ground truth dataset is based on an open source 3D movie Sintel. For this movie, all 3D data is available. From various parts of the movie, more than 1,000 training and 500 test frames were extracted. Because various rendering options can be used for the same moment in the movie, we can obtain images with different level of rendering. For example, in the slide is demonstrated that for the first image, no shading was used. For the second image, complex shading and lighting are added. For the third image, such effects as motion blur or depths of field are added. This allow us to evaluate optical flow estimation methods in various cases. [SOUND] [MUSIC]