This week is devoted to video analysis. Video cameras are the main sources of new data in the Internet. Your cameras are widely available almost each mobile device, my phone or tablet, PC has a video camera nowadays. Video surveillance cameras are mounted also everywhere. In big cities we have hundreds of thousands of surveillance video cameras. Also, robotic systems are rapidly developing. Their camera starts from common dashboard cameras to cameras on autonomous vehicles, plus we have a lot of new TV and movie content every day. What is a video? Video is just an older set of frames of the same resolution, usually frames are taken at regular time intervals. When constructing the video procession algorithm, we divide the video into two classes. Video stream is an ongoing video for online processing. In processing video stream, we don't know the future frames. Video sequence is a video of fixed lens. All frames are available at once, so we can process video sequence as a whole object. Video is much larger object than an image. Frame width of consumer video is usually the line range from 3-5 image per second to 30 or 50 frames per second. The resolution can be up to Full HD or 4K right now. So, the uncompressed data stream from Full HD video can reach 300 megabytes per second, which is more than throughput of 1 gigabit Ethernet LAN. Thus, they usually work in this compressed videos where some information is lost. For example, my new super resolution algorithms can actually reconstruct higher resolution relies on information that is removed from video during compression. So, only other algorithm that hallucinate higher resolution can be applied. Currently, most video cameras only record video with little or no automatic recognition. The amount of video data is enormous, so it's very difficult to work with such amount without automatic analysis. Also, most of the video is stored on local drives, and it's not readily available. Privacy issues also limits video availability, especially when we are working this video surveillance systems. What do we want from video analysis? First, we want to detect all interesting objects in video. Then we usually want to identify their properties including human pose estimation, attributes estimation, person identification et cetera. Then you want to recognize people actions and recognize events which are happening in the video. Some video analysis application can process video offline, but for many applications the, situational awareness is required. So, we need to extract semantic data from video in real time, and this information should be sufficient for adequate immediate reaction. There are two main examples of such applications: the smart surveillance systems and robotics. Also, we know that object appearance varies significantly between different viewpoints. For example, some surveillance cameras are mounted on people's height, so people are large and seen these high resolution. Other surveillance cameras are mounted on top of the building to overview the situation, so each person is seen as a very small dot. Current image recognition and the detection algorithm cannot raise sufficient accuracy and speed simultaneously for both scenarios. So, practically, applied video analysis systems are usually obtained only when algorithm are tailored to specific video scenario.