En Ru Es

Towards Safe city video surveillance

Thousands of cities around the world are already equipped with video surveillance systems for monitoring and law enforcement to ensure the safety of inhabitants within a Safe-city project framework. Technology and the continuing development of IP-based video surveillance systems are targeted towards not only simple video monitoring but also are also becoming a rather powerful intellectual tool, which can ensure safety and provide accurate and useful information to end-users.

The scale of different Safe-city systems vary, but they are usually presented by video surveillance systems consisting of tens or hundreds of cameras installed on the city streets, highways, squares, railway stations, and courtyards. For a system to be installed properly and able to prevent or investigate a particular event, high picture quality and megapixel cameras are required. According to conservative estimates, thousands of hours of "heavy" megapixel video are received in an archive for processing every day. It requires powerful software and hardware tools. Fast video analysis is required to find correct and accurate information.

Our company is a software developer. That is why I will address Safe-city issues from a software standpoint.

Large-scale Safe-city video surveillance systems face the following list of challenges:

  • Expensive high-power processors required;
  • A lot of time consumed watching video archives;
  • Low speed or lack of reaction on an operator’s part in response to unforeseen events.
Focus on servers

A server is the heart of a video system. It is the most important part. Hardware and software components of a system work together. That is why it is possible to optimize a computer’s equipment operation through software. Not all of the software packages are able to make the best use of server resources.  That is why the first thing you should pay attention to when designing any large-scale system (especially a city video surveillance system), is how a particular software works with the computing resources. It is crucially important to choose the one with reduced resources consumption.

Nowadays, software analysis technology for video streams without full decompression is the most efficient solution for this challenge. This technology allows you to optimize a server’s work with incoming video, and, as a result, reduce its load up to 4 times. Ultimately, one server may handle 4 times more IP cameras without compromising on quality or reliability.

Video stream decompression happens in several stages. The last stages require significant computational resources. These are not performed for the above-mentioned technology. Video stream processing algorithms allow you to analyze data, detect, or search for certain characteristics without complete decompression. Use of compressed video stream analysis technology without decompression leads to increased processing speed. A CPU’s load is reduced by 4 times on average, resulting in a lower overall system cost.

How to find what you are looking for in thousands of video hours

When speaking of a video system as an intelligent mechanism, which is able to provide necessary and accurate information, we cannot ignore opportunities presented by video analysis. Hundreds of cameras of any safe city capture thousands of events every day, and observe millions of people. That is why it is a very difficult and tedious task to find the right person or event in the video without special tools.

Detection in changing circumstances

The accurate detection of moving objects is extremely important for the city video surveillance and video analysis. The performance of a motion detector is highly affected by real life conditions such as shadows, noises, changes in lighting and background, heavy traffic, etc.

Most of the existing methods of motion detection consider the presence of noise. Noise level can be set-up manually for the whole screen on some motion detectors.  Other motion detectors allow to set-up various noise levels for different parts of the screen. In the first case, the motion detector’s accuracy is reduced because noise level threshold varies in different parts of the screen.  If a single value is set up as noise level threshold for the whole screen, it can lead to either false starts, if it has been set too low, or overlook events, if it has been set too high.

In order to optimize the motion detector’s operation, the following requirements should be met:

  1. Stable operation in a noisy environment, automatic calculation, and assignment of various noise levels for different parts of the screen.
  2. Automatic adaptation to slowly and rapidly changing lighting conditions.
  3. Stability when the background is changing.
  4. Automatic detection and removal of a moving object’s shadow.
  5. High data processing speed (parallel processing of multiple real time video streams should be possible).

Objects and events search

By date and time

This type of search is not intelligent. It is available with all software vendors. It is the simplest, but it requires precise input. If a video system operator knows what time a particular event occurred or which camera captured it, no additional video analysis tools are necessary.  A search by time will be enough.

Another uncertainty, as is usually the case, is that the time an event occurred is unknown.

Search by features

Search by features allows you to find an object with given visual attributes: type, shape, colour characteristics, or position within a snapshot. Using various search filters by features, an operator can find anything in a video, even with minimal input.

Let us consider an incident on a city street, the theft of a bag, for example, which should be investigated after the fact. Let us apply video analysis tools. We can say with high probability that the scene of the crime is known. Respectively, we can pinpoint a camera that captured it and then select a spot on a screen where the event occurred as well. We also know that this action was performed by a person. So we make search settings: select an appropriate object type and set its proportions in the screen. Finally, we know what the colours of his clothes were.  For example, a man was wearing a blue jacket and black pants. We can create a pattern and paint it with appropriate colours. We also know an approximate time interval in which the theft occurred. We input the time interval and begin searching.

As a result, the system displays a set of search results corresponding to the specified parameters. Despite the fact that the displayed results will correspond to specified parameters, a human should make final determination. The search results will be displayed as a set of images, and an operator should select a desired option. They will be able to view a particular segment from a video archive. Thus, the system will find all people who appear within a specified period, in a specified area of a screen, dressed in the certain "blue top, black bottom" clothing. However, an offender should be picked and finally identified by an operator himself or herself. If a system operator does not possess all the above data, certain filters should be left blank. Instead of colouring a man’s template, an operator can simply download his photo from a computer or choose an object’s picture from a video archive.

Search by features is universal: you can search not only for people, but also, say, cars, or even city facilities. The probability of overlooking the correct information is small because the analysis is based not so much on searching for absolutely similar properties but on the rejection of non-conforming properties. The more precise the characteristics of an object are set by an operator, the more accurate will be the results, with fewer "extra" results.

Some software vendors also provide search by height. It allows you to set person’s height in a numerical form but requires very precise camera calibration. Search by height may be very useful in addition to other video analysis tools. Nevertheless, it should be clear, that street cameras’ calibration requires a huge amount of work. Search by proportions, which requires no additional hardware configuration, can be an excellent alternative to search by height. Besides, when speaking of street theft situations, it is more convenient to note the approximate size of a person on a screen than to define his height, which is unlikely to be known.

IP cameras control a Safe city’s street space for the most part, which means that recording takes place under constantly changing conditions. Colours are being distorted by lighting or the quality of recording IP cameras. Presently, computers cannot recognize colours invariantly, which is why “search by features” does not find colours, but colour characteristics, or rather combinations of colours (blue jacket + black jeans). Colour variations are more invariant than the colour itself. That is why “search by features” is more accurate and resistant to non-ideal conditions.

Situational control

Safe city software developers have implemented a number of solutions and modules for crime prevention and investigation. They allow one to conduct searches not only for objects/objects, but for events as well.


The tracking module allows you to specify a certain area on a screen or a line of control. If this line is crossed, a warning signal or alarm is sent to a video system operator. This module is useful for access control for areas with restricted or forbidden entrance. For example, it might be found in industrial zones or specially guarded sites, as well as dangerous areas for people. Thus, a person falling onto subway rails can be detected by tracking. Passenger platforms should be demarcated in such a way that if a certain line is crossed, it means that somebody has fallen onto the tracks. An alarm will be initiated.

This module can also be configured to issue messages should somebody stay longer within a predetermined area of surveillance. In many cities there are parking sites with limited periods of free parking. If a camera is installed, a screen is divided into parking zones, and a free parking time limit is set. An operator will receive information about all time limit excesses.

Suspect inter-camera tracking

This is an intelligent function, which allows one to build a movement pathway of an object across multiple cameras within a video system. Inter-camera tracking can be used to track people on a Safe city streets: search for missing persons, crime investigation, etc.

In order to be able to utilize inter-camera tracking, you should have an interactive “search function by features.” It is necessary to download or draw a site’s map using an editor program and indicate all cameras which be connected to this module. Then you can initiate a search by features.

The search begins with the first camera to construct a pathway. By specifying parameters and the characteristics of a person in question or by uploading his photo, an operator can get search results that match criteria. After selecting a person in question, an operator launches a further search, which happens on the next cameras within the system. Then the operator has to observe results again, choose a desired one, and continue searching. Thus, inter-camera tracking works in a few steps: search with the first camera and choose a sample, then search with the next camera, and choose a sample, and so on. The module analyzes an area’s map, determines when a selected object could have been seen in a particular camera and provides relevant results in a set of tracks - grouped object’s snapshots per IP-channel. Grouping images is done based on continuous an object’s movement in camera sight. Building a pathway is done step by step, and it is continued either until an object does not disappear from the sight of all cameras or until an operator receives sufficient information.

The more iterations are performed in Suspect inter-camera tracking the more accurate results. However, due to the fact that the system calculates an estimated time of an object’s transition between cameras next to each other, the location of cameras on a site’s map should be precisely indicated.

Slip and fall detectors

This is used to detect dangerous situations for people automatically and to prevent accidents. The detector monitors two types of falls:  from top to bottom and on even ground. The first case applies, for example, to a person falling on subway tracks, into water from an embankment, etc.  The second applies to a slip and fall on a street, due to either poor health or crime. Technologically, a fall detector responds to either a virtual line crossing (similar to tracking module) or an abrupt change in a moving object’s size, as well as the absence of motion or limited motion.

Abandoned objects detection

This module is a great help, especially, for the prevention of terrorist attacks in crowded areas of a Safe city, as well as acts of vandalism. It implies timed registration of background, which occur when new or previously observed objects appear or disappear. The system allows you to adjust objects’ sizes, their position on a screen, where detection will be made, and specify a period of time, after which an item will be considered abandoned.


Abandoned objects detection

Crowd monitoring

It allows you to register crowds of people and prevent accidents in streets, squares, and railway stations. An operator sets a threshold value, above which a given number of people will be considered as a crowd. The work of a crowd detector is based on three basic methods.

1. The first one analyzes the square occupied by moving people, determines how many pixels falls on one moving person (taking perspective into account), and divides the total square footage occupied by a crowd by the square footage of one person. A result is a number of people gathered together. The crowd’s position on a screen and people overlapping each other is taken into account.

2. The second method counts people's heads. The program classifies parts of human body, separates their heads from bodies, and counts them after that. This method requires good image quality. It does not work if a crowd is far away from a camera, one person has a small number of pixels, or if a camera has a low resolution.

3. The third method works with human special points, e.g. specific angles of the human body, the closure of contours, etc. This method estimates the number of such angles in a crowd and the number per person and uses statistics from the previous two methods.


Counting people based on heads images

Crowd detector combines the three different methods by taking their results value, calculating an average, and arriving at a required threshold number of people in a crowd. If this threshold is exceeded it reacts accordingly.

Face and license plate recognition

These features are widely used in safe city video surveillance systems. These features are widely discussed in our other articles. Let us just note that license plate recognition can be used effectively for speed limits in highway control, automatic admission of vehicles into an enterprise’s property, registering and investigating traffic accidents, etc. In most cases, Face recognition requires special conditions and places a high demand on samples for comparison. At the same time, there are technologies in place, which can be used successfully for face recognition on a citywide scale.

Instant reaction to the events

It is very important either to prevent certain events or to be able to react to them instantly, rather than investigate them some time after they have occurred. This is especially true when we are talking about people and events on a citywide scale. That is why the key element of intelligent video functions is their capacity of instant event notification.

There are all kinds of alarm messages, such as instant alerts when a certain object appears or a particular event occurs. An alarm can be triggered by any video analytics feature specified. It can also be triggered if any component of a video system fails to work properly, such as a camera is off, a server is hanging, etc. An alarm signal can be sent to an operator’s screen or sent as an e-mail message, directing his attention to a particular camera. Often, an instant reaction to a certain event can prevent negative effects. So, an "alert" in a video system is not only a nice bonus but an essential tool for prevention the prevention of negative situations.

Three basic rules of a safe city

Any city is a constantly changing and moving living organism, with a wide variety of events taking place every minute. That is why we need to take serious responsibility in building security and monitoring systems, to ensure law and order on the streets for the safety of city inhabitants.

These are three main rules, which should be followed in Safe City video surveillance system:

  1. The processing and analysis of video archives should reliable and done in a timely fashion.
  2. A video system should possess good analytical functions in order to be able to investigate accidents and conduct people searches.
  3.  Instant alerts and notification functions should be in place to prevent undesirable events or, eliminate the negative effects quickly.
We use cookies on this site to ensure the best service possible. Read more