Computer vision is a part of Data Science that focuses on creating systems with artificial vision capabilities for understanding visual signals and making decisions based on information extracted from video streams. Generally, Computer Vision systems are artificial eyes and brains which are capable of being managed by many devices and systems.
Softarex has been accumulating experience in Computer Vision for the last 20 years. We’ve already implemented many Computer Vision systems for different areas and published more than 100 research articles in different scientific studies and blogs. Some of them you can find in our blog.
A wide range of tasks can be solved using computer vision and deep learning technologies. Here are the main models and problems they solve:
To simplify any solution, predefined markers such as ArUco, QR, or barcodes can be used. With these, you can perform detection, recognition, tracking, spatial localization, and scene calibration easier.
It is often non trivial to form specific technical problems when solving a business problem since they can be achieved by different approaches. The choice of one of them is determined by the limitations of the scene and the presence of the dataset.
When developing a solution from scratch, you must first analyze the scene. In doing so you will determine the best placement for camera installation- one that provides the best possible view with minimal overlap. Additionally, cameras should be installed close enough to the monitored objects to provide high detail and, in turn, the accuracy of the Computer Vision algorithms.
Next, define the tasks and ways to resolve them. There is a choice to use multitasking end-to-end models or to develop a multistage algorithm based on single-tasking models. End-to-end solutions minimize the number of configurable parameters of the whole system, but also require preparing a large dataset with partitioning for all tasks. Multistage algorithm requires tuning at each stage, but in the end requires a much smaller amount of data for training.
Machine learning requires the preparation of a meaningful data set. Data collection in such tasks requires a lot of time and can include searching for available open source datasets for similar tasks, manual cleaning of datasets from outliers, or even marking and preparing datasets from scratch.
The introduction of Computer Vision into the acquisition process allows you to speed up the development of the dataset. Thus, adding Multiple Object Tracking functions to the data preparation software allows the operator to significantly speed up the process of data partitioning on video. After marking the first frame, the software predicts a new position of the object for subsequent frames and the operator only needs to correct or confirm the correctness of the marking.
The method of iterative training with pseudo-partitioning also allows you to tremendously reduce the time of model development. According to this methodology, a small set of data is marked out and the first model is trained. Then, with the help of this model, the data is further partitioned, where the main emphasis is placed on those examples where the model is in error. After the data set is expanded, the model is re-trained and the target metrics are checked. If the model falls within the specified error ranges, then the process is completed, otherwise, the re-training is repeated.
Computer vision specialists must evaluate the trained models – whether high accuracy is needed for segmentation or a rough estimate is sufficient, and choose a loss function that shows what the model should focus on when training.
Deep models need many computational resources, so we need to find a balance between accuracy and cost. Learning deep networks may require the use of a GPGPU cluster as well as an inference.
Scalability and bandwidth limitations for a large number of cameras in the vision system may be a problem. In this case, deploying intermediate computing nodes/servers is a good decision.
The mobile deployment includes model optimization for real-time inference, which requires the development of optimized algorithms for a wide range of platforms: about 30 different SoC models now cover 50% of the market and an individual optimization approach is needed for each model. That requires a lot of work from low-level optimization engineers.
If you are thinking of a project in Manufacturing, such as tracking within a factory, you will likely face issues with setting device sizes and shapes. In such tasks, an integral part of the system is the calibration of the automatic parameters for a specific production site.
All software related to computer vision tasks needs to be highly productive for providing real-time video processing. Therefore, the last step before deployment is to compile the model into a more optimized format using libraries such as TensorRT, TVM, TensorFlow Lite, CoreML, and OpenVINO.
Software implementations for video processing algorithms need to be optimal for your tasks – therefore universal algorithms cannot be used.
It is also important to note that when developing Computer Vision systems for tasks taken on by actual workers, such as Production line monitoring, special care and attention must be taken. To improve the predictability of human behavior it may be necessary to refine the protocols of the production process so as to minimize possible overlaps and interference from workers.