How to Create Automated Information Extraction System for Document Images Using Graph Convolutional Networks

As everything is getting digital the demand for machine-based document digitization is growing higher than ever. Every organization wants its documents to be digitized as digital documents are easy to search. Maintaining hard copies of documents is both an expensive and tedious process. Also, these documents become unreadable after some years due to the fact that their papers get too old. The hard copy documents could be easily destroyed by criminals, natural disasters, etc in certain situations. All these facts make document digitization an inevitable thing to do. But digitizing documents manually can be both a very expensive and time taking process. So how can we digitize documents fast and cheaply? That’s where the deep learning systems come into place. The document pages can be captured using cameras and those images can be fed into a deep learning system to recognize texts and extract information from the images. After extracting information, it can be used to fill a predefined template for a particular document type and can be stored in the local system or in the cloud. Rule-Based Approach Our first attempt at the problem was very basic. We used an OCR system to perform text recognition of the document images. Once the text was extracted we employed certain hand-crafted rules based on regex to extract the information out of the documents. We know that the text in documents can have certain patterns like dates could be in formats like dd-mm-yyyy or yyyy-mm-dd, etc. The addresses can belong to texts separated by commas, etc. But what were the limitations of the first approach? Second Approach Having realized the limitations of the rule-based approach we decided to research a better approach. After doing some research we realized that the problem of information extraction can be represented in the form of graphs. The intuition around using graphs to solve the information extraction problem comes from the way in which we humans identify important information from documents. If you are given an identity card and you are told to identify the name of the person, DOB and address then you might look for a key-value pair in the card or you might be able to know about the information by the location of the text in the image. For example, the top most text in the card may be the organization for which the card is issued. Graph Convolutional Neural Network Exploring certain literature in the computer vision domain we found out GCN is a type of network that combines the visual and text information to create graphs. Then it performs classification on the graph nodes to identify the category of that text hence extracting the information from the document. The above diagram gives up a rough idea of our system. The whole work can be broken down into the below steps: Performing OCR: First, we perform OCR on the document image and extract the texts and corresponding bounding boxes from the image. Feature Extraction: The textual information is passed to the transformer module which converts the textual information into feature vectors. The bounding boxes obtained in the previous step are used to crop the image regions containing those texts. These cropped images are passed through the Convolutional neural network and features are extracted.  Graph Convolution: Now we have textual features and visual features from transformer and CNN respectively. These features textual and visual are passed to the Graph Neural Network. The GNN component models the texts as nodes. The relationship between these nodes is established with the help of the visual feature obtained by the convolutional neural network. Once these nodes are established by the GNN. These nodes can be processed further and classified into labels. Node Classification: The BiLSTM layer and CRF layer follows the GNN layer which takes the graph feature and classified the nodes into their label e.g. Name, Company, etc. Say we want to extract the name of the organization from an ID Card then we would train our model to classify the text containing the name as the rest of the text.  Challenges Like any other challenges we also faced certain challenges to create this project. Low Availability of Data One of the biggest problems with creating this type of project was getting a good amount of data to perform our experiments. We did an intensive amount of search but we couldn’t find any good dataset. In fact, even finding images of ID Cards in a good amount wasn’t possible. Hence we decided to create our own dataset with a mix of natural and synthetic images. We created certain tools to generate synthetic images of cards. We also created a tool to perform automatic annotation of these images. Soon after a few days, we had a good amount of data for doing experimentation. Model Size and Computational Requirement The model that we used for experimentation was based on Wenwen Yu et al. The model was too heavy to be deployed. Hence we needed to modify the architecture of the neural networks involved to make the model small and less computation demanding. We used the intuition that the textual features are less important than visual features for doing node classification and we modified the transformer and CNN blocks. We also modified the GNN model. After doing certain experimentations we were able to figure out the architecture that is both small and accurate. Results We tested our model on around 100 images of the cards. The metric that we used for evaluation was MEF(Mean Entity F1 score). The MEF of our model on test data was approximately  99.17 which seems to be pretty good. Conclusion In this article, we learned how Graph Convolutional Networks can be used to extract information from document images and can help us in the digitization of documents. Proper implementation of the approach can yield a pretty robust and accurate system which can save a lot of time and money for an organization in the digitization of documents.

Read More

Quantum Computing Concepts and Implementation in Python

Quantum computing is a fascinating concept in the science and technology industry. There’s a huge scope to use quantum computing in daily business processes in the future. We’ll discuss quantum computing concepts and see how it’s implemented using Python.  Quantum physics, as such, is a highly complex and extensive subject. The theories and concepts of quantum physics can confuse most of us, including the experts. However, researchers are making progress in utilizing the concepts of quantum physics in computing and building systems. Quantum computation might sound like something from the future, but we are very much proceeding in that direction, albeit with tiny steps. IBM, Microsoft, and D-Wave Systems (partnering with NASA) have developed quantum computers in the cloud for public use. Yes, you can actually use a quantum computer from the cloud for free.  Of course, it’s easier said than done. Quantum computing technology is not a substitute for classic computing. It’s an extension or a diversification, where classic computing and quantum computing go hand in hand. Given how building a single quantum computer can cost thousands of dollars, using the cloud version is the best choice for us. But where does Python come into the picture? And what exactly is quantum computing?  Let’s explore this topic further and understand how to implement quantum computing concepts in Python.  An Introduction to Quantum Computing  The term ‘Quantum’ comes from Quantum Mechanics, which is the study of the physical properties of the nature of electrons and photons in physics. It is a framework to describe and understand the complexities of nature. Quantum computing is the process of using quantum mechanics to solve highly complicated problems.  We use classic computing to solve problems that are difficult for humans to solve. Now, we use quantum computing to solve problems that classic computing cannot solve. Quantum computing works on a huge volume of complex data in quick time.  The easiest way to describe quantum computing would be by calling it complicated computation. It is a branch of Quantum Information Science and works on the phenomena of superposition and entanglement.  Superposition and Entanglement The smallest particles in nature are considered quantum. Electrons, photons, and neutrons are quantum particles.  Superposition is when the quantum system is present in more than one state at the same time. It’s an inherent ability of the quantum system. We can consider the time machine as an example to explain superposition. The person in the time machine is present in more than one place at the same time. Similarly, when a particle is present in multiple states at once, it is called superposition.  Entanglement is the correlation between the quantum particles. The particles are connected in a way that even if they were present at the opposite ends of the world, they’ll still be in sync and ‘dance’ simultaneously. The distance between the particles doesn’t matter as the entanglement between them is very strong. Einstein had described this phenomenon as ‘spooky action at a distance’. Quantum Computer  A quantum computer is a device/ system that performs quantum calculations. It stores and processes data in the form of Qubits (Quantum Bits). A quantum computer can speed up the processes of classic computing and solve problems that are beyond the scope of a classical computer.  If the classical computer takes five seconds to solve a complex mathematical problem like (689*12547836)/4587, the quantum computer will take only 0.005 seconds to give you the answer.  Quantum Bits (Qbits Concept) A quantum bit is a measure of data storage unit in quantum computers. The quantum bit is a subatomic particle that can be made of electrons or photons. Every quantum bit or Qbit adheres to the principles of superposition and entanglement. This makes things hard for scientists to generate Qbits and manage them. That’s because Qbits can show multiple combinations of zeros and ones (0 & 1) at the same time (superposition). Scientists use laser beams or microwaves to manipulate Qbits. Though the final result collapses to the quantum state of 0 or 1, the concept of entanglement is in force. When the two bits of the pair are placed at a distance, they are still connected to each other. A change in the state of one Qbit will automatically result in the change of state for the related Qbit.  Such connected groups of Qbits are powerful compared to single binary digits used in classical computing.  Classical Computing vs. Quantum Computing Since you have a basic idea of quantum computing, it’s time to delve into the differences between classical computing and quantum computing. These differences can be categorized based on the physical structure and working processes.  Architecture Level (Physical Structure) Differences  Phenomenon and Behavior  In classical/ conventional computing, the electric circuits can be only in a single state at any given point in time. The circuits follow the laws of classical physics. In quantum computing, the particles follow the rules of superposition and entanglement and adhere to the laws of quantum mechanics.  Information Storage  The information in classical computing is stored as bits (0 and 1), based on voltage/ charge. The binary codes represent information in conventional computing. The same is stored as Qubits or Qbit in quantum computing polarization of a photon or the spin of an electron. The Qbits include the binary code (0 & 1) and their superposition states to represent information.   Building Blocks  Conventional computers use CMOS transistors as basic building blocks. Data is processed in the CPU (Central Processing Unit), which contains an ALU (Arithmetic and Logic Unit), Control Unit, and Processor Registers.  Quantum computers use SQUID (Superconducting Quantum Interference Device) or quantum transistors as basic building blocks. Data is processed in QPU (Quantum Processing Unit) with interconnected Qbits.  Working Process Differences  The way data is represented is the major difference between a classical computer and a quantum computer.  The bits in classical computing can take the value of either 0 or 1. The Qbits in quantum computing can take the value of 0 or 1 or both simultaneously in a superposition

Read More

Smart Video Generation from Text Using Deep Neural Networks

Creating animated videos doesn’t have to be a laborious process anymore. Artificial intelligence and deep neural networks process datasets to create videos in less time. The blog details the different AI models and techniques used for smart video generation from text.  It’s no surprise that creating animated videos takes time. It’s hard work and involves several man-hours. Even with the use of technology, animated videos are still not easy to produce. However, the entry of artificial intelligence has brought new developments.  Researchers from the Allen Institute for Artificial Intelligence and the University of Illinois have worked together to create an AI model called CRAFT. It stands for Composition, Retrieval, and Fusion Network. The CRAFT model took text/ description (captions) from users to generate scenes from the famous cartoon series, The Flintstones. CRAFT is entirely different from the pixel-generation model where the pixel value is determined by the values generated by previous pixels to create a video. It uses the text-to-entity segment retrieval method to collect data from the video database. The model was trained on more than 25,000 videos where each clip was three seconds and 75 frames long. All videos were individually annotated with the details of the characters in the scene and information about what the scene dealt with. That is still labor-intensive as the team has to work on adding the captions to each scene. How can AI experts help generate video from text using automated video generation models? First, let’s take a look at the problems in creating videos from different POVs. Problems in Creating Videos The major problems in creating animated videos can be categorized into the following: Problems from the General Point of View Time Consuming and Effort-Intensive There’s a high demand for animated videos, leading to a gap between demand and supply. Kids and adults love animated videos, games, etc. But the supply isn’t as much as the viewers would like.  This is because the technology still hasn’t reached the stage where we can generate content in minutes and meet the increasing expectations. Video generation is still a time-consuming and laborious process that requires a lot of resources and input data. Computers are Not Enough It might seem that computers are an answer to everything. However, computers and the existing software are not advanced enough to change the video creation process. While researchers and experts are working on creating new applications to create videos in quick time, we still need to wait to experience a higher level of innovation. Problems from Deep Learning Point of View Manually Adding Text Artificial intelligence has helped develop video generation software to speed up the process. However, even AI doesn’t offer a solution to everything as yet. For example, some videos don’t have captions. But you still need to create a video from existing clips. What do you do? Well, you’ve got to manually add the captions so that the software can convert the text to video. Imagine doing that for thousands of video clips!  Improper Labeling The problem doesn’t end at manually adding captions. You’ve got to label the videos as well. Now, with so many clips to work on, it’s highly possible that you might mislabel something or give a wrong caption to a couple of videos. What if you notice the error only after the smart video is generated from the given text captions? Wouldn’t that lead to more wastage of resources, and poor-quality videos?  More than CRAFT Model While the CRAFT model is indeed a worthy invention, the world needs something better and more advanced than this. Moreover, the CRAFT model is limited to creating cartoons and cannot work with all kinds of video clips. Introduction to NLP and CV Well, we’ve seen the challenges faced by the video industries and AI researchers. Wouldn’t it be great to find a solution to overcome these challenges? Oh, yes! That’s exactly what we’ll be doing in this blog. However, we’ll first get a basic idea about the two major concepts that are an inherent part of smart video generation from the text. Yep, we are talking about NLP (Natural Language Processing) and CV (Computer Vision), the two branches of artificial intelligence. Natural Language Processing (NLP) NLP can be termed as a medium of communication between a human and a machine. This is, of course, a layman’s explanation. Just like how we use languages to communicate with each other, computers use their own language (the binary code) to exchange information. But things get complex when a human has to communicate with a machine. We are talking about how the machine processes and understands what you say and write.  NLP models can train a computer to not only read what you write/ speak but also to understand the emotions and intent behind the words. How else will a computer know that you’re being sarcastic? Applications like Sentiment Classification, Named Entity Recognition, Chatbots (or our virtual friends), Question- Answering systems, Story generations, etc., have been developed using NLP models to make the computer smarter than before.  Computer Vision (CV) Computer vision is yet another vital aspect of artificial intelligence. Let’s consider a scenario where you spot a familiar face in the crowd. If you know the person very well, you’ll mostly be able to recognize them among a group of strangers. But if you don’t? What if you need to identify someone by watching the CCTV recording? Overwhelming, isn’t it?  Now, what if the computer can identify a person from a series of videos on your behalf? It would save you so much time, effort, and confusion. But how does the computer do it? That’s where CV enters the picture (pun intended). We (as in the AI developers) provide the model with annotated datasets of images to train it to correctly identify a person based on their features.  Possible Approaches other than CRAFT model Researchers have been toiling on finding ways to use artificial intelligence and deep learning to facilitate video generation from text. The solutions involve using

Read More

Scaling Up Deep Learning Model Serving Using OpenCV

According to one research, 80% of the models built by data scientists never make it to production. The reason for this is that the production environment has several constraints. It could be inference time or it could be hardware in some cases. Hence, to make the model ready for production, we need to first think about the model itself that we will use for production. For deciding the model different experiments are carried out and then a trade-off between the hardware and accuracy is compared. The accuracy term used here doesn’t only mean accuracy but includes any appropriate metric depending on the use case we are working on.  In this article, we will explore model optimization for CPU environments. Benefits of Model Optimization in Terms of Business Reduces Deployment Cost By doing model optimization, we can run our models efficiently using less memory and computational resources, which saves the cost of deploying our models in production. Model Optimization Boosts Your Earning       Model optimization reduces the latency of the model. Meaning that more requests could be served in less time. Meaning at the same deployment cost you can serve more users and get more revenue. Why Use OpenCV to Serve Our Model? Because It’s Fast and Memory-Efficient OpenCV is fast and memory-efficient. Memory consumption is often low in comparison to other frameworks when we are doing inference using OpenCV. The inference speed is also fast. Even the models trained with darknet framework run faster with OpenCV as the cv2.dnn module is optimized for inference using Intel CPUs. OpenCV is Optimized for Intel CPUs Since the OpenCV was originally designed by Intel, it is optimized for doing inference with Intel CPUs. Here in this case we will be optimizing an SSD mobile net model which has been trained on the coco dataset. Let’s Optimize Our Model We need to perform the following steps in order to optimize our model. Freezing and Optimizing the Model Freezing converts the weights in form of variables to constants so we can freeze the model and also optimize it. Fortunately, Tensorflow object detection API provides a single script for doing both things. The script is called export_tflite_ssd_graph.py. This script performs the optimizations like stripping unused and identity nodes, removing dropouts. Quantization option is also provided but that type of optimization is not suited for CPUs and they don’t support float16 operations. Although it is not true for all CPUs. For converting the model. We need to install the Tensorflow object detection API for Tensorflow 1.x and run the script with the following arguments. pipeline_config_path: This is the path to the configuration file used for training the network. trained_checkpoint_prefix: This is the path to the best checkpoint. output_directory: The path where the optimized model will be stored. The optimized model will be in protobuf(.pb) format. Let’s say our trained model checkpoints and configuration is stored at the trained_checkpoints folder then we can do the conversion using the following command. python object_detection/export_tflite_ssd_graph.py \ –pipeline_config_path trained_checkpoints/mobilenetv1.config \ –trained_checkpoint_prefix trained_checkpoints/model.ckpt \ –output_directory trained_checkpoints/optimized_model.pb Generating the pbtxt File for OpenCV Prediction In the case of TensorFlow models, the DNN module readNetFromTensorflow function expects both the protobuf(.pb) file which actually contains the weights and a configuration file which is in pbtxt format which contains the topology of the model. These configurations are called text graphs in technical terms. For writing text graphs OpenCV repository has some helper code. In this case, as we selected to use the SSD model. The script we would be using is called tf_text_graph_ssd.py. It will be different in the case of RCNN models. This script expects three arguments:- input: This is the path to the optimized model config: This is the path to the configuration file used for training the model. output: This is the path where the pbtxt file will be saved. Wow, the Script Was Great but Where to Find This Amazing Script? The script can be found as follows: Go to: https://github.com/opencv/opencv.git In this folder of repo:– samples/dnn Let’s say our optimized model resides in trained_checkpoints as well then we can generate pbtxt file using the following command: python tf_text_graph_ssd.py \ –input trained_checkpoints/optimized_model.pb \ –config trained_checkpoints/mobilenetv1.config \ –output trained_checkpoints/model_conf.pbtxt We Have Optimized Our Model. So What’s Next? Now let’s roll up the curtains and see the magic which is happening behind. Removal of Dropouts Any deep learning practitioner who has trained any neural network might be familiar with dropouts. They are implemented as layers in some deep learning frameworks like TensorFlow. The dropouts randomly turn off a certain percentage of neurons during training hence preventing the model to overfit the data. But during the inference, these are not needed and if they remain the neural network they will never be used and will still consume memory. Hence in this step, we remove those dropouts and make our model more efficient. Removal of Unused and Identity Nodes In some cases, there are some nodes in the model that never get used and they only increase the memory and computation footprint of the model and hence they have to be removed to optimize our model. There are also nodes in the model which just produce identity results and hence are redundant and can be removed. Conversion of Variables to Constants During the training time, the weights are in the form of variables. These weights are updated by backpropagation of the errors. But after training is done these weights have not to be changed hence there is no need to keep them as variables but instead, they could be converted to less memory-consuming constants. Pruning During training, some weights values approach near zero. The neurons corresponding to those weights are never fired and hence are redundant. By removing those neurons we can drastically reduce the size of the network. Quantization In quantization, we typecast the weights of the neural network to smaller data types. Like from float32 to float16 or int8. Quantization is hardware-specific i.e. some hardware support both float32 and float16 operation while others don’t. This

Read More

Shoplifting – A Big Concern for the Retail Industry

The Big Question ? Have you ever wondered that the CCTV cameras we use in our workplaces, retail stores, jewelry shops, etc. are being underutilized compared to what they are actually capable of? There are many use cases that can be solved by using your CCTV cameras be it any anomalous event happening around. Here in this blog, we will talk about one of the major concerns of the retail industry i.e. Shoplifting. Along with this, we’ll also talk about how we at DataToBiz approached the solution to the problem. These days Computer Vision and Deep Learning are becoming prime choices for automation of daily work at many places. The reason behind their success is that they have an edge over providing security to businesses. But, till now only big enterprises have unleashed the potential of automated systems. This time, we at DataToBiz have come up with a solution that any business, be it small or big can use to prevent their daily business loss. As we all know, most of the shop owners nowadays prefer to install CCTV cameras in their shops. But, on a broader view, they limit their motives to only 2 purposes. First, to keep recordings of previous ‘n’ days. Second, to monitor the CCTV live stream for any anomalies. How can you save your business from daily loss ? Datatobiz has taken a step forward to better utilize your existing CCTV real-time feed and save manpower for your business. We all know that any crime generates significant losses, either human or economics, or both together. One of the major forms of crime in retail shops is Shoplifting – “the action of stealing goods from a shop while pretending to be a customer”. The second motive of every retail shop owner is to monitor such kind of activity. But the way they follow demands extra manpower that ultimately leads to recurring expenditure. Even following this traditional practice doesn’t prove to be an efficient solution. So, this approach needs to be solved in a completely automated way. Motivation behind the Shoplifting Solution According to the 2018 National Retail Security Survey (NRSS) inventory shrink, a loss of inventory related to theft, shoplifting, error, or fraud, had an impact of $46.8 billion in 2017 on the U.S. retail economy.  According to a survey released by the shoplifting prevention association, Metropolitan Police Department of Japan, the loss is estimated to be 4615 billion yen per year, which is equal to 12.6 billion yen per day. The stunning figure of 12.6 billion daily loss is equal to buying 126 Tesla model S (Big enough! Right?). And, if we look wisely there is no such manpower that can watch continuously to all such cases daily, and also will not be feasible for any business. Technical approach to the solution Fig.1 – Workflow of the Solution What’s new in our proposed solution ? We have implemented a 3DCNN (3-Dimensional Convolutional Neural Network) to process the CCTV video stream and extract the Spatiotemporal Features out of the frames. Spatiotemporal features are different from traditional 2DCNN models in a way such that it extracts features for an extra segment i.e. Temporal Segment. 3DCNN feature extractor takes a batch of frames as input and out of those frames, it selects some of the frames only for capturing ‘appearance’ features and some of the frames for capturing ‘motion’ related features. Let’s take examples of two different 3DCNN models proposed by Facebook and look at the way these models select frames for feature extraction. C3D vs Slowfast – By Facebook Example 1 – If we look into the working of C3D feature extractor model, it selects the first ‘x’ frames out of total ‘y’ frames of a batch to extract appearance-related features and use remaining all frames for extracting motion related features. Example 2 – But, if we look Slowfast (4×16) model, it takes a total of 64 frames as an input. Then, it selects a total of 4 frames each with an interval of 16 frames for extracting spatial features. Parallelly, it selects a total of 32 frames each with an interval of 2 frames for extracting temporal features. Note:- Complete explanation of 3DCNN models are beyond the scope of this blog. The Final Loop – Getting Results After extracting features, a model is built to perform certain pre-processing to bring down all the features into a fixed shape and then perform regression or classification on the extracted features. Here, whether to do classification or regression depends on your selected feature extraction model and the target use-case. Setting a threshold above which your model will treat the event as anomalous will be different from use-case to use-case because some human activities are comparatively easy to detect (e.g. Running, Eating, etc.) and some are hard (Shoplifting, Shooting, etc.). Once the Shoplifting event is confirmed by the model, a dedicated pipeline has been set up that sends notifications (messages, sound, etc.) to the staff members present there along with that particular event’s screenshot. Conclusion The proposed solution is a fully automated way to solve one of the biggest concerns of Retail Shop Owners, Jewelers, Museums, etc. This solution is capable of saving their manpower along with the loss that they had to bear till date.  DataToBiz has its expertise in developing state of the art Computer Vision algorithms and inferencing them on edge devices in real-time. Our highly experienced AI engineers will help you build a vision system customized for your requirements.

Read More

Nvidia DeepStream – A Simplistic Guide

Nvidia DeepStream is an AI Framework that helps in utilizing the ultimate potential of the Nvidia GPUs both in Jetson and GPU devices for Computer Vision. It powers edge devices like Jetson Nano and other devices from the Jetson family to process parallel video streams on edge devices in real-time. DeepStream uses Gstreamer pipelines (written in C) to take input video in GPU which ultimately processes it faster for further processes. What is DeepStream Used for? Video analytics has a vital role in the transportation, healthcare, retail, and physical security industries. DeepStream by Nvidia is an IVA SDK that enables you to attach and remove video streams without affecting the rest of the environment. Nvidia has been working on improving its deep learning stack to provide developers with even better and more accurate SDKs to create AI-based applications.  DeepStream is a bundle of plug-ins used to facilitate a deep learning video analysis pipeline. Developers don’t have to build the entire application from scratch. They can use the DeepStream SDK (open source) to speed up the process and reduce the time and effort invested into the project. Being a streaming analytics toolkit, it helps create smart systems to analyze videos using artificial intelligence and deep learning.  DeepStream is flexible and runs on discrete GPUs (Nvidia architecture) and chip platforms with Nvidia Jetson devices. It helps easily build complex applications with the following:  Multiple streams  Numerous deep learning frameworks  Several models working in tandem  Various models combined in series or parallel connection to create an ensemble Customized pre and post-processing  Computing at different precisions  Working with Kubernetes  Scaling is easy when using DeepStream as it allows you to deploy applications in stages. This helps maintain accuracy and minimize the risk of errors.  Components of DeepStream DeepStream has a plugin-based architecture. The Graph-based pipeline-interface allows high-level component interconnection. It enables heterogeneous parallel processing using multithreading on both GPU and CPU. Here are the major components of DeepStream and their high-level functions – Meta Data It is generated by graph and it is generated at every stage of the graph. Using this we can get many important fields like Type of Object detected, ROI coordinates, Object Classification, Source, etc. Decoder The decoders help in decoding the input video (H.264 and H.265). It supports multi-stream simultaneously decoding. It takes Bit depth and Resolution as parameters. Video Aggregator (nvstreammux) It helps in accepting n input streams and converts them into sequential batch frames. It uses Low Level APIs to access both GPU and CPU for the process. Inferencing (nvinfer) This is used to get inference of the model used. All the model related work is done through nvinfer. It also supports primary and secondary modes and various clustering methods. Format Conversion and Scaling (nvvidconv) It converts format from YUV to RGBA/BRGA, scales the resolution and does the image rotation part. Object Tracker (nvtracker) It uses CUDA and is based on KLT reference implementation. We can also replace default Tracker with other trackers. Screen Tiler (nvstreamtiler) It manages the output videos, i.e kind of equivalent of open cv’s imshow function. On Screen Display (nvosd) It manages all the drawables on the screen, like drawing lines, bounding boxes circles, ROI etc. Sink The sink as the name suggest is last end of pipeline where normal flows end. Flow of execution in Nvidia DeepStream Decoder -> Muxer -> Inference -> Tracker (if any) -> Tiler -> Format Conversion -> On Screen Display -> Sink The DeepStream app consists of two parts, one is the config file and the other is its driver file (can be in C or in Python). Example of Config File For more info refer here Different modes in running inference on Nvidia DeepStream While using DeepStream we can choose between 3 types of network mode. FP32 FP16 Int8 The performance varies with network mode Int8 being the fastest and FP32 being slowest but more accurate but the Jetson nano can not run Int8. DataToBiz has its expertise in developing state of the art Computer Vision algorithms and inferencing them on edge devices in real-time. For more details contact us

Read More

Face Recognition: ONNX to TensorRT conversion for Arcface model problem?

Are you also fascinated to get an inference from a face recognition model on jetson nano? I fail to run TensorRT inference on Jetson Nano, due to PReLU activation function not supported for TensorRT 5.1. But, the PReLU channel-wise operator is available for TensorRT 6. In this blogpost, I will explain the steps required in the model conversion of ONNX to TensorRT and the reason why my steps failed to run TensorRT inference on Jetson Nano.  Steps included to run TensorRT inference on Jetson Nano : The first step is to import the model, which includes loading it from a saved file on disk and converting it to a TensorRT network from its native framework or format. Our example loads the model in ONNX format i.e. arcface model of face recognition. Next, an optimized TensorRT engine is built based on the input model, target GPU platform, and other configuration parameters specified. The last step is to provide input data to the TensorRT engine to perform inference. The sample uses input data bundled with model from the ONNX model zoo to perform inference. Sample code: Now let’s convert the downloaded ONNX model into TensorRT arcface_trt.engine. TensorRT module is pre-installed on Jetson Nano. The current release of TensorRT version is 5.1 by NVIDIA JetPack SDK. Firstly, ensure that ONNX is installed on Jetson Nano by running the following command. import ONNX If this command gives an error, then ONNX is not installed on Jetson Nano. Follow the steps to install ONNX on Jetson Nano: sudo apt-get install cmake==3.2 sudo apt-get install protobuf-compiler sudo apt-get install libprotoc-dev pip install –no-binary ONNX ‘ONNX==1.5.0’ Now, ONNX is ready to run on Jetson Nano satisfying all the dependencies. Now, download the ONNX model using the following command: wget https://s3.amazonaws.com/ONNX-model-zoo/arcface/resnet100/resnet100.ONNX Simply run the following script as a next step: We are using Python API for the conversion. import os import tensorrt as trtbatch_size = 1 TRT_LOGGER = trt.Logger() def build_engine_ONNX(model_file): with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.ONNXParser(network, TRT_LOGGER) as parser: builder.max_workspace_size = 1 << 30 builder.max_batch_size = batch_size # Load the ONNX model and parse it in order to populate the TensorRT network. with open(model_file, ‘rb’) as model: parser.parse(model.read()) return builder.build_cuda_engine(network) # downloaded the arcface mdoel ONNX_file_path = ‘./resnet100.ONNX’ engine = build_engine_ONNX(ONNX_file_path) engine_file_path = ‘./arcface_trt.engine’ with open(engine_file_path, “wb”) as f: f.write(engine.serialize()) After running the script, we get some error “Segmentation fault core dumped”. After doing a lot of research we have found that there is no issue with the script. There are some other reasons why we are facing this problem. The reasons and explanations are discussed in the following paragraphs. What are the reasons for which model conversion failed? Jetson Nano is a ARM architecture-based device where TensorRT 5.1 is already pre-installed. The image which is written on SD card of NVIDIA Jetpack SDK does not includes TensorRT 6. It is possible to convert other models to TensorRT and run inference on top of it but it’s not possible with arcface. The arcface model cannot be converted because it contains a PRELU activation function which only supports TensorRT 6. Model cannot be converted because we are unable to upgrade the TensorRT version from 5.1 to 6. So, unless and until NVIDIA provides us a Jetpack SDK OS image with the latest version of TensorRT 6 specifically the arcface model cannot be converted. Why can’t we upgrade from TensorRT 5.1 to TensorRT 6? The installation file of TensorRT 6 is only supportable for AMD64 architecture which can’t be run on Jetson Nano because it is an ARM-architecture device. That’s why, the arcface ONNX model conversion is failed. Future Work and Conclusion As soon as, NVIDIA Jetpack SDK releases OS image with TensorRT 6 the arcface ONNX model will get converted to TensorRT and we can run inference on top of it. I am all ears to know your thoughts/ideas to make it happen if NVDIA is taking its time to update jetpack SDK. We at DataToBiz always strive for latest tools & technologies to get ahead from our competitors. Contact for further details About Author: Sushavan is a student of B.Tech in Computer Engg. at Lovely Professional University. He worked as an intern at DataToBiz for 6 months.

Read More

Fixing WiFi connectivity on Nvidia Jetson Nano

Are you struggling with fixing wifi connectivity issue with Nvidia Jetson Nano? After going through Nano official forums, if you are planning to buy expensive wifi modules, just because of reported issues of connectivity loss, then you need to check this blog. This blog contains 5 minutes of verified & tested hack to solve buggy driver, so that, jetson nano never loses its connectivity. What is NVIDIA Jetson Nano? Nvidia is a multinational computer systems design company based in California, US. It’s no surprise that some of the best software applications and developer kits belong to this multinational company. Jetson Nano™ is one such developer kit module that can empower countless artificial intelligence-based systems.  The Jetson Nano™ kit is a cost-effective solution to build AI systems in less time. It helps create a range of embedded IoT (Internet of Things) apps, AI robots, intelligent gateways, and several artificial intelligence software/ solutions. The kit allows developers to build low-power AI systems and entry-level NVRs (Network Video Recorders). From image processing to object recognition, the kit has all you need to build an AI application.  Another advantage of Jetson Nano™ is that it has been created for beginners. If you are still learning or want to learn about artificial intelligence and robotics, grab the starter kit and get going. It has ready-to-try projects that help you understand the use and purpose of AI in the real world.  The following are some applications of Jetson Nano™:  Fixing WiFi connectivity on Nvidia Jetson Nano Firstly, check wifi module compatibility from jetson nano official forum. We are setting up Edimax EW-7811Un for testing purposes for which maximum issues are reported: sudo apt-get update sudo apt-get install git linux-headers-generic build-essential dkms git clone the following repository:https://github.com/pvaret/rtl8192cu-fixes? Checking if your device uses this driver from the list below,ASUSTek USB-N13 rev. B1 (0b05:17ab)Belkin N300 (050d:2103)D-Link DWA-121 802.11n Wireless N 150 Pico Adapter [RTL8188CUS]Edimax EW-7811Un (7392:7811)Kootek KT-RPWF (0bda:8176)OurLink 150M 802.11n (0bda:8176)Plugable USB 2.0 Wireless N 802.11n (0bda:8176)TP-Link TL-WN725N (0bda:8176)TP-Link TL-WN821Nv4 (0bda:8178)TP-Link TL-WN822N (0bda:8178)TP-Link TL-WN823N (only models that use the rtl8192cu chip)TRENDnet TEW-648UBM N150 sudo dkms add ./rtl8192cu-fixes sudo dkms install 8192cu/1.11 sudo depmod -a sudo cp ./rtl8192cu-fixes/blacklist-native-rtl8192.conf /etc/modprobe.d/ sudo echo options rtl8xxxu ht40_2g=1 dma_aggregation=1 | sudo tee /etc/modprobe.d/rtl8xxxu.conf sudo iw dev wlan0 set power_save off sudo reboot now DataToBiz’s Hack: Enable auto-login so that wifi never loses its connectivity on ssh. Here is how to make it happen: sudo nano /etc/gdm3/custom.conf In this file, uncomment the following:I. AutomaticLoginEnable = trueII. AutomaticLogin = user // put your user name here e.g. jetson After following the above 10 steps, Edimax module will never drop the connection. Enjoy playing around with Jetson Nano with no connectivity issues!!!! At DataToBiz, we have been experimenting with various edge devices to solve business problems. Contact us in case you are looking for computer vision solutions on Nvidia Jetson Nano. About Author: Aanchal is a deep learning engineer with DataToBiz having expertise in deep learning technologies and currently working on various IOT devices for computer vision product SensiblyAI & related client projects.

Read More

How to Reduce the cost of Google BigQuery Data Processing?

“Mysql Server has gone away” OR “Lost Connection to Mysql Server during query”- These are some of the most common errors a developer of a DBA looks at on his/her computer screen, [….]

Read More
DMCA.com Protection Status

Get a Free Data Analysis Done!

Need experts help with your data? Drop Your Query And Get a 30 Minutes Consultation at $0.

They have the experience and agility to understand what’s possible and deliver to our expectations.

Drop Your Concern!