Hand gesture is a natural way for humans to interact with the computers to perform variety of applications. Using Deep learning which is efficient for image recognition system is used to find the hand gesture which is captured dynamically. In particular the Convolutional neural network is used for better performance. The model is trained with static hand gesture images. The Convolutional neural network is created without using a Pre-trained model.
Gesture Recognition is more important when it comes to Real time applications. Due to wide spread of Digital cameras availability many researchers are in research regarding the applications of gesture recognition. Still there are many challenges because of the complexity of gestures recognition. Hence this problem is solved by Deep learning using Convolutional neural network . Deep learning is more efficient when it comes to image recognition.
Here the ASL dataset which contains the hand gesture containing (0-9) is used. Generally an image is preprocessed where it plays a vital role for extracting the gesture in a static image ( i.e ) background subtraction , image binarization . the feature is extracted from all the images after binarization .
Convolutional neural network’s are made of neurons with learnable weights and biases. Every neurons receives several inputs, and takes a weighted sum over them. It is then passed through an activation function and responds with an output.
In this paper, the hand gesture is recognized using Deep Learning Techniques. The proposed system is trained with the static image datasets. The network is created using convolutional neural network without using the Pre-trained models . A gesture is captured using webcam and is given as an input to recognize the gesture.
GESTURE RECOGNITION USING CNN DEEP LEARNING
Initially the dataset is loaded into the network for training . Preprocessing is done before the feature is extracted. The training is done in Convolutional neural network . After training an input image is given by capturing from a webcam. The given input image is tested for recognizing the gesture . A confusion matrix is produced accordingly to the produced output with its mean accuracy.
A ConvNet is a popular machine learning algorithm. Which is one of the techniques of Deep learning and is a learning model used to execute classification tasks through images, video, text, and sound .CNNs specifically give better results for identifying patterns in an image, which leads to recognizing of hand gesture, face, and any object. The advantage of CNN is it don’t require any feature extraction to train the model. CNN is invariant to the scaling and rotation.
imageInputLayer: An imageInputLayer is the place you initialize the size of input image,here, 128-by-128-by-1 is used. These numbers represent height, width, and the number ofchannels. In this case, input data is a grayscale image, hence the number of channel is 1.
Convolutional Layer: Input arguments for this layer are filtering size, the number of filters, and padding. Here, the filter of size 10 is used, which determines 10 x 10 filter.The number of channels used is 10, means 10 neurons are connected. Padding of 1 specifies that the size of the output image is same as that of an input image.
ReLU Layer: ReLU (rectified linear unit) layer is a batch normalization layer, which is placed after initializing a nonlinear activation function. Importance of this layer is to
decrease the sensitivity and increase the pace of the training.
Max Pooling Layer: Max pooling layer is one of the downsampling technique which is used for convolutional layers. In this architecture, poolSize is set to 3 and training function’s step size is 3.
Fully Connected Layer: Fully connected layers follow max pooling layer. In this layer, all the neurons of all layers are interconnected to the previous layer. The given input argument for this layer is 10, which indicate 10 classes.
Softmax Layer: Fully connected layers are followed by softmax layer, which is normalization technique. This layer generates positive numbers as output such that the sum of numbers is one. Classification layer uses these numbers for lassification.
Classification Layer: Classification layer is the final layer of the architecture. This layer classifies the classes based on probabilities obtained from softmax layer and also calculate cost function.
The maximum number of epochs set to 15 and initial learning rate is 0.001.
Hardware : Camera
Software : Matlab
For more Image Processing projects ,Click here
For more Deep Learning Projects Click here