Written by TKS Alumni, Alishba Imran.
There are about 2 million amputees in the U.S. alone, and that number is expected to nearly double to 3.6 million by 2050. However, current prosthetics aren’t accurate with their grasp and grip control. For cheaper prosthetics, grasp (ability to interact with real-world objects) must be manually controlled by the user which requires a lot of training and still often not very accurate.
Although there are more automated ways to control grasp, these prosthetics can end up being $100,000+, making them difficult to afford by most users.
To solve this problem, I’ve been working alongside mentors from Kindred.ai on building a cheaper, more automated and more accurate grasping program for a robotic prosthetic arm using deep learning.
Before I get into how I built this, lets better understand how prosthetics today work and more about the problem.
Your brain controls the muscles in your limbs by sending electrical commands down the spinal cord and then through peripheral nerves to the muscles. This information pathway would be blocked if you had a limb amputated. The peripheral nerves would still carry electrical motor command signals generated in the brain, but the signals would meet a dead end at the site of amputation and never reach the amputated muscles.
We essentially try to achieve this same thing with a prosthetic arm so that we can still do daily tasks (however, most of them do this without the feeling being returned to the person). Today, when a wearer of a prosthetic arm wants to grab something there are three main ways in which this single is sent:
Body-powered prostheses are the most commonly used artificial arm. A body-powered prosthetic arm is held in place by suction or by a strap system that rests on your shoulders. A cable and harness system controls the device.
There are two main types of body-powered hand prostheses:
Myoelectric prostheses do not require a harness and are controlled by nerves and electrical impulses from the wearer’s residual limb. When we move our muscles, small electrical fields are generated around them, which can be measured by electrodes. Sensors or electrodes in the prosthetic socket detect your muscle contractions and send commands to operate the high performance, battery-operated prosthetic motors.
They often use two electrode sites to sense muscle contractions from two major muscle groups.
The IMES system is a device is used to improve signal quality and consistency of myoelectric signals for prosthetic control.
An IMES is implanted into each targeted muscle that will be used to control a function of the prosthetic arm. Two devices would be needed for DOF (one device would control fingers opening and another device would control fingers closing).
To summarize, these are the main problems with current prosthetic arms:
What if there was a way to have full functionality of a prosthetic arm with smaller training times, at a much cheaper price with increased precision of control grasp and grip.
This was my goal with this project, specifically, I had mounted a USB camera onto a 3D printed prosthetic arm that using a Convolutional Neural Network (CNN) is able to detect objects and identify ways for the arm to manipulate them. This creates for a cheaper and more effective alternative to current prosthetic arms. Specifically:
For the rest of the article, I’ll be diving deeper into how I built the CNN model for object detection and grasping:
My system consists of the 3D printed prosthetic arm, with an NVIDIA GPU and a USB camera:
For the robotic arm to interact with the identified object we need a grasping implementation which has the following sub-systems:
The architecture I used includes a grasp region proposal network for the identification of potential grasp regions. The network then partitions the grasp configuration estimation problem into regression over the bounding box parameters, and classification of the orientation angles, from RGB-D image inputs.
Given corresponding RGB and depth images of a novel object, we want to be able to identify the grasp configurations for potential grasp candidates of an object so that we can manipulate it.
We can use the 5-dimensional grasp rectangle as the grasp representation here which describes the location, orientation, and opening distance of a parallel gripper prior to closing on an object. The 2D orientated rectangle, shown below depicts the gripper’s location (x, y), orientation θ, and opening distance (h). And an additional parameter describing the length (w) is added for the bounding box grasp configuration.
As seen in c) Each element in the feature map is an anchor and corresponds to multiple candidate grasp proposal bounding boxes.
I used a ResNet-50 with 50 layers for feature extraction and grasp prediction. This architecture helps consider multiple objects in a scene and performs significantly better than the AlexNet (which is otherwise commonly used for this task). These are the key features of this model:
I used the Cornell Dataset which consists of 885 images of 244 different objects, with several images taken of each object in various orientations or poses. Each distinct image is labelled with multiple ground truth grasps corresponding to possible ways to grab the object.
The Cornell dataset is preprocessed to fit the input format of the ResNet-50 network which mostly consisted of resizing the images (227×227) and substituting in the depth channel.
b) showcases the top grasp outputs for several objects; c) output grasps (red) and ground-truth grasps (green) showing that the system may output grasps for which there is no ground truth; (d) multi-grasp output for several objects. The green rectangles are ground truth and the red rectangles represent predicted grasps for each unseen object.
Reinforcement Learning (RL) in the field of machine learning figures out what to do and how to map situations to actions. The end result is to maximize the numerical reward signal but instead of telling the learner what action to take, they must discover which action will result in the maximum reward.
The ultimate goals with this section are to figure out:
Using RL, we can get an agent to learn the optimal policy for performing a sequential decision without complete knowledge of the environment.
The agent first explores the environment by taking action and then edits the policy according to the reward function to maximize the reward. We use the Deep Deterministic Policy Gradient (DDPG), Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) to train the agent.
We could use Imitation learning where the learner tries to mimic an expert’s action in order to achieve the best performance. Possibly by implementing the DAgger algorithm.
I know I just dropped a bunch of algorithms but if you’re interested I’ll be going deeper into this in a future article.
That’s it for now ✌