Visual-based Robotic Tracking of Arbitrarily defined Object


The ultimate goal of this project is to manipulate arbitrary objects with a robotic manipulator through an on-board vision system. Currently, we are able to track any single object using PySOT(an open-sourced software developed by SenseTime Video Intelligence Research Team), which implemented state-of-the-art single object tracking algorithms, including SiamRPN and SiamMask. Together with an RGB-D camera (Intel RealSense D435) the 3D pose of the designated object can be calculated thus it can be manipulated by a robotic arm (KUKA LBR iiwa14 R820).


  1. Take a snapshot using the RGB-D camera.
  2. Select the interested object from the snapshot.
  3. The online object tracking and segmentation algorithm will start working and tracking the designated object in the pixel frame.
  4. Convert the 2D position of the designated object to 3D position under the camera frame using the aligned depth camera and the librealsense SDK.
  5. Transform the 3D position of the object using the robot’s frame.

Now, the robot knows where the interested object is located at and manipulation of it becomes possible.


We have two demonstrations using this workflow, but this method can be extended to more advanced applications such as pick-and-place, etc..

Poke an Object

Track an Object