Contents

Autonomous Item Delivery Robot Using Boston Dynamics Spot

Contents

The goal of this project was to create an autonomous item delivery robot using the Boston Dynamics Spot robot. This robot was intended to identify and deliver specific items to designated locations using computer vision and a flexible, programmable control system. The project combined robotics, embedded systems, machine learning, and software development into a fully functional solution capable of real-time item recognition and delivery.

To accomplish this, we used Boston Dynamics’ Spot robot as the mobile base platform, leveraging its advanced mobility and built-in sensors. The computational backbone of the system was an NVIDIA Jetson, which processed visual data using a custom-trained TensorFlow image recognition model. This allowed the robot to recognize specific objects or packages and determine the correct delivery path based on visual input. We also made extensive use of Spot’s built-in SDK and Mission Control system to program and execute delivery tasks.

My Role:

I was primarily responsible for integrating the image recognition model with the hardware and control architecture. This included deploying the TensorFlow model on the Jetson, ensuring it could perform real-time inference under constrained conditions, and building the communication pipeline between the model, the Jetson system, and Spot’s control interface.

In addition, I developed a custom API that connected the image recognition results with the robot’s Mission Control system. This API served as the core logic engine, translating the model’s predictions into commands that Spot could execute—such as moving to a location, picking up or placing an item, and completing a delivery mission.

My work also involved debugging and fine-tuning performance to ensure the system could handle dynamic environments and make decisions based on new visual input without requiring constant human intervention.

Challenges Faced and How I Overcame Them:

Integrating TensorFlow with Spot’s SDK and Mission Control:

One of the biggest initial challenges was integrating our TensorFlow-based image recognition model with Spot’s proprietary SDK and Mission Control system. These systems were not built to work together out of the box, and we had to bridge different programming environments and data formats.

I addressed this by writing a middleware service on the Jetson that could take the output of the image recognition model and convert it into a structured format readable by the Spot SDK. I also had to carefully map the image recognition labels to mission task triggers within Spot’s Mission Control. This involved a significant amount of trial and error, as well as close reading of the Spot SDK documentation to understand how to create custom missions dynamically based on external input.

Optimizing Image Recognition for Real-Time Use on the Jetson:

Another key challenge was achieving fast, reliable image recognition on the Jetson platform. The original TensorFlow model was not optimized for real-time performance and caused noticeable delays in decision-making.

To solve this, I converted the TensorFlow model to TensorRT, NVIDIA’s high-performance inference engine. This greatly reduced latency and improved throughput. I also used multi-threading to decouple the inference pipeline from the main mission control logic, allowing them to run in parallel without bottlenecking each other.

Building a Stable and Flexible API:

The API I developed had to act as a bridge between multiple systems—handling real-time inference outputs, interacting with Spot’s mission control, and managing potential interruptions or edge cases (such as unrecognized objects or conflicting commands).

Designing this API required careful thought around system architecture, reliability, and scalability. I implemented robust error handling, including timeouts and fail-safes, so that the robot could recover from unexpected conditions without manual intervention. I also designed the API to be modular, so that new features or object types could be added easily later on.

Testing in Dynamic Environments:

Once the technical pieces were working, testing in real-world scenarios brought new challenges—lighting conditions, object positioning, and environmental variability all affected image recognition and path planning.

I conducted extensive tests in varied environments and iteratively improved both the model and the mission logic to increase accuracy and robustness. This included augmenting the training dataset, implementing real-time confidence scoring, and setting thresholds for decision-making to avoid misclassification errors.