LLM Agent control of a Autonomous Robot

A Generalized software for LLM's to control Autonomous Robots using Langchain and webRTC based Realtime Communication

Overview

This project explores using advanced Large Language Models (LLMs) for controlling a ROS2-based robot (A24). By integrating LangChain for text-based control and LiveKit WebRTC for speech-based interaction, we demonstrate how LLMs can bridge the gap between humans and robots. The project also leverages OpenAI’s API for real-time decision-making, enabling natural language queries to translate into robotic actions. Be sure to check out the associated publication on my website for this project.

Objectives

  • Use LangChain to allow text-to-text control of the A24 robot.
  • Use LiveKit WebRTC to enable speech-to-speech interaction with the robot.
  • Enhance robot capabilities with ROS2, Nav2, and custom tools for robust navigation and control.
  • Demonstrate tool chaining with LangChain for complex queries.

Key Features

  1. LangChain Integration
    • Added tools to LangChain, allowing the robot to:
      • Navigate to specific coordinates.
      • Capture and describe images using OpenAI’s API.
      • Determine distances from depth cameras.
      • Echo ROS2 topics for debugging and odometry readings.
    • Tool chaining enables handling multi-step queries, e.g., determining the robot’s location and time.
  2. LiveKit WebRTC
    • Developed a speech-to-speech interaction system using LiveKit’s real-time communication platform.
    • Integrated a pipeline: Speech-to-Text (STT)LLM QueryText-to-Speech (TTS).
    • Allowed seamless interaction through a smartphone or other devices.
  3. ROS2 Integration
    • Controlled robot navigation using the Nav2 stack and SLAM Toolbox.
    • Used a Gazebo simulation to validate the system’s performance.
    • Enhanced localization with Adaptive Monte Carlo Localization (AMCL).
  4. Simulation Setup
    • Simulated the A24 robot in a Gazebo environment representing a home.
    • Generated maps using slam_toolbox and navigated autonomously within the environment.

Results

LangChain

  • Successfully executed commands like:
    • “Move forward.”
    • “Where are you?” (using /odom topic for coordinates).
  • Demonstrated robust tool chaining to solve multi-step queries.

LiveKit

  • Enabled speech-to-speech queries such as:
    • “Make the robot move forward.”
    • “Stop the robot.”
  • Highlighted limitations in tool chaining compared to LangChain.

ROS2 & Navigation

  • Generated accurate maps using SLAM Toolbox and navigated autonomously.
  • Tested navigation with waypoints, achieving high accuracy in localization.

Tools and Libraries Used

  • LangChain: LLM framework for integrating tools.
  • LiveKit WebRTC: Real-time speech-to-speech interaction.
  • OpenAI API: For natural language processing and image descriptions.
  • ROS2: Robot Operating System for middleware and control.
  • Nav2: Navigation stack for autonomous movement.
  • slam_toolbox: For mapping and localization.
  • Gazebo: Robot simulation environment.

Repository Structure

  • agent_code/:
    • Contains LangChain and LiveKit agents.
    • Includes tools for image recognition, navigation, and querying ROS2 topics.
  • launch/:
    • Launch files for AMCL, SLAM, and navigation.
  • maps/:
    • Includes pre-generated maps for testing navigation.
  • urdf/:
    • Unified Robot Description Format files for the A24 robot.

Challenges and Learnings

  • Tool Chaining:
    LangChain demonstrated better tool chaining capabilities than LiveKit, allowing multi-step task execution.

  • Sim2Real:
    Testing in Gazebo provided insights into real-world deployment challenges, especially in localization and navigation.

Future Work

  • Integrate the system with a physical A24 robot for real-world testing.
  • Expand the toolset for LangChain and LiveKit to handle more complex tasks.
  • Improve LiveKit’s ability to chain tools for multi-step queries.

Conclusion

This project showcases the power of LLMs in human-robot interaction. By combining LangChain’s tool integration with LiveKit’s real-time communication capabilities, we demonstrate a scalable, low-computation solution for controlling autonomous robots using natural language.