  • 【promptulate】一个强大的大语言模型自动化与应用开发框架,支持连续对话、角色预设、对话存储、工具扩展等功能,可以无需代理直接访问,开箱即用。 通过 promptulate,你可以轻松构建起属于自己的GPT应用程序
  • 【cushy-storage】一个基于磁盘缓存的ORM框架,可以轻松地进行数据和对象存储
  • 【cushy-serial】A lightweight python serial library. You can create a serial program easily.
  • 【cushy-socket】A lightweight python socket library. You can create a TCP/UDP connection easily.
  • 【broadcast-service】一个强大的Python发布订阅者框架,支持同步异步、定时任务、主题管理等功能
  • 一个markdown图片链接转换器,你可以轻松地将web链接的图片地址转换成本地地址或指定图床的地址
  event: 一个面向大学生的赛事组队交流分享平台





本文尝试构建一种思路,以prompt technique来实现Robo特定的功能,如通过用户输入来形成特定的信息,构建特定指令等。下面,本文将从技术背景、设计思路上描述如何构建一个可以用ChatGPT控制的机器人。



  1. 以复合机器人在室内或室外的导航和抓取任务为测试场景,根据给出的指令(例如:你去前面那辆比亚迪看一下,是否有落下什么重要的东西。),结合LLM(如chatgpt),输出机器人的感知、规划与控制等算法。
  2. 完成该过程在仿真环境中的闭环验证。
  3. 可以部署在实地的应用场景中,通过ROS或者RoboSDK去开发基于LLM的机器人框架。








一个简单的思路实现可以参考Microsoft的PromptCraft-Robotics,PromptCraft-Robotics资料库是为人们提供一个社区,以在机器人领域测试和共享大型语言模型(LLMs)有趣的提示示例。此外,PromptCraft-Robotics还提供了一个示例机器人模拟器(基于Microsoft AirSim),与ChatGPT集成,让用户可以开始使用。



首先,我们定义一组高级机器人 API 或函数库。这个库可以针对特定的机器人,并且应该映射到来自机器人控制堆栈或感知库的现有低级实现。对于高级 API,使用描述性名称非常重要,这样 ChatGPT 就可以推断它们的行为;

接下来,我们为 ChatGPT 编写文本提示,描述任务目标,同时明确指出高级库中可用的函数。提示还可以包含有关任务限制的信息,或者 ChatGPT 应如何形成其答案(使用特定的编程语言,使用辅助解析元素);

用户会一直留在循环中,以评估 ChatGPT 的代码输出,可以直接检查或使用模拟器。如果需要,用户将使用自然语言向 ChatGPT 提供反馈,以评估答案的质量和安全性。当用户对解决方案满意时,最终代码可以部署到机器人上。


Imagine I am a robot equipped with a camera and a depth sensor. I am trying to perform a task, and you should help me by sending me commands. You are only allowed to give me the following commands:

  • turn(angle): turn the robot by a given number of degrees
  • move(distance): moves the robot straight forward by a given distance in meters.

On each step, I will provide you with the objects in the scene as a list of <object name, distance, angle in degrees>. You should reply with only one command at a time.
The distance is in meters, and the direction angle in degrees with respect to the robot’s orientation. Negative angles are to the left and positive angles are to the right. If a command is not valid, I will ignore it and ask you for another command. If there is no relevant information in the scene, use the available commands to explore the environment.

Task: go to the chairs
Objects in the scene: <door, 0.53 m, 22>, <chair, 4.84 m, -21>, <chair, 2.12 m, -24>, <window, 6.74 m, -3>, <window, 7.17 m, 6>, <door, 0.61 m, 18>, <stairs, 1.42 m, 0>

move(1.5) # Move forward by 1.5 meters to explore the environment. 

Objects in the scene: <seating, 5.05 m, 21>, <chair, 3.93 m, -25>, <window, 4.55 m, 22>, <window, 5.24 m, -4>, <stairs, 3.49 m, 13>, <window, 5.67 m, 7>, <window, 4.37 m, 23>

turn(-25) # Turn to the left to face the chair that is 3.93 meters away. 

Objects in the scene: <chair, 4.85 m, -14>, <chair, 4.76 m, -9>, <chair, 1.57 m, -23>, <chair, 2.66 m, -24>, <window, 4.96 m, 10>, <window, 4.19 m, 25>

turn(-10) # Turn to the right to explore the environment further. 

Objects in the scene: <door, 5.07 m, -25>, <chair, 5.36 m, -6>, <table, 2.17 m, -24>, <chair, 5.01 m, -1>, <chair, 1.80 m, -16>, <chair, 3.15 m, -16>, <window, 4.30 m, 19>

turn(-6) # Turn to the right to face the chair that is 5.36 meters away. 

Objects in the scene: <counter, 1.24 m, -24>, <door, 5.93 m, -17>, <chair, 5.48 m, 1>, <table, 2.60 m, -14>, <chair, 4.92 m, 6>, <chair, 2.02 m, -7>, <chair, 3.53 m, -7>, <chair, 3.16 m, -18>

turn(1) # Turn to the left to explore the environment further. 

Objects in the scene: <door, 5.07 m, -25>, <chair, 5.36 m, -6>, <table, 2.17 m, -24>, <chair, 5.01 m, -1>, <chair, 1.80 m, -16>, <chair, 3.15 m, -16>, <window, 4.30 m, 19>

move(5.5) # Move forward towards the chair that is 5.36 meters away. 

Objects in the scene: <chair, 0.96 m, -10>, <chair, 0.99 m, 20>, <chair, 1.77 m, -20>

move(1) # Move forward to approach the chairs. 

Full Navigation



  • 如果使用GPT,返回的结果的时效性无法保证,如何解决GPT指令返回时间的不确定性问题
  • 如果指令更新时间过久,两个指令之间的中间过程怎么处理
  • 传感器部分失效之后的判别机制怎么处理

总的来说,这里只是一个简单地demo演示,如果想要看更多相关的prompt demo,可以查看https://github/microsoft/PromptCraft-Robotics



在项目初期,我们可以采用LangChain的能力,用agent + tool的思路来构建一个具有任务规划、任务分析、指令生成、任务执行一体的RoboAgent。下面,为了更好地介绍RoboAgent如何可以拥有处理复杂Robo任务的能力,我将会介绍一下LangChain、agent、tool、ReAct等相关的Prompt technique概念。



如果你想构建一些复杂的LLM应用,我强烈建议你使用LangChain(虽然有一点私心,本人也是LangChain的开发者之一),LangChain是一个强大的框架,旨在帮助开发人员使用语言模型构建端到端的应用程序。它提供了一套工具、组件和接口,可简化创建由大型语言模型 (LLM) 和聊天模型提供支持的应用程序的过程。LangChain 可以轻松管理与语言模型的交互,将多个组件链接在一起,并集成额外的资源,例如 API 和数据库。


paper: https://arxiv/pdf/2210.03629.pdf


  • Tought: 面对这个 Question 我下一步应该做什么。
  • Action:执行某个动作。在 ReAct 里有三种动作,第一个是 Search[entity] 如果存在对应实体的维基页面,则返回前5句话,否则使用维基百科搜索引擎搜索前5个类似的实体;第二个是 Look[sring] 它将返回包含该字符串的页面中的下一句话,模拟浏览器上的 Ctrl+F 功能。第三个是 Finish[answer] 它将使用答案完成当前任务。
  • Observation:观察到的外部工具给到的结果,将作为新的提示输入给 ChatGPT。

下面是一些ReAct的Prompt demo:

Question: What is the elevation range for the area that the eastern sector of the
Colorado orogeny extends into?
Thought 1: I need to search Colorado orogeny, find the area that the eastern sector
of the Colorado orogeny extends into, then find the elevation range of the
Action 1: Search[Colorado orogeny]
Observation 1: The Colorado orogeny was an episode of mountain building (an orogeny) in
Colorado and surrounding areas.
Thought 2: It does not mention the eastern sector. So I need to look up eastern
Action 2: Lookup[eastern sector]
Observation 2: (Result 1 / 1) The eastern sector extends into the High Plains and is called
the Central Plains orogeny.
Thought 3: The eastern sector of Colorado orogeny extends into the High Plains. So I
need to search High Plains and find its elevation range.
Action 3: Search[High Plains]
Observation 3: High Plains refers to one of two distinct land regions
Thought 4: I need to instead search High Plains (United States).
Action 4: Search[High Plains (United States)]
Observation 4: The High Plains are a subregion of the Great Plains. From east to west, the
High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130
Thought 5: High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer
is 1,800 to 7,000 ft.
Action 5: Finish[1,800 to 7,000 ft]

Question: Musician and satirist Allie Goertz wrote a song about the "The Simpsons"
character Milhouse, who Matt Groening named after who?
Thought 1: The question simplifies to "The Simpsons" character Milhouse is named after
who. I only need to search Milhouse and find who it is named after.
Action 1: Search[Milhouse]
Observation 1: Milhouse Mussolini Van Houten is a recurring character in the Fox animated
television series The Simpsons voiced by Pamela Hayden and created by Matt
Thought 2: The paragraph does not tell who Milhouse is named after, maybe I can look up
"named after".
Action 2: Lookup[named after]
Observation 2: (Result 1 / 1) Milhouse was named after U.S. president Richard Nixon, whose
middle name was Milhous.
Thought 3: Milhouse was named after U.S. president Richard Nixon, so the answer is
Richard Nixon.
Action 3: Finish[Richard Nixon]

Question: Which documentary is about Finnish rock groups, Adam Clayton Powell or The
Saimaa Gesture?
Thought 1: I need to search Adam Clayton Powell and The Saimaa Gesture, and find which
documentary is about Finnish rock groups.
Action 1: Search[Adam Clayton Powell]
Observation 1 Could not find [Adam Clayton Powell]. Similar: [’Adam Clayton Powell
III’, ’Seventh Avenue (Manhattan)’, ’Adam Clayton Powell Jr. State Office
Building’, ’Isabel Washington Powell’, ’Adam Powell’, ’Adam Clayton Powell
(film)’, ’Giancarlo Esposito’].
Thought 2: To find the documentary, I can search Adam Clayton Powell (film).
Action 2: Search[Adam Clayton Powell (film)]
Observation 2: Adam Clayton Powell is a 1989 American documentary film directed by
Richard Kilberg. The film is about the rise and fall of influential
African-American politician Adam Clayton Powell Jr.[3][4] It was later aired
as part of the PBS series The American Experience.
Thought 3: Adam Clayton Powell (film) is a documentary about an African-American
politician, not Finnish rock groups. So the documentary about Finnish rock
groups must instead be The Saimaa Gesture.
Action 3: Finish[The Saimaa Gesture]

Question: What profession does Nicholas Ray and Elia Kazan have in common?
Thought 1: I need to search Nicholas Ray and Elia Kazan, find their professions, then
find the profession they have in common.
Action 1: Search[Nicholas Ray]
Observation 1: Nicholas Ray (born Raymond Nicholas Kienzle Jr., August 7, 1911 - June 16,
1979) was an American film director, screenwriter, and actor best known for
the 1955 film Rebel Without a Cause.
Thought 2: Professions of Nicholas Ray are director, screenwriter, and actor. I need
to search Elia Kazan next and find his professions.
Action 2: Search[Elia Kazan]
Observation 2: Elia Kazan was an American film and theatre director, producer, screenwriter
and actor.
Thought 3: Professions of Elia Kazan are director, producer, screenwriter, and actor.
So profession Nicholas Ray and Elia Kazan have in common is director,
screenwriter, and actor.
Action 3: Finish[director, screenwriter, actor]

其中,前面部分属于少样本学习中的示例,目的是教会 ChatGPT 如何使用工具。最后一行的 Question 就是我们要问 ChatGPT 的问题。对于这个问题,ChatGPT 将使用外部工具解答。具体的结果如下:

Question: Author David Chanoff has collaborated with a U.S. Navy admiral who served as the ambassador to the United Kingdom under which President?Thought 1: I need to search David Chanoff and the U.S. Navy admiral, find the ambassador to the United Kingdom, then find the President they served under.
Action 1: Search[David Chanoff]
Observation 1: David Chanoff is an American author and journalist. He is the co-author of
several books, including Inside the Mirage: America's Fragile Partnership
with Saudi Arabia (2004) and The Game Player: Confessions of the CIA's
Original Political Operative (1989).
Thought 2: I need to search the U.S. Navy admiral David Chanoff collaborated with.
Action 2: Search[U.S. Navy admiral]
Observation 2: Admiral Charles R. Larson was a United States Navy four-star admiral who
served as the United States Ambassador to the United Kingdom from 1998 to
Thought 3: Admiral Charles R. Larson served as the United States Ambassador to the
United Kingdom from 1998 to 2001. So the President they served under is
Bill Clinton.
Action 3: Finish[Bill Clinton]

在LLM的Prompt Engineering中,Agent是更高级的执行器,负责复杂任务的调度和分发,在用户向Agent输入了其要求之后,Agent内部会通过Action Plan Generation拆解用户的要求并形成一系列的计划,进一步地,我们让Agent内部自动执行每一个Plan,并通过ReAct Prompting technique来让Agent对自己Plan的执行计划的输出进行一个观察,对输出的结果得出自己的结论,并根据结论继续执行任务,直到Agent认为其得到了想要的结果。




  • RoboQueryTool Robo指令查询工具
  • RoboInfoTool Robo当前信息查询工具
  • RoboActionTool Robo行为指令工具,这里或许并不是RoboActionTool,而是某某一些具体的行为实现,如前进,后退等动作.











  • github: promptulate
  • github: PromptCraft
  • 谷歌联合发布SayCan模型,让机器人做出合理回答,还能“说到做到”
  • github: LangChain
  • ChatGPT for Robotics: Design Principles and Model Abilities

