System Implementation

Our system mainly consists of two devices, Hololens 2 and Spot robot, and we will also introduce our system implementation by these two parts.

Hololens

Communication

On the Hololens side, we use the ROS TCP Connector package from Unity for the communication between Unity and ROS. For Follow Mode and Arm Mode which require sending messages continuously, we check if the time elapsed is longer than the time frequency we set, and only publish messages when the condition is true. We use the same message type and different topic names for different modes. Although the message types are the same, the concrete information is different for different modes. When the application is launched, we will register all the topics.

Mode Switching

Since we have different modes for the user to control the robot, and GUI is not available for our target user, there are too many voice commands if we naively assign one command for each mode. It is necessary to reuse the voice command for different modes. Besides, some voice commands are tailored for a specific mode. For example, grasp, rotate hand, and stop rotate hand are only callable only when the current mode is arm mode. And select item, delete selection commands are only available when the current mode is the select mode. Therefore, in order to achieve these requirements and encapsulate varying behavior for the same object, we use state pattern as shown in the class diagram Figure 1. We design an interface called OperationMode, this object is held by RosPublisherScript as an attribute. Once the ChangeMode function is called, a specific mode (one of follow mode, arm mode, or select mode) is assigned to this attribute. And the Activate() and Terminate() functions will call the member functions self.mode.Activate(), self.mode.Terminate(), respectively, to achieve different behavior for different modes.

Another important issue is that the Spot robot receives different message formats for different modes. For example, the follow mode consecutively sends the target position to the robot, the select mode intermittently sends the target position, and the arm mode consecutively sends the head pose to the robot. Also, for position and rotation, the coordinate transformations between the Unity frame and the Azure Anchor frame are different. Therefore, we implement different SendPose() methods for different modes, and this function is called by the RosPublisherScript::Update() method.

For the mode-specified commands, we use a boolean flag to record whether the current mode is selected, and only operate the command when the current mode is selected. In order to create an object instance of different modes and change public attributes more conveniently, we create game objects for each mode and attach the mode classes as scripts.

Figure 1

Class Diagram for state pattern. The RosPublisherScript class holds an OperationMode interface, which is implemented by FollowMode, SelectMode, and ArmMode.

Eye-gaze Tracking

Since the application uses eye gaze to control the robot's motion, eye-gaze tracking is significant. The eye gazing ray is obtained through the eye gaze API of HoloLens2, and we directly set the eye cursor position to the intersection between the eye ray and the world mesh constructed by the Hololens2.

A challenge encountered in this step is that the collider of the cursor is enabled. Initially, we move the cursor to the intersection point between the eye gaze and the mesh in the EyeGazeCursor::Update() function for every frame. However, the eye gaze intersection interface provided by the MRTK takes all game objects into account. When the cursor is moved to the intersection point, the eye gaze intersects with the cursor, and this point is mistakenly used by the MRTK as a new hit point to move the cursor. As a result, the cursor will directly move to the camera. We solved this problem by turning off the box collider of the eye gaze cursor. As shown in Figure 1, the cursor is held by FollowMode and SelectMode as an attribute, which enables these two modes to send messages depending on the cursor's position.

Head Tracking

Head tracking is the most challenging part of the project since we need to handle many coordinate transformations. Since our goal is to let the robot arm mimic the behavior of the human head, a local coordinate of the robot hand in the robot frame needs to be specified. As shown in Figure 2, we implement a head motion monitor and a virtual robot to extract the head motion and compute the local coordinate. We denote the transformation of the object B under object A's local frame as

where P and R represent position and rotation respectively. Then we have Equation 1:

The initial hand position can be hard-coded as the offset of the real robot and the initial position of the virtual robot could be calculated as:

Figure 2

The transformation diagram to compute the robot arm's pose.

After initialization, once the user is moving, the head pose can be directly assigned as the global pose of the camera, and we can update the hand's relative pose using Equation 1. Instead of explicitly calculating the head position in the virtual robot's frame, we create a head tracker game object as a child of the virtual robot object, and we can directly get the transformation using headTracker.transform.localPosition.

An important issue for arms control is that the user cannot readily watch the target object when she/he is moving her/his head. To solve this problem, we store the local transformation of the virtual robot under the camera (head) frame and use this transformation to initialize the virtual robot's position when the arm control is activated again. Another issue is that the gripper angle may not be perfect for grasping items. Therefore, we enable the user to continuously rotate the gripper by tilting the head to adjust the angle.

Spatial Anchor

The Azure Spatial Anchor can be used to co-localize Hololens and the Spot robot, in order to use it we used the Microsoft Azure Spatial Anchors package. In the Azure official tutorial, the code for creating anchors is given. When creating anchors, we first check if there are any existing anchors in the desired location, if there are not, we create a new anchor. Every time before sending positions, we transform the position into the anchor's local space. We could do this directly by calling anchor.transform.InverseTransformPoint(). Because of the different coordinate systems used in Unity and Anchor, we need to manually change the position we send from (x, y, z) to (z, -x, y).

Spot Robot and ROS

Spatial Anchor Localization

As mentioned above, To co-localize Hololens and Spot robot, Spatial Anchor from Microsoft is used. The Spot robot needs to recognize the coordinate frame of the Spatial Anchor, which is achieved by the Spatial Anchor ROS package from Microsoft. Basically, we use the visual information collected by the camera of the Spot robot, and get the Spatial Anchor ID passed by the Hololens via ROS topic, then query the Anchor ID by Microsoft Azure. The coordinate frame of the certain Spatial Anchor is then added to the frame transformation tree of ROS.

Frame Transformation

Since we have several coordinate frames (Unity, Spatial Anchor, and ROS), all of them are represented in different coordinate systems, i.e. Unity uses left-hand y-up system, Spatial Anchor uses right-hand y-up system, and ROS uses right-hand z-up system. To handle these different coordinate systems, we adjust the coordinate manually, by the ROS package spot-mr-core to transform all these three coordinate systems to the right-hand z-up systems, before using the tf package from ROS to do the frame transformation. This way, the destination coordinates sent by Hololens can be used directly in the ROS coordinate system.

Spot Robot Movement

For the robot movement, after we get the destination coordinate y (△x, △y) in the robot's body frame by the frame transformation, the robot will rotate θ angle along the z-axis to turn to the target direction and go to the target position simultaneously.

The ROS topic /spot/go_to_pose would be used to publish the desired pose in the robot's body frame: desired position (△x, △y) and desired orientation Quaternion(0, 0, sin(θ/2), cos(θ/2)).

Spot Robot Driver

We use the Spot Robot ROS Driver to wrap the original Spot Robot Driver into ROS. We use the ROS topic /spot/go/_to/_pose to control the movement of the robot, the ROS service /spot/gripper_pos to control the pose of the robot arm, and the ROS service /spot/gripper_angle_open to open/close the gripper.

However, the original ROS service spot/gripper_pos cannot adjust the operational time, it is fixed to 5 seconds to operate all the commands, which is too slow for our task, i.e. our update rate is 0.5 seconds. To overcome this problem, we adjust the driver and pass one more parameter to describe how long to operate the command. This way, the robot arm can follow the user's command fluently.