IEEE Robotics and Automation Letters
Number of Followers: 9 Hybrid journal (It can contain Open Access articles) ISSN (Online) 2377-3766 Published by IEEE [228 journals] |
- DriveGPT4: Interpretable End-to-End Autonomous Driving Via Large Language
Model-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Zhenhua Xu;Yujia Zhang;Enze Xie;Zhen Zhao;Yong Guo;Kwan-Yee K. Wong;Zhenguo Li;Hengshuang Zhao;
Pages: 8186 - 8193
Abstract: Multimodallarge language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos. This study seeks to extend the application of MLLMs to the realm of autonomous driving by introducing DriveGPT4, a novel interpretable end-to-end autonomous driving system based on LLMs. Capable of processing multi-frame video inputs and textual queries, DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users. Furthermore, DriveGPT4 predicts low-level vehicle control signals in an end-to-end fashion. These advanced capabilities are achieved through the utilization of a bespoke visual instruction tuning dataset, specifically tailored for autonomous driving applications, in conjunction with a mix-finetuning training strategy. DriveGPT4 represents the pioneering effort to leverage LLMs for the development of an interpretable end-to-end autonomous driving solution. Evaluations conducted on the BDD-X dataset showcase the superior qualitative and quantitative performance of DriveGPT4. Additionally, the fine-tuning of domain-specific data enables DriveGPT4 to yield close or even improved results in terms of autonomous driving grounding when contrasted with GPT4-V.
PubDate: WED, 07 AUG 2024 09:16:59 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Leg-KILO: Robust Kinematic-Inertial-Lidar Odometry for Dynamic Legged
Robots-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Guangjun Ou;Dong Li;Hanmin Li;
Pages: 8194 - 8201
Abstract: This letter presents a robust multi-sensor fusion framework, Leg-KILO (Kinematic-Inertial-Lidar Odometry). When lidar-based SLAM is applied to legged robots, high-dynamic motion (e.g., trot gait) introduces frequent foot impacts, leading to IMU degradation and lidar motion distortion. Direct use of IMU measurements can cause significant drift, especially in the z-axis direction. To address these limitations, we tightly couple leg odometry, lidar odometry, and loop closure module based on graph optimization. For leg odometry, we propose a kinematic-inertial odometry using an on-manifold error-state Kalman filter, which incorporates the constraints from our proposed contact height detection to reduce height fluctuations. For lidar odometry, we present an adaptive scan slicing and splicing method to alleviate the effects of high-dynamic motion. We further propose a robot-centric incremental mapping system that enhances map maintenance efficiency. Extensive experiments are conducted in both indoor and outdoor environments, showing that Leg-KILO has lower drift performance compared to other state-of-the-art lidar-based methods, especially during high-dynamic motion. To benefit the legged robot community, a lidar-inertial dataset containing leg kinematic data and the code are released.
PubDate: THU, 08 AUG 2024 09:17:48 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Benchmarking and Simulating Bimanual Robot Shoe Lacing
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Haining Luo;Yiannis Demiris;
Pages: 8202 - 8209
Abstract: Manipulation of deformable objects is a challenging domain in robotics. Although it has been gaining attention in recent years, long-horizon deformable object manipulation remains largely unexplored. In this letter, we propose a benchmark for the bi-manual Shoe Lacing (SL) task for evaluating and comparing long-horizon deformable object manipulation algorithms. SL is a difficult sensorimotor task in everyday life as well as the shoe manufacturing sector. Due to the complexity of the shoe structure, SL naturally requires sophisticated long-term planning. We provide a rigorous definition of the task and protocols to ensure the repeatability of SL experiments. We present 6 benchmark metrics for quantitatively measuring the ecological validity of approaches towards bi-manual SL. We further provide an open-source simulation environment for training and testing SL algorithms, as well as details of the construction and usage of the environment. We evaluate a baseline solution according to the proposed metrics in both reality and simulation.
PubDate: WED, 07 AUG 2024 09:16:59 -04
Issue No: Vol. 9, No. 10 (2024)
-
- AerialVL: A Dataset, Baseline and Algorithm Framework for Aerial-Based
Visual Localization With Reference Map-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Mengfan He;Chao Chen;Jiacheng Liu;Chunyu Li;Xu Lyu;Guoquan Huang;Ziyang Meng;
Pages: 8210 - 8217
Abstract: Visual localization plays an essential role in the autonomous flight of Unmanned Aerial Vehicles (UAVs) especially for the Global Navigation Satellite System (GNSS) denied environments. Existing aerial-based visual localization methods mainly focus on eliminating image variance between database map and captured frames. However, these is a lack of public dataset and baseline for method comparisons, which impedes the development of aerial-based visual localization. To address this issue, we construct AerialVL, a large-scale dataset, which is collected using UAV flying at different altitudes, along various routes, and during diverse time periods. AerialVL consists of 11 image sequences covering approximately 70 km of trajectory and includes a reference satellite image database corresponding to the flight area. Leveraging AerialVL, we perform thorough evaluations on various mainstream solutions designed for aerial-based visual localization for the first time. This evaluation encompasses visual place recognition, visual alignment localization and visual odometry, serving as comparison baselines. Furthermore, we present a general aerial-based visual localization framework, which unifies various methods and integrates them into a modular architecture. We note that across all flight trajectories, the proposed framework achieves higher localization accuracy and robustness against the existing methods.
PubDate: FRI, 09 AUG 2024 09:17:26 -04
Issue No: Vol. 9, No. 10 (2024)
-
- DrPlanner: Diagnosis and Repair of Motion Planners for Automated Vehicles
Using Large Language Models-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Yuanfei Lin;Chenran Li;Mingyu Ding;Masayoshi Tomizuka;Wei Zhan;Matthias Althoff;
Pages: 8218 - 8225
Abstract: Motion planners are essential for the safe operation of automated vehicles across various scenarios. However, no motion planning algorithm has achieved perfection in the literature, and improving its performance is often time-consuming and labor-intensive. To tackle the aforementioned issues, we present ${\mathtt {DrPlanner}}$, the first framework designed to automatically diagnose and repair motion planners using large language models. Initially, we generate a structured description of the planner and its planned trajectories from both natural and programming languages. Leveraging the profound capabilities of large language models, our framework returns repaired planners with detailed diagnostic descriptions. Furthermore, our framework advances iteratively with continuous feedback from the evaluation of the repaired outcomes. Our approach is validated using both search- and sampling-based motion planners for automated vehicles; experimental results highlight the need for demonstrations in the prompt and show the ability of our framework to effectively identify and rectify elusive issues.
PubDate: FRI, 09 AUG 2024 09:17:26 -04
Issue No: Vol. 9, No. 10 (2024)
-
- LOG-LIO2: A LiDAR-Inertial Odometry With Efficient Uncertainty Analysis
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Kai Huang;Junqiao Zhao;Jiaye Lin;Zhongyang Zhu;Shuangfu Song;Chen Ye;Tiantian Feng;
Pages: 8226 - 8233
Abstract: Uncertainty in LiDAR measurements, stemming from factors such as range sensing, is crucial for LIO (LiDAR-Inertial Odometry) systems as it affects the accurate weighting in the loss function. While recent LIO systems address uncertainty related to range sensing, the impact of incident angle on uncertainty is often overlooked by the community. Moreover, the existing uncertainty propagation methods suffer from computational inefficiency. This letter proposes a comprehensive point uncertainty model that accounts for both the uncertainties from LiDAR measurements and surface characteristics, along with an efficient local uncertainty analytical method for LiDAR-based state estimation problem. We employ a projection operator that separates the uncertainty into the ray direction and its orthogonal plane. Then, we derive incremental Jacobian matrices of eigenvalues and eigenvectors w.r.t. points, which enables a fast approximation of uncertainty propagation. This approach eliminates the requirement for redundant traversal of points, significantly reducing the time complexity of uncertainty propagation from $\mathcal {O} (n)$ to $\mathcal {O} (1)$ when a new point is added. Simulations and experiments on public datasets are conducted to validate the accuracy and efficiency of our formulations.
PubDate: THU, 08 AUG 2024 09:17:48 -04
Issue No: Vol. 9, No. 10 (2024)
-
- GOReloc: Graph-Based Object-Level Relocalization for Visual SLAM
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Yutong Wang;Chaoyang Jiang;Xieyuanli Chen;
Pages: 8234 - 8241
Abstract: This letter introduces a novel method for object-level relocalization of robotic systems. It determines the pose of a camera sensor by robustly associating the object detections in the current frame with 3D objects in a lightweight object-level map. Object graphs, considering semantic uncertainties, are constructed for both the incoming camera frame and the pre-built map. Objects are represented as graph nodes, and each node employs unique semantic descriptors based on our devised graph kernels. We extract a subgraph from the target map graph by identifying potential object associations for each object detection, then refine these associations and pose estimations using a RANSAC-inspired strategy. Experiments on various datasets demonstrate that our method achieves more accurate data association and significantly increases relocalization success rates compared to baseline methods.
PubDate: TUE, 13 AUG 2024 09:16:48 -04
Issue No: Vol. 9, No. 10 (2024)
-
- SALSA: Swift Adaptive Lightweight Self-Attention for Enhanced LiDAR Place
Recognition-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Raktim Gautam Goswami;Naman Patel;Prashanth Krishnamurthy;Farshad Khorrami;
Pages: 8242 - 8249
Abstract: Large-scale LiDAR mappings and localization leverage place recognition techniques to mitigate odometry drifts, ensuring accurate mapping. These techniques utilize scene representations from LiDAR point clouds to identify previously visited sites within a database. Local descriptors, assigned to each point within a point cloud, are aggregated to form a scene representation for the point cloud. These descriptors are also used to re-rank the retrieved point clouds based on geometric fitness scores. We propose SALSA, a novel, lightweight, and efficient framework for LiDAR place recognition. It consists of a Sphereformer backbone that uses radial window attention to enable information aggregation for sparse distant points, an adaptive self-attention layer to pool local descriptors into tokens, and a multi-layer-perceptron Mixer layer for aggregating the tokens to generate a scene descriptor. The proposed framework outperforms existing methods on various LiDAR place recognition datasets in terms of both retrieval and metric localization while operating in real-time.
PubDate: WED, 07 AUG 2024 09:16:59 -04
Issue No: Vol. 9, No. 10 (2024)
-
- SoftSling: A Soft Robotic Arm Control Strategy to Throw Objects With
Circular Run-Ups-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Diego Bianchi;Giulia Campinoti;Costanza Comitini;Cecilia Laschi;Alessandro Rizzo;Angelo Maria Sabatini;Egidio Falotico;
Pages: 8250 - 8257
Abstract: In this letter, we present SoftSling, a soft robot control strategy designed for accurately throwing objects following circular run-ups. SoftSling draws inspiration from ancient slingers, who rotated a sling loaded with a projectile at high speeds to fight and hunt, releasing the object by letting go of the sling's end. Our study aims to replicate this behavior by exploiting the embodied intelligence of soft robots under periodic actuation input, that enables them to generate self-stabilizing motions. The periodic input parameters for moving along a circular-like path are generated by a neural network based on the weight of the object and the target position. Subsequently, a separate neural network model predicts the release time by considering the gripper opening delay and the object positions during motion. We tested this strategy on a modular soft robot, I-Support, by throwing three objects of varying weights into 140-mm square target boxes. We achieved a success rate ranging from 75% to 88% for different objects, with the heaviest object yielding the highest success rate. Our research contributes to integrating soft robots into everyday life, enabling them to perform complex and dynamic tasks.
PubDate: TUE, 13 AUG 2024 09:16:48 -04
Issue No: Vol. 9, No. 10 (2024)
-
- LVDiffusor: Distilling Functional Rearrangement Priors From Large Models
Into Diffusor-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Yiming Zeng;Mingdong Wu;Long Yang;Jiyao Zhang;Hao Ding;Hui Cheng;Hao Dong;
Pages: 8258 - 8265
Abstract: Object rearrangement, a fundamental challenge in robotics, demands versatile strategies to handle diverse objects, configurations, and functional needs. To achieve this, the AI robot needs to learn functional rearrangement priors to specify precise goals that meet the functional requirements. Previous methods typically learn such priors from either laborious human annotations or manually designed heuristics, which limits scalability and generalization. In this letter, we propose a novel approach that leverages large models to distill functional rearrangement priors. Specifically, our approach collects diverse arrangement examples using both LLMs and VLMs and then distills the examples into a diffusion model. During test time, the learned diffusion model is conditioned on the initial configuration and guides the positioning of objects to meet functional requirements. In this way, we balance zero-shot generalization with time efficiency. Extensive experiments in multiple domains, including real-world scenarios, demonstrate the effectiveness of our approach in generating compatible goals for object rearrangement tasks, significantly outperforming baseline methods.
PubDate: FRI, 02 AUG 2024 09:17:24 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Learning Prehensile Dexterity by Imitating and Emulating State-Only
Observations-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Yunhai Han;Zhenyang Chen;Kyle A Williams;Harish Ravichandar;
Pages: 8266 - 8273
Abstract: When human acquire physical skills (e.g., tool use) from experts, we tend to first learn from merely observing the expert. But this is often insufficient. We then engage in practice, where we try to emulate the expert and ensure that our actions produce similar effects on our environment. Inspired by this observation, we introduce Combining IMitation and Emulation for Motion Refinement (CIMER) – a two-stage framework to learn dexterous prehensile manipulation skills from state-only observations. CIMER's first stage involves imitation: simultaneously encode the complex interdependent motions of the robot hand and the object in a structured dynamical system. This results in a reactive motion generation policy that provides a reasonable motion prior, but lacks the ability to reason about contact effects due to the lack of action labels. The second stage involves emulation: learn a motion refinement policy via reinforcement that adjusts the robot hand's motion prior such that the learned object motion is reenacted. CIMER is both task-agnostic (no task-specific reward design or shaping) and intervention-free (no additional teleoperated or labeled demonstrations). Detailed experiments with prehensile dexterity reveal that i) imitation alone is insufficient, but adding emulation drastically improves performance, ii) CIMER outperforms existing methods in terms of sample efficiency and the ability to generate realistic and stable motions, iii) CIMER can either zero-shot generalize or learn to adapt to novel objects from the YCB dataset, even outperforming expert policies trained with action labels in most cases.
PubDate: FRI, 16 AUG 2024 09:16:40 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Load-Carrying Assistance of Articulated Legged Robots Based on Hydrostatic
Support-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Wu Fan;Zhe Dai;Wenyu Li;Tao Liu;
Pages: 8274 - 8281
Abstract: This letter proposes a novel mechanical structure for traditional articulated-legged robots that uses a linkage-based approach and hydrostatic transmission to reduce the joint load caused by gravity. The wide application of legged robots is limited by their weight-bearing capacity and low energy efficiency. To address these issues, we built HyELeg2, a motor-actuated bipedal robot with the assistance of a hydraulic auxiliary mechanism. HyELeg2 employs one passive hydraulic cylinder per leg to counterbalance the joint torque induced by the body mass at both knee and hip joints. During the stance phase of walking, the cylinder can provide upward support force passively to reduce the energy cost of motors. To control the flow efficiently, a rotary-cage valve (RCV) was developed with low energy consumption and low flow resistance. Contrast experiments were conducted to investigate the energy-saving effects of HyELeg2 under different speed, gait frequency and load conditions. The results indicate that hydraulic assistance saves energy by more than 65% to perform the same walking tasks, and greatly reduces the cost of transport (CoT). This study has high practical value in the legged robot field.
PubDate: WED, 07 AUG 2024 09:16:59 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Multi-Step Continuous Decision Making and Planning in Uncertain Dynamic
Scenarios Through Parallel Spatio-Temporal Trajectory Searching-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Delun Li;Siyuan Cheng;Shaoyu Yang;Wenchao Huang;Wenjie Song;
Pages: 8282 - 8289
Abstract: Autonomous driving in urban scenarios faces uncertain dynamic changes, especially in China, where a dense mixture of cars, cyclists and pedestrians travel together on roads with random uncertain behaviors and high-risk road crossing. This letter proposes a Multi-step Continuous Decision Making and Spatio-temporal Trajectory Planning framework to achieve stable continuous decision making and high-quality trajectory planning in such uncertain and highly dynamic environments. Firstly, a 3D spatio-temporal probabilistic map is constructed to represent the uncertain future driving environment. Based on the map, parallel spatio-temporal trajectory search is performed to obtain multi-strategy feasible spatio-temporal trajectories that satisfy the short-term deterministic and long-term uncertain environmental constraints. Then considering the continuity and consistency of decision making, risk-aware rolling-fusion of trajectory sequences is proposed, achieving efficient and exploratory far-end planning with a stable and safe near-end driving trajectory. To validate the proposed framework, we collected the Hard Case data from real Chinese urban roads, containing challenging scenarios such as dense traffic flows, mixed vehicle-pedestrian roads, and complex intersections, which are widely recognized barriers to the successful real-world deployment of autonomous driving. Moreover, the SMARTS simulator is used to build closed-loop simulation scenarios to verify the effectiveness of the framework. Experimental results show the superior performance of our proposed framework in complex uncertain dynamic scenarios.
PubDate: WED, 14 AUG 2024 09:16:41 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Mixing Left and Right-Hand Driving Data in a Hierarchical Framework With
LLM Generation-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Jiazhe Guo;Cheng Chang;Zhiheng Li;Li Li;
Pages: 8290 - 8297
Abstract: Data-driven trajectory prediction is critical in autonomous vehicles, which requires high-quality data. However, discussions about the compatibility of data collected from different countries remain limited, with a typical issue being the different driving rules in various countries. Therefore, we propose a hierarchical framework for mixing left and right-hand driving data to support trajectory prediction. Integrated with a proposed LLM-based sample generation method, the framework utilizes mirroring, MMD and sample generation incrementally to reduce the domain gap between datasets. By testing the mixed results on two typical trajectory datasets, we demonstrate that this method enhances the performance of models trained on left-hand driving data when applied to right-hand driving scenarios.
PubDate: WED, 14 AUG 2024 09:16:41 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Language-Grounded Dynamic Scene Graphs for Interactive Object Search With
Mobile Manipulation-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Daniel Honerkamp;Martin Büchner;Fabien Despinoy;Tim Welschehold;Abhinav Valada;
Pages: 8298 - 8305
Abstract: To fully leverage the capabilities of mobile manipulation robots, it is imperative that they are able to autonomously execute long-horizon tasks in large unexplored environments. While large language models (LLMs) have shown emergent reasoning skills on arbitrary tasks, existing work primarily concentrates on explored environments, typically focusing on either navigation or manipulation tasks in isolation. In this work, we propose MoMa-LLM, a novel approach that grounds language models within structured representations derived from open-vocabulary scene graphs, dynamically updated as the environment is explored. We tightly interleave these representations with an object-centric action space. Given object detections, the resulting approach is zero-shot, open-vocabulary, and readily extendable to a spectrum of mobile manipulation and household robotic tasks. We demonstrate the effectiveness of MoMa-LLM in a novel semantic interactive search task in large realistic indoor environments. In extensive experiments in both simulation and the real world, we show substantially improved search efficiency compared to conventional baselines and state-of-the-art approaches, as well as its applicability to more abstract tasks.
PubDate: FRI, 09 AUG 2024 09:17:26 -04
Issue No: Vol. 9, No. 10 (2024)
-
- AMVP: Adaptive Multi-Volume Primitives for Auto-Driving Novel View
Synthesis-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Dexin Qi;Tao Tao;Zhihong Zhang;Xuesong Mei;
Pages: 8306 - 8313
Abstract: Synthesizing high-quality novel views is critical to extending training data for auto-driving scenes. However, existing novel view synthesis techniques rely on a single-volume radiance field with uniform spatial resolution, constraining their model capacity and resulting in artifacts in synthesized auto-driving views. This letter introduces AMVP, a novel neural representation that models auto-driving scenes using multiple local primitives with adaptive spatial resolution. AMVP addresses the lack of representation capability of detail-rich regions by adaptively subdividing the scene into multiple local volumes. Each local volume is assigned a tailored resolution based on its geometric complexity, as determined by a density prior. Subsequently, multi-volume primitives are introduced to enable sharing a global feature table among local volumes, addressing the GPU memory inefficiency caused by the duplicated allocation. In addition, the letter proposes resolution-aware confidence, a mechanism that suppresses artifacts arising from frequency ambiguity. This mechanism adaptively reduces high-frequency components based on the spatial resolution of each local volume and the distance of the sampling point from the optical center. Experimental results on benchmark auto-driving datasets demonstrate that the proposed AMVP achieves superior rendering quality while using a similar number of parameters compared to existing methods.
PubDate: THU, 15 AUG 2024 09:17:26 -04
Issue No: Vol. 9, No. 10 (2024)
-
- DiPGrasp: Parallel Local Searching for Efficient Differentiable Grasp
Planning-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Wenqiang Xu;Jieyi Zhang;Tutian Tang;Zhenjun Yu;Yutong Li;Cewu Lu;
Pages: 8314 - 8321
Abstract: Grasp planning is an important task for robotic manipulation. Though it is a richly studied area, a standalone, fast, and differentiable grasp planner that can work with robot grippers of different DOFs has not been reported. In this work, we present DiPGrasp, a grasp planner that satisfies all these goals. DiPGrasp takes a force-closure geometric surface matching grasp quality metric. It adopts a gradient-based optimization scheme on the metric, which also considers parallel sampling and collision handling. This not only drastically accelerates the grasp search process over the object surface but also makes it differentiable. We apply DiPGrasp to three applications, namely grasp dataset construction, mask-conditioned planning, and pose refinement. For dataset generation, as a standalone planner, DiPGrasp has clear advantages over speed and quality compared with several classic planners. For mask-conditioned planning, it can turn a 3D perception model into a 3D grasp detection model instantly. As a pose refiner, it can optimize the coarse grasp prediction from the neural network, as well as the neural network parameters. Finally, we conduct real-world experiments with the Barrett hand and Schunk SVH 5-finger hand.
PubDate: FRI, 16 AUG 2024 09:16:40 -04
Issue No: Vol. 9, No. 10 (2024)
-
- PRIME: Scaffolding Manipulation Tasks With Behavior Primitives for
Data-Efficient Imitation Learning-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Tian Gao;Soroush Nasiriany;Huihan Liu;Quantao Yang;Yuke Zhu;
Pages: 8322 - 8329
Abstract: Imitation learning has shown great potential for enabling robots to acquire complex manipulation behaviors. However, these algorithms suffer from high sample complexity in long-horizon tasks, where compounding errors accumulate over the task horizons. We present PRIME (PRimitive-based IMitation with data Efficiency), a behavior primitive-based framework designed for improving the data efficiency of imitation learning. PRIME scaffolds robot tasks by decomposing task demonstrations into primitive sequences, followed by learning a high-level control policy to sequence primitives through imitation learning. Our experiments demonstrate that PRIME achieves a significant performance improvement in multi-stage manipulation tasks, with 10–34% higher success rates in simulation over state-of-the-art baselines and 20–48% on physical hardware.
PubDate: THU, 15 AUG 2024 09:17:26 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Modifying Adaptive Cruise Control Systems for String Stable Stop-and -Go
Wave Control-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Fangyu Wu;Joy Carpio;Matthew Bunting;Matthew Nice;Daniel Work;Jonathan Sprinkle;Jonathan Lee;Sharon Hornstein;Alexandre Bayen;
Pages: 8330 - 8337
Abstract: This letter addresses the important issue of energy inefficiency and air pollution resulting from stop-and-go waves on highways by introducing a novel controller called the Attenuative Kerner's Model (AKM). The objective of AKM is to enhance an existing Adaptive Cruise Control (ACC) system to improve vehicle following in stop-and-go waves. It is designed as a hybrid controller that is compatible with a wide range of commercial vehicles equipped with ACC. The article demonstrates the local string stability of the controller. Next, it presents a comparative analysis of AKM against two benchmarks: a human driver and a commercial ACC system, through numerical simulations and physical experiments using a 2022 Cadillac XT5. The findings reveal that AKM substantially outperforms both the human driver and the ACC in controlling low-speed, stop-and-go waves. The results indicate that AKM could act as an additional control layer for existing ACC systems, potentially improving their operational efficiency and reducing pollution emissions, thus contributing to more sustainable highway transportation.
PubDate: THU, 08 AUG 2024 09:17:48 -04
Issue No: Vol. 9, No. 10 (2024)
-
- SEG-Net: Deep Learning Grasping With a Soft Enveloping Gripper
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Yufei Hao;Bin Peng;Hui Zhang;Chang Wang;Rui Wang;Jianhua Zhang;
Pages: 8338 - 8345
Abstract: The emergence of non-fingered soft bioinspired grippers poses a challenge for learning-based grasping control due to the lack of a model describing grasping robustness and a dataset for training. In this letter, we propose a comprehensive pipeline encompassing grasping evaluation, dataset generation, deep neural network construction and training, as well as experimental verification for a soft enveloping gripper to investigate its learning-based grasping methods. The core of our approach lies in the development of a grasping quality model based on force balance between objects and the deformed gripper, enabling us to evaluate the robustness of grasp. Using this model, we synthesize approximately 10K grasp scenes and 30K grasp poses for our dataset, each containing four pixel-wise heatmaps representing depth information, grasp depth, grasp axis, and quality assessment. Subsequently, we construct SEG-Net which takes in depth images as input and outputs the best grasping point along with corresponding grasp axis and depth. Following training and fine-tuning using our dataset, we validate the performance through simulations as well as experiments. Results demonstrate that our proposed method effectively enables automatic grasping using the soft enveloping gripper.
PubDate: THU, 15 AUG 2024 09:17:26 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Multi-Vehicle Trajectory Planning at V2I-Enabled Intersections Based on
Correlated Equilibrium-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Wenyuan Wang;Peng Yi;Yiguang Hong;
Pages: 8346 - 8353
Abstract: Generating trajectories that ensure both vehicle safety and improve traffic efficiency remains a challenging task at intersections. Many existing works utilize Nash equilibrium (NE) for the trajectory planning at intersections. However, NE-based planning can hardly guarantee that all vehicles are in the same equilibrium, leading to a risk of collision. In this letter, we propose a framework for trajectory planning based on Correlated Equilibrium (CE) when Vehicle to Infrastructure (V2I) communication is also enabled. The recommendation with CE allows all vehicles to reach a safe and consensual equilibrium and meanwhile keeps the rationality as NE-based methods that no vehicle has the incentive to deviate. The Intersection Manager (IM) first collects the trajectory library and the personal preference probabilities over the library from each vehicle in a low-resolution spatial-temporal grid map. Then, the IM optimizes the recommendation probability distribution for each vehicle's trajectory by minimizing overall collision probability under the CE constraint. Finally, each vehicle samples a trajectory of the low-resolution map to construct a safety corridor and derive a smooth trajectory with a local refinement optimization. We conduct comparative experiments at a crossroad intersection involving two and four vehicles, validating the effectiveness of our method in balancing vehicle safety and traffic efficiency.
PubDate: THU, 15 AUG 2024 09:17:26 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Diverse Controllable Diffusion Policy With Signal Temporal Logic
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Yue Meng;Chuchu Fan;
Pages: 8354 - 8361
Abstract: Generating realistic simulations is critical for autonomous system applications such as self-driving and human-robot interactions. However, driving simulators nowadays still have difficulty in generating controllable, diverse, and rule-compliant behaviors for road participants: Rule-based models cannot produce diverse behaviors and require careful tuning, whereas learning-based methods imitate the policy from data but are not designed to follow the rules explicitly. Besides, the real-world datasets are by nature “single-outcome”, making the learning method hard to generate diverse behaviors. In this letter, we leverage Signal Temporal Logic (STL) and Diffusion Models to learn controllable, diverse, and rule-aware policy. We first calibrate the STL on the real-world data, then generate diverse synthetic data using trajectory optimization, and finally learn the rectified diffusion policy on the augmented dataset. We test on the NuScenes dataset and our approach can achieve the most diverse rule-compliant trajectories compared to other baselines, with a runtime 1/17X to the second-best approach. In the closed-loop testing, our approach reaches the highest diversity, rule satisfaction rate, and the least collision rate. Our method can generate varied characteristics conditional on different STL parameters in testing. A case study on human-robot encounter scenarios shows our approach can generate diverse and closed-to-oracle trajectories.
PubDate: FRI, 16 AUG 2024 09:16:40 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Positioning in Congested Space by Combining Vision-Based and
Proximity-Based Control-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: John Thomas;François Chaumette;
Pages: 8362 - 8369
Abstract: In this letter, we consider positioning in congested space within the framework of sensor-based control using vision and proximity sensors. Vision acts as primary sensing modality for performing the positioning task, while proximity sensors complement it by ensuring that the robotic platform does not collide with objects in the workspace. Sensor information is combined in a shared manner using the QP formalism where ideas from safety-critical control are used to express inequality constraints. The proposed method is validated through various real experiments.
PubDate: THU, 15 AUG 2024 09:17:26 -04
Issue No: Vol. 9, No. 10 (2024)
-
- RTONet: Real-Time Occupancy Network for Semantic Scene Completion
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Quan Lai;Haifeng Zheng;Xinxin Feng;Mingkui Zheng;Huacong Chen;Wenqiang Chen;
Pages: 8370 - 8377
Abstract: The comprehension of 3D semantic scenes holds paramount significance in autonomous driving and robotics technology. Nevertheless, the simultaneous achievement of real-time processing and high precision in complex, expansive outdoor environments poses a formidable challenge. In response to this challenge, we propose a novel occupancy network named RTONet, which is built on a teacher-student model. To enhance the ability of the network to recognize various objects, the decoder incorporates dilated convolution layers with different receptive fields and utilizes a multi-path structure. Furthermore, we develop an automatic frame selection algorithm to augment the guidance capability of the teacher network. The proposed method outperforms the existing grid-based approaches in semantic completion (mIoU), and achieves the state-of-the-art performance in terms of real-time inference speed while exhibiting competitive performance in scene completion (IoU) on the SemanticKITTI benchmark.
PubDate: WED, 14 AUG 2024 09:16:41 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Balancing and Hopping With a Ring Screw Actuation on One Leg
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Federico Allione;Antonios E. Gkikakis;Christian Di Natali;Darwin Caldwell;
Pages: 8378 - 8385
Abstract: This letter presents the prototype of a human-sized, untethered athletic hopping and balancing underactuated monopedal robot designed to withstand crash landings. Experimental results in 2D of the robot balancing on an unstable contact point, hopping vertically, landing back, and balancing within a narrow point of 2.5 cm are presented and discussed. Furthermore, the first application of a new linear transmission mechanism called the ring screw, an alternative to the ball screw, is presented. The hopping ability of two robots, one equipped with a commercially available ball screw and one with the ring screw, is compared, with the latter achieving a hopping height of 34 cm, three times higher than the former, thanks to its capability to provide extra power derived from its higher speed limit.
PubDate: MON, 19 AUG 2024 09:17:38 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Energy-Optimal Asymmetrical Gait Selection for Quadrupedal Robots
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Yasser G. Alqaham;Jing Cheng;Zhenyu Gan;
Pages: 8386 - 8393
Abstract: Symmetrical gaits, such as trotting, are commonly employed in quadrupedal robots for their simplicity and stability. However, the potential of asymmetrical gaits, such as bounding and galloping–which are prevalent in their natural counterparts at high speeds or over long distances–is less clear in the design of locomotion controllers for legged machines. This study systematically examines five distinct asymmetrical quadrupedal gaits on a legged robot, aiming to uncover the fundamental differences in footfall sequences and the consequent energetics across a broad range of speeds. Utilizing a full-body model of a quadrupedal robot (Unitree A1), we developed a hybrid system for each gait, incorporating the desired footfall sequence and rigid impacts. To identify the most energy-optimal gait, we applied optimal control methods, framing it as a trajectory optimization problem with specific constraints and a work-based cost of transport as an objective function. Our results show that, in the context of asymmetrical gaits, when minimizing cost of transport across the entire stride, the front leg pair primarily propels the system forward, while the rear leg pair acts more like an inverted pendulum, contributing significantly less to the energetic output. Additionally, while bounding–characterized by two aerial phases per cycle–is the most energy-optimal gait at higher speeds, the energy expenditure of gaits at speeds below 1 m/s depend heavily on the robot's specific design.
PubDate: THU, 15 AUG 2024 09:17:26 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Task-Informed Grasping of Partially Observed Objects
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Cristiana de Farias;Brahim Tamadazte;Maxime Adjigble;Rustam Stolkin;Naresh Marturi;
Pages: 8394 - 8401
Abstract: In this letter, we address the problem of task-informed grasping in scenarios where only incomplete or partial object information is available. Existing methods, which either focus on task-aware grasping or grasping under partiality, typically require extensive data and long training durations. In contrast, we propose a one-shot task-informed methodology that enables the transfer of grasps computed for a stored object model in the database to another object of the same category that is partially perceived. Our method leverages the reconstructed shapes from Gaussian Process Implicit Surfaces (GPIS) and employs the Functional Maps (FM) framework to transfer task-specific grasping functions. By defining task functions on the objects' manifolds and incorporating an uncertainty metric from GPIS, our approach provides a robust solution for part-specific and task-oriented grasping. Validated through simulations and real-world experiments with a 7-axis collaborative robotic arm, our methodology demonstrates a success rate exceeding 90% in achieving task-informed grasps on a variety of objects.
PubDate: MON, 19 AUG 2024 09:17:38 -04
Issue No: Vol. 9, No. 10 (2024)
-
- OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in
Large-Scale Outdoor Environments-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Yinan Deng;Jiahui Wang;Jingyu Zhao;Xinyu Tian;Guangyan Chen;Yi Yang;Yufeng Yue;
Pages: 8402 - 8409
Abstract: Environment representations endowed with sophisticated semantics are pivotal for facilitating seamless interaction between robots and humans, enabling them to effectively carry out various tasks. Open-vocabulary representation, powered by Visual-Language models (VLMs), possesses inherent advantages, including zero-shot learning and open-set cognition. However, existing open-vocabulary maps are primarily designed for small-scale environments, such as desktops or rooms, and are typically geared towards limited-area tasks involving robotic indoor navigation or in-place manipulation. They face challenges in direct generalization to outdoor environments characterized by numerous objects and complex tasks, owing to limitations in both understanding level and map structure. In this work, we propose OpenGraph, a novel open-vocabulary hierarchical graph representation designed for large-scale outdoor environments. OpenGraph initially extracts instances and their captions from visual images, enhancing textual reasoning by encoding captions. Subsequently, it achieves 3D incremental object-centric mapping with feature embedding by projecting images onto LiDAR point clouds. Finally, the environment is segmented based on lane graph connectivity to construct a hierarchical representation. Validation results from SemanticKITTI and real-world scene demonstrate that OpenGraph achieves high segmentation and query accuracy.
PubDate: MON, 19 AUG 2024 09:17:38 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Visual-Tactile Perception Based Control Strategy for Complex Robot
Peg-in-Hole Process via Topological and Geometric Reasoning-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Gaozhao Wang;Xing Liu;Zhengxiong Liu;Panfeng Huang;Yang Yang;
Pages: 8410 - 8417
Abstract: Peg-hole-insertion processes of diverse shapes are typical contact-rich tasks, which need the accurate representation of object's shape, pose, and peg-hole contact states. The visual-tactile sensor can perceive the relative moving trend between the gripper and the grasped object, which could be applied in the perception of the peg-hole contact states. In order to complete peg-hole insertion tasks, this manuscript proposes a method of using the visual-tactile sensor to estimate the relative position of peg and hole. Furthermore, it introduces the theory of topological and geometric reasoning to characterize the insertion process, which could be used for various polygon shaped pegs and holes. In five different shapes of peg and hole experiments, errors of peg-hole relative position estimation using the method proposed in this manuscript are almost within 5 degrees, which can meet the needs of insertion tasks. What's more, insertion processes become more smooth by adopting the topological and geometric reasoning, indicating the effectiveness of the reasoning process.
PubDate: WED, 31 JUL 2024 09:18:01 -04
Issue No: Vol. 9, No. 10 (2024)
-
- MBRVO: A Blur Robust Visual Odometry Based on Motion Blurred Artifact
Prior-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Jialu Zhang;Jituo Li;Jiaqi Li;Yue Sun;Xinqi Liu;Zhi Zheng;Guodong Lu;
Pages: 8418 - 8425
Abstract: How to estimate camera pose from motion-blurred images remains a challenge for visual odometry. The blurring artifacts are inevitably caused by the exposure during camera motion. While current visual odometry regards them as noise, we argue that it is necessary to extract potential information from blur artifacts, as they contain prior knowledge of camera motion. Base on this, we propose a blur-robust visual odometry that improves the accuracy of camera pose estimation through exposure trajectory. Specifically, we first use the exposure trajectory to guide pixel matching between neighboring frames. The blur mask is then generated based on the magnitude of the exposure trajectory. The mask makes the pose module pay less attention to the feature information in the severely blurred regions. Experiments show that our proposed end-to-end visual odometry achieves competitive performance on most sequences of motion blurred datasets.
PubDate: WED, 14 AUG 2024 09:16:41 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Corrections to “a Sim-to-Real Deep Learning-Based Framework for
Autonomous Nano-Drone Racing”-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Lorenzo Lamberti;Elia Cereda;Gabriele Abbate;Lorenzo Bellone;Victor Javier Kartsch Morinigo;Michał Barciś;Agata Barciś;Alessandro Giusti;Francesco Conti;Daniele Palossi;
Pages: 8426 - 8426
Abstract: Presents corrections to the article “a Sim-to-Real Deep Learning-Based Framework for Autonomous Nano-Drone Racing”.
PubDate: MON, 26 AUG 2024 09:16:56 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Unsupervised Meta-Testing With Conditional Neural Processes for Hybrid
Meta-Reinforcement Learning-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Suzan Ece Ada;Emre Ugur;
Pages: 8427 - 8434
Abstract: We introduce Unsupervised Meta-Testing with Conditional Neural Processes (UMCNP), a novel hybrid few-shot meta-reinforcement learning (meta-RL) method that uniquely combines, yet distinctly separates, parameterized policy gradient-based (PPG) and task inference-based few-shot meta-RL. Tailored for settings where the reward signal is missing during meta-testing, our method increases sample efficiency without requiring additional samples in meta-training. UMCNP leverages the efficiency and scalability of Conditional Neural Processes (CNPs) to reduce the number of online interactions required in meta-testing. During meta-training, samples previously collected through PPG meta-RL are efficiently reused for learning task inference in an offline manner. UMCNP infers the latent representation of the transition dynamics model from a single test task rollout with unknown parameters. This approach allows us to generate rollouts for self-adaptation by interacting with the learned dynamics model. We demonstrate our method can adapt to an unseen test task using significantly fewer samples during meta-testing than the baselines in 2D-Point Agent and continuous control meta-RL benchmarks, namely, cartpole with unknown angle sensor bias, walker agent with randomized dynamics parameters.
PubDate: WED, 14 AUG 2024 09:16:41 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Online Incremental Dynamic Modeling Using Physics-Informed Long Short-Term
Memory Networks for the Pneumatic Artificial Muscle-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Shuopeng Wang;Rixin Wang;Junjie Yang;Lina Hao;
Pages: 8435 - 8442
Abstract: The pneumatic artificial muscle (PAM) is widely applied in various scenarios due to their compliance and high-efficiency characteristics. However, the online modeling method which can accommodate online data remains an unresolved issue when data cannot be obtained off-line. This letter proposes an online incremental modeling method based on the physics-informed LSTM (PI-LSTM) architecture. The modified three-element model is regarded as the physics knowledge, and integrated into the PI-LSTM architecture, enabling the representation of physical constraints through neural networks. Subsequently, the elastic weight consolidation (EWC) method is utilized to combine the online operational data with the offline PI-LSTM model, allowing the model to be updated using the online data. Finally, online dynamic modeling experiments conducted on PAMs under different loads and driving conditions demonstrate the precision of the proposed method. Additionally, the experiments confirm that the proposed method effectively mitigates the catastrophic forgetting problem that can arise from online mini-batch data.
PubDate: TUE, 20 AUG 2024 09:17:14 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Lighthouse Localization of Miniature Wireless Robots
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Said Alvarado-Marin;Cristobal Huidobro-Marin;Martina Balbi;Trifun Savić;Thomas Watteyne;Filip Maksimovic;
Pages: 8443 - 8450
Abstract: In this letter, we apply lighthouse localization, originally designed for virtual reality motion tracking, to positioning and localization of indoor robots. We first present a lighthouse decoding and tracking algorithm on a low-power wireless microcontroller with hardware implemented in a cm-scale form factor. One-time scene solving is performed on a computer using a variety of standard computer vision techniques. Three different robotic localization scenarios are analyzed in this work. The first is a planar scene with a single lighthouse with a four-point pre-calibration. The second is a planar scene with two lighthouses that self calibrates with either multiple robots in the experiment or a single robot in motion. The third extends to a 3D scene with two lighthouses and a self-calibration algorithm. The absolute accuracy, measured against a camera-based tracking system, was found to be 7.25 mm RMS for the 2D case and 11.2 mm RMS for the 3D case, respectively. This demonstrates the viability of lighthouse tracking both for small-scale robotics and as an inexpensive and compact alternative to camera-based setups.
PubDate: FRI, 24 MAY 2024 09:16:33 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Multi-View Registration of Partially Overlapping Point Clouds for Robotic
Manipulation-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Yuzhen Xie;Aiguo Song;
Pages: 8451 - 8458
Abstract: Point cloud registration is a fundamental task in intelligent robots, aiming to achieve globally consistent geometric structures and providing data support for robotic manipulation. Due to the limited view of measurement devices, it is necessary to collect point clouds from multiple views to construct a complete model. Previous multi-view registration methods rely on sufficient overlap and registering all pairs of point clouds, resulting in slow convergence and high cumulative errors. To solve these challenges, we present a multi-view registration method based on the point-to-plane model and pose graph. We introduce a robust kernel into the objective function to diminish registration errors caused by mismatched points. Additionally, an enhanced Euclidean clustering method is proposed for extracting object point clouds. Subsequently, by establishing pose constraints on non-adjacent frames of point clouds, the cumulative error is reduced, achieving global optimization based on the pose graph. Experimental results demonstrate the robustness of our method with respect to overlap ratios, successfully registering point clouds with overlap ratio exceeding 30$\%$. In comparison to other techniques, our method can reduce the E (R) of multi-view registration by 13.54$\%$ and E (t) by 18.72$\%$, effectively reducing the cumulative error.
PubDate: MON, 19 AUG 2024 09:17:38 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Graph-Based Spatial Reasoning for Tracking Landmarks in Dynamic
Laparoscopic Environments-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Jie Zhang;Yiwei Wang;Song Zhou;Huan Zhao;Chidan Wan;Xiong Cai;Han Ding;
Pages: 8459 - 8466
Abstract: Accurate anatomical landmark tracking is crucial yet challenging in laparoscopic surgery due to the changing appearance of landmarks during dynamic tool-anatomy interactions and visual domain shifts between cases. Unlike appearance-based detection methods, this work proposes a novel graph-based approach to reconstruct the entire target landmark area by explicitly modeling the evolving spatial relations over time among scenario entities, including observable regions, surgical tools, and landmarks. Considering tool-anatomy interactions, we present the Tool-Anatomy Interaction Graph (TAI-G), a spatio-temporal graph that captures spatial dependencies among entities, attribute interactions within entities, and temporal dependencies of spatial relations. To mitigate domain shifts, geometric segmentation features are designated as node attributes, representing domain-invariant image information in the graph space. Message passing with attention helps propagate information across TAI-G, enhancing robust tracking by reconstructing landmark data. Evaluated on laparoscopic cholecystectomy, our framework demonstrates effective handling of complex tool-anatomy interactions and visual domain gaps to accurately track landmarks, showing promise in enhancing the stability and reliability of intricate surgical tasks.
PubDate: MON, 19 AUG 2024 09:17:38 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Collaborative Constrained Target-Reaching Control in a Multiplayer
Reach-Avoid Game-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Haiyan Zhao;Rongxin Cui;Weisheng Yan;
Pages: 8467 - 8474
Abstract: For a high-value attacker, relocating it is more valuable than destroying it. This relocation issue involves luring an adversarial agent along a predefined path to reach a goal in a non-convex domain. Here, we define it as a collaborative constrained target-reaching (CTR) problem. We introduce a novel virtual defense channel to define a symmetric dynamic extend target set, enabling us to treat the CTR problem as an aggregation of individual two-player reach-avoid (RA) games and obtain analytical strategies for defenders. First, we describe the partition of the game space and construct barriers using explicit policy methods and geometric analysis. This allows us to determine if a solution to the game exists based on players' initial conditions. Second, we develop nonlinear state feedback strategies using a suitable risk metric. These strategies are based on prescribed performance control, offering a viable framework for practical scenarios with control errors. Finally, simulations and experiments validate the effectiveness of our method.
PubDate: MON, 19 AUG 2024 09:17:38 -04
Issue No: Vol. 9, No. 10 (2024)
-
- Evolving Robotic Hand Morphology Through Grasping and Learning
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Bangchu Yang;Li Jiang;Wenhao Wu;Ruichen Zhen;
Pages: 8475 - 8482
Abstract: Creatures can co-evolve their biological structures and behaviors under environmental pressures. Leveraging biomimetic evolution algorithms (referred to as co-design or co-optimization), a diverse range of robots with environmental adaptation has been generated. However, implementing these evolutionary methods or results in real-world robots, especially in the case of robotic hands, was not easy. In this context, this work presents a comprehensive self-optimization scheme for robotic hands that encompasses both software and hardware components. This scheme enables robots to autonomously refine their morphology through the integration of hardware gradients and reinforcement learning within parallel environments, thereby enhancing their adaptability to a variety of grasping tasks. For the hardware aspect, we developed a reconfigurable hand prototype with 37 variable hardware parameters (i.e., joint stiffness, the length of phalanges, finger location, and palm curvature) adjusted by mechanical components. Leveraging the adjustable hardware and 20 motors, this hand achieves full actuation and can dynamically adjust its morphology. The training results indicate that the fitness score of the self-optimizing hand exceeds that of original designs in this instance. The hardware parameters can be further fine-tuned in response to task variations. Moreover, the evolved hardware parameters are transferred to a real-world reconfigurable hand, demonstrating its grasping and adaptivity capabilities.
PubDate: THU, 08 AUG 2024 09:17:48 -04
Issue No: Vol. 9, No. 10 (2024)
-
- IMU Augment Tightly Coupled Lidar-Visual-Inertial Odometry for
Agricultural Environments-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Quoc Hung Hoang;Gon-Woo Kim;
Pages: 8483 - 8490
Abstract: This letter presents a new tightly coupled LiDAR-visual-inertial odometry scheme for agricultural autonomous machinery under a structureless environment and the presence of fluctuation uncertainties. By proposing the robust adaptive filter, the effects of unknown disturbances and noises are significantly addressed. In the meantime, the IMU orientation is effectively estimated by the great capability of an error state Kalman filter (ESKF). The IMU attitude estimation is integrated to significantly improve the accuracy of both LiDAR and visual odometry. Hence, the suggested approach obtains the perfect output performance, smooth trajectory, and robustness against uncertainties. Finally, the effectiveness of the proposed LiDAR-visual-odometry is confirmed with the real-time experiment of different scenarios.
PubDate: THU, 08 AUG 2024 09:17:48 -04
Issue No: Vol. 9, No. 10 (2024)
-