SmolVLA-SewerBot: A Compact Vision-Language-Action Model for Autonomous Sewer Inspection in Resource-Constrained Environments
Abstract
We introduce SmolVLA-SewerBot, a compact vision-language-action model optimized for deployment on battery-powered robotic platforms operating in confined underground spaces. The model architecture extends the SmolVLA framework with task-specific adaptations for sewer inspection: a dual-camera visual encoder processing simultaneous front and wrist camera feeds, a language-conditioned action decoder supporting natural language task specifications in Hindi and English, and an adaptive precision inference engine that dynamically adjusts quantization levels based on available battery and computational headroom. Trained on 12,000 episodes of teleoperated sewer inspection data collected from Indian municipal systems, the model achieves 87% task completion rate across four canonical sewer maintenance subtasks (navigate, assess, extract, deposit) when deployed on a Jetson Orin Nano. We benchmark against full-precision baselines and show that our adaptive quantization approach maintains 94% of baseline accuracy while reducing inference energy consumption by 3.8x, extending operational time from 45 minutes to 2.7 hours on a single battery charge. The model weights and evaluation code are released on HuggingFace.
Keywords
Citation
Chanda, S. (2026). "SmolVLA-SewerBot: A Compact Vision-Language-Action Model for Autonomous Sewer Inspection in Resource-Constrained Environments." Saral Systems Council Working Paper SSC-WP-2026-003. DOI: 10.xxxx/ssc-wp-2026-003
Related Research
More from Robotics
Vision-Language-Action Models for Sewer Inspection Robotics in India
Sayonsom Chanda
Embodiment-Aware Intent Transfer for Hazardous Labour Substitution: From Simulation to Indian Field Conditions
Sayonsom Chanda
Adaptive Precision Inference for Battery-Powered Field Robots in Infrastructure-Poor Environments
Sayonsom Chanda