January 2024 - May 2024
Co-developed TOM: Text-to-Mapping, an AI-powered system for automated book detection in large bookshelves, achieving 73.33% true positive and 85% true negative rates in book identification. Designed a four-stage computer vision pipeline using Meta AI’s SAM for pixel-level segmentation, followed by OpenAI’s CLIP to classify and filter non-book objects, ensuring precise book detection. Integrated Google Cloud’s Vision API for text extraction and location mapping, matching extracted text chunks to book segments for accurate identification of desired books in a bookshelf. Aim to integrate TOM with a robotic system to enable real-time library assistance in book retrieval tasks.
TBD