Bal Narendra Sapa
AI Engineer | Software Engineer | Retrieval-Augmented Generation (RAG) | LLM Systems
About
I am an AI Engineer and Software Engineer specializing in Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), vector search systems, and scalable AI infrastructure.
I hold a Master of Science in Data Science from the University of New Haven and a Bachelor's degree in Computer Science from IIIT Basar. My work focuses on designing and deploying production-grade AI systems that solve real-world business problems using modern machine learning and cloud technologies.
My interests include:
- Large Language Models (LLMs)
- Retrieval-Augmented Generation (RAG)
- AI Infrastructure and MLOps
- Distributed Systems
- Information Retrieval
- Open Source Software
Work Experience
Software Engineer
PPK VirtueServices Pvt. Ltd. | Hyderabad, India
Oct 2025 – Apr 2026
Worked on enterprise AI applications focused on document intelligence, retrieval systems, and scalable LLM deployments.
Key Contributions
- Designed and implemented semantic caching using LangChain, Redis, and Redis Search to reduce response latency.
- Built bulk document ingestion pipelines using Celery workers for large-scale document processing.
- Implemented hybrid search capabilities by combining dense vector retrieval with BM25 sparse retrieval.
- Added OCR support for extracting text from scanned documents.
- Improved LLM response quality through prompt engineering and retrieval optimization.
- Optimized GPU memory utilization for large-context inference workloads.
- Managed deployments using Docker, Docker Compose, Azure Virtual Machines, and Milvus vector databases.
- Enabled processing of document contexts approaching 30,000 tokens on 16 GB GPU environments.
Technologies
Python, LangChain, Redis, Redis Search, Celery, Docker, Azure, Milvus, OCR, BM25, Llama Models
AI Engineer
Sunrise Software Solutions Corporation | Connecticut, USA
Jan 2024 – Sept 2025
Worked on document-based conversational AI systems powered by open-source Large Language Models.
Key Contributions
- Developed a production-grade RAG chatbot platform using Llama and Falcon models.
- Built REST APIs for user management, document management, and AI interactions using Django REST Framework.
- Designed strategies for document chunking, retrieval, and vector storage management.
- Integrated Hugging Face Text Generation Inference (TGI) for scalable model serving.
- Implemented per-user vector store management using Azure Blob Storage.
- Deployed AI workloads using Docker, Azure DevOps, and Runpod Serverless infrastructure.
Technologies
Python, Django, DRF, LangChain, Azure Blob Storage, Docker, Azure DevOps, Hugging Face TGI, Runpod
Open Source Contributions
LangChain Contributor
Pull Request: https://github.com/langchain-ai/langchain/pull/7914
Contributed vector store serialization functionality to LangChain.
The contribution enabled vector stores to be serialized and stored programmatically, making them easier to persist, transport, and manage across environments.
Selected Projects
Drivable Area and Lane Segmentation
Links
- GitHub: https://github.com/balnarendrasapa/road-detection
- Dataset: https://huggingface.co/datasets/bnsapa/road-detection
- Demo: https://huggingface.co/spaces/bnsapa/road-detection
Built a computer vision system for drivable-area detection and lane segmentation using a custom annotated dataset collected and labeled by the project team.
Highlights
- Custom dataset creation and annotation
- Semantic segmentation
- Hugging Face deployment
- Open-source release
E-Commerce FAQ Chatbot using PEFT and LoRA
Links
- Kaggle: https://www.kaggle.com/code/balnarendrasapa/fine-tuning-falcon-7b-with-faq-e-com-dataset
- GitHub: https://github.com/balnarendrasapa/faq-llm
- Model: https://huggingface.co/bnsapa/faq-llm
Fine-tuned Falcon-7B using Parameter-Efficient Fine-Tuning (PEFT) and LoRA techniques to build an e-commerce FAQ assistant.
Highlights
- Falcon-7B fine-tuning
- LoRA adaptation
- Streamlit deployment
- Public Hugging Face model release
Cybersecurity Named Entity Recognition
Links
- GitHub: https://github.com/balnarendrasapa/cybersecurity-ner
- Dataset: https://huggingface.co/datasets/bnsapa/cybersecurity-ner
- Demo: https://huggingface.co/spaces/bnsapa/cybersecurity-ner
Developed a domain-specific NER model for cybersecurity using DistilBERT and MITRE-based datasets.
Highlights
- Domain-specific NLP
- DistilBERT fine-tuning
- Custom dataset publication
- Hugging Face deployment
Face Mask Detection
Links
- GitHub: https://github.com/balnarendrasapa/mask-detection
- Colab: https://colab.research.google.com/github/balnarendrasapa/mask-detection/blob/master/face_mask_detection.ipynb
- Demo: https://huggingface.co/spaces/bnsapa/mask-detection
Built a transfer-learning-based mask detection system using a pretrained VGG19 model and a dataset containing approximately 12,000 images.
Certifications
- Microsoft Azure AI Engineer Associate (AI-102)
- Microsoft Azure AI Fundamentals (AI-900)
Education
University of New Haven
Master of Science in Data Science Aug 2022 – Dec 2023
GPA: 3.81 / 4.00
Rajiv Gandhi University of Knowledge Technologies (IIIT Basar)
Bachelor of Technology in Computer Science 2018 – 2022
GPA: 9.01 / 10.00
Technical Skills
Programming Languages
- Python
- JavaScript
- SQL
AI & Machine Learning
- Large Language Models (LLMs)
- Retrieval-Augmented Generation (RAG)
- LangChain
- Hugging Face
- Fine-Tuning (LoRA, PEFT)
- Vector Databases
- Information Retrieval
- Computer Vision
- NLP
Backend & Infrastructure
- Django
- FastAPI
- REST APIs
- Docker
- Celery
- Redis
- Azure
- Milvus
- Azure Blob Storage
Data & Analytics
- Pandas
- NumPy
- Machine Learning
- Data Science
Contact
- Email: bnsapa2000@gmail.com
- LinkedIn: https://www.linkedin.com/in/balnarendrasapa
- GitHub: https://github.com/balnarendrasapa
- Hugging Face: https://huggingface.co/bnsapa
- Kaggle: https://www.kaggle.com/balnarendrasapa