Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
SEEK (Semantic Embedding & Extraction Kit)
Tech Stack
Python, Hugging Face, FAISS, OpenRouter, DeepSeek
• Developed a scalable Retrieval-Augmented Generation (RAG) pipeline to extract and retrieve insights from over 100+ GB PDF corpus, performing data extraction, semantic chunking, and embedding using Hugging Face’s BAAI/bge-base-en-v1.5 model.
• Implemented FAISS for vector search and evaluated IVFPQ indexing as an optimal approach (less than 100 ms latency) for large-scale retrieval, combined with query expansion techniques to improve recall.
• Deployed a zero-cost GenAI chatbot powered by DeepSeek LLM via OpenRouter, producing citation-backed, hallucination-resistant answers, and validated the pipeline’s economic and technical feasibility through performance benchmarking.

