awinml - Overview (original) (raw)

Ashwin Mathur

AI Engineer · Agentic RAG & Reranking · LLM Fine-Tuning & RL · Domain-Specific AI

I work on LLM systems for domain-specific applications in Finance, Bio-Medical, and Legal AI, spanning retrieval, agents and model training. I've contributed to Haystack, MTEB, HuggingFace, and scikit-learn, and co-authored MMTEB, published at ICLR 2025. Developing open-source AI at AVNLP.

Developing Open-Source AI @ AVNLP

LLM Training & RL Alignment

Repository	Description
BioThink	Self-Reflective Bio-Medical QA training with QLoRA + GRPO to generate structured self-reflection tokens using six reward functions; evaluated across seven metrics via LLM-as-a-Judge.
RAG Model Training	Fine-tuning LLMs for Adaptive-RAG, Corrective RAG, RQ-RAG, Self-RAG, Agentic RAG, and ReZero via SFT and GRPO across finance, biomedical, and open-domain QA.
GRPO	Four GRPO implementations comparing format/correctness rewards, DeepSpeed vs. PyTorch training, frozen/server/periodic reference models, and vLLM vs. Transformers rollout generation.
LLM Finetuning	SFT, DPO, KTO, ORPO, PPO, and GRPO pipelines with QLoRA/LoRA/DoRA/P-Tuning/Prefix-Tuning adapter training across ARC, FactScore, TriviaQA, PopQA, Earnings Calls, and GSM8K.

Retrieval Augmented Generation and Agents

Repository	Description
RAG Pipelines	Domain-specific RAG pipelines combining LangGraph orchestration, BAML structured generation, Milvus Hybrid Search, 3-layer metadata enrichment, and instruction-following rerankers for Medical and Financial QA.
DSPy Optimizers	DSPy RAG optimization with Weaviate Hybrid Search, Query Rewriting, Sub-Query Decomposition using MIPROv2/COPRO/BootstrapFewShot optimizers on FreshQA, HotpotQA, TriviaQA, and PubMedQA.
VectorDB	Haystack and LangChain retrieval pipelines spanning Dense/Sparse/Hybrid search, Reranking, Parent-Child Retrieval, Query Enhancement, and Multi-Tenancy across Pinecone, Weaviate, Milvus, Qdrant, and Chroma.

Information Retrieval & Ranking

Repository	Description
LLM Rankers	LLM rankers using Pairwise, Setwise, and Listwise techniques with RankZephyr/RankLlama, Pydantic-validated structured generation, and efficient zero-shot sorting.
Pairwise Ranking Prompting	Zero-shot pairwise reranking with All-Pairs, Heapsort, and Sliding-K strategies, using bidirectional comparison for position-bias mitigation and Pydantic-validated outputs.
Reciprocal Rank Fusion and LLM Rankers	Hybrid retrieval combining Reciprocal Rank Fusion with Diversity, Lost-in-the-Middle, and Similarity rankers, evaluated on BEIR (NDCG, MAP, Recall, Precision).
LLM Blender	LLM ensembling framework using PairRanker for cross-attention candidate ranking and GenFuser for top-K output fusion, packaged as a Haystack component.

Open-Source Contributions

Haystack - Built the Haystack evaluation framework (eval, EvaluationResult, calculate_metrics) and four metrics (EM, F1, SAS, MRR); added HuggingFace TEI Embedders and a sentence-transformer Diversity Ranker.
MTEB - Added the complete LegalBench Benchmark (160+ legal classification and retrieval datasets) and four Japanese benchmarks (JMTEB Clustering, JSICK, JaGovFaqs, NLPJournal).
Haystack Core Integrations - Implemented INSTRUCTOR Embedders, Optimum Embedders (ONNX runtime), Llama.cpp Generator, Pinecone Document Store, and Cohere V3 Embed model support.
HuggingFace Transformers, Evaluate - BioGPTForSequenceClassification and Trainer-free ViT pre-training scripts in Transformers; scikit-learn integration guides in Evaluate.
scikit-learn, imbalanced-learn - Three core scikit-learn features: OOB fitted scores for Gradient Boosting, sparse-matrix support for silhouette_samples, and multiclass average_precision_score.
voyage-embedders-haystack - Full Haystack integration for Voyage AI: text/document embedders, reranker, multimodal embeddings, and contextualized chunk embeddings; published on PyPI.

Publications

MMTEB: Massive Multilingual Text Embedding Benchmark (ICLR 2025)

Largest multilingual text embedding benchmark: 500+ tasks across 250+ languages and 10 task categories. Contributed the complete LegalBench suite - 160+ legal domain classification and retrieval datasets.