piyush_prashant // ACTIVE

Building High-Performance AI Systems

B.Tech DSAI honours student at IIIT Dharwad. Specializing in low-latency C++ model inference, pgvector semantic caching, and intelligent query routing.

piyush@deep-infra-node-01: ~

GPU_LOAD:42%

VRAM:8.4 / 40.0 GB

RUN SIMULATION:

// 22-Language NLP Research

ACL SemEval-2026 Author

// IIIT Dharwad Honours

CGPA 9.1 / 10.0

// pgvector + Groq + Cache

Production RAG Systems

// ONNX Runtime Accelerator

C++ Inference Engineering

Recruiter Snapshot

What recruiters usually look for

Metrics that prove production execution, low-latency infrastructure design, and mathematical foundation. Hover or click cards to view engineering implementation details.

4,582,104+

Market Data Snapshots Processed

// Problem Solved:Training high-capacity deep learning models (DeepLOB & TransformerLOB) on raw LOB (Limit Order Book) snapshots without memory leakage or sliding window performance bottlenecks.

// Scale Check:4.5 Million snapshots (FI-2010 benchmark dataset), processing 100 levels of order book bids/asks across 5 prediction horizons.

// Business Value:Enabled high-accuracy mid-price movement forecasts under sub-millisecond conditions for high-frequency trading simulations.

STACK:PyTorchNumPyC++Sliding-Window Cache

Languages Evaluated & Classified

// Problem Solved:Detecting content polarization and framing anomalies across highly diverse low-resource and high-resource languages without training separate models per language.

// Scale Check:22 distinct language datasets evaluated using a single unified model ensemble under the ACL SemEval-2026 Task 9 framework.

// Business Value:Achieved a macro-F1 of 0.797, outperforming baseline models by +33.6 percentage points and zero-shot Llama-3-8B-Instruct by +26.3 percentage points.

STACK:mDeBERTa-v3XLM-RoBERTa4-bit QLoRAXGBoost Stacking

97.31%

F1 Score in Intelligent Query Routing

// Problem Solved:Eliminating manual query routing and reducing database execution bottlenecks for domain-specific language (DSL) queries requiring relational SQL or Neo4j Graph queries.

// Scale Check:Evaluated on a 10,000+ query hybrid workload, classifying query targets dynamically in real-time.

// Business Value:Automated routing decisions with an F1 score of 97.3% and incorporated SHAP-based feature weight explainability, cutting manual analysis by 40%.

STACK:XGBoostLogistic RegressionSHAPScikit-Learn

84.32%

HFT Mid-Price Prediction Accuracy

// Problem Solved:Predicting micro-structural price directions from highly volatile, noisy limit order book tick-level streams in real-time trading.

// Scale Check:High-frequency limit order book order matching, tested across 5 sequential future time horizons.

// Business Value:Achieved state-of-the-art mid-price accuracy of 84.3%, enabling simulated trading strategies to outperform standard random-walk baselines.

STACK:TransformerLOBCNN-DeepLOBONNX RuntimeCUDA Acceleration

60%

Inference API Cost Reduction

// Problem Solved:High token consumption and API cost overhead from redundant natural language questions asked by business users to NL-to-SQL analytics schemas.

// Scale Check:Tested against 50k rows CSV file uploads and 10+ relational database schemas under multi-user concurrency.

// Business Value:Saved ~60% in LLM API fees and dropped average query response times from 4s to 1.5s via a vector-based semantic cache.

STACK:Supabasepgvectornomic-embed-textGroq API

1.750x

C++ Inference Speedup

// Problem Solved:Python interpreter overhead and execution latency delays in deep learning model inference (exceeding maximum time-budgets for high frequency execution).

// Scale Check:C++ inference deployment running sliding window feeds of 500k+ sequence sets.

// Business Value:Decreased inference time per step to 2.78ms (representing a 1.75x speedup), establishing production-grade deployment capabilities.

STACK:ONNX Runtime C++CMakeLibTorchMemory Caching

Production Systems

Technical Architecture & Case Studies

Auditable, live-simulated code bases showing low-latency design, database tuning, and classification logic.

DataChat: RAG-based NL-to-SQL Engine

Stack: Next.js 14, Supabase (pgvector), Groq LLM, Ollama (nomic-embed-text), Recharts

VIEW CODE ON GITHUB

Problem: Standard RAG queries suffer from high LLM token costs and execution latencies (4s+). Solution: Implemented pgvector schema matching and similarity-based query semantic caching to solve this bottleneck.

90%+

Query Accuracy

~60%

API Call Reduction

1.5s

Response Latency

> Engineering Challenges:

Cache Tuning: Built Cosine-similarity threshold controls in Supabase pgvector to eliminate hallucinated hits.
Sub-500ms SQL Compile: Pre-embedded database schema layouts to minimize Llama-3 context sizes.

// Live RAG Flow State:

1. User Query ReceivedPROCESSING...

2. Semantic Cache Check

3. pgvector Retrieval

4. Prompt Context Construction

5. Groq SQL Compilation

6. Data Output & Table Render

// Upload CSV Analytics:

Click to upload CSV (Max 50k rows)Simulates schema embedding and Recharts auto-generation

Academic Output

Peer-Reviewed Research Lab

Applying low-level ML engineering, multi-gpu model quantization, and ensemble routing architecture to solve core challenges in multilingual NLP.

ACL SemEval-2026 Accepted Paper

Semantic Vectors at SemEval-2026 Task 9: Robust Multilingual Polarization Detection via Dual-Encoder Fusion and Expert Ensembling

Ankit Dash, Priyanshu Mittal, Piyush Prashant, Sunil Saumya | Proceedings of SemEval-2026, Association for Computational Linguistics

DOWNLOAD PAPER (PDF)ACL ANTHOLOGY

// Research Overview:

Cross-lingual polarization classification suffers from significant semantic shift. We fine-tuned mDeBERTa-v3 and XLM-RoBERTa-large via 4-bit QLoRA, stacking their predictions with an XGBoost classifier and applying Shannon entropy thresholds for dynamic routing.

// Model Ensemble Architecture:

DUAL-ENCODER META-STACKING LAYER

✔4-bit QLoRA: Co-trained dual models across 22 languages on a single GPU.

✔Entropy Routing: Used Shannon entropy metrics to route simple inputs directly, reducing compute overhead by 35%.

✔XGBoost Stacking: Blended base dual-encoders to mitigate model-specific classification bias.

// Macro-F1 Score Performance Compare (higher is better):

Dual-Encoder Stacking (Ours)79.7%

Llama-3-8B-Instruct (Zero-Shot)53.4%

mDeBERTa-v3-base (Baseline)68.2%

Majority Class Baseline46.1%

* Note: The dual-encoder fusion architecture outperforms Llama-3-8B-Instruct by +26.3 pp and majority class base by +33.6 pp across the 22-language benchmark.

22-Language Evaluation Matrix (SemEval-2026)

English

F1:0.842

Spanish

F1:0.825

French

F1:0.819

German

F1:0.814

Italian

F1:0.808

Portuguese

F1:0.803

Russian

F1:0.795

Chinese

F1:0.792

Arabic

F1:0.781

Hindi

F1:0.789

Bengali

F1:0.772

Tamil

F1:0.765

Telugu

F1:0.762

Urdu

F1:0.758

Persian

F1:0.754

Turkish

F1:0.778

Indonesian

F1:0.784

Vietnamese

F1:0.776

Korean

F1:0.788

Japanese

F1:0.791

Swahili

F1:0.741

Yoruba

F1:0.732

Total Languages: 22 evaluatedAverage Multilingual F1: 0.785

Engineering Stack

System Infrastructure Rack

Proficiencies categorized by architecture layers. Hovering over a skill shows which production system or paper used it. Hovering over a project badge highlights its technology stack in the cabinet.

Hover Project to Highlight Stack:

Layer 1: AI Automation & LLMs

RAG Pipelines

Agentic AI

Multi-Agent Systems

LangChain

LlamaIndex

Groq LLM

Transformers

Layer 2: ML Infrastructure & Training

PyTorch

TensorFlow

ONNX Runtime

CUDA

QLoRA

XGBoost Stacking

MLflow / W&B

Layer 3: Backend & APIs

Python

C++

FastAPI

Docker

Kubernetes

Git / PR Workflows

Layer 4: Data & Retrieval

pgvector (Supabase)

ChromaDB / FAISS

MongoDB

MySQL / SQL

Neo4j / Graphs

Layer 5: Frontend & Visualization

Next.js 14

React

TailwindCSS

Recharts

shadcn/ui

Growth Path

Engineering Milestones

A chronological git history detailing formal education, published scientific literature, and active code branches.

Expected May 2028branch: main

GPA: 9.1Data ScienceSystems

B.Tech in Data Science & AI

IIIT Dharwad (Till 4th Semester)

Maintaining a cumulative CGPA of 9.1 / 10.0 at IIIT Dharwad. Core focus: DBMS, Data Structures & Algorithms, Statistics, and Probability.

January 2026branch: research-nlp

QLoRAXGBoost StackingACL Anthology

ACL SemEval-2026 Author

Association for Computational Linguistics Proceedings

Co-authored research on polarization detection across 22 languages. Developed the stacking classifier using QLoRA and Shannon entropy expert-routing.

Late 2025branch: hft-dev

C++ONNX RuntimeHigh Frequency Trading

DeepLOB C++ Inference Engineering

Quant/HFT Model Acceleration Project

Architected a C++ ONNX Runtime pipeline for limit order book predictions, achieving a 1.75x speedup (2.78ms latency) over Python.

Mid 2025branch: hft-dev

XGBoostSHAP ExplainabilityNeo4j / SQL

HIFUN Router Development

Hybrid Query Optimization System

Developed an XGBoost query routing classifier (97.3% F1 score) to route DSL inputs to SQL (Postgres) or Graph (Neo4j) backends.

Early 2025branch: hft-dev

pgvectorGroq APISemantic Cache

DataChat Architecture Launch

RAG NL-to-SQL Database Interface

Engineered a pgvector RAG database cache, reducing API token calls by 60% and cutting response times from 4.0s to 1.5s.

Ongoingbranch: main

Minor GPA: 9.0Agentic SystemsPrompt Engineering

Generative AI Minor Specialization

Academic Honors Track

Completing honors track in Generative AI (Minor GPA: 9.0/10.0). Focus: LLM optimization, representations, and multi-agent loops.

// Currently Exploring:

AI Infrastructure

Efficient Inference Systems

Agentic Workflows

Applied ML Systems