systems_engineer_run // ACTIVE

Building Production-Grade AI Systems for the Next Era of Computing

B.Tech Data Science & AI student at IIIT Dharwad working on RAG systems, agentic AI pipelines, multilingual ML research, and low-latency inference infrastructure.

Resume (docx)
piyush@deep-infra-node-01: ~
GPU_LOAD:42%
VRAM:8.4 / 40.0 GB
21:05:33.714$
RUN SIMULATION:
// 22-Language NLP Research
ACL SemEval-2026 Author
// IIIT Dharwad Honours
CGPA 9.1 / 10.0
// pgvector + Groq + Cache
Production RAG Systems
// ONNX Runtime Accelerator
C++ Inference Engineering
Recruiter Snapshot

What recruiters usually look for

Metrics that prove production execution, low-latency infrastructure design, and mathematical foundation. Hover or click cards to view engineering implementation details.

4,582,104+
Market Data Snapshots Processed
// Problem Solved:Training high-capacity deep learning models (DeepLOB & TransformerLOB) on raw LOB (Limit Order Book) snapshots without memory leakage or sliding window performance bottlenecks.
// Scale Check:4.5 Million snapshots (FI-2010 benchmark dataset), processing 100 levels of order book bids/asks across 5 prediction horizons.
// Business Value:Enabled high-accuracy mid-price movement forecasts under sub-millisecond conditions for high-frequency trading simulations.
STACK:PyTorchNumPyC++Sliding-Window Cache
22
Languages Evaluated & Classified
// Problem Solved:Detecting content polarization and framing anomalies across highly diverse low-resource and high-resource languages without training separate models per language.
// Scale Check:22 distinct language datasets evaluated using a single unified model ensemble under the ACL SemEval-2026 Task 9 framework.
// Business Value:Achieved a macro-F1 of 0.797, outperforming baseline models by +33.6 percentage points and zero-shot Llama-3-8B-Instruct by +26.3 percentage points.
STACK:mDeBERTa-v3XLM-RoBERTa4-bit QLoRAXGBoost Stacking
97.31%
F1 Score in Intelligent Query Routing
// Problem Solved:Eliminating manual query routing and reducing database execution bottlenecks for domain-specific language (DSL) queries requiring relational SQL or Neo4j Graph queries.
// Scale Check:Evaluated on a 10,000+ query hybrid workload, classifying query targets dynamically in real-time.
// Business Value:Automated routing decisions with an F1 score of 97.3% and incorporated SHAP-based feature weight explainability, cutting manual analysis by 40%.
STACK:XGBoostLogistic RegressionSHAPScikit-Learn
84.32%
HFT Mid-Price Prediction Accuracy
// Problem Solved:Predicting micro-structural price directions from highly volatile, noisy limit order book tick-level streams in real-time trading.
// Scale Check:High-frequency limit order book order matching, tested across 5 sequential future time horizons.
// Business Value:Achieved state-of-the-art mid-price accuracy of 84.3%, enabling simulated trading strategies to outperform standard random-walk baselines.
STACK:TransformerLOBCNN-DeepLOBONNX RuntimeCUDA Acceleration
60%
Inference API Cost Reduction
// Problem Solved:High token consumption and API cost overhead from redundant natural language questions asked by business users to NL-to-SQL analytics schemas.
// Scale Check:Tested against 50k rows CSV file uploads and 10+ relational database schemas under multi-user concurrency.
// Business Value:Saved ~60% in LLM API fees and dropped average query response times from 4s to 1.5s via a vector-based semantic cache.
STACK:Supabasepgvectornomic-embed-textGroq API
1.750x
C++ Inference Speedup
// Problem Solved:Python interpreter overhead and execution latency delays in deep learning model inference (exceeding maximum time-budgets for high frequency execution).
// Scale Check:C++ inference deployment running sliding window feeds of 500k+ sequence sets.
// Business Value:Decreased inference time per step to 2.78ms (representing a 1.75x speedup), establishing production-grade deployment capabilities.
STACK:ONNX Runtime C++CMakeLibTorchMemory Caching
Production Systems

Technical Architecture & Case Studies

Auditable, live-simulated code bases showing low-latency design, database tuning, and classification logic.

DataChat: RAG-based NL-to-SQL Engine

Stack: Next.js 14, Supabase (pgvector), Groq LLM, Ollama (nomic-embed-text), Recharts

VIEW CODE ON GITHUB
// Why This Matters:

Standard natural language database queries consume significant API token fees and suffer from high execution latencies (typically 4s+ per LLM query). DataChat solves this by introducing pgvector schema matching + an intelligent semantic cache.

90%+
Query Accuracy
~60%
API Call Reduction
1.5s
Response Latency

> Engineering Challenges:

  • Semantic Cache Tuning: Developed Cosine-distance threshold parameters in Supabase pgvector to balance cache-hit accuracy against hallucinated answers.
  • High-Speed Context Generation: Pre-embedded database schemas, utilizing schema layout structure mapping to minimize context lengths passed to Groq Llama-3, achieving sub-500ms SQL generation times.
// Live RAG Flow State:
1. User Query ReceivedPROCESSING...
2. Semantic Cache Check
3. pgvector Retrieval
4. Prompt Context Construction
5. Groq SQL Compilation
6. Data Output & Table Render
// Upload CSV Analytics:
Academic Output

Peer-Reviewed Research Lab

Applying low-level ML engineering, multi-gpu model quantization, and ensemble routing architecture to solve core challenges in multilingual NLP.

ACL SemEval-2026 Accepted Paper

Semantic Vectors at SemEval-2026 Task 9: Robust Multilingual Polarization Detection via Dual-Encoder Fusion and Expert Ensembling

Ankit Dash, Priyanshu Mittal, Piyush Prashant, Sunil Saumya | Proceedings of SemEval-2026, Association for Computational Linguistics

// Research Overview:

Multilingual polarization detection suffers from heavy syntax variations and cross-lingual representation drift. This work fine-tunes mDeBERTa-v3-base and XLM-RoBERTa-large via 4-bit Parameter-Efficient QLoRA. The dual encoders feed outputs into an XGBoost meta-classifier stacked with a Shannon entropy routing mechanism to dynamically route predictions to expert nodes.

// Model Ensemble Architecture:
DUAL-ENCODER META-STACKING LAYER
QLoRA Alignment: Fine-tuned embedding matrices across 22 languages simultaneously using single GPU setups, maintaining mathematical representation mapping.
Entropy-Based Routing: Shannon entropy serves as a metric threshold to dynamically bypass the stack and route simple query inputs directly, saving 35% computation.
Expert Meta-Classifier: Blended mDeBERTa-v3 and XLM-R models via XGBoost meta-stacking to resolve model-specific bias profiles.
// Macro-F1 Score Performance Compare (higher is better):
Dual-Encoder Stacking (Ours)79.7%
Llama-3-8B-Instruct (Zero-Shot)53.4%
mDeBERTa-v3-base (Baseline)68.2%
Majority Class Baseline46.1%
* Note: The dual-encoder fusion architecture outperforms Llama-3-8B-Instruct by +26.3 pp and majority class base by +33.6 pp across the 22-language benchmark.
22-Language Evaluation Matrix (SemEval-2026)
EN
English
F1:0.842
ES
Spanish
F1:0.825
FR
French
F1:0.819
DE
German
F1:0.814
IT
Italian
F1:0.808
PT
Portuguese
F1:0.803
RU
Russian
F1:0.795
ZH
Chinese
F1:0.792
AR
Arabic
F1:0.781
HI
Hindi
F1:0.789
BN
Bengali
F1:0.772
TA
Tamil
F1:0.765
TE
Telugu
F1:0.762
UR
Urdu
F1:0.758
FA
Persian
F1:0.754
TR
Turkish
F1:0.778
ID
Indonesian
F1:0.784
VI
Vietnamese
F1:0.776
KO
Korean
F1:0.788
JA
Japanese
F1:0.791
SW
Swahili
F1:0.741
YO
Yoruba
F1:0.732
Total Languages: 22 evaluatedAverage Multilingual F1: 0.785
Engineering Stack

System Infrastructure Rack

Proficiencies categorized by architecture layers. Hovering over a skill shows which production system or paper used it. Hovering over a project badge highlights its technology stack in the cabinet.

Hover Project to Highlight Stack:
Layer 1: AI Automation & LLMs
RAG Pipelines
Agentic AI
Multi-Agent Systems
LangChain
LlamaIndex
Groq LLM
Transformers
Layer 2: ML Infrastructure & Training
PyTorch
TensorFlow
ONNX Runtime
CUDA
QLoRA
XGBoost Stacking
MLflow / W&B
Layer 3: Backend & APIs
Python
C++
FastAPI
Docker
Kubernetes
Git / PR Workflows
Layer 4: Data & Retrieval
pgvector (Supabase)
ChromaDB / FAISS
MongoDB
MySQL / SQL
Neo4j / Graphs
Layer 5: Frontend & Visualization
Next.js 14
React
TailwindCSS
Recharts
shadcn/ui
Growth Path

Engineering Milestones

A chronological git history detailing formal education, published scientific literature, and active code branches.

Expected May 2028branch: main
GPA: 9.1Data ScienceSystems

B.Tech in Data Science & AI

IIIT Dharwad (Till 4th Semester)

Maintaining a high academic standing with a cumulative CGPA of 9.1 / 10.0. Engaging in deep systems coursework including Database Management Systems, Data Structures, Statistics, and Probability.

January 2026branch: research-nlp
QLoRAXGBoost StackingACL Anthology

ACL SemEval-2026 Author

Association for Computational Linguistics Proceedings

Co-authored a paper on robust multilingual polarization detection across 22 languages. Engineered the stacked classifier using QLoRA and Shannon entropy expert-routing layers.

Late 2025branch: hft-dev
C++ONNX RuntimeHigh Frequency Trading

DeepLOB C++ Inference Engineering

Quant/HFT Model Acceleration Project

Architected a C++ deployment pipeline for limit order book tick analysis. Compiled CNN/Transformer models to ONNX, achieving 1.75x inference execution speeds over Python baselines.

Mid 2025branch: hft-dev
XGBoostSHAP ExplainabilityNeo4j / SQL

HIFUN Router Development

Hybrid Query Optimization System

Co-developed an intelligent ML routing node to automatically classify query DSL target backends. Evaluated across 10k+ benchmark datasets, reaching an F1 score of 97.3%.

Early 2025branch: hft-dev
pgvectorGroq APISemantic Cache

DataChat Architecture Launch

RAG NL-to-SQL Database Interface

Engineered an agentic database querying workflow utilizing pgvector and semantic caching layers. Achieved 60% API cost reductions and compressed average response times to 1.5s.

Ongoingbranch: main
Minor GPA: 9.0Agentic SystemsPrompt Engineering

Generative AI Minor Specialization

Academic Honors Track

Completing dedicated honors coursework focused on LLM optimization, deep representations, prompt architecture, and multi-agent loops. Minor GPA: 9.0 / 10.0.

// Currently Exploring:
AI Infrastructure
Efficient Inference Systems
Agentic Workflows
Applied ML Systems