MSc Researcher · Ontario Tech University

Dvip Patel

AI engineer & graduate researcher working on trustworthy NLP for Requirements Engineering — LLM governance, knowledge discovery, and empirical software engineering.

MSc researcher working at the intersection of NLP and Requirements Engineering (RE). My thesis builds a four-study pipeline that takes raw stakeholder prose to structured, audit-grade knowledge artifacts — spanning LLM-based parsing with tag governance, transformer classification with local-context modeling, and multi-signal implicit keyword discovery grounded in an industrial FinTech/SaaS corpus of 110 requirements and 1,997 HLJ artifacts. Across three papers (CASCON 2025 published, RE'26 Main under review, RE'26 RE@Next! in revision), I argue that governance and distributional alignment — not raw model scale — are the primary levers for trustworthy RE automation.

dvippatel.math@gmail.com·GitHubLinkedIn· Oshawa, Ontario, Canada

Current Focus

Research

Thesis in progress

Toward Automated Requirements Engineering: Empirical and Architectural Foundations for Structured Parsing and Knowledge Discovery

MSc, Software Engineering · Ontario Tech University
Supervisor: Prof. Sanaa Alwidian

A four-study pipeline that automates requirements engineering (RE) using NLP and AI — from raw stakeholder prose to implicit domain knowledge discovery. The work spans structured JSON parsing, transformer-based classification with local context, and multi-signal keyword discovery, grounded in an industrial FinTech and SaaS corpus.

Pages
182
Studies
4
Papers
3
HLJ artifacts
1,997
Requirements
110

What I'm building

Study 1 & 2CASCON 2025 · published

Structured LLM parsing with tag governance

Multi-model pipeline (GPT-4.1, Claude Opus 4, Meta-70B) that converts raw requirements into confidence-scored HLJ artifacts. A 7-stage governance layer — harvest, filter, cluster, validate, whitelist, audit, drift — takes v2 precision to 0.95 and catches prompt-leak exploitation before it reaches downstream stages.

Study 3RE'26 Main · under review

Local context for requirement classification

Empirical study across MPNet and DeBERTa-v3, frozen / LoRA / full fine-tuning, and context windows k=0–3. Structured divergence features move F1 by +16 points; distributional alignment beats scale (4K domain samples ≥ 15K mixed).

Study 4RE'26 RE@Next! · in revision

Implicit keyword discovery with audit trails

Five-phase engine: domain dictionary (13,725 entries), enriched co-occurrence graph (275K edges), three discovery signals — neighbor transfer, graph walks, UMAP+HDBSCAN gap detection — gated by a bounded LLM-as-judge that only tiebreaks the borderline band.

Cross-cuttingThesis · ongoing

Infrastructure & reproducibility

Every tag and every implicit keyword carries full provenance: discovery signal, per-signal scores, neighbor IDs, graph paths, cluster membership, SBERT similarity, dictionary match, and LLM judgment when invoked. Drift detection runs across pipeline versions.

The throughline across all three papers: governance and distributional alignment, not raw model scale, are the primary levers for trustworthy RE automation. LLMs are powerful but don't belong in every role — in this work, they parse and they tiebreak, but they never generate ground-truth knowledge without an audit trail.

Research Network

Citation Graph

Explore the relationships between papers, research questions, datasets, and findings. Click a node for details, drag to rearrange, scroll to zoom.

Interactive Exploration

Parameter Tweaking

Adjust pipeline parameters and see how different configurations affect model performance. Results are pre-computed from actual experiment runs.

Parameters

Adjust parameters to explore pre-computed results

Clustering
0.80
0.60.9
Deduplication
0.83
0.750.92
NLU Validation
0.68
0.550.78
Scoring
9
312
Model
Token Filter
0.40
0.30.6

Results

F1

84.7%

Precision

94.9%

Recall

79.9%

Confusion Matrix

72
4
16
68

Selected Work

Publications

  1. [3]PublishedOpen Research Object

    Improving Reliability of LLMs in RE with Structured Confidence & Tag Governance

    Dvip Patel, Sanaa Alwidian · CASCON 2025, 2025

    Read →

    A modular multi-model LLM pipeline that converts raw stakeholder requirements into High-Level JSON (HLJ) artifacts with confidence-scored fields, paired with a versioned tag-governance system that catches prompt-leak exploitation, hallucinated tags, and low-agreement outputs before they reach downstream stages.

    LLM governanceRequirements EngineeringTag validationHLJ artifactsMulti-model benchmarking
  2. [2]In Revision

    From Explicit to Implicit: Towards Traceable Keyword Discovery in Requirements Engineering

    Dvip Patel, Amarachi Nwosu, Sanaa Alwidian · RE'26, RE@Next!, 2026

    Explicit keyword extraction — even at audit-grade precision — hits a structural ceiling of roughly 1.5 canonical keywords per HLJ artifact, with Jaccard agreement below 0.11 across KeyBERT, RAKE, and YAKE. This paper presents a 5-phase implicit keyword discovery engine combining a 13,725-entry domain dictionary, a 275,164-edge enriched co-occurrence graph, UMAP+HDBSCAN clustering, and a bounded LLM-as-judge that tiebreaks only the borderline scoring band. Every implicit keyword traces back to its discovery signal(s) with full per-signal evidence.

    Implicit knowledge discoveryGraph-based NLPUMAP / HDBSCANLLM-as-judgeAudit trails
  3. [1]Under Review

    Towards Improving Sentence-Level Requirements Identification via Explicit Local Context Modeling

    Dvip Patel, Sanaa Alwidian · RE'26, Main Track, 2026

    An empirical study of sentence-level requirement classification on 110 real-world FinTech and SaaS documents (~5,700 candidate sentences, balanced to 15K mixed / 4K domain-only). We compare all-mpnet-base-v2 (110M) and DeBERTa-v3-base (184M) across frozen / LoRA / full fine-tuning and context window sizes k=0–3, and show that structured local-context features — not raw concatenation — are the critical signal.

    Requirements classificationLocal context modelingLoRA fine-tuningDeBERTa-v3Empirical software engineering

Where I've worked

Experience

  1. Graduate Researcher (MSc, Software Engineering) · Ontario Tech University

    Sep 2024Present

    Oshawa, ON · Supervisor: Prof. Sanaa Alwidian

    • Designing a four-study thesis pipeline that automates Requirements Engineering from raw stakeholder prose to implicit domain-knowledge discovery, grounded in an industrial FinTech/SaaS corpus of 110 requirements and 1,997 High-Level JSON (HLJ) artifacts.
    • Built and benchmarked a multi-model LLM parsing pipeline (GPT-4.1, Claude Opus 4, Meta-70B) with a versioned tag-governance stack (Harvest → Filter → Cluster → Validate → Whitelist → Audit → Drift); Opus 4 and Meta-70B reach F1=0.85 / precision=0.95 at the strictest v2 stage — published at CASCON 2025.
    • Conducted an empirical study of sentence-level requirement classification (all-mpnet-base-v2, DeBERTa-v3-base) across frozen / LoRA / full fine-tuning and context windows k=0–3, showing structured local context adds +16 F1 points and that 4K domain-aligned samples (F1=0.894) beat 15K mixed (F1=0.883) — submitted to RE'26 Main.
    • Architected a 5-phase implicit keyword discovery engine combining neighbor transfer, graph walks over a 275,164-edge enriched co-occurrence graph, and UMAP+HDBSCAN cluster-gap detection, with a bounded LLM-as-judge tiebreaker and full per-keyword provenance — in revision at RE'26 RE@Next! with Amarachi Nwosu.
    • Co-designed a domain dictionary of 13,725 entries / 35,799 lookup keys / 108 detected abbreviations used as synonym normalizer, confidence booster, stoplist, and novelty flagger across the pipeline.
    • Translated research into 3 papers across CASCON and RE'26; conducted literature reviews, experimental design, ablation planning (hop depth, signal composition, dictionary impact), and supervisory reporting.
  2. Migration Engineer · Palomino Systems

    Nov 2025Present

    Remote

    • Designed and shipped Laravel Upgrader, a fully autonomous CLI-driven AI migration system built on a 10-step pipeline (analysis → planning → transformation → validation → self-healing) processing 10k+ LOC/run.
    • Achieved end-to-end autonomous migration of small-to-medium Laravel applications at under $80 cost/run, with 85–90% automated issue resolution via detect → fix → retry loops.
    • Reduced migration effort by 80%+ and runtime by 40–60%; validation layers cut post-migration defects by 70%+.
    • Led client communication directly with Palomino stakeholders, presenting pipeline architecture and translating technical tradeoffs into prioritized recommendations.
    • Owned the full codebase end-to-end, driving iterative improvements under tight client feedback cycles.
  3. Software Engineer · Mediabridge

    Apr 2025Present

    Remote

    • Led development of a modular Ad Builder Canvas Engine and dynamic campaign system, reducing frontend effort by 60%+; served as primary technical point of contact for client-side feature discussions.
    • Engineered multi-tenant backend services with RBAC access control and event-driven notification delivery; reduced deployment time by 70% via CI/CD automation.
    • Proposed and prioritized feature roadmap improvements directly with client stakeholders, translating business needs into system decisions.
  4. Laravel Developer · Finserve Infotech

    Jan 2024Dec 2024

    India

    • Delivered 4 ERP/POS systems, improving workflow efficiency by 30%+.

Built & Shipped

Projects

Laravel Upgrader

Autonomous 10-step AI migration pipeline for Laravel codebases.

CLI-driven migration system that ingests legacy Laravel code and runs analysis → planning → transformation → validation → self-healing loops. Processes 10k+ LOC per run at under $80 cost, with 85–90% automated issue resolution.

PythonLLM OrchestrationLaravelCLI

RE NLP System — Thesis Pipeline

Research

Four-study pipeline for automated Requirements Engineering — structured parsing, context-aware classification, and implicit keyword discovery.

End-to-end research system behind my MSc thesis at Ontario Tech University. The pipeline converts raw stakeholder prose into audit-grade knowledge artifacts across four studies: 1. **Studies 1 & 2 (CASCON 2025)** — Multi-model LLM parsing (GPT-4.1, Claude Opus 4, Meta-70B) into High-Level JSON artifacts with versioned tag governance (v0 → v2; v2 precision 0.95 / F1 0.85). 2. **Study 3 (RE'26 Main, under review)** — Sentence-level requirement classification with structured local-context features (+16 F1 at k=1; best F1 0.894 on 4K domain-aligned samples). 3. **Study 4 (RE'26 RE@Next!, in revision)** — 5-phase implicit keyword discovery over a 275,164-edge enriched co-occurrence graph, UMAP+HDBSCAN cluster-gap detection, and a bounded LLM-as-judge tiebreaker. 4. **Cross-cutting infrastructure** — 1,997 HLJ artifacts, 13,725-entry domain dictionary, FAISS index over 768-dim SBERT vectors, full per-decision audit logging, and drift monitoring across pipeline runs.

PythonPyTorchHuggingFaceSBERT / MPNetDeBERTa-v3FAISSUMAPHDBSCANLoRA

The Toolkit

Skills & Education

Technical competencies

Research Areas

Requirements Engineering, NLP for Software Engineering, LLM Governance, Knowledge Discovery, Empirical SE

Languages

Python, TypeScript / JavaScript, PHP (Laravel), SQL, Bash

AI / ML

PyTorch, HuggingFace Transformers, SBERT / MPNet, DeBERTa-v3, FAISS, LoRA Fine-tuning, LLM Applications, RAG Pipelines, UMAP + HDBSCAN

Systems

Distributed Pipelines, Pipeline Orchestration, Feedback Loops, Constraint-based Validation, Agentic Systems

Backend / Cloud

Laravel, Flask, REST APIs, Microservices, RBAC, Docker, AWS, CI/CD, GitHub Actions

Education

  • MASc, Software Engineering (Thesis)
    Ontario Tech University
    2024Present · Oshawa, ON, Canada

    Thesis: “Toward Automated Requirements Engineering: Empirical and Architectural Foundations for Structured Parsing and Knowledge Discovery

Certifications

  • AWS Certified Cloud Practitioner Amazon Web Services (2024)
  • OVIN Microcredential Ontario Vehicle Innovation Network (2024)

Let's talk

Get in touch

I'm open to research collaboration, AI engineering work, and good conversations about NLP for software engineering. The fastest way to reach me is email.

© 2026 Dvip PatelAcademic view · serif & slow