shivamk3r.com
Agentic engineering workflowsLLM and multimodal evaluation systemsWorkflow-backed review platformsProduction full-stack AI products

AI Agent Systems Builder

Shivam Kumar

I build production-grade AI systems, agentic workflows, evaluation platforms, and scalable full-stack products for teams that need reliable execution, not demos.

Shivam Kumar smiling in a casual dark overshirt and mustard tee

Working model

Serious AI systems need more than prompts.

The strongest agentic products are built like operational systems: clear task boundaries, async execution, evaluation artifacts, human review, observability, and deployment discipline.

Agent workflows that finish work

I design systems where agents plan, execute, review, and leave artifacts that humans can trust.

Evaluation as product infrastructure

I care about scoring, review loops, retry behavior, and result provenance because they make AI systems operable.

Full-stack ownership

I move across product surfaces, APIs, workers, data models, and deployment paths without losing the system shape.

Practical ML systems

My background spans LLM platforms, multimodal benchmarks, computer vision, and production automation.

Selected projects

Systems across evaluation and applied ML.

A few representative projects from multimodal benchmarking and computer vision.

View projects

Multimodal benchmarking

VLMBench / multimodal model evaluation

An evaluation tool for comparing multimodal language models with a consistent prediction-to-ground-truth methodology.

  • Evaluated model outputs across image and text tasks using repeatable scoring flows.
  • Built around provider APIs, cloud storage, structured samples, and result analysis.
  • Focused on making model comparisons auditable and easier for teams to reason about.
PythonGCPOpenAI APIGemini APIEvaluation design

Computer vision and applied ML

DSM reconstruction ML system

A patented machine-learning system for generating Digital Surface Models from imagery, reducing reliance on external elevation sources.

  • Built data collection and preprocessing workflows for large-scale model training.
  • Worked across computer vision modeling, evaluation, and production integration.
  • Extended operational coverage for geospatial workflows where elevation data was limited.
PythonPyTorchOpenCVGCPComputer vision