HomeAboutExperienceProjectsTestimonialsBlog
ProjectsTestimonialsBlog
DIVYANSHUSONI

Building products that feel right!

Interested in working together? Let's talk.

hello[at]divyanshusoni[dot]com
GitHubGitHub
LinkedInLinkedIn
XX
DEVDev.to
RSSRSS
© 2026 Divyanshu Soni

The views expressed here are my own and do not reflect those of my employer.

Back to Blogs
February 8, 2026

How AI Coding Tools Are Rewiring Software Development

Software development is changing fast. Agents write code, engineers review and orchestrate. Here's what agentic engineering looks like from the inside; the tools, the costs, and the new skills that matter.

On this page

  • Introduction
  • Coding workflow shift among software developers
  • A Real Example: Building Equipment Tracking for BuildTrack
  • What Actually Happened
  • What Broke
  • The Time Math
  • Vibe Coding vs Agentic Engineering
  • The Tools I Actually Use
  • The Opus 4.6 Moment
  • What Actually Matters in 4.6
  • The Benchmark Reality
  • Cost Reality Check
  • What Changed in Developer’s Workflow using Claude Code
  • Agents Still Make Mistakes
  • The Expertise Gap
  • The Token Tax
  • The Security Reality
  • The Design Bottleneck
  • The De-skilling Risk
  • What I Think Is Coming
  • Wrapping Up

Introduction#

Remember below post by Andrej Karpathy about "vibe coding" last year? That casual thought where you just let the LLM handle stuff and go with the flow?

I chuckled at it back then. It felt like a fun experiment for side projects - cool demos, weekend hacks, nothing serious.

A year later, I'm eating my words.

 

My workflow has completely flipped. I went from writing code with occasional AI suggestions to describing what I want and occasionally stepping in when things derail. It happened faster than I expected, and honestly, it still catches me off guard some days.

Let me walk you through what I've actually experienced.


 

Coding workflow shift among software developers#

Here's the thing nobody tells you about working with AI agents: it doesn't feel like coding anymore. It feels like managing.

 

A typical session for me or any software developer using AI tools now looks something like this:

 

  1. Open a GitHub issue
  2. Assign it to an AI agent
  3. Make a cup of coffee (or two, depending on complexity). Haha, just kidding. Sit there and wait for the magic to turn into reality!
  4. Come back to find it either opened a perfect PR... or confidently broke three things while fixing one

On good days, it's magic. On bad days, you're debugging AI-generated spaghetti.

 

This way, I'm not writing software. I'm commissioning it. Read the post below by Karpathy:

Karpathy himself said he went from 80% manual coding to 80% agent coding in just a few weeks.

 

 

Boris Cherny at Anthropic mentioned that 100% of their code is now written by Claude - he shipped 22 PRs one day, 27 the next. All AI-generated.

 

These aren't press releases. These are real people sharing their actual workflows.

 

blog image

 

That diagram above? That's the developers new daily life. The loop between "Agent Writes Code" and "Agent Runs Tests" happens without them touching anything.


A Real Example: Building Equipment Tracking for BuildTrack#

Let me tell you about something I built recently. I've been working on BuildTrack - a construction project management platform. One of the features was an enterprise-grade Equipment/Asset Tracking system. You can find this project on my Github.

 

Here's what that meant: database models for Equipment, EquipmentAssignment, EquipmentMaintenance, EquipmentDocument. API routes with Zod validation for CRUD operations, check-in/check-out flows, maintenance scheduling, utilization analytics. Plus Terraform infrastructure, Docker containerization, AWS Secrets Manager integration, the whole stack.

 

A year ago, this would've been a two-week sprint minimum. With AI agents? Most of it shipped in days.

What Actually Happened#

I started with the Prisma schema. Described the data model I wanted - equipment items, who they're assigned to, maintenance records, related documents. The agent generated the schema, but here's where it got interesting: it also fixed relations I hadn't specified. When I said "Equipment belongs to a Tenant and can be assigned to Users," it inferred the back-relations I'd forgotten on the User model.

 

Then came the API routes. I described the endpoints I needed. The agent wrote them with proper Zod validation, error handling, pagination - stuff I would've added but might have been lazy about in a first pass.

What Broke#

Not everything was smooth:

 

  1. The Prisma client issue. After regenerating the schema, TypeScript started throwing errors. The agent had added models but the Prisma client wasn't regenerated. It took me a few minutes of confusion before I realized the agent hadn't run npx prisma generate. Simple fix, but it shows you can't fully check out.

  2. Overcomplicated the maintenance scheduler. First version had like 400 lines of code for something that could've been 80. I had to say "simplify this, we don't need support for recurring schedules yet" and it immediately cut it down.

  3. Docker networking assumptions. The agent assumed I wanted a specific MongoDB setup. It worked, but the credentials were hardcoded. I had to explicitly ask for environment variable configuration.

The Time Math#

TaskEstimated (Manual)Actual (AI-Assisted)
Database schema + migrations3-4 hours15 minutes
API routes (6 endpoints)1-2 days1 hour
Terraform infrastructure1 day1 hour
Docker + CI/CD setupHalf day1 hour
Debugging AI mistakesN/A2 hours

Total: ~2 days (including assisting AI + generating code) instead of ~2 weeks. Even with the debugging overhead, it's still a massive win.

 

The point isn't that AI did everything perfectly. It didn't. The point is that the bottleneck shifted from writing code to reviewing code - and reviewing is faster.


Vibe Coding vs Agentic Engineering#

Let's clear something up. What we're doing now isn't vibe coding anymore.

Vibe coding was casual - you'd ask the AI something vague and hope for the best. It worked for throwaway projects. It almost worked for real stuff.

 

What's happening now is different. People are calling it agentic engineering, and the name matters:

  • Agentic because you're orchestrating agents, not writing code directly
  • Engineering because there's still skill, expertise, and craft involved

 

It's not magic. It's a different kind of problem-solving.

blog image

The difference is oversight. I'm still responsible for what ships. I just changed how I get there.


The Tools I Actually Use#

Everyone asks about tools. Here's my honest setup:

 

ToolWhat I Use It ForWhy It Works
Claude CodeComplex refactors, terminal workflowsDeep reasoning across files
CursorWhen I need precise controlGreat VS Code integration
Copilot WorkspaceHigh level Planning & PR managementTask-to-Plan
Antigravity IDEFull task delegation & E2E featuresAgent Orchestration
OpenAI CodexHeavy-duty Agentic workflowsAutonomous Command Center

No single tool dominates. I switch between them depending on the task. Multi-tool workflow is the reality.


The Opus 4.6 Moment#

Three days ago, Anthropic dropped Opus 4.6. I didn’t get the chance to try it out as of now in full-fledged mode but I’m seeing lot of noise and its applications around it and how big the context window it provides, ofcourse with higher rates.

 

Here's my take as someone who's been building with these models daily: this release changes the math on what's practical for agentic workflows.

What Actually Matters in 4.6#

The headline features aren't just marketing:

1M token context window (in beta). That's not a typo. You can now load an entire medium-sized codebase into context. Previously, I had to carefully curate which files to include. Now I can feed it the whole src/ directory and let it figure out what's relevant. The catch? Premium pricing kicks in above 200K tokens ($10/$37.50 per million input/output vs the standard $5/$25).

 

Agent teams. Claude Code now lets you spin up multiple agents that work in parallel. You can test this on a codebase review; three agents running simultaneously, each analyzing different parts of the code. They coordinate autonomously. You can jump into any subagent with Shift+Up/Down. It feels weird at first, like managing a small team instead of using a tool. See the below code by Lydia Hallie (@AnthropicAI) in which she explains how Claude Code can spin up multiple agents in parallel.

 

Context compaction. This is subtle but huge for long-running tasks. When the conversation approaches the context limit, Claude automatically summarizes older context and keeps going. No more "let me start a new chat because we hit the limit."

 

Adaptive thinking. The model now decides when to think deeper. Previous versions were binary; extended thinking on or off. Opus 4.6 reads the room. For complex architecture decisions, it thinks longer. For simple refactors, it moves fast.

The Benchmark Reality#

I'm not usually a benchmarks person, but some numbers here caught my attention:

 

  • Terminal-Bench 2.0 (agentic coding): Opus 4.6 hits the top score
  • Humanity's Last Exam (complex reasoning): Leads all frontier models
  • GDPval-AA (knowledge work tasks): Outperforms GPT-5.2 by ~144 Elo points
  • MRCR v2 (needle in haystack, 1M context): 76% accuracy vs Sonnet 4.5's 18.5%

 

That last one is the practical differentiator. Context rot was the reason agentic workflows would break down after 30 minutes. The model would "forget" something critical you mentioned earlier. Opus 4.6 holds context better than anything I've used.

Cost Reality Check#

Let me break down what this actually costs in practice:

ScenarioContext SizeInput CostOutput Cost
Standard request< 200K tokens$5/MTok$25/MTok
Long context> 200K tokens$10/MTok$37.50/MTok
US-only inferenceAny1.1x standard1.1x standard
Fast mode (beta)Any6x standard6x standard
Batch processingAny50% discount50% discount

 

The batch processing discount is interesting for background agents. If you're running overnight code reviews or test generation, that 50% cut adds up.

 

My cost optimization strategy:

  • Use standard context for interactive sessions
  • Batch non-urgent tasks overnight
  • Reserve the 1M context for genuine large-codebase scenarios
  • Fast mode only when latency matters more than cost

What Changed in Developer’s Workflow using Claude Code#

  1. Bigger context, fewer sessions. Instead of breaking work into multiple conversations, you can load everything once and let it rip.

  2. Agent teams for reviews. You can spin up parallel agents for PR reviews; one for logic, one for security, one for style. They catch things a single pass would miss.

  3. Let it think. Stop micromanaging the /effort setting. The adaptive thinking is good enough now that I trust it to calibrate.

  4. Compaction for long sessions. You no longer need to restart conversations when context fills up. Just let compaction handle it.

The release notes mention that Anthropic builds Claude with Claude. Their engineers use Claude Code daily. Opus 4.6 is what they've been testing internally. That shows.


Agents Still Make Mistakes#

🔴 Caution

I want to be real here. AI agents are not perfect. Not even close.

 

The mistakes aren't simple syntax errors anymore. They're subtle conceptual mistakes - the kind a hasty junior dev might make:

 

  • They assume things without asking. The agent decides something on your behalf and runs with it if enough context is not provided.
  • They don't clarify. Instead of saying "I'm not sure what you meant," they guess and proceed.
  • They overcomplicate. They'll write 1000 lines when 100 would do, and you have to tell them to simplify.
  • They leave dead code. Refactors often leave zombie functions lying around.

 

Karpathy put it perfectly:

ℹ️ Note

"They will implement an inefficient, bloated construction over 1000 lines and it's up to you to be like 'couldn't you just do this instead?' and they'll say 'of course!' and cut it down to 100 lines."

 

So yeah, oversight still matters. Drawing from the discussions above, the below pie reflects the time distribution of developers coding with AI tools today.

blog image


The Expertise Gap#

 

Here's something that really hit me. As OpenAI Co-founder have recently suggested, we are in a "step function" transition. For top-tier engineers, the tool of first resort is no longer the editor; it’s the Agent.

 

 

The AI isn't replacing the engineer; it is automating the "slop" (the boilerplate and the manual wiring) so the engineer can focus on the system architecture. But here’s the catch: You cannot automate what you do not understand.

 

 

To bridge this gap, the modern workflow has shifted from "writing code" to "curating trajectories":

  • Creating AGENTS.md: Documenting where the AI struggles so it learns the nuances of your specific codebase.
  • Building Skills: Writing reusable tools (MCP servers or CLIs) so your agent can interact with internal infrastructure directly.
  • Refusing Slop: Maintaining a strict bar for human accountability. If you don't understand the code the agent generated, you aren't an engineer—you're a spectator.

 

 

The Verdict: AI multiplies your existing knowledge. If your knowledge is zero, 0×100 is still zero. But if you know how to architect a system, AI turns your 2-week roadmap into a 2-day sprint.

 

blog image

The AI multiplies your existing knowledge. It doesn't replace it.


The Token Tax#

Nobody talks about the cost side enough. Running agentic workflows isn't cheap.

 

Andrew Pignanelli from General Intelligence Company shared that his company spent around $4000 per engineer per month on Opus tokens in January 2026. That's a real line item in the budget now.

 

                                     https://www.generalintelligencecompany.com/writing/agent-native-engineering

But here's the flip side: his engineers shipped an average of 20 PRs a day. Sometimes hundreds of commits daily. 20% more spend for 3-4x output? That math works out.

 

The scary part is runaway agents. One developer asked Claude to remind him to check his kid's homework. Token usage? 3.2 million.

What happened? The agent thought it needed to check the homework itself. It scanned directories, sent every image to a multimodal model, searched websites, found nothing, and finally sent the simple reminder it should've started with.

 

 

What I do to manage costs:

  1. Route aggressively - small models for boilerplate, frontier models for hard stuff
  2. Cache context - don't re-index the codebase every request
  3. Batch related tasks together
  4. Monitor token usage by task type

The Security Reality#

Here's the uncomfortable truth: we're generating code faster than we can audit it.

 

⚠️ Warning

Some predictions say 30% of new security vulnerabilities by 2027 will stem from AI-generated logic. Not because AI writes malicious code - but because we trust AI output without proper review.

 

 

 

Now look at the post below:

                                                                             A post from https://x.com/simonw

Simon Willison has been warning about something else. Systems that give agents access to email, browsers, and external services create what he calls the "lethal trifecta":

 

  • AI with access to private data
  • Ability to take actions in the world
  • Vulnerability to prompt injection

He calls it his "most likely to result in a Challenger disaster" scenario.

 

What I've added to my workflow:

  • Never merge AI-generated code without automated security scanning
  • Static analysis, dependency checks, secrets detection; non-negotiable
  • PII-redacting filters before hitting the LLM
  • Adversarial testing targeting AI-specific vulnerabilities

The Design Bottleneck#

Andrew Pignanelli, CEO of General Intelligence Company, and his team learned this the hard way. They sped up engineering significantly in December. By January, they were way behind on design and UX.

 

Makes sense when you think about it. Engineers can now spin up features at the speed of thought. But those features still need to look good and feel intuitive.

That traditional 1:20 designer-to-engineer ratio? Probably too low now.

 

blog image

 

If you're only focused on building, you might end up with functional but clunky products.


The De-skilling Risk#

I think about this one a lot. Heavy AI reliance means you code less. Code less means your manual skills atrophy.

Karpathy noticed it in himself - his ability to write code manually is already starting to fade.

 

Here's the problem: if you can't write code, how do you know when the AI output is subtly wrong?

Generation (writing code) and discrimination (reviewing code) are different cognitive skills. You can lose one while keeping the other, but it's risky.

 

[!TIP]What we should do: Rotate through "AI-off" development sprints quarterly. It feels weird, almost nostalgic. But it's risk mitigation. You need people who can debug when the AI fails.


What I Think Is Coming#

A few predictions people are throwing around for end of 2026:

  • IDEs as we know them could largely go away
  • Human code review decreases significantly
  • Humans only review product changes and major infrastructure
  • Engineers become more like product managers, and vice versa

Longer term speculation:

  • Background agents handle nearly everything by default
  • Frontend code largely automated
  • Teams get smaller but more capable
  • Designers matter more; pure coders matter less. You need to have product-level thinking.

 

I don't know which of these will prove accurate. But they're worth thinking about.


Wrapping Up#

My LinkedIn still says "Software Engineer." The job looks nothing like it did 18 months ago.

I'm not primarily a writer of code anymore. I'm a specifier, reviewer, writer, prompter and orchestrator.

This isn't the death of engineering skill; it's a redistribution. The premium now is on architectural judgment, quality assessment, and the meta-skill of working effectively with AI.

Something shifted around December 2025. LLM agent capabilities crossed some threshold of coherence and caused a phase shift in how we build software.

Despite all the rough edges, programming feels more fun now. The drudgery is removed. What remains is the creative part. Less feeling blocked, more courage to try things.

But there's a split coming. Engineers who primarily liked coding will have a different experience than those who primarily liked building.

I liked building.

Build accordingly.


This post incorporates perspectives from Andrej Karpathy, Simon Willison, Boris Cherny, and others at the frontier. Views are my own. I'm still figuring this out like everyone else.

 

Support my work

If this post was useful, consider supporting my open source work and independent writing.

Sponsor on GitHubBuy me a coffee

Back to Blogs