Vibe Coding: Great for MVP But Not Ready for Production (original) (raw)

Vibe coding is a new term that has entered our lives with AI coding tools like Cursor. It means coding by only prompting. We made several benchmarks to test the vibe coding tools, and with our experience, we decided to prepare this detailed guide.

There are many different AI code editors with different features. The most preferred ones are:

These tools have similar features, they use AI models to generate code, modify existing code, and explore code using the prompt given by the user. They can even run terminal commands and solve errors by using error messages.

Some of them also adopt MCP features.

Figure 1: Cursor’s path to 1Mto1M to 1Mto100M MRR.1

Cursor went from 1M to 100M ARR in 2 years, with a fast rise, showing the importance of the topic and the popularity of the tools.

How does it work?

These tools are powered by AI, so they either have their own LLM or offer some LLM integrations like Claude Opus 4.5, and GPT 5.2 Codex, or they can use self-hosted LLMs.

Model performances vary, while some models be better at planning, others can be better at implementation.

Also, some users report that some models are too”self-confident” and adds many unnecessary and unwanted features to the project. Prompting clearly increases the accuracy of the outcome significantly.

What are the best practices of vibe coding?

Planning is the key; every feature must be planned in every detail.

Having it written on .cursorrules or in a file if you are using other tools like of Cursor helps the AI tool stay aligned.

Also, users mentioned that making AI write every applied feature in a separate file helps it follow the guidelines more strictly.

The tools tend to hallucinate in the large codebases, seperating tasks and writing down every step help tool stay aligned with the goals.

Do not forget to use a code review tool before publishing the project to ensure safety.

How will it affect the future of software engineers?

This is a controversial topic:

Optimists claim that these tools help develop software faster and easier. By using these tools, one month’s worth of work can be done in one day. These tools also allow non-developers to build software without the coding skills needed.

Pessimists, on the other hand, say that these tools are killing developers’ coding skills. A junior developer with Cursor is not learning any new skills, and this is a problem for the future. Also, AI handling every task is a huge threat to software development-with its current definition-.

It may also lead to some security issues; therefore, the high-security sectors will not adopt AI-generated code for a while.

As Karpathy said, now most people are just “See stuff, say stuff, run stuff, and copy paste stuff”. This will make ideas more important than coding skills in software engineering.

A realistic point of view

For a software project, usually, some developers and designers are needed. With these tools, a technical but non-developer user can code their own project, and earn money from it.

The definition of software development will likely change in the following years, one with strong skills and creativity will survive, and most of today’s work (especially in the web and app development area) will be replaced by AI.

Please note that we didn’t get any full software in those benchmarks, but it does not mean that the tools are not capable of it. To keep the benchmarks as objective as possible, we did not make further prompting the fix the issues in the codebases.

You can read them in more detail by following the links:

Cursor vs. Windsurf vs. Replit:

We made 2 tasks with Cursor, Windsurf, Replit, Claude Code and Cline.

Screenshot-to-Code:

We tested v0, Bolt, and Lovable by using 5 Figma design screenshots, and asked them to code these. v0 and Bolt are the most successful tools, with more than 70% success rates.

AI Website Creator:

We prompted v0, Bolt, Lovable, and CerebrasCoder to create a website, the leader of the benchmark is v0 with a 90% success rate.

AI Coding Benchmark:

We tested the AI coding assistants across 5 different criteria. Benchmarked tools are Cursor, Amazon Q, Gitlab, Replit, Cody, Gemini, Codeium, Codiumate, Github Copilot, and Tabnine. The overall leader of this benchmark is Cursor.

LLM Coding Benchmark – LMC Eval:

We benchmarked leading LLMs on 100 different logic/math coding questions, OpenAI’s o1 and o3-mini are the leaders of this benchmark.

RevEval – AI Code Review Eval

We benchmarked leading AI code review tools on 309 PRs, since with the vibe coding, the need for them increased significantly. Among the tools tested, CodeRabbit achieved the highest average success rate (80.3%), followed by Greptile (69.5%), GitHub Copilot (69.1%) and Cursor Bugbot (62.3%).

Is AI-generated code safe to use?

AI coding assistants usually generate safe code, but users must be aware that they can hallucinate or leave backdoors in the system. Therefore, the generated code should always be checked by a human expert. It seems so easy to throw away weekend projects with AI-assisted development to write code, but scaling it and making it safe for the customers still requires an experienced developer. Therefore, users should not see it as “copy-paste stuff” but be aware of the workflow.

Cite this research

Pick the format that matches where you're publishing. Pasting the link version into your CMS preserves the backlink.

Cem Dilmegani and Şevval Alper (2026) - "Vibe Coding: Great for MVP But Not Ready for Production". Published online at AIMultiple.com. Retrieved January 21, 2026, from: https://aimultiple.com/vibe-coding [Online Resource]

Dilmegani, C., & Alper, Ş. (2026, January 21). Vibe Coding: Great for MVP But Not Ready for Production. AIMultiple. https://aimultiple.com/vibe-coding

@misc{dilmegani2026, author = {Dilmegani, Cem and Alper, Şevval}, title = {{Vibe Coding: Great for MVP But Not Ready for Production}}, year = {2026}, month = jan, howpublished = {\url{https://aimultiple.com/vibe-coding}}, note = {AIMultiple. Retrieved January 21, 2026} }

Cem Dilmegani

Cem Dilmegani

Principal Analyst

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

View Full Profile

Researched by

Şevval Alper

Şevval Alper

AI Researcher

Şevval is an AIMultiple AI researcher specializing in LLMs, AI agents and quantum technologies.

View Full Profile