Vatché Chamlian

Vibe Coding Platforms: The Promise vs. Reality of AI-Powered App Development

2025-05-29T00:00:00+00:00

One of the biggest barriers to developing applications has always been coding. If you have a brilliant idea but lack programming skills, you’d typically need to hire a developer or learn to code yourself. Enter “vibe coding” platforms—AI-powered tools that promise to build applications through natural conversation. But do they live up to the hype?

I’ve spent months testing these platforms, investing real money to access full feature sets across multiple services. From invite-only beta platforms to established players, I tested Envato, Build.ai, Builder.ai, Replit, Lovable, Tempo, Emergent, and several others. Here’s what I discovered about the current state of conversational app development.

The Testing Ground: A Real-World Project

For consistency, I asked each platform to build the same application: an event management tool for my wife’s special events role at a local private school. The requirements included OAuth authentication, user profiles, event creation and management, team member invitations, and budget tracking—a reasonably complex application that would test each platform’s capabilities.

What They All Get Right: The Magic of First Impressions

The initial results were genuinely impressive. Every platform I tested could take my description and generate a mostly functional application from a single prompt. Within minutes, I had working prototypes with:

OAuth authentication systems
User profiles and management
Event creation and editing interfaces
Team invitation functionality
Budget tracking components
Basic responsive design

This first iteration capability is transformative. For rapid prototyping or proof-of-concept development, these tools are unmatched. The speed from idea to working demo is remarkable and represents a genuine breakthrough in application development accessibility.

The Pricing Puzzle: Different Models, Different Pain Points

The platforms take notably different approaches to monetization:

Credit-Based Systems (Emergent)

Pay-per-use model with real-time credit consumption
Deployment costs ~50 credits ($20 USD)
Costs escalate quickly with iterations
Transparent but expensive for extensive development

Subscription + Microtransactions (Replit)

$25/month base subscription
Agent checkpoints: $0.25 each
Assistant checkpoints: $0.05 each
Free deployments
Occasionally waives fees for AI-caused errors

The Hidden Truth: Builder.ai’s Revelation

One particularly eye-opening discovery was Builder.ai, which marketed itself as an AI coding platform but actually employed human developers working behind the scenes. This “smoke and mirrors” approach highlights the importance of understanding what’s actually powering these platforms.

The Critical Flaw: Where AI Development Breaks Down

Here’s where every platform I tested failed: iteration and feature addition. The moment you try to modify or extend the initial application, the AI systems struggle with code organization and context management. I encountered numerous examples of this breakdown:

Case Study 1: The SVG Disaster

When requesting an update to an SVG code snippet, one platform generated malformed code with a closing tag vg>, causing compilation errors that required additional credits to resolve.

Case Study 2: The Button Color Catastrophe

I requested simple form validation that would change a button’s color to green when all fields were completed. The AI successfully implemented this feature, but somehow broke:

Login functionality
File upload capabilities
User account creation
API endpoint connections
Nearly every other system component

The button turned green perfectly, but the application became unusable.

The Solution: Hybrid Development Approach

The most effective strategy I discovered combines these platforms’ strengths with traditional development tools:

Use vibe coding for rapid prototyping
Export to GitHub (most platforms offer this)
Continue development locally with traditional AI coding assistants
Leverage better context windows in tools like Claude, Copilot, or Cursor

This approach gives you the speed of initial AI generation with the control and reliability of established development workflows.

Platform Comparison

Platform	Pricing Model	GitHub Integration	Best For	Major Limitations
Replit	$25/mo + checkpoints	✅	Full development cycle	Complex feature additions
Emergent	Credit-based (~$20/deploy)	✅	One-off prototypes	Cost escalation
Tempo	Subscription	✅	Rapid prototyping	Limited customization
Lovable	Subscription	✅	UI-focused apps	Limited backend complexity

Replit: The Developer’s Choice

Among all platforms tested, Replit emerged as the most developer-friendly option. It provides:

Integrated development environment with full IDE capabilities
Built-in deployment pipeline with autoscaling
Complete GitHub integration and management
Object storage and database creation tools
Comprehensive logging and console access
Cost bypass mechanism through GitHub sync (changes pulled from GitHub don’t count as checkpoints)

However, Replit has limitations with complex operations like PDF processing, where external services (Lambda functions with S3 storage) become necessary.

Pro Tips for AI-Assisted Coding

Through extensive testing, I’ve identified several strategies that dramatically improve results:

1. Code Organization Strategy

AI models tend to cram everything into single files. Use this prompt to improve maintainability:

Please review [filename].tsx and break up the functions into separate files. 
I'd like to organize the code into these categories: services, handlers, 
endpoints, and middleware. Each category should be in its own file.

2. Planning Before Coding

Always establish approach before implementation:

Do not write any code yet. First, provide me with your approach to 
[describe your goal] and ask me if I agree with it. We should ensure 
we're in agreement before writing any code.

3. Controlled Implementation

Once you’ve agreed on the approach:

I agree with this approach. Please update only one file at a time and 
ask if I have questions about the changes. If I don't have questions, 
you can write [Option A: the complete file] or [Option B: only the 
changed code with clear start/end markers].

Option A works better for non-integrated tools Option B enables longer conversations by conserving context window

IDE Integration: Augment Leads the Pack

For traditional IDE-based development, I recommend Augment. It excels at:

Rapid code indexing
Contextual code suggestions
Natural language code queries
Seamless VS Code integration

The Verdict: Promise Partially Delivered

Vibe coding platforms represent a genuine breakthrough in application development accessibility, but they’re not the complete solution they promise to be. They excel at:

Rapid prototyping
Initial application generation
Lowering barriers to entry
Proof-of-concept development

However, they struggle with:

Iterative development
Complex feature additions
Code maintainability
Context management

Looking Forward

The future likely belongs to hybrid approaches that combine the rapid generation capabilities of vibe coding platforms with the precision and control of traditional development tools. As these platforms mature and improve their iteration capabilities, they may eventually deliver on their full promise.

For now, treat them as powerful prototyping tools that can jumpstart your development process, but be prepared to transition to traditional development methods for serious application building.

The democratization of app development is happening, just not quite as seamlessly as the marketing suggests. The key is understanding these tools’ strengths and limitations, then using them strategically within a broader development workflow. Like I mentioned in my previous post about Are Coding Skills Following the Typists Path, the future belongs to those who can effectively prompt, architect, direct, and integrate with AI tools.

Have you experimented with vibe coding platforms? Share your experiences and insights in the comments below. ```

From ‘Works on My Machine’ to ‘Works for Everyone’

2025-05-08T00:00:00+00:00

A long time ago, when I was working in the Drupal CMS space, I was introduced to Lando. It was one of the first times I had seen a docker container impact the workflow of a project. It was not easy to setup initially but in the end the result was so positive, that it could not be ignored. Development environments have evolved significantly since then to solve the “works on my machine” problem.

In this post we will be getting technical, so if that is not your thing, don’t feel bad about hitting the back button.

Remember when "works on my machine" was a valid excuse? Docker containers ended that era and we're all better for it.

1. Docker and Dev Containers

What are Dev Containers and how do they work?

Dev Containers are development environments containerized using Docker that allow developers to use a consistent, pre-configured environment. They encapsulate dependencies, runtimes, and tools needed for development.

Dev Containers work by leveraging Docker’s containerization technology but with a focus on development rather than deployment. When a developer opens a project with Dev Container support (in VS Code or other compatible IDEs), the IDE builds and runs the container, then connects to it for development tasks like editing, debugging, and running code.

How do “.devcontainer/Dockerfile” and “.devcontainer/devcontainer.json” work together?

These two files form the foundation of a Dev Container:

“.devcontainer/Dockerfile”: Defines the base container image and steps to install required tools and dependencies.

FROM python:3.11
RUN apt-get update && apt-get install -y \
    git \
    curl \
    && rm -rf /var/lib/apt/lists/*
RUN pip install poetry

“.devcontainer/devcontainer.json”: Configures how the Dev Container integrates with the IDE and environment.

{
  "name": "Python Project",
  "build": {
    "dockerfile": "Dockerfile",
    "context": ".."
  },
  "customizations": {
    "vscode": {
      "extensions": ["ms-python.python", "ms-python.vscode-pylance"]
    }
  },
  "forwardPorts": [8000],
  "postCreateCommand": "poetry install"
}

The Dockerfile builds the container, while the devcontainer.json file configures how the IDE interacts with it, including IDE extensions to install, ports to forward, and commands to run after container creation.

Benefits for team collaboration

Consistency: Every team member works in the exact same environment, eliminating “works on my machine” problems

Onboarding: New developers can be productive within minutes by simply opening the project in their IDE

Isolation: Projects with different dependencies don’t conflict with each other

Version control: The development environment itself is versioned alongside the code

How they help achieve parity with production

Dev Containers can use the same base images as production containers, shared dependencies ensure development behaviors match production, environment variables can be configured similarly to production, and service dependencies (databases, message queues) can be included via Docker Compose.

Dev containers don't just solve "works on my machine"—they solve "works exactly like production" too.

2. Cloud Development Environments

What are cloud IDEs like Google’s Project IDX or GitHub Codespaces?

Cloud Development Environments provide fully functional development environments hosted in the cloud and accessible through web browsers or local IDEs. They eliminate the need to set up local development environments completely.

GitHub Codespaces: Pre-configured cloud environments integrated with GitHub repositories

Google’s Project IDX: Google’s cloud development platform designed for web and mobile app development

GitPod: Open source cloud development environments that can integrate with GitHub, GitLab, and Bitbucket

How do they differ from local Dev Containers?

Resource allocation: Cloud environments use cloud resources instead of local computer power

Access: Accessible from any device with a web browser

Setup time: Instant access without local Docker installation or configuration

Cost model: Usually involves usage-based pricing rather than local hardware costs

Performance: Network latency can affect the development experience

Configuration files they use

GitHub Codespaces: Uses the same “.devcontainer” configuration as local Dev Containers

Project IDX: Uses “.idx/dev.nix” configuration files based on the Nix package manager

Example “.idx/dev.nix” for Project IDX:

{ pkgs, ... }: {
  channel = "stable";
  
  packages = [
    pkgs.nodejs_20
    pkgs.yarn
    pkgs.python311
  ];
  
  idx.extensions = [
    "dbaeumer.vscode-eslint"
    "esbenp.prettier-vscode"
  ];
  
  idx.previews = {
    enable = true;
    previews = [
      {
        command = ["npm" "run" "dev"];
        manager = "web";
        id = "web";
      }
    ];
  };
}

Advantages and limitations

Advantages:

Work from anywhere with internet access
No local setup required
Consistent environment for all team members
Easily scalable resources for intensive tasks
Collaboration features like real-time pair programming

Limitations:

Requires internet connectivity
Potential latency issues
Monthly costs for team usage
Less control over the underlying infrastructure
Privacy/security concerns with proprietary code in cloud environments

3. Other Approaches

How do tools like Docker Compose fit into development workflows?

Docker Compose allows developers to define and run multi-container Docker applications. It’s often used alongside Dev Containers to set up supporting services needed for development (databases, caches, message queues), create a network of interconnected services that mirror production, and manage environment variables and volumes across multiple containers.

Example “docker-compose.yml”:

version: '3'
services:
  app:
    build: .
    ports:
      - "8000:8000"
    volumes:
      - .:/app
    depends_on:
      - db
      - redis
  
  db:
    image: postgres:14
    environment:
      POSTGRES_PASSWORD: devpassword
      POSTGRES_USER: devuser
    volumes:
      - pgdata:/var/lib/postgresql/data
  
  redis:
    image: redis:7
    ports:
      - "6379:6379"
volumes:
  pgdata:

Differences between Dev Environments and Docker Compose

Dev Containers focus on the development environment itself (IDE integration, extensions, tools)

Docker Compose orchestrates multiple services that work together

Dev Containers can integrate with Docker Compose to provide both aspects

Role of package managers like “uv” and task runners like “just”

Modern package managers like “uv” (for Python, written in Rust) improve dependency management speed and reliability. I highly recommend “uv” it is so much faster.

Task runners like “just” provide a consistent interface for common development tasks

Example “justfile”:

default:
    @just --list

# Run project unit tests
test:
    uv run -- pytest

# Run MLflow server
mlflow:
    uv run -- mlflow server --host 127.0.0.1 --port 5000

# Serve latest registered model locally
serve:
    uv run -- mlflow models serve -m models:/mymodel/latest -h 0.0.0.0 -p 8080

These tools help standardize common development tasks across the team, regardless of the environment they’re working in.

Modern dev tools like 'uv' and 'just' make containerized environments feel as smooth as native development—but with way better consistency.

4. Best Practices

When to choose each approach

Dev Containers: For teams with complex development environments who want IDE integration

Cloud Development: For distributed teams, or when onboarding needs to be extremely fast

Docker Compose: For applications with multiple interconnected services

Package managers/task runners: As complementary tools in any environment

Ensuring development matches production

Use the same base images and version tags when possible, document all dependencies explicitly, use infrastructure-as-code to define both environments, test in a staging environment that mirrors production before deployment, and include all critical services in the development environment.

Trade-offs between simplicity and completeness

Simple environments are faster to set up but may miss edge cases

Complete environments catch more issues but require more resources and maintenance

Start with the minimal viable environment and incrementally add complexity as needed

Focus on matching the aspects of production that affect development most directly

Managing environment variables

Use “.env” files for development-specific variables, never commit production secrets to version control, consider tools like “direnv” to manage environment switching, use secret management services for production environments, and define default values in the codebase with clear documentation.

Example approach with “.env.example” and “.gitignore”:

# .env.example (committed to version control)
DATABASE_URL=postgresql://devuser:devpassword@db:5432/devdb
REDIS_URL=redis://redis:6379/0
API_KEY=example_key_for_development

# .gitignore
.env

Real-world scenario: Full-stack web application

For a typical full-stack web application with a React frontend, Node.js API, and PostgreSQL database:

Dev Container approach:

“.devcontainer/Dockerfile” with Node.js, PostgreSQL client tools
“.devcontainer/devcontainer.json” with VS Code extensions for React, Node
“docker-compose.yml” for PostgreSQL service

Cloud IDE approach:

GitHub Codespaces configuration with the same Dev Container setup
Environment variables set through the Codespaces secrets

Local only approach:

“docker-compose.yml” with services for frontend, backend, and database
Volume mounts for live code reloading

Hybrid approach:

Dev Container for the development environment
Docker Compose for service dependencies
Task runner (“just” or npm scripts) for common commands
Environment managed through “.env” files with “.env.example” templates

The best solution depends on your or your team’s specific needs, but containerized environments (either local or cloud-based) have been leveraged to ensure consistency and reduce onboarding friction for a while now.

The evolution from “works on my machine” to “works for everyone” represents more than just a technical advancement—it’s a fundamental shift in how we think about development environments. We’ve moved from treating environment setup as a necessary evil to embracing it as a core part of our development workflow.

Whether you choose local dev containers, cloud development environments, or a hybrid approach, the key is consistency and reproducibility. The days of spending hours debugging environment-specific issues are largely behind us, replaced by systems that ensure every developer on your team can be productive from day one.

The infrastructure choices you make today will determine how smoothly your team scales tomorrow. Choose tools that grow with your team and make the complex simple, not the simple complex.

I hope you found this post helpful, thanks for reading.

Are Coding Skills Following the Typist’s Path?

2025-05-02T00:00:00+00:00

A friend of mine Harry Hoffman recently shared an article by Sangeet Paul Choudary titled “The many fallacies of ‘AI won’t take your job, but someone using AI will’”. The article offers a thought-provoking analysis of how AI is transforming work. While the article covers multiple professions and systems, I’d like to focus on what this means specifically for software and web developers and how those professions are experiencing a transformation similar to what typists and other roles mentioned in the article faced.

I know that the mere writing of this article is going to strain some of my relationships. I am not taking this lightly, I love coding, especially the late night coding session, but this is something I believe is happening and I want us to start talking about it. I have had many arguments about how AI will impact our profession, but many times it is dismissed, because “the context windows are too small” or “there are too many security implications”, etc. But one quote always comes to mind “AI is the worst it will be, today.” This has proven to be true time and time again. Today’s models, platforms, agents, MCPs, are being iterated at speeds we have never seen before. MCPs to my knowledge were not even around when AI was first released to the general public.

Let’s use web developer transitions as an example:

Early days: Manual HTML coding, custom CMSs
AJAX revolution (~2005): Dynamic content without page reloads transformed user experiences
Mobile/responsive design era (~2010): Forced developers to think beyond desktop
Middle era: CSS/JS frameworks emerged, PHP/.NET for CMS development
JavaScript frameworks (Angular, React, Vue): Single-page applications changed architecture
API-first/headless CMS approach: Decoupled frontend and backend development
Current trend: No-code tools empowering marketers

Each transition required developers to adapt their skills or risk obsolescence. The profession hasn’t disappeared—it’s continuously transformed as lower-level tasks become commoditized while new specialties emerge at higher levels of abstraction.

We've been through this before: AJAX, mobile, frameworks, APIs. Each time, the job evolved. AI coding isn't different—it's just happening faster.

The Typist Parallel

The most striking parallel from the article is the example of typists in the late twentieth century. As Choudary notes, when word processors emerged, typists believed that “Word processors won’t take your job, but someone using a word processor will.” However, what actually happened was more profound:

“Typists weren’t outcompeted by better typists. They were displaced by a new system design in which typing no longer justified a full-time role. […] Typing became embedded across all workflows. From a specialized task requiring specialized skills, it became a basic task that everyone could perform.”

Today’s software developers face a similar inflection point. The emergence of AI coding assistants like GitHub Copilot, Claude Code, Augment, and other LLM-based tools has begun to transform the nature of coding itself. Just as word processors made document editing cheap and accessible to everyone, these AI tools are making certain aspects of coding more accessible to non-specialists.

Beyond Task-Level Thinking

Focusing on automation versus augmentation at the task level misses the bigger picture—how entire systems are being restructured. For developers, this means that the value isn’t just in writing code anymore, but in understanding how to design systems that leverage AI effectively.

The fallacy of task-based thinking is particularly relevant here. While we might believe that AI will either automate coding (replacing developers) or augment coding (making developers more productive), the reality is that the entire system of software development is changing. The value of certain coding tasks is diminishing while other capabilities are becoming more important.

Skills Commoditization

Another key insight from the article is how certain skills become commoditized:

“Companies that adopt AI for task acceleration will soon realize that when tools are widely available and easily replicated, productivity becomes a commodity.”

Basic coding skills are becoming commoditized through AI. The ability to implement standard algorithms or create boilerplate code is increasingly handled by AI tools. This commoditization is reshaping what makes a developer valuable.

When everyone has AI that can write React components, the value isn't in writing React—it's in knowing what to build and why.

From Coder to System Designer

Just as the article describes shifts in various fields like basketball (from fixed positions to fluid roles), developers are experiencing a rebundling of their roles. The job is evolving from writing code to designing systems and workflows where humans and AI work together effectively.

This mirrors what Choudary calls “the static jobs fallacy”—the mistaken belief that jobs remain fixed units while tools change. Instead, entire roles get unbundled and rebundled around new priorities. For developers, this means less focus on implementation details and more emphasis on architecture, coordination, and problem framing.

The Power Shift

The article discusses how tools like Excel shifted organizational power toward those who could control spreadsheet cells. Similarly, AI coding tools are shifting power within development teams. Those who can effectively prompt, direct, and integrate with AI tools may gain influence, while those who primarily contributed through writing code may see their relative value decrease. Imagine you spent all your time training to be a coder only to find that the majority of your time is doing AI code reviews.

What This Means for Developers of the Future

Focus on system design as much as implementation: As AI handles more implementation details, value shifts to those who can design effective systems and architectures.

Develop prompt engineering skills: Understanding how to effectively direct AI tools becomes a crucial skill, similar to how Excel proficiency became valuable in the 1990s.

Strengthen domain knowledge: Deep understanding of business domains becomes more valuable as coding itself becomes commoditized.

Embrace coordination roles: As development becomes more modular with AI handling components, the ability to coordinate and integrate becomes critical.

Prepare for salary pressure: As the article suggests with session musicians, we may see more work but potentially at lower rates as basic development skills become easily accessible.

The developers who thrive won't be the ones who code fastest—they'll be the ones who understand what should be built and can orchestrate AI to build it.

The transformative impact of AI on software development isn’t simply about whether AI will replace developers or make them more productive. Rather, it’s about how the entire system of software development is being restructured. The most successful developers won’t be those who simply use AI tools to code faster, but those who recognize how the field is being fundamentally reshaped and position themselves accordingly.

As Choudary eloquently puts it: “The real advantage is not in making existing workflows faster, but in being first to build the new ones that won’t need those steps at all.” For developers, this means looking beyond code production to the new forms of value creation that are emerging because of AI.

The typist didn’t disappear because they weren’t good at typing. They disappeared because typing became something everyone did as part of their job. We might be witnessing the same transformation with coding—where the act of writing code becomes a basic skill embedded in many roles, rather than a specialized profession unto itself.

The question isn’t whether you’ll still have a job as a developer. The question is: what kind of developer will you become when coding is no longer the primary source of your value? The answer to that question will determine not just your career trajectory, but your relevance in an AI-integrated world.

This transformation is happening now, not in some distant future. The developers who recognize this shift and begin adapting today will be the ones who shape the new landscape of software development. Those who cling to the old model of value creation may find themselves in the same position as typists who insisted that word processors would never replace their specialized skills.

The choice is ours to make, but we need to make it soon.

Prompt Engineering: 10 Fatal Flaws & 3 Templates That Work

2025-03-09T00:00:00+00:00

In this post we will cover some common mistakes I have seen people make in my training sessions and lunch-n-learns. I will also be providing you with some of the prompting methods that I have used successfully.

When search engines first came out, people were typing into them conversationally. The search engine had to parse out the extra words and find the user what they were looking for. Searches looked like: “I need help with changing my faucet, it is leaking everywhere”. And as people got better with search the search query would look like “replace faucet how-to”. This became a skill, identifying the key words on your own and getting the best search results because of it.

But generative conversational AI is the opposite. The more context you can provide, the clearer you are with what you are looking to get out of this conversation, the better your results will be. So as my Sifu Yon Lee used to tell me:

“empty your cup, it is time to learn something new”

Mistakes were made

These are the top ten most common mistakes, they are not in a particular order but I would definitely put vague or ambiguous at the top if it were ranked.

Using vague or ambiguous requests: Specific instructions with examples yield much better results than general queries.

Overloading with multiple questions: Asking several unrelated questions in one message often results in incomplete answers or overlooked points.

Providing insufficient context: The AI doesn’t know your specific situation unless you explain it. Without proper context, responses will be generic.

Expecting perfection without iteration: The best results often come through a refinement process rather than expecting the perfect output immediately.

Neglecting to specify format requirements: If you need information presented in a particular way (bullet points, tables, code), specify this upfront.

Failing to set constraints: Without parameters like length, tone, or audience, the AI might produce content that doesn’t match your needs.

Abandoning conversations prematurely: When responses aren’t quite right, providing feedback helps the AI adjust rather than starting over.

Assuming perfect recall: While these systems can reference conversation history, they may not perfectly recall every detail from earlier exchanges.

Not using system prompts for complex tasks: Establishing a framework upfront (like a collaborative coding workflow) creates more efficient interactions.

Treating AI as a search engine: These systems work best as reasoning partners rather than fact retrieval systems, though many can now use search tools when needed.

Prompt engineering isn't about finding magic words—it's about clear communication and understanding how AI systems think.

The iterative approach

If you had a particular thing you were working on and you were using tools like Claude or ChatGPT, you may have found yourself going back and forth until you got the result you wanted. Sometimes the process was fun and maybe the “mistakes” that the LLM made were interesting and took you in a new direction. Or maybe you were frustrated because you felt like the iterative process was taking too long, “I thought this was supposed to make my life easier.” But the biggest missed opportunity is ending the conversation when they got what they wanted. “Ah, this is what I was looking for,” and then they copied/pasted and moved on.

Take a minute

You just spent fifteen minutes going back and forth with a machine and you got the two paragraphs you wanted just right, or the code snippet finally works. In general, but especially when you are first starting out, take a minute and paste this into the chat:

I'm satisfied with this result. To help me improve future interactions:
What specific elements in my initial prompt were most effective?
What was missing or unclear that would have helped you understand my request faster?
Are there any examples or additional context I could have included?
What assumptions did you make that I could have clarified upfront?
Based on your answers to these questions and our conversational history, provide me with a better prompt that would achieve this result more efficiently.

If you have some more time, try pasting the newly generated prompt into a new chat and see how close you get to what you were looking for in your previous chat. Once you have done that, you will have a much better idea for what your particular LLM is expecting. In addition to this, once you have honed in on a prompt structure that works for you, you can also ask for a template prompt based on that.

The best prompt engineers don't just get good results—they ask the AI how to get those results faster next time.

For writing

When it comes to writing, one of the things I would recommend is taking your previously written material and paste it into a text file. If it is a pdf, word doc, etc. Just select all and copy/paste into a text file. The format doesn’t really matter and the point of doing this is so that you can provide the LLM with examples.

Then you can try the following prompt:

I need help writing [type of content: essay, blog post, story, etc.] about [topic].
Audience: [describe who will read this - their background, expertise level, interests]
Purpose: [explain what you want to achieve - inform, persuade, entertain, etc.]
Tone: [specify desired tone - formal, conversational, humorous, technical, etc.]
Length: [approximate word count or number of paragraphs]
Structure: [any specific sections or format requirements]
Key points to include:
1. [first important point]
2. [second important point]
3. [third important point]
Examples of my style: [optional: include samples of your writing]
Additional notes: [any other requirements, constraints, or preferences]
Please draft this content focusing on clarity and engagement for my specified audience.

When it comes to coding

There are a few things that need to be considered. First and foremost if you are doing this for work, you must check with your manager to ensure that whatever requirements you are putting into the chat are not restricted. If your company has its own private LLM application(s) it won’t matter, but it is always best to check.

Secondly, using a chat session to develop something that has not been thought out is a waste of time and resources. One of the reasons why I like this process is because it forces you to think about how you want to approach this particular problem. That is not coding or programming, that is technical design or solution architecture. You can use LLMs to help you with that too.

Here is the prompt template I have created for new programming projects:

I need help developing [Language e.g. python] code for [specific project/task]. Let's follow this collaborative process:
First, I'll explain what I'm trying to build: [briefly describe your project goal]
Before writing any code, please outline your recommended approach including:
Overall architecture/structure
Key libraries/frameworks you suggest using
Data flow or major processing steps
Any potential challenges or alternative approaches to consider

Once we agree on the approach, let's implement one file at a time.
For each file:
Suggest the filename and purpose
Explain your implementation plan for this specific file
After my approval, write the complete code with helpful comments
Wait for my questions or revision requests before finalizing

After each file is complete, ask if I have questions before moving to the next file.
Continue this process until the entire project is complete.
Let's start with my project description: [your detailed description]

There are a few things worth noting with the above prompt:

A lot of times the LLM will just start coding, even though you have explicitly told it not to. I am not sure why this happens but it happens a lot. Most systems have a “stop” button, use it. Once it has stopped remind it to continue in the proper fashion.
Checking the approach provides you with an opportunity to understand the direction the LLM is intending to move in, you will be able to ask clarifying questions, reinforce particular requirements, etc.
One file at a time, the number of tokens that can be used during output are limited, by doing one file at a time you are lessening the load for the response and you are giving yourself a chance to modify the code before moving on.

For example if you added a new argument in a function, you would have to update the arguments that are being passed to that function from one of the other files. So reviewing the output of the code is essential and will save you a lot of time and headache moving forward.

The secret to successful AI coding isn't faster prompts—it's structured conversations that force both you and the AI to think before writing.

The meta-skill of prompting

What we’re really talking about here isn’t just prompting—it’s a fundamental shift in how we communicate with intelligent systems. The best prompters aren’t those who memorize magic phrases, but those who understand that AI systems are reasoning partners that need context, constraints, and clear objectives.

The evolution from search keywords to conversational prompting mirrors our broader relationship with technology. We’re moving from commanding machines to collaborating with them. The people who master this collaboration will have a significant advantage in an AI-powered world.

Remember that prompting is ultimately about clear communication. The same principles that make you a good communicator with humans—being specific, providing context, setting clear expectations—apply to AI systems. The difference is that AI systems are infinitely patient and will engage in as many iterations as you need to get the result right.

The templates I’ve shared here are starting points, not final destinations. Use them as scaffolding to build your own approach. Pay attention to what works for your specific use cases, and don’t be afraid to experiment. The field of prompt engineering is still young, and there’s plenty of room for innovation.

Most importantly, approach AI as a partner in thinking, not just a tool for output. The most powerful applications of AI come from leveraging its ability to reason, not just its ability to generate text or code. When you prompt with this mindset, you’ll find that the AI becomes not just more useful, but more creative and insightful as well.

I hope these methods are helpful, your feedback is always welcome.

Vibe coding: the good, the bad, the inevitable

2025-02-24T00:00:00+00:00

While working as the Senior Director of Technology (2021 to 2023) at Boston Digital (BD), I was tasked with the exploration of three areas, to see what new services BD could offer its clients. These areas were Augmented Reality, Virtual Reality, and Artificial Intelligence. Facebook had recently rebranded as Meta and everyone was buzzing about the “metaverse”. We knew that of the three, VR was not the winner. In my opinion it was AI, AR, and VR in a distant third. But the buzz around it was too great so we created a Drupal connector that would allow marketers to edit existing content in VR applications without the use of a developer. But we used AI to help us do it.

There are few moments in my career where the use of a technology causes an explosion in my mind. Coding with AI was one of those moments. When I brought it up to my peers, very few thought it was as cool as I did, even now there are many that flat-out do not like it. I didn’t even know until recently that it was called “vibe coding”. In this article we are going to explore vibe coding’s meaning, why people like it, why some don’t and why I think it is here to stay.

In most of my posts I go back in time for a little context and this is no different. I just want to point out the natural progression that is taking place. You used to have books on a particular language that provided you with information on how it worked. Then you had search engines. Those yielded places like Stack Overflow, Reddit, etc. Then you had LLMs.

What is vibe coding?

I had to look it up, I love to see the etymology of phrases. Like, why “vibe”? Vibe coding is a programming approach believed to have been coined by Andrej Karpathy (a co-founder of OpenAI) that leverages AI to generate code from natural language descriptions. Rather than writing detailed code manually, developers “give in to the vibes” by describing the intent or desired functionality to an AI, which then builds and refines the application based on those instructions. This approach essentially allows programmers to “forget that the code even exists,” focusing instead on the desired outcome and letting AI handle the implementation details. In this model, the programmer’s role shifts from writing code to guiding and refining AI-generated solutions.

How is this different than what people were doing before vibe coding?

Before AI-powered vibe coding, developers would often search for solutions to specific problems on platforms like Stack Overflow and GitHub, copying useful snippets to integrate into their projects. While this approach helped solve particular issues, people frowned on that approach too—it was the old argument of “hacker” vs “cracker” and in a profession that is riddled with imposter syndrome, one can imagine how this perception was received. I know of so many developers that would hide the fact that “their” solution was found online.

Vibe coding takes this concept much further by enabling the creation of entire applications through conversational prompts to AI assistants, which have access to all of those Stack Overflow issues, blog posts, documentation, etc.

Vibe coding isn't just copy-paste from Stack Overflow—it's having a conversation with all of programming knowledge and getting custom solutions.

The Evolution of Coding Assistance

Understanding where we are requires looking at where we’ve been:

1970s-1980s: Print Documentation Era - Physical programming language manuals and reference books, technical magazines with code examples (BYTE, Dr. Dobb’s Journal), university computer lab printouts and shared code listings, and user groups for sharing knowledge in person.

1990s: Local Documentation Era - CD-ROM based documentation (MSDN Library, Java Documentation), help files integrated into IDEs and development environments, code samples bundled with development tools, and early programming forums via Bulletin Board Systems (BBS).

2000s: Search Engine Era - Google becomes the developer’s primary companion, official documentation moves online with searchable interfaces, blogs and personal websites sharing coding solutions, and the rise of dedicated programming Q&A sites.

2010s: Community Knowledge Era - Stack Overflow (founded 2008) becomes the definitive programming resource, GitHub repositories serving as living documentation and examples, Reddit programming communities for discussion and problem-solving, and YouTube coding tutorials offering visual and audio learning.

2020s: AI Assistance Era - GitHub Copilot and AI-powered code completion, context-aware coding assistants that understand project structure, ChatGPT and other LLMs for interactive coding help, and “Vibe coding” – describing intent over syntax.

Emerging Now: Autonomous Development Era - AI agents that can plan and execute entire development workflows, natural language to functional application with minimal human code, systems that maintain and refactor their own codebases, and the shift from writing code to directing AI systems.

So what is the big deal?

Many experienced developers worry that vibe coding might lead to a generation of programmers who lack foundational knowledge. As software engineer Gergely Orosz notes in his newsletter, there’s concern that “developers will lose the ability to understand what’s happening under the hood” if they primarily use AI to generate their code. In the past two years I have been using AI to help me code a lot and this sentiment is important. There are times where I am working and think, how long would this take if I didn’t have an assistant helping with it?

Critics point out that AI-generated code may contain subtle bugs, security vulnerabilities, or inefficiencies that non-experts might miss. The code might work superficially but could have significant problems that only become apparent later. I have seen this first hand in my projects, where generated code seems to be working just fine, but in reality has some major issues. These could be security or just poor execution where the code itself is cumbersome or difficult to scale. A pure functional application vs a class based application.

Some developers express concern about relying on systems whose decision-making processes aren’t fully transparent. When code fails, understanding why becomes challenging if the developer didn’t write it themselves. This lack of transparency raises questions about accountability and trust in AI-generated code. In addition to this, many developers that I have spoken to are worried about the technical limitations of current AI models. For example how many files can be provided as context, how much code can be generated, and how many prompts can be used, are all valid concerns that need to be addressed. For example a small app that has four or five files might be fine to vibe code with, but an application that has 30 files and a convoluted structure may make a turn for the worse with vibe coding.

The real skill in vibe coding isn't writing prompts—it's knowing enough to spot when the AI gets it wrong.

Proponents argue that vibe coding makes software development accessible to people who might otherwise be excluded. It lowers the technical barrier to entry, allowing more diverse individuals to create software solutions. For experienced developers, vibe coding can dramatically accelerate development speed. By handling boilerplate and routine implementations, AI allows developers to focus on higher-level architecture and unique business problems.

Some educators and developers view vibe coding as a new learning path. By examining AI-generated code and iteratively refining prompts, beginners can gain insights into best practices and patterns they might not discover on their own. The most thoughtful perspectives on vibe coding acknowledge that it represents neither the apocalypse nor utopia for software development, but rather a new tool with specific strengths and limitations.

Understanding fundamentals remains crucial—even with AI assistance, knowledge of core programming concepts helps developers evaluate and improve generated code. A good programmer would develop better prompts, because they know how to point out specific things. But I believe, over time, that these systems—agents, MCP, RAG, LLMs, etc.—will be much better at taking conversation and creating proper applications from those conversations. Vibe coding excels at certain tasks—it’s particularly effective for prototyping, UI components, and standard implementations, but may struggle with more complex projects, novel algorithms or highly optimized systems. But token input and output numbers are always increasing and I think some of these concerns will be alleviated.

The developer’s role is evolving—rather than elimination, we’re seeing a transformation where communication skills, architectural thinking, and problem definition become more valuable.

A personal example

I have a personal project in a private repo, but I did something in this project that I thought would be useful to other developers. The elements I wanted to share were a drop down list of providers (ollama, openAI, anthropic, etc.), checkboxes of models (gpt-4o, mistral, etc.), prompting templates that you could edit and use on the fly, temperature, and max tokens. But I wanted the rest of the app to be something simple with maybe an example snippet of code.

I took my project files and put it into a Claude Project, I provided it with a prompt asking it to only include the elements of the sidebar and the functions that go along with it, for the purposes of sharing that functionality with others.

Screenshot of the Claude Project

Vibe coding this was so easy, it saved me so much time rather than manually modifying all of files and functions and it hopefully will add value for other developers.

Screenshot of the wrapper and the starter code for building your own application and a chat assistant.

Click here for the GitHub repo

Industry leaders are already adapting to this new paradigm. GitHub’s CEO Thomas Dohmke has described a future where “natural language is the new programming language,” reflecting how tools like GitHub Copilot are changing development workflows. In a recent TechCrunch article, he mentions that 25% of Y Combinator startups have 95% of their code being generated by AI. Meanwhile, companies like Replit are embracing the vibe coding approach with AI-powered features that allow users to build applications conversationally, suggesting that the industry is moving toward acceptance of this new paradigm.

When 25% of Y Combinator startups are 95% AI-generated code, we're not talking about the future anymore—we're talking about the present.

The emergence of vibe coding represents not just a new technique, but a fundamental reimagining of what programming is. While it won’t entirely replace traditional coding skills, it is already changing how we think about software development and who can participate in it. The most successful developers of the future might be those who can effectively blend vibe coding with traditional practices—knowing when to leverage AI and when to write code manually, understanding both approaches deeply enough to move fluidly between them.

The evolution from books to Stack Overflow to AI assistance shows us that each generation of developers has adapted to new tools and sources of knowledge. Vibe coding is simply the latest step in this progression. The developers who thrive will be those who embrace these tools while maintaining the critical thinking skills to guide and evaluate AI-generated solutions.

As we stand at this inflection point, the question isn’t whether vibe coding will become mainstream—it’s how quickly we can learn to use it effectively while preserving the foundational knowledge that makes great developers great. The future belongs to those who can speak both languages: the language of human intent and the language of code.

MCP, what is it?

2024-12-30T00:00:00+00:00

The way we manage, structure, and transfer information between AI components has become just as critical as the models themselves. As AI systems grow increasingly complex—often involving multiple specialized models working in concert—there’s a pressing need for sophisticated infrastructure to connect these components. This is where Model Context Protocols (MCPs) enter the picture, serving as the transit system that enables intelligence to flow seamlessly between different parts of an AI ecosystem, providing access to capabilities and information that were previously siloed or inaccessible.

Most AI users have been exposed to the ability to attach something to their conversational AI of choice. This has been great, but imagine the possibilities when a company or user doesn’t need to attach a doc, because their system is already paired with their CRM, OneDrive, SAP, etc. Previously you would need to hardcode all of these connections, opening up access to these other systems provides you with the ability to access and modify external systems, in a way that is very straightforward. This reminds me of how I felt the first time I was introduced to Heroku. This article will get more technical than other posts, so please feel free to copy and paste into your favorite tool to get a better understanding of the code.

What Are Model Context Protocols?

At their core, Model Context Protocols are standardized frameworks that govern how context, instructions, and data flow between AI models and components within a system. Think of them as the diplomatic language that different AI models use to communicate effectively with each other and with the broader system architecture.

MCPs establish rules for packaging, transmitting, and interpreting contextual information—everything from user inputs and system instructions to memory management and tool usage. By defining these interactions through structured protocols, we can build more coherent, capable, and controllable AI systems.

MCPs are the diplomatic language that different AI models use to communicate—turning AI chaos into AI orchestration.

The Technical Foundation of MCPs

MCPs are built upon several fundamental technical principles that enable efficient model-to-model communication. The foundation of any MCP is how it represents context. Modern approaches typically use structured JSON objects providing clear schema for data exchange, vector embeddings capturing semantic meaning in high-dimensional space, and token-efficient formats minimizing context window consumption.

For example, a basic MCP might structure context like this:

{
  "protocol_version": "1.0",
  "system_instructions": {
    "primary": "You are an assistant helping with data analysis",
    "constraints": ["Do not make up information", "Ask for clarification when needed"]
  },
  "conversation_history": [
    {"role": "user", "content": "Can you analyze this dataset?"},
    {"role": "assistant", "content": "I'd be happy to. What aspects would you like me to focus on?"}
  ],
  "tools": {
    "available": ["data_analyzer", "chart_generator"],
    "permissions": ["read_only"]
  },
  "memory": {
    "working": {},
    "long_term": {"user_preferences": {"detail_level": "technical"}}
  }
}

Stateful Context Management

MCPs must define how context persists across interactions, including how long-term memory is maintained, which elements expire and when, and how to prioritize information when context limits are reached.

Protocol Versioning and Compatibility

Like any good protocol, MCPs include versioning mechanisms to ensure backward compatibility while enabling evolution:

def validate_protocol_compatibility(incoming_context):
    protocol_version = incoming_context.get("protocol_version", "0.1")
    if not is_compatible_version(protocol_version, CURRENT_VERSION):
        return apply_version_migration(incoming_context, protocol_version, CURRENT_VERSION)
    return incoming_context

Key Components of Model Context Protocols

A comprehensive MCP typically includes several essential components. The system instructions layer defines the operating parameters for models, including role definitions and constraints, ethical guidelines and safety parameters, and output formatting requirements.

Conversation memory management establishes patterns for maintaining conversational context, determining which exchanges to preserve verbatim, when to summarize lengthy conversations, and how to handle context window limitations.

For AI systems that leverage external tools, MCPs specify how tools are registered and discovered, the format for tool calls and responses, and error handling for failed tool operations.

In multi-model systems, MCPs determine which model handles which types of requests, how to transfer context between specialized models, and when to parallelize processing across multiple models.

A well-designed MCP is like a good API—it makes complex interactions feel simple and predictable.

Implementation Approaches

There are several strategies for implementing MCPs in production systems. In a centralized context manager approach, a dedicated service manages all context operations:

class ContextManager:
    def __init__(self, protocol_version="1.0"):
        self.protocol_version = protocol_version
        self.active_contexts = {}
        
    def create_context(self, session_id, initial_context=None):
        """Initialize a new context for a session."""
        context = {
            "protocol_version": self.protocol_version,
            "session_id": session_id,
            "created_at": datetime.now().isoformat(),
            "system_instructions": DEFAULT_INSTRUCTIONS,
            "conversation_history": [],
            "memory": {"working": {}, "long_term": {}}
        }
        
        if initial_context:
            context = self._merge_contexts(context, initial_context)
            
        self.active_contexts[session_id] = context
        return context
    
    def update_context(self, session_id, update_data):
        """Update an existing context with new information."""
        if session_id not in self.active_contexts:
            raise KeyError(f"No active context found for session {session_id}")
            
        current = self.active_contexts[session_id]
        updated = self._merge_contexts(current, update_data)
        
        # Apply context window management if needed
        if self._is_context_too_large(updated):
            updated = self._compact_context(updated)
            
        self.active_contexts[session_id] = updated
        return updated

Middleware Protocol Layers

Another approach uses middleware to handle protocol operations:

class MCPMiddleware:
    def __init__(self, protocol_config):
        self.config = protocol_config
        
    async def process_incoming(self, request, context):
        """Process incoming requests before they reach models."""
        # Validate protocol conformance
        if not self._validate_protocol(request):
            return self._create_protocol_error_response(request)
            
        # Update context with new information
        updated_context = self._update_context(context, request)
        
        # Add metadata for tracking and logging
        request_with_metadata = self._add_request_metadata(request)
        
        return request_with_metadata, updated_context
        
    async def process_outgoing(self, response, context):
        """Process outgoing responses before they reach the client."""
        # Update context with response data
        updated_context = self._update_context_with_response(context, response)
        
        # Format response according to protocol
        formatted_response = self._format_response(response)
        
        return formatted_response, updated_context

Distributed Protocol Enforcement

For larger systems, a distributed approach may be necessary:

class DistributedMCP:
    def __init__(self, config, context_store):
        self.config = config
        self.context_store = context_store  # Redis, Cassandra, etc.
        
    async def get_context(self, session_id):
        """Retrieve context from distributed storage."""
        raw_context = await self.context_store.get(f"context:{session_id}")
        if not raw_context:
            return self._create_initial_context(session_id)
            
        return self._deserialize_context(raw_context)
        
    async def update_context(self, session_id, context_updates):
        """Update context in distributed storage with optimistic locking."""
        retry_count = 0
        while retry_count < MAX_RETRIES:
            current = await self.get_context(session_id)
            version = current.get("_version", 0)
            
            updated = self._merge_contexts(current, context_updates)
            updated["_version"] = version + 1
            
            success = await self.context_store.set(
                f"context:{session_id}",
                self._serialize_context(updated),
                condition=f"_version = {version}"
            )
            
            if success:
                return updated
                
            retry_count += 1
            await asyncio.sleep(RETRY_DELAY)
            
        raise ConcurrencyError("Failed to update context due to concurrent modifications")

Getting Started with MCPs

If you’re looking to implement MCPs in your AI system, here’s a practical roadmap. Start by designing the schema that best fits your system needs. Consider what types of context your models need to share, how complex your system instructions are, what memory management approach makes sense for your use cases.

Build validation tooling to ensure protocol conformance:

def validate_mcp_compliance(context):
    """Validate that a context object complies with MCP specifications."""
    required_fields = ["protocol_version", "system_instructions", "conversation_history"]
    
    for field in required_fields:
        if field not in context:
            return False, f"Missing required field: {field}"
    
    # Validate protocol version compatibility
    if not is_supported_version(context["protocol_version"]):
        return False, f"Unsupported protocol version: {context['protocol_version']}"
    
    # Validate conversation history format
    for entry in context.get("conversation_history", []):
        if "role" not in entry or "content" not in entry:
            return False, "Invalid conversation history format"
    
    return True, "Context is compliant with MCP specifications"

Start with a simple implementation:

class SimpleContextManager:
    def __init__(self):
        self.contexts = {}
    
    def create(self, session_id):
        """Create a new MCP-compliant context."""
        context = {
            "protocol_version": "1.0",
            "system_instructions": DEFAULT_INSTRUCTIONS,
            "conversation_history": [],
            "created_at": time.time()
        }
        self.contexts[session_id] = context
        return context
    
    def get(self, session_id):
        """Retrieve context for a session."""
        return self.contexts.get(session_id)
    
    def update(self, session_id, updates):
        """Update context with new information."""
        if session_id not in self.contexts:
            raise ValueError(f"No context found for session {session_id}")
        
        current = self.contexts[session_id]
        
        # Deep merge updates into current context
        updated = deep_merge(current, updates)
        
        # If context is too large, summarize older messages
        if self._estimate_token_count(updated) > MAX_CONTEXT_TOKENS:
            updated = self._compact_context(updated)
        
        self.contexts[session_id] = updated
        return updated

Verify your protocol works across different models:

async def test_cross_model_context_transfer():
    """Test that context transfers correctly between different models."""
    # Initialize context with the protocol
    context = context_manager.create("test_session")
    
    # Process with first model
    result1, updated_context = await process_with_model_a(
        "What's the capital of France?",
        context
    )
    
    # Verify context was updated correctly
    assert "Paris" in str(result1)
    assert len(updated_context["conversation_history"]) == 2
    
    # Transfer to second model with the same context
    result2, final_context = await process_with_model_b(
        "Tell me more about its history",
        updated_context
    )
    
    # Verify the second model maintained context
    assert "Paris" in str(result2)
    assert "France" in str(result2)
    assert len(final_context["conversation_history"]) == 4

Starting with MCPs? Begin simple, validate early, and test across models—protocol complexity can grow with your needs.

Use Cases for MCPs in Production

Let’s explore some real-world applications of Model Context Protocols. In multi-stage content generation, a content creation system might use MCPs to coordinate specialized models where an ideation model generates content concepts, MCPs transfer these concepts to a drafting model, the draft flows to an editing model for refinement, and finally, MCPs route the content to a review model. Throughout this process, the MCP maintains consistent context about the original request, style guidelines, and previous iterations.

For AI assistants that leverage external tools, MCPs provide consistent patterns:

# Example MCP-compliant tool call format
tool_call = {
    "tool_name": "database_query",
    "parameters": {
        "query": "SELECT * FROM customers WHERE region = 'Northeast'",
        "database": "sales_data"
    },
    "permissions": {
        "read_only": True,
        "max_rows": 1000
    },
    "call_id": "db_query_12345"
}

# Tool response format defined by the MCP
tool_response = {
    "call_id": "db_query_12345",
    "status": "success",
    "result_type": "table_data",
    "data": [...],  # Actual query results
    "metadata": {
        "execution_time": 0.23,
        "row_count": 347
    }
}

In distributed AI reasoning frameworks, complex reasoning tasks can be broken down into steps handled by different models where a planning model decomposes the problem into steps, MCPs facilitate each step being handled by specialized models, intermediate results are synthesized via the protocol, and a final model delivers the comprehensive solution.

Challenges and Solutions

While powerful, MCPs come with implementation challenges. Context window limitations present a significant challenge as models have finite context windows, but protocols add overhead. The solution involves implementing dynamic context summarization, using token-efficient encodings, and prioritizing information based on recency and relevance:

def optimize_context(context, max_tokens=3000):
    """Optimize context to fit within token limits."""
    current_tokens = estimate_token_count(context)
    
    if current_tokens <= max_tokens:
        return context
    
    # Clone context to avoid modifying original
    optimized = copy.deepcopy(context)
    
    # Preserve critical components
    critical_sections = ["system_instructions", "active_tools"]
    critical_tokens = sum(estimate_token_count(optimized.get(section, {})) 
                         for section in critical_sections)
    
    # Tokens available for conversation history
    history_token_budget = max_tokens - critical_tokens - TOKEN_BUFFER
    
    # Optimize conversation history
    history = optimized.get("conversation_history", [])
    optimized["conversation_history"] = compress_conversation_history(
        history, 
        max_tokens=history_token_budget
    )
    
    return optimized

Protocol versioning complexity can break compatibility when evolving protocols. The solution requires implementing robust versioning, creating protocol migration utilities, and designing for backward compatibility.

Multi-model orchestration faces the challenge that different models may interpret protocol elements differently. Solutions include creating model-specific adapters, implementing consistent validation, and developing comprehensive test suites.

Performance Considerations

MCPs inevitably affect system performance, but strategic implementations can minimize overhead. A well-optimized MCP typically adds 2-5% overhead in terms of token consumption compared to unstructured contexts. This is a reasonable trade-off given the benefits in system coherence and capability.

Several approaches can enhance MCP performance through context pruning (dynamically removing irrelevant information, summarizing older conversation turns, compressing repeated instructions), parallel processing (processing independent protocol sections concurrently, using asynchronous updates for non-critical components), and caching strategies (caching frequently used protocol components, implementing lazy loading for infrequently accessed context).

Future Developments in MCP Technology

The field of Model Context Protocols continues to evolve rapidly. Self-adapting protocols focus on protocols that can adapt to changing requirements, dynamically adjusting verbosity based on model performance, self-optimizing context structures for specific tasks, and learning from interaction patterns to improve efficiency.

As AI systems become more interconnected, we’re seeing movement toward standardized MCPs with industry consortiums developing interoperability specifications, open-source protocol implementations gaining traction, and shared benchmarks for protocol performance.

Next-generation MCPs will incorporate advanced security features including fine-grained access controls for context components, cryptographic verification of context integrity, and privacy-preserving context sharing mechanisms.

The future of AI isn't just smarter models—it's smarter communication between models. MCPs are the foundation of that future.

The Strategic Importance of MCPs

Model Context Protocols represent far more than technical plumbing—they’re strategic assets that determine how effectively AI systems can coordinate, reason, and evolve. By implementing well-designed MCPs, organizations can achieve greater control, consistency, and capability from their AI systems. Beyond the technical benefits, they provide crucial governance mechanisms for ensuring AI systems behave as expected across a wide range of scenarios.

As you begin implementing MCPs in your own systems, consider starting with a focused use case rather than attempting to standardize all AI interactions at once. The most successful protocol implementations begin with clear needs and gradually expand as they prove their value.

The questions to consider as you explore MCPs include: How do your current AI systems handle context management, and what improvements could an MCP approach bring? What unique requirements might your organization have for an MCP implementation? How should MCPs evolve to address emerging AI capabilities like multimodal reasoning? What role should industry standards play in MCP development versus organization-specific protocols?

Model Context Protocols aren’t just about making AI systems work better—they’re about making them work together. In a world where AI capabilities are rapidly expanding, the systems that can effectively coordinate multiple AI components will have a significant advantage. MCPs provide the foundation for that coordination, turning isolated AI tools into integrated intelligence platforms.

The journey toward effective MCP implementation may seem complex, but the payoff is substantial: AI systems that are more capable, more reliable, and more aligned with organizational goals. As we move into an era of increasingly sophisticated AI applications, those organizations that master the art of AI orchestration through protocols like MCPs will be the ones that truly harness the transformative power of artificial intelligence.

The Business Matrix: AI Agents Among Us

2024-11-29T00:00:00+00:00

In today’s rapidly evolving technological landscape, AI agents represent one of the most significant advancements in how businesses operate and how professionals work. But what exactly are AI agents? Put simply, AI agents are software systems that can perceive their environment, make decisions, and take actions to achieve specific goals—all with varying degrees of autonomy.

Unlike traditional software that follows rigid, pre-programmed instructions, AI agents can adapt their behavior based on new information, learn from interactions, and operate with a level of independence that was once the realm of science fiction. They represent the evolution from passive tools that wait for human commands to proactive partners that can anticipate needs and take initiative.

The importance of AI agents in today’s business environment cannot be overstated. As organizations face increasing pressure to innovate, improve efficiency, and enhance customer experiences, AI agents offer capabilities that extend far beyond what conventional automation can achieve. They’re not just about reducing costs or streamlining processes—they’re about fundamentally reimagining how work gets done.

How AI Agents Work: Behind the Digital Curtain

At their core, AI agents function through a sophisticated interplay of several key technologies and approaches. While the technical details can be complex, the fundamental principles are quite approachable.

AI agents typically operate on what’s called a sense-think-act cycle. First, agents gather information from their environment through various inputs—this could be text from a conversation, data from a database, images from a camera, or signals from sensors. Next, the agent processes this information using advanced machine learning models, particularly Large Language Models (LLMs) like GPT-4 or Claude, which serve as the “brain” of many modern AI agents. These models analyze the inputs, consider the context, and determine appropriate responses or actions. Finally, based on its analysis, the agent takes actions to achieve its goals, whether that’s generating text, making decisions, triggering other systems, or even controlling physical devices.

What makes today’s AI agents particularly powerful is their foundation in neural networks with billions of parameters, trained on vast datasets encompassing much of human knowledge. This training gives them remarkable capabilities in understanding language, solving problems, and generating human-like responses.

AI agents aren't just following instructions anymore—they're sensing, thinking, and acting with a level of autonomy that transforms them from tools into digital colleagues.

Modern AI agents also leverage what’s called Retrieval-Augmented Generation (RAG), which allows them to access and use specific information beyond their training data. This is particularly important for business applications, where agents need to work with proprietary company information.

As one expert noted, “AI agents represent the convergence of several technological breakthroughs—advanced machine learning models, robust integration capabilities, and sophisticated decision-making frameworks—all working in concert to create systems that can think and act with unprecedented autonomy.”

Types of AI Agents: A Spectrum of Capabilities

AI agents come in various forms, each designed for different purposes and with different levels of autonomy. Understanding these categories can help businesses determine which types of agents might best suit their needs.

Assistive agents work alongside humans, augmenting their capabilities rather than replacing them. They’re designed to handle routine tasks, provide information, and offer suggestions while leaving final decisions to human users. An example would be an executive assistant agent that manages email, schedules meetings, and prepares brief summaries of documents, allowing the human executive to focus on strategic thinking and interpersonal relationships.

Autonomous agents operate with minimal human intervention, making decisions and taking actions independently within defined parameters. They’re particularly valuable for handling repetitive, rule-based tasks that require consistent execution. A customer service agent that can handle routine inquiries, process returns, and escalate complex issues to human representatives only when necessary exemplifies this category.

Collaborative agents represent perhaps the most sophisticated category, working in teams—both with humans and with other AI agents. They can divide tasks, share information, and coordinate their efforts to solve complex problems. Think of a project management ecosystem where multiple agents handle different aspects of a project while coordinating with each other and the human team members.

Specialized agents are designed for specific industries or functions, with deep expertise in narrow domains. They often incorporate domain-specific knowledge and follow industry-specific protocols. A healthcare diagnostic agent that analyzes patient symptoms, medical history, and test results to suggest potential diagnoses and treatment options for physician review would fall into this category.

No Code? No Problem?

If you want to try this out but don’t know how to code, I would highly recommend Dharmesh Shah’s recent project agent.ai. Yes it is a marketplace for agents, but it also has a no code agent builder, which is an amazing opportunity for you to try it out—the barrier for entry cannot be much lower.

Real-World Applications: AI Agents in Action

The theoretical capabilities of AI agents are impressive, but their real value becomes clear when we examine their practical applications across industries.

Financial Services: Morgan Stanley’s AI Enhancement

Morgan Stanley has deployed an AI agent system that assists their financial advisors in providing customized investment advice. The system can analyze vast amounts of financial data, research reports, and client information to generate personalized investment recommendations.

The results speak for themselves: 98% of Financial Advisor teams have adopted the AI assistant since its launch. Financial advisors using the system save 10-15 hours per week through automated meeting transcription and categorization. The AI implementation has helped Morgan Stanley close over 100,000 new clients, with the company reporting nearly $64 billion in net new assets in a single quarter.

The system not only makes advisors more efficient but also ensures that recommendations comply with financial regulations, reducing compliance risks. By providing quick access to over 100,000 research reports and documents, the AI enables advisors to deliver more informed, personalized service to clients.

Customer Service: Vodafone’s TOBi Transformation

Telecommunications company Vodafone implemented an AI agent system called “TOBi” that handles customer inquiries across multiple channels. The system can understand natural language questions, access customer account information, and resolve many issues without human intervention.

TOBi processes approximately 1 million interactions monthly, with 70% resolved at first contact. The enhanced version, SuperTOBi, increased first-time resolution rates from 15% to 60% for customers in Portugal. Customer satisfaction metrics improved significantly, with online net promoter scores jumping from 14 points to 64 points.

From Morgan Stanley's financial advisors saving 15 hours per week to Vodafone resolving 70% of customer issues on first contact—AI agents are delivering real, measurable business value.

TOBi has evolved from its initial launch to become increasingly sophisticated, now leveraging generative AI capabilities to handle more complex customer queries and provide a more personalized experience. This implementation demonstrates how AI agents can dramatically improve customer service efficiency while maintaining or enhancing service quality.

Implementation Considerations: Navigating the AI Agent Journey

While the benefits of AI agents are compelling, successful implementation requires careful planning and consideration of several key factors.

Technical Requirements

Implementing AI agents typically requires robust infrastructure—depending on the complexity and scale of your AI agents, you may need significant computing resources, though cloud-based options often provide the most flexibility. You’ll also need a solid data foundation, as AI agents need access to high-quality, well-structured data including your business documentation, customer records, product information, and operational data.

Integration capabilities are crucial for AI agents to be effective, as they need to connect with your existing systems—whether that’s your CRM, ERP, communication platforms, or custom applications. Finally, robust security measures are essential, especially when agents handle sensitive information or have permission to take actions within your systems.

Integration Challenges

Common challenges when integrating AI agents include legacy system compatibility, as older systems may lack modern APIs or have limited documentation, making integration difficult. Knowledge fragmentation presents another hurdle—many organizations have information scattered across different repositories and formats, making it challenging for agents to access comprehensive knowledge.

Process redesign is often necessary, as existing workflows often need to be reimagined to effectively incorporate AI agents, which can require significant change management. Additionally, many organizations face a skills gap, lacking the internal expertise to effectively deploy and manage advanced AI systems, necessitating training or external partnerships.

Ethical Considerations

The deployment of AI agents raises important ethical questions that organizations must address. Users should understand when they’re interacting with an AI agent versus a human, and have visibility into how decisions are being made. Clear frameworks should establish who is responsible for the actions and decisions of AI agents.

Organizations must actively work to identify and address potential biases in their AI systems to ensure fair treatment of all stakeholders. Appropriate human supervision should be maintained, particularly for consequential decisions or actions, and organizations must ensure that AI agents handle personal data responsibly and in compliance with relevant regulations.

As one expert observed, “The most successful AI agent implementations aren’t those with the most advanced technology, but those that thoughtfully address the human, ethical, and organizational dimensions of this transformation.”

Getting Started: Practical Steps for AI Agent Adoption

For businesses looking to begin their journey with AI agents, here are three practical steps to get started.

Start with a focused use case rather than attempting a broad implementation. Identify a specific process or function where AI agents could deliver clear value. Look for tasks that are repetitive and time-consuming, well-documented with clear rules, important but not critical (for initial pilots), and currently causing bottlenecks or frustration. For example, you might begin with an agent that handles meeting scheduling, prepares standard reports, or answers common internal questions.

Build your knowledge foundation because AI agents are only as good as the information they can access. Before implementation, audit your existing documentation and knowledge bases, identify gaps in documented processes or information, standardize information formats where possible, and create clear taxonomies and organization systems for your data. This foundational work will dramatically improve the effectiveness of any AI agent implementation.

Develop a human-AI collaboration framework by establishing clear guidelines for how humans and AI agents will work together. Define which decisions AI can make independently versus which require human approval, create escalation paths for complex or unusual situations, establish monitoring and feedback mechanisms to improve agent performance, and develop training for human team members on working effectively with AI colleagues. This framework should evolve as you gain experience with AI agents and as their capabilities advance.

The secret to successful AI agent adoption isn't just the technology—it's starting focused, building solid foundations, and creating clear frameworks for human-AI collaboration.

Future Trends: The Evolving AI Agent Landscape

The field of AI agents is advancing rapidly, with several key trends likely to shape developments over the next 2-3 years.

We’re moving beyond single AI agents toward coordinated teams of specialized agents that can work together on complex tasks. These multi-agent systems will feature specialized agents with distinct roles and expertise, communication protocols between agents, coordination mechanisms for collaborative problem-solving, and self-organization capabilities for dynamic task allocation. This approach mirrors human organizational structures, where specialized teams collaborate on complex projects.

Enhanced reasoning capabilities represent another major advancement. Current AI agents sometimes struggle with complex reasoning, but upcoming advancements will significantly improve their ability to perform multi-step logical reasoning, understand cause and effect relationships, generate and test hypotheses, and apply domain-specific reasoning frameworks. These capabilities will allow agents to tackle more complex problems and make more nuanced decisions.

Future AI agents will also become more adept at leveraging external tools and resources, seamlessly accessing and using specialized software, working with databases and knowledge management systems, controlling physical devices and robotic systems, and orchestrating complex workflows across multiple systems. This will extend their capabilities beyond information processing to direct action in both digital and physical environments.

Common Misconceptions About AI Agents

As with any emerging technology, there are several misconceptions about AI agents that can lead to unrealistic expectations or unwarranted concerns.

The misconception that “AI agents will completely replace human workers” doesn’t align with reality. While AI agents can automate many tasks, they work best in partnership with humans. The most effective implementations augment human capabilities rather than replace them entirely. Humans remain essential for creative thinking, ethical judgment, emotional intelligence, and handling novel situations.

Another common misconception is that “AI agents can understand everything like humans do.” Despite their impressive abilities, AI agents lack true understanding in the human sense. They don’t have consciousness, genuine emotional responses, or life experiences to draw from. They excel at pattern recognition and generating responses based on their training, but they don’t truly “understand” concepts the way humans do.

The belief that “AI agents are objective and unbiased” is also incorrect. AI agents can inherit biases from their training data, design decisions, or implementation approaches. Without careful attention to bias detection and mitigation, they may perpetuate or even amplify existing biases in decision-making processes.

Finally, viewing AI agent implementation as “a purely technical challenge” misses the bigger picture. While technical aspects are important, successful AI agent implementation is equally dependent on organizational factors, including change management, process redesign, training, and establishing appropriate governance frameworks.

Conclusion: Navigating the AI Agent Revolution

AI agents represent a transformative technology that is already reshaping how businesses operate across industries. From enhancing customer experiences to optimizing complex operations, these intelligent systems are delivering measurable value while opening new possibilities for innovation.

As we’ve explored, successful implementation requires more than just technical expertise—it demands thoughtful consideration of organizational, ethical, and human factors. Organizations that approach AI agents with a clear strategy, focused use cases, and a commitment to responsible deployment will be best positioned to capture their benefits while avoiding potential pitfalls.

The field is evolving rapidly, with advances in multi-agent systems, reasoning capabilities, and tool integration promising even more powerful applications in the near future. By staying informed about these developments and adopting a measured approach to implementation, businesses can harness the power of AI agents to enhance their competitive position and create new value for their customers and stakeholders.

As you think about the potential of AI agents for your organization, consider what processes in your organization consume significant time but add relatively little value, how the relationship between your employees and their work might change if routine tasks were handled by AI agents, what unique knowledge and capabilities your organization possesses that could be enhanced through AI agent implementation, and how your customer relationships might evolve if AI agents were able to provide personalized, immediate assistance at every touchpoint.

The answers to these questions may help illuminate your organization’s unique path toward leveraging this powerful technology. The AI agent revolution isn’t coming—it’s already here, transforming businesses one intelligent decision at a time.

Ollama: the open-source solution for your local LLM dreams

2024-10-04T00:00:00+00:00

This is..(sniffle)…it’s like Christmas morning. I am so excited to share this with you. I came across Ollama today and within an hour was up and running on a laptop. I decided to do the same on my desktop so I could get more power and take screenshots as I share the simple process of getting a local AI chatbot up and running, with Ollama.

Why would you want to do this?

There are three reasons in my opinion that make this awesome. It’s FREE! No little coin counting down in the top right corner as you reach your daily limit of prompts. It is private, as private as it gets. Did I mention it’s free?

Step 1: Go to https://github.com/ollama/ollama (recommended) or https://ollama.com/download

Step 2: Choose the distribution you are going to use

If you are on Windows you can also use the Linux command in Power Shell (run as admin) but you may need to run some additional commands like wsl –install, which will install the “Windows Subsystem for Linux”. Once this command is completed you can run your local LLM from the command-line.

ollama run llama3.2

The response took about a minute to complete

Running AI locally is like having your own personal ChatGPT that never judges your 3 AM questions and never runs out of tokens.

Pretty cool right? But wait, there is more!

Step 3: Install Open WebUI (for code and further instruction go to https://github.com/open-webui/open-webui).

For this you will need to have pip installed, which means you will need to have python installed. Pip is the package manager, it stands for “pip installs packages”, just thinking about the name puts my mind in an infinite loop, which is why some people refer to it as “preferred installer program”. To learn more about pip and get setup (it’s worth it trust me) go to: https://pip.pypa.io/en/stable/installation/.

pip install open-webui

Once it has installed you will run the following command:

open-webui serve

Now you can go to http://localhost:8080 and have a chat interface as you would on Claude, ChatGPT, etc. You will need to create an account, it is free.

That’s cool, but there really is so much more. You gain access to granular controls in the right sidebar, you can download more models from https://openwebui.com/#open-webui-community, create your own models, change between models in one chat session, create system prompts, add documents to support whatever you are doing (seems like a built in RAG?!), add and use tools and functions. All of this is done through the interface.

More screenshots below, but honestly this is such a huge advancement for tinkerers, researchers, and developers. A huge shout out to Timothy J. Baek for putting this together, it is people like him that make open-source as amazing as it is. And as I bow my head in reluctance, shout out to Meta for the open source project Llama.

Ollama + Open WebUI = the ChatGPT interface you always wanted, running on your own hardware with your own rules.

I am not sure if this will work with models that you can download from Hugging Face, this is something I will be looking into and updating back here. But even with what they have in the openwebui community, there are so many amazing options for you to choose from and test out.

The workspace gives you access to a lot of the features described above

You can download the JSON and do a manual import or just paste in the localhost URL and it will install it for you

The beauty of this setup goes beyond just having a free AI assistant. You’re getting complete control over your data, the ability to experiment with different models without usage limits, and a platform that you can customize to your heart’s content. Whether you’re a developer wanting to prototype AI features, a researcher needing consistent access to language models, or just someone who values privacy, Ollama delivers.

The fact that you can switch between models mid-conversation is particularly impressive—imagine testing how different AI models respond to the same prompt without having to jump between different websites or apps. The built-in RAG capabilities mean you can upload your own documents and have the AI reference them in conversations, effectively creating your own personalized knowledge base.

For anyone who’s ever hit usage limits on commercial AI services or worried about data privacy, this is a game-changer. The initial setup might seem technical, but once you’re running, it’s as smooth as any commercial alternative—and completely under your control.

Thanks for reading and I hope you give this a try, it is by far the easiest way to get started locally. With more coming from the community the options are endless and will continue to be. If you are able to contribute to open-source projects like this please consider devoting some of your time. Also if you find this useful and want to talk shop about local AI, AI projects, web projects, etc. Please feel free to reach out to me on LinkedIn.

The California AI Transparency Act (SB-942)

2024-09-25T00:00:00+00:00

January 1st, 2026, happy new year and mark your calendars! That is the day the California AI Transparency Act (SB-942), signed into law on September 19, 2024, will go into effect. This legislation aims to address the growing concerns surrounding AI-generated content and its potential for misuse. But is this something that can actually be enforced? I will start with some take-aways and summaries, towards the end of the post I will be covering some questions and concerns that I have regarding this legislation.

Key Provisions

SB-942 introduces several important requirements for covered providers of generative AI systems. AI detection tools must be offered free and publicly accessible, capable of assessing whether content was created or altered by their GenAI systems—like Snopes for AI! Users must have the option to include clear, conspicuous disclosures identifying AI-generated content through manifest disclosures.

The legislation also mandates latent disclosures, requiring AI-generated content to include embedded information about its provenance, including the provider’s name, system details, and creation timestamp. Finally, providers must ensure that licensees maintain disclosure capabilities and revoke licenses if these capabilities are removed.

Importance and Potential Impact

The California AI Transparency Act is crucial for several reasons. By providing tools to detect AI-generated content, the act aims to reduce the spread of deepfakes and other misleading media, directly combating misinformation. The legislation gives consumers more control and awareness over the content they encounter online, effectively empowering users.

As one of the first comprehensive AI transparency laws, SB-942 could serve as a model for other states or countries, setting important standards for the industry. The act also encourages AI providers to be more responsible in their development and deployment of generative AI systems, promoting greater accountability.

California's SB-942 is like adding nutrition labels to AI content—transparency that empowers users but challenges the industry to adapt.

Pros and Cons

The legislation brings several advantages, including increased transparency in AI-generated content, free tools for content verification, clear penalties for non-compliance, and protection of user privacy by limiting data collection.

However, there are potential drawbacks. The act may increase operational costs for AI companies and could potentially stifle innovation in smaller AI firms. Enforcement across state lines may be challenging, and technical limitations may affect the effectiveness of some provisions.

Enforceability

The enforceability of SB-942 presents both strengths and challenges. The act specifies a $5,000 fine per violation, providing a financial deterrent—but is $5k enough? It empowers the Attorney General, city attorneys, and county counsels to bring civil actions, providing legal standing.

However, ensuring compliance with latent disclosure requirements may be difficult to verify due to technical complexity. Enforcing the law on companies outside California could be problematic due to jurisdictional issues, and the law may struggle to keep pace with rapidly advancing AI technologies.

Looking Forward

As Tom Kemp noted, “Furthermore, it is also conceivable in the years to come that the law could mandate support for text or require APIs to be enhanced to enable real-time detection of a video or audio stream, etc.—but obviously, all contingent on technical feasibility and political will.”

The California AI Transparency Act represents a bold attempt to regulate the rapidly evolving field of generative AI. While it offers significant protections for consumers and promotes transparency, its effectiveness will largely depend on how well it can be enforced and adapted to technological advancements.

When it comes to enforceability, it seems like the penalties and legal routes are reactionary. If AI caused a problem that would warrant a penalty or legal standing, chances are those ramifications are coming too late. For example, if it causes a stock market crash or manipulates the democratic process, those are not moments that you can reset back to like a snapshot in a container. These ramifications are instant and you can’t really take them back.

As the saying goes, “A lie can travel half way around the world while the truth is putting on its shoes.”

AI regulation faces a fundamental challenge: by the time we detect and penalize harmful AI content, the damage may already be irreversible.

Additional Concerns

In the bill it states “covered providers” are a “person that creates, codes, or otherwise produces a generative artificial intelligence system that has over 1,000,000 monthly visitors or users and is publicly accessible within the geographic boundaries of the state.”

So when it comes to the “Manifest Disclosures” provision, we are essentially including only large companies. That means a person with their own private local genAI system does not meet the criteria. They are not publicly accessible and by virtue of that, they do not have over a million visitors.

Does this apply to them having to disclose it once their post has reached a million views on YouTube? Is it on YouTube to figure it out? What about reposting? Does everyone that reposts a deepfake of me saying I like the Toyota Prius (which I hate) get a free pass?! Where is the justice?! I may be totally misunderstanding this portion of the bill but it seems like there are some major loopholes that can be exploited here.

Regardless of my opinions and skepticism of SB-942, I believe it is a step in the right direction. This is something we should all keep a close eye on, because I can see it coming to a legislator near you.

The challenge ahead lies not just in the technical implementation of these requirements, but in creating a regulatory framework that can evolve as quickly as the technology it seeks to govern. As the implementation date approaches, both the AI industry and legal experts will be watching closely to see how this landmark legislation shapes the future of AI governance.

Stanford University’s STORM tool

2024-09-24T00:00:00+00:00

In the ever-evolving landscape of artificial intelligence, Stanford University’s STORM has emerged as a powerful new player, promising to revolutionize the way we research and synthesize information. This open-source project, capable of generating comprehensive reports on any given topic, complete with sources, raises intriguing questions about the future of knowledge curation and dissemination.

As STORM aims to bridge the gap between vast online information and individual comprehension, it’s worth examining both its potential benefits and the challenges it may present to traditional research methods and critical thinking skills. With the ability to produce Wikipedia-like articles from scratch, STORM represents a significant leap forward in AI-assisted content creation, but at what cost to human intellectual engagement?

How does it perform?

The site is extremely slow, which for a free generative AI tool from a university that is running on “donated Azure credits” is to be expected. But there is more. When entering a prompt to “Create a New Article” it would ask you for clarifying information, even if you used the same prompts that were provided in the Discover/Examples section of their site.

I found this confusing because the prompt was pretty self-explanatory. For example, one of the prompts I tried was “Explain to me what STORM is and how it compares to existing generative AI tools.” The response I got was to elaborate and when I did it threw an error. It took so long to get the system to finally register something, that by the time it worked the topic and content I was looking for initially was no longer the same.

When it does work though, it is pretty amazing. Not only does it show you all of the sources as it goes through them, but it also puts together a very thorough article, with cohesive content, backed by sources that actually resolve when checked. I wanted to show you how this looked but the service is temporarily disabled!

If this was around during my academic career, I would 100% have used it, the wait times would be worth it. But as the screenshot below shows, it doesn’t always work. I think this is a great example of how important infrastructure, scalability, caching, and DevOps are going to be instrumental for any AI company to succeed.

The date says December 31, 1 at 7:03 PM, but this occurred at 11 AM on September 24, smh

STORM shows the promise and peril of academic AI tools: brilliant research capabilities hampered by infrastructure reality checks.

Who’s walking who?

I was walking my dog the other day and he was pulling me all different directions, and an old gentleman said “who’s walking who?” we both chuckled. But I am worried that AI is doing the same thing to humans.

Before you endeavor on a task with AI, you should write down what you are trying to do. Have that be your North Star and try to stick to it, don’t let Midjourney, STORM, or whatever tool decide which direction you are going in. AI should be a tool that helps enhance your creativity or initiative, not drive it.

The beauty of STORM lies in its transparency—showing you the sources it’s consulting in real-time, much like watching a researcher’s thought process unfold. This visibility sets it apart from other AI tools that operate as black boxes. When it works, it produces genuinely impressive, well-sourced content that could serve as an excellent starting point for deeper research.

However, the technical challenges highlight a broader issue in the AI space: the gap between research prototypes and production-ready tools. Stanford’s STORM represents cutting-edge research, but the user experience reminds us that innovation and reliability don’t always develop at the same pace.

AI should enhance your creativity, not drive it. Write down your goal before you start—that's your North Star in the AI wilderness.

The real question isn’t whether STORM can replace human researchers, but whether it can make us better researchers. The tool’s ability to synthesize information from multiple sources and present it coherently is undeniably powerful. Yet the frustrating user experience serves as a reminder that we’re still in the early days of AI-assisted research.

As these tools mature, we must remain mindful of maintaining our intellectual agency. The goal should be to use AI to amplify our capabilities, not to let it determine our direction. STORM, when it works, offers a glimpse of that future—one where AI handles the heavy lifting of information gathering while humans focus on analysis, synthesis, and creative thinking.

If you have some time… a lot of time and want to check STORM out, you can visit: https://storm.genie.stanford.edu/

Just remember to keep a firm grip on that leash.