Published on

Setting Up a Python Skill: Object Recognition with Claude Code

Authors

Introduction

One of the places Claude Code shines is when you are picking up a skill you have not worked with before. You know what you want to build, but you are not sure which libraries to use, how to structure the project, or what the common pitfalls are.

This post walks through exactly that — setting up a Python object recognition project using Claude Code, starting from zero domain knowledge.

The Goal

I wanted a Python script that could take an image or a webcam feed and detect objects in it — drawing bounding boxes and labels around things it recognizes (cars, people, bottles, etc.).

I did not know whether to use OpenCV, a YOLO model, TensorFlow, PyTorch, or something else entirely. That is where Claude Code comes in.

Step 1: Picking the Stack

Rather than reading comparison articles, I described my goal directly:

"I want to do real-time object detection in Python — either from a webcam or a static image. I want labeled bounding boxes as output. What is the simplest stack that is production-quality and easy to get running locally?"

Claude Code recommended YOLOv8 via the ultralytics package combined with OpenCV for image display. The reasoning it gave:

  • ultralytics wraps the full YOLO pipeline in a simple Python API — no manual model loading or preprocessing
  • Pre-trained weights are downloaded automatically on first run
  • OpenCV handles webcam capture and image rendering with minimal boilerplate

That was enough to commit to a direction.

Step 2: Scaffolding the Project

I asked Claude Code to create the initial project structure:

"Set up a Python project for object recognition with ultralytics YOLOv8 and OpenCV. I want a virtual environment, a requirements file, and a script that runs detection on a static image."

It created:

object-recognition/
├── requirements.txt
├── detect.py
└── images/
    └── .gitkeep
requirements.txt
ultralytics>=8.0.0
opencv-python>=4.8.0
detect.py
from ultralytics import YOLO
import cv2

model = YOLO("yolov8n.pt")  # nano model — fast, good enough for most use cases

def detect_image(path: str) -> None:
    results = model(path)
    annotated = results[0].plot()  # draws boxes and labels onto the frame
    cv2.imshow("Detection", annotated)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

if __name__ == "__main__":
    import sys
    detect_image(sys.argv[1])

Then it told me how to set up the virtual environment:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Running python detect.py images/test.jpg downloaded the YOLOv8n weights on first run and opened a window with bounding boxes drawn over detected objects.

Step 3: Adding Webcam Support

Once the static image version worked, I asked for webcam support:

"Add a mode that reads from the webcam live instead of a static file."

Claude Code updated detect.py to accept a --webcam flag:

detect.py
import argparse
from ultralytics import YOLO
import cv2

model = YOLO("yolov8n.pt")

def detect_image(path: str) -> None:
    results = model(path)
    annotated = results[0].plot()
    cv2.imshow("Detection", annotated)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

def detect_webcam() -> None:
    cap = cv2.VideoCapture(0)
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        results = model(frame, verbose=False)
        annotated = results[0].plot()
        cv2.imshow("Webcam Detection", annotated)
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
    cap.release()
    cv2.destroyAllWindows()

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--webcam", action="store_true")
    parser.add_argument("image", nargs="?")
    args = parser.parse_args()

    if args.webcam:
        detect_webcam()
    elif args.image:
        detect_image(args.image)
    else:
        parser.print_help()

Step 4: Debugging a Display Issue

On my machine the webcam window opened but frames were laggy. I described the symptom:

"The webcam feed runs but feels slow — maybe 3–4 fps. The CPU is spiking. Is there something obvious I am doing wrong?"

Claude Code identified that passing a raw numpy frame to model() was triggering an extra format conversion on every frame, and that verbose=False suppresses per-frame console output that also adds latency. It also suggested dropping to a smaller input size:

results = model(frame, verbose=False, imgsz=320)

That brought it up to a comfortable framerate without sacrificing detection quality for typical use.

Step 5: Filtering by Class

The final ask was filtering — only showing detections for specific object classes (e.g., only people and cars):

"How do I filter so only certain classes are shown in the output?"

CLASSES = [0, 2]  # 0 = person, 2 = car in the COCO dataset

results = model(frame, verbose=False, imgsz=320, classes=CLASSES)

Claude Code also showed me where to find the full COCO class list so I could add or remove classes without guessing.

What Made This Work

A few things made Claude Code particularly useful here:

It explained its choices. When it recommended YOLOv8 over alternatives, it gave concrete reasons I could evaluate — not just "this is popular."

It read error output directly. When installation produced a warning about a missing CUDA driver, I pasted it in and Claude Code explained that the CPU-only version of PyTorch would be used automatically, and that this was fine for development.

Iteration was fast. Each step was a short description. I did not need to find documentation, read through it, figure out which parts applied, and then translate that into code. The loop from idea to running code was much tighter.

Conclusion

Starting a new skill is often the hardest part — not the implementation itself, but the initial decisions about what to use and how to structure things. Claude Code compresses that ramp significantly by letting you describe what you want and getting grounded, specific guidance rather than generic tutorials.

The object recognition project went from zero to a working webcam feed with filtered detections in a single session. The code is straightforward enough that I understood every line by the time we were done — which is the goal.

References