- Published on
Setting Up a Python Skill: Object Recognition with Claude Code
- Authors

- Name
- Douglas Montanus
Introduction
One of the places Claude Code shines is when you are picking up a skill you have not worked with before. You know what you want to build, but you are not sure which libraries to use, how to structure the project, or what the common pitfalls are.
This post walks through exactly that — setting up a Python object recognition project using Claude Code, starting from zero domain knowledge.
The Goal
I wanted a Python script that could take an image or a webcam feed and detect objects in it — drawing bounding boxes and labels around things it recognizes (cars, people, bottles, etc.).
I did not know whether to use OpenCV, a YOLO model, TensorFlow, PyTorch, or something else entirely. That is where Claude Code comes in.
Step 1: Picking the Stack
Rather than reading comparison articles, I described my goal directly:
"I want to do real-time object detection in Python — either from a webcam or a static image. I want labeled bounding boxes as output. What is the simplest stack that is production-quality and easy to get running locally?"
Claude Code recommended YOLOv8 via the ultralytics package combined with OpenCV for image display. The reasoning it gave:
ultralyticswraps the full YOLO pipeline in a simple Python API — no manual model loading or preprocessing- Pre-trained weights are downloaded automatically on first run
- OpenCV handles webcam capture and image rendering with minimal boilerplate
That was enough to commit to a direction.
Step 2: Scaffolding the Project
I asked Claude Code to create the initial project structure:
"Set up a Python project for object recognition with ultralytics YOLOv8 and OpenCV. I want a virtual environment, a requirements file, and a script that runs detection on a static image."
It created:
object-recognition/
├── requirements.txt
├── detect.py
└── images/
└── .gitkeep
ultralytics>=8.0.0
opencv-python>=4.8.0
from ultralytics import YOLO
import cv2
model = YOLO("yolov8n.pt") # nano model — fast, good enough for most use cases
def detect_image(path: str) -> None:
results = model(path)
annotated = results[0].plot() # draws boxes and labels onto the frame
cv2.imshow("Detection", annotated)
cv2.waitKey(0)
cv2.destroyAllWindows()
if __name__ == "__main__":
import sys
detect_image(sys.argv[1])
Then it told me how to set up the virtual environment:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Running python detect.py images/test.jpg downloaded the YOLOv8n weights on first run and opened a window with bounding boxes drawn over detected objects.
Step 3: Adding Webcam Support
Once the static image version worked, I asked for webcam support:
"Add a mode that reads from the webcam live instead of a static file."
Claude Code updated detect.py to accept a --webcam flag:
import argparse
from ultralytics import YOLO
import cv2
model = YOLO("yolov8n.pt")
def detect_image(path: str) -> None:
results = model(path)
annotated = results[0].plot()
cv2.imshow("Detection", annotated)
cv2.waitKey(0)
cv2.destroyAllWindows()
def detect_webcam() -> None:
cap = cv2.VideoCapture(0)
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
results = model(frame, verbose=False)
annotated = results[0].plot()
cv2.imshow("Webcam Detection", annotated)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--webcam", action="store_true")
parser.add_argument("image", nargs="?")
args = parser.parse_args()
if args.webcam:
detect_webcam()
elif args.image:
detect_image(args.image)
else:
parser.print_help()
Step 4: Debugging a Display Issue
On my machine the webcam window opened but frames were laggy. I described the symptom:
"The webcam feed runs but feels slow — maybe 3–4 fps. The CPU is spiking. Is there something obvious I am doing wrong?"
Claude Code identified that passing a raw numpy frame to model() was triggering an extra format conversion on every frame, and that verbose=False suppresses per-frame console output that also adds latency. It also suggested dropping to a smaller input size:
results = model(frame, verbose=False, imgsz=320)
That brought it up to a comfortable framerate without sacrificing detection quality for typical use.
Step 5: Filtering by Class
The final ask was filtering — only showing detections for specific object classes (e.g., only people and cars):
"How do I filter so only certain classes are shown in the output?"
CLASSES = [0, 2] # 0 = person, 2 = car in the COCO dataset
results = model(frame, verbose=False, imgsz=320, classes=CLASSES)
Claude Code also showed me where to find the full COCO class list so I could add or remove classes without guessing.
What Made This Work
A few things made Claude Code particularly useful here:
It explained its choices. When it recommended YOLOv8 over alternatives, it gave concrete reasons I could evaluate — not just "this is popular."
It read error output directly. When installation produced a warning about a missing CUDA driver, I pasted it in and Claude Code explained that the CPU-only version of PyTorch would be used automatically, and that this was fine for development.
Iteration was fast. Each step was a short description. I did not need to find documentation, read through it, figure out which parts applied, and then translate that into code. The loop from idea to running code was much tighter.
Conclusion
Starting a new skill is often the hardest part — not the implementation itself, but the initial decisions about what to use and how to structure things. Claude Code compresses that ramp significantly by letting you describe what you want and getting grounded, specific guidance rather than generic tutorials.
The object recognition project went from zero to a working webcam feed with filtered detections in a single session. The code is straightforward enough that I understood every line by the time we were done — which is the goal.