What You'll Build

A computer vision-powered self checkout. You'll create and train a custom model based on your favorite drinks and combine it with a vision service. This machine can then be deployed to a Raspberry Pi or spare laptop to be used as a checkout!

computer vision powered checkout; gif showing a beverage being shown to the camera with a bounding box around it and a confidence level for its detection

Prerequisites

What You'll Need

What You'll Learn

  1. In the Viam app under the LOCATIONS tab, create a machine by typing in a name and clicking Add machine. add machine
  2. Click View setup instructions. setup instructions
  3. To install viam-server on your device, select the operating system you are running. For example, I'll be using a MacBook Air as my device, so I'll select Mac: select platform
  4. Follow the instructions that are shown for your platform.
  5. The setup page will indicate when the machine is successfully connected. machine connected

With a machine configured, we now need a way to capture images. Let's add a webcam next!

  1. In the Viam app, find the CONFIGURE tab.
  2. Click the + icon in the left-hand menu and select Component.
  3. Select camera, and find the webcam module. Leave the default name camera-1 for now, then click Create. This adds the module for working with a standard USB camera or other webcam that streams camera data. find and add the webcam module for the camera component
  4. Notice adding this module adds the camera hardware component called camera-1. You'll see a collapsible card on the right, where you can configure the camera component, and the corresponding camera-1 part listed in the left sidebar. added webcam
  5. To configure the camera component, the video_path of the intended device needs to be set. You can quickly find which devices are connected to your machine by adding a discovery service. Click Add webcam discovery service that appears in the prompt. add Discovery service to find webcam
  6. Notice that this adds the discovery-1 service and find-webcams module to your machine in the left sidebar. Corresponding cards to these items also appear on the right.
  7. Click Save in the top right to save and apply your configuration changes.
  8. Expand the TEST panel of the discovery-1 card. Here, you'll find attributes of all discoverable cameras connected to your machine. Find the video_path of the device you'd like to use as your webcam, then copy the value. For example, I'll use my MacBook Air's built-in FaceTime camera, so I'll copy 3642F2CD-E322-42E7-9360-19815B003AA6find and copy video_path value
  9. Paste the copied video_path value into your camera component's video_path input, which is in the Attributes section: paste video_path value
  10. Click Save in the top right once more to save and apply your configuration changes.
  11. Expand your camera component's TEST panel. If things are properly configured, you should see the video streaming from your camera. test camera feed
  12. With your camera added and working, you can now delete the discovery-1 service and find-webcams module as you'll no longer need them. Click the ... next to each item, then select Delete.
    delete component
  13. You will be prompted to confirm the deletion, select Delete
    confirm component deletion
  14. Finally, Save your configuration changes.

Great, your machine now has eyes!

To train a custom model based on your beverages, you'll need a dataset the LiteRT framework (previously known as TensorFlow Lite) can use. Here, you'll create a dataset and add some images of your beverages using the webcam you configured in the last step. (Note that the rest of this codelab will still refer to TensorFlow Lite in some areas)

  1. In the Viam app, find the DATASETS tab.
  2. Click the + Create dataset button and give your dataset a name, like beverages. Click the Create dataset button again to save. create a dataset
  3. Switch back to your machine in the Viam app. You can navigate back by going to Fleet > All Machines Dashboard, then clicking on the name of your machine.
  4. Expand your camera component's TEST panel. Here, you'll see the live feed of your camera as well the "Add image to dataset" icon, which looks like a camera. add image to dataset button
  5. Using the live feed as a viewfinder, position your webcam so that you can place one of your beverages fully in the frame. The less visual clutter in the background the better! positioning a beverage for capture
  6. When you are happy with the image, click the Add image to dataset button. adding image to dataset
  7. In the list of datasets that appear, select the dataset you wish to add your captured image to. For example, beverages:
    selecting the dataset
  8. Confirm the dataset you've selected, then click Add.
    add image to selected dataset
  9. A success message will appear at the top-right once your image is added to your selected dataset.
    successful image added to dataset
  10. Repeat steps 5 - 8 to capture at least 10 images of each beverage you will be detecting. Be sure to vary angles and positions!

san pellegrino peachanother angle san pellegrino peachone more angle san pellegrino peachblueberry topo chicodifferent angle blueberry topo chicoone more angle blueberry topo chicospindriftdifferent angle spindriftone more angle spindrift

Phew, that was a lot, but your custom model will thank you! Let's make our images smarter by annotating them in the next step.

Having images to train a model is a good start. However, they won't be useful unless TensorFlow has a bit more information to work with. In this step, you'll draw bounding boxes around your beverages and label them accordingly.

  1. In the Viam app, find the DATASETS tab.
  2. Click on the name of your dataset, for example beverages:
    datasets overview
  3. Here, you'll see all of the images you've captured, neatly grouped into its own space. beverages dataset overview
  4. Select one image from the dataset. A side panel will appear on the right-hand side. You can see details about this image, such as any objects annotated, associated tags, which datasets the image belongs to, among other details. Click on the Annotate button in this panel. dataset side panel when single image selected
  5. The selected image opens to a larger screen. To detect an object within an image, a label must be given. Create an appropriate label for the beverage you have selected, for example spindrift_pog: image label creation - spindrift_pog
  6. With the appropriate label now chosen, hold the Command or Windows key down while you use your mouse to draw a bounding box around your beverage. gif of bounding box being drawn around drink
  7. In the OBJECTS panel on the right, you'll see your beverage listed, with an object count of 1. If you hover over this item, you'll see the spindrift_pog label appear in the image and the bounding box fill with color. drink annotated
  8. Repeat this for the rest of the images that match the label. You can quickly navigate between images by pressing the > (right arrow) or < (left arrow) keys on your keyboard.
  9. Once you get to a new beverage, create another descriptive label, draw the bounding box, and repeat for the rest of the images of the same beverage. Double check that each image only has one label and detects the correct beverage! (Multiple labels and therefore, bounding boxes, are allowed and make sense for more complex detections. Since we are just trying to accurately detect the correct beverage one at a time, one label per image is recommended for this codelab)new drink label and annotation
  10. When you are finished annotating all of your images, you can exit out of the annotation editor by clicking on the "X" in the top-left corner. Notice that a breakdown of your bounding box labels are calculated and displayed: label breakdown
  11. Be sure that all your images are labeled, that there are no Unlabeled images left, and that there are at least 10 images of each beverage you are planning to detect.

Great work annotating all of that. (so..many..beverages...) Your model will be the better for it. Let's finally train your custom model!

  1. In your dataset overview, click Train model located within the left-hand panel. train model button
  2. Select your model training options. For now, leave the default selections of New model and Built-in (TensorFlow Lite). Confirm that the correct dataset is selected. Click Next steps. train a model options
  3. Give your model a name, for example beverage-detectornaming your custom model
  4. Select Object detection as the Task type. Notice that the labels you've created are auto-detected from the images in your dataset and selected in the Labels* section: custom model task type and labels
  5. Click Train model. This will kick off the training job for your custom model. training job in progress
  6. If you click on the ID of your training job, you can view more details on the job's overview. You can view any relevant logs while the job runs. training job overview
  7. Wait until your training job is complete. It may take up to 15 minutes, so feel free to open up one of your beverages! Once it is finished, you'll see the status of your job change to Completedtraining job complete

Well done, you've just created your own custom model tailored to your favorite drinks! Let's add it to our machine.

  1. In the Viam app, find the CONFIGURE tab.
  2. Click the + icon in the left-hand menu and select Service.
  3. Select ML model, and find the TFLite CPU module. Click Add module. Leave the default name mlmodel-1 for now, then click Create. This adds support for running TensorFlow Lite models on resource-constrained devices. find and add the TFLite CPU module for the ML model service
  4. Notice adding this module adds the ML Model service called mlmodel-1 and the tflite_cpu module from the Viam registry. You'll see configurable cards on the right and the corresponding parts listed in the left sidebar. ML model service added
  5. In the Configure panel of the mlmodel-1 service, leave the default deployment selection of Deploy model on machine. In the Model section, click Select model. selecting a custom model
  6. Find and select the custom model you've just trained, for example beverage-detector. Notice that you can select from any custom models you create (located within the My Organization tab) or from the Viam registry (located within the Registry tab). Click Select
    finding and choosing custom model
  7. Confirm that the correct dataset is selected, then click Select
    confirming custom model selection
  8. Click Save in the top right to save and apply your configuration changes.
  9. Your custom model is now configured and will be used by the ML model service. Notice that a Version option is also configurable. If you decide to train new versions of your beverages model, you have the ability to set specific versions based on your needs. custom model added
  1. In the Viam app, find the CONFIGURE tab.
  2. Click the + icon in the left-hand menu and select Service.
  3. Select vision, and find the ML Model module. Give your vision service a more descriptive name, for example beverage-vision-service. Click Create. While the camera component lets you access what your machine sees, the vision service interprets the image data. find and add vision service
  4. Notice that your service is now listed in the left sidebar and a corresponding configuration card is added on the right.
  5. In the Configure panel of your vision service, set the ML Model to your ML Model service, for example mlmodel-1setting the vision service&rsquo;s ML model
  6. Move the Minimum confidence threshold slider to 0.5. This sets the vision service to only show results where its beverage detection confidence level is at least 50% or higher. setting confidence threshold for vision service
  7. Find and select your camera component in the Depends on section, for example camera-1. setting dependent camera for vision service
  8. Click Save in the top right to save and apply your configuration changes. This might take a few moments.
  9. Expand the TEST panel of your vision service. You'll see a live feed of your configured webcam and a section Labels. Test out your CV-powered checkout! Try showing your beverages to your webcam. When detected, an item will appear with the object the vision service thinks it is seeing and its confidence level. You can also try showing multiple beverages and see how well your model detects them! computer vision powered checkout; gif showing a beverage being shown to the camera with a bounding box around it and a confidence level for its detection

Congratulations! You've just built a working computer vision-powered checkout with a custom model trained on your favorite beverages!

Congratulations! You've just built a computer vision-powered checkout! 🥳 Using your own images, favorite drinks, and the built-in TensorFlow Lite framework, you've created a custom model to detect the beverages that make you smile and can be deployed anywhere. And through Viam's modular platform, you combined your custom model with a vision service to enable a CV-powered checkout! Do let me know if you've built this!

What You Learned

Real-world applications for CV-powered checkout

This project is a great way to learn about combining different components to produce something useful; it has practical applications as well:

Specifically for beverages, some real-world use cases can include:

Extend your CV-checkout with Viam

Right now, you can detect your favorite drinks using your custom model. But there are other things you can do! As an example, you could:

Related Resources