Auditing e-commerce listings with multimodal AI agents cover

Auditing Webshop Listings: How We Turned BigQuery into an AI Data Quality Engine

E-commerce platforms struggle with mismatched product titles and images. We built a solution for a Kaggle challenge that uses BigQuery and Gemini to act as an automated catalog auditor, classifying millions of listings for consistency and providing actionable insights to improve data quality.

TAMÁS PÓTSA

September 23, 2025

We built an automated product catalog auditor using BigQuery and Gemini to solve a common e-commerce problem: inconsistencies between product descriptions and images. Our solution, developed for a hackathon, can classify millions of listings, identify the root causes of errors, and provide actionable insights to improve data quality at scale. This project was our answer to the BigQuery AI - Building the Future of Data hackathon featured by Google Cloud on Kaggle.

In this article, we will demonstrate how to use BigQuery's built-in AI capabilities to process mixed-format data and tackle real-world business challenges directly within the database.

When a picture isn't worth a thousand words

You've probably experienced it before: you search for a "blue cotton t-shirt," click on a promising result, and the image shows a red polyester jacket. This mismatch between a product's title and its image is a huge headache for e-commerce platforms. It leads to customer confusion, erodes trust, and drives traffic away.

For companies with millions of products, manually reviewing every listing is just impossible. This is where AI comes in. The hackathon prompted us to use BigQuery's AI features to solve a real problem with unstructured data, and this common problem of title-image inconsistency was a perfect fit.

A strict AI auditor built with SQL

We developed a solution that transforms BigQuery into a powerful, automated data quality engine. The core of our project is a single SQL query that uses the AI.GENERATE function to pass product titles and image URLs to the gemini-2.5-flash model.

We instructed the model to act as a strict catalog auditor and classify each product into one of four categories:

OK: The title and image are a perfect match.
MISMATCH: It's the same type of product, but a key attribute like brand, size, or model is different or missing.
ERROR: The title and image are for completely different products.
UNCERTAIN: There isn't enough evidence to make a confident decision.

Critically, we also asked the model for explainability:

Reasons: A human-readable explanation for its decision.
Salient Image Tags: Keywords describing the key visual features of the image.
Confidence Score: A numerical score (from 0 to 1) of its certainty.

(You can find code and prompts for this in the notebook linked at the end of the article.)

Key findings from our AI audit

After running the analysis, we uncovered several key insights into the catalog's data quality.

1. Inconsistency is widespread

The audit revealed a nearly even split between consistent (OK) and inconsistent (MISMATCH + ERROR) listings. This confirmed our hypothesis that catalog quality was a significant issue, driven by incomplete titles (missing brand or volume) and noisy product photos.

2. The model is confident in its decisions

The vast majority of classifications had a confidence score between 0.9 and 1.0. This high level of certainty suggests we can largely trust the model's automated judgments for most cases, while the UNCERTAIN category effectively isolates the few ambiguous listings that require human review.

3. The biggest problems were Brand and Volume

By analyzing the keywords in the model's "reasons," we found the primary drivers of mismatches. "Brand" was mentioned over 15,000 times, with "volume" and "type" being the next most common culprits. This tells us that the most significant source of catalog error isn't wildly incorrect listings, but rather missing or conflicting details about specific attributes.

A look at real examples

To see the auditor in action, let's look at a few examples.

✅ Perfect matches (OK)

TP-LINK Wireless N Router TL-WR940N 450Mbps: A fantastic example where the model perfectly matched the brand, model number, and technical specifications in the title to the image.
Safi Dermasafe Night Moisturiser 50 gr: The AI correctly identified the brand, product line, type, and volume, confirming a perfect match.

⚠️ Attribute mismatches (MISMATCH)

Nescafe Éclair Latte: The title was simple, but the image clearly showed a volume (220ml) that was missing from the title, triggering a MISMATCH.
MARKS & SPENCER Rose Hand & Body Lotion 250 ml: Here, the title included the brand and volume, but these details were not clearly visible on the bottle in the image, leading to a MISMATCH.

❌ Clear error (ERROR)

Plastic Kangaroo Toy: The title had nothing to do with the image, which showed Victoria’s Secret shopping bags and cupcakes. This is a classic example of a severe cataloging error that the system easily flagged.

Turning insights into action

This solution provides a scalable, data-driven framework for e-commerce catalog management. Retailers can use these AI-generated insights to:

Automatically approve tens of thousands of OK listings, saving countless hours of manual work.
Prioritize human review for the listings flagged as MISMATCH or ERROR, focusing attention where it's needed most.
Identify and fix systemic issues. For example, if "brand" is a recurring reason for mismatches, they can enforce stricter data entry rules for that field.

By embedding AI directly within the database, we've shown that BigQuery can be more than just a data warehouse, and that it can be an active, intelligent engine for ensuring data quality and solving real-world business problems.

We embraced the Multimodal Pioneer approach from the hackathon by combining text and images to tackle a real-world business problem directly within BigQuery and demonstrated that you don't need a separate system to process mixed-format data. It can all happen within a single environment that feels like an extension of SQL, and you can realize a powerful, scalable, end-to-end AI workflow that turns overlooked data into actionable insights.

If you’d like to dig into the details, find the notebook of this solution under this link.

Article by TAMÁS PÓTSA

Artificial Intelligence

Cloud

AI Agents

Retail & CPG

Explore more stories

From SQL DAGs to Databricks Jobs: How We Automated Snowflake Task Migration
Migrating Snowflake Tasks to Databricks Jobs used to be tedious manual work. We built a simple automation tool that converts SQL DAGs to Databricks YAML workflows, saving 80% of orchestration migration effort.
Context is King: UX Design for Private Equity AI Tools
A tool that fits one private equity firm might completely derail another. UX designers must keep specific users in mind while fulfilling specific requirements of a highly sensitive industry. Here’s how we do it.
Databricks Asset Bundles on Isolated Networks
Network isolation requirements in regulated industries create unique challenges. Learn how to implement secure, maintainable deployment patterns with Databricks Asset Bundles.

Flying high with Hifly

We want to work with you

Hiflylabs is your partner in building your future. Share your ideas and let’s work together.