AI Price Estimation From Images: 3 Attempts, 1 Working Solution
Share
Why Does Everyone Say "AI Can't Measure From Photos"?
In 2024, researchers at Google published SpatialVLM — a benchmark testing how well vision-language models understand spatial relationships. The results were sobering: when asked to estimate distances and dimensions from photos, state-of-the-art models landed within the correct range (0.5× to 2× of reality) only 37.2% of the time. Almost two-thirds of estimates were off by more than double.
A follow-up study, SpatiaLab (2026), confirmed the problem runs deep — earlier benchmarks actually overestimated how well these models perceive space. The real numbers are worse.
The fundamental issue is called monocular ambiguity: from a single 2D image without a reference point, it's physically impossible to recover absolute 3D dimensions. A 30cm pot photographed up close looks identical to a 3m planter shot from a distance. No amount of training data changes this — it's not an AI limitation, it's geometry.
And AI-generated images make it even harder. Real photos at least carry EXIF metadata (focal length, sensor size) that could theoretically anchor perspective calculations. Generated images have none of that.
Interestingly, the research also revealed where the bottleneck actually lies: the vision-language handoff problem. The visual encoder does correctly represent spatial information internally — but the language model can't extract it when generating text responses. The model "sees" the dimensions more accurately than it can "say" them.
So when we decided to build automatic price estimates for AI-generated garden designs, the academic consensus was clear: don't measure. Classify.
Here's what the industry does instead:
| Company | Approach | Measures from photos? |
|---|---|---|
| Zillow Zestimate | Classifies features (granite vs laminate), uses comparable sales data | No — 1M+ training samples, classification only |
| SimplyWise | Classifies project type → regional price tables | No — ±10-15% accuracy, no pixel measurement |
| Hover | 8-10 photos → 3D reconstruction + human QA | Yes — but needs multiple angles and takes ~1 hour |
| AI Garden Planner, Planner 5D, etc. | Visualization only — no pricing | N/A |
Nobody in the AI garden design space ($1.72B market, growing at 21.4% CAGR) offers price estimates from generated designs. Not one competitor. We decided to try anyway.
How Our AI Generates Garden Designs (And Why That Makes Pricing Hard)
Before diving into pricing, it helps to understand what we're pricing. Our AI Garden Designer lets users upload a photo of their actual garden. The AI then generates a photorealistic visualization of how that space could look with modular raised beds.
The image actually goes through a three-stage pipeline before it gets rendered:
- Agent 1 — brand-unaware architect (Gemini Flash). Takes the user's vision, density, and season. Designs a coherent garden scene as JSON — elements list, planting character, style mood. This agent has no knowledge of our product, so it doesn't default to "raised beds everywhere." Ask for a wildflower meadow and you get a wildflower meadow. Ask for a pergola lounge and you get a pergola lounge.
-
Agent 2 — brand-aware categorizer (Gemini Flash). Reads the scene plan and decides which elements are Brick-compatible (raised beds, benches, low walls, planters, stairs — yes; pergola, trellis, trees — no). Picks 0–2 product reference photos from a pool of three, depending on which Brick elements are present. Builds the final image prompt with a conditional Brick paragraph that only appears when
brick_count > 0. - Image generator (Gemini Pro image model). Receives the final prompt + 0–2 reference photos. Renders the photorealistic garden.
This split matters for pricing. A "wild meadow" request ends up with zero Brick structures and zero reference photos — there's nothing to price. A "terraced kitchen garden with herbs and a bench" ends up with two reference photos and a full Brick paragraph injected into the prompt. The downstream price estimator only fires when the scene actually contains Brick-compatible structures; the planting-only scenes correctly return €0 and no quote email.
Within that constraint, the AI still has creative freedom on everything else — how many structures, where they go, what shapes they take, how they relate to the existing garden. An L-shape wrapping a tree? A bench integrated into a retaining wall? A set of stairs following a slope? All up to the model.
Users control a density slider (0-100) that maps to approximately 0-20 structures. At density 25, you get a naturalistic garden with a few subtle beds nestled among wildflowers. At density 80, you get a fully organized outdoor living space with distinct zones connected by pathways. The AI picks what types of structures make sense for the scene.
This creative freedom is the whole point of the tool. Nobody wants a configurator that outputs the same three rectangular beds every time. But it creates a fundamental pricing problem: every generated image is unique. There's no predefined bill of materials. No SKU list. Just a photorealistic picture of wooden structures that could be anything from a single planter to an elaborate multi-zone garden.
So how do you price something that doesn't exist yet, from an image that was generated 5 seconds ago?
Attempt #1: Just Ask AI for Dimensions
Our first approach was the most naive one: give Gemini 2.5 Pro the generated garden image and ask it to estimate dimensions in meters.
// The prompt we shipped to production
You are an expert at estimating dimensions of garden structures
from photographs.
For EACH distinct wooden structure you can identify
(raised beds, benches, walls, stairs, planters),
estimate its dimensions in meters:
- length_m: the longest horizontal dimension
- width_m: the shorter horizontal dimension (depth)
- height_m: the vertical dimension
Return JSON:
{ "structures": [
{ "type": "raised_bed",
"length_m": 2.0, "width_m": 1.0, "height_m": 0.6 }
]}
Pricing was straightforward geometry — calculate visible wall surface area and multiply by €125/m²:
// Wall area calculation per structure type:
// raised_bed: 2 × (length + width) × height
// wall: 2 × length × height
// stairs: length × height × 1.5
const wallArea = 2 * (s.length_m + s.width_m) * s.height_m;
const price = wallArea * 125; // EUR per m²
It worked. Better than we expected. A bed that was actually 1.8m long would come back as 1.4m or 2.2m — the individual dimensions were imprecise, but the geometry compensated: when length was overestimated, height tended to be underestimated. The price estimate ended up within ±20-25% of reality. For a free, instant estimate from an AI-generated image, that felt surprisingly useful.
The model was especially decent at counting structures — if the image showed 3 raised beds and a bench, it generally found 3 raised beds and a bench. It understood what our Brick system looks like. The dimensions were fuzzy, but the structure detection was solid.
But then we read the papers. SpatialVLM's 37.2% accuracy rate. Google's own documentation cautioning against spatial measurement from single images. Stack Overflow threads full of "this is fundamentally impossible." We got spooked.
"This can't work long-term," we told ourselves. "We're just getting lucky. Let's do it the right way — the way everyone recommends."
Attempt #2: The "Proper" Way — Catalog Classification
The recommended approach is clear: don't measure, classify. Identify what type of structure it is, assign it to a size category, look up a fixed price. No measurement, no ambiguity. This is what Zillow does. This is what SimplyWise does. This is what the research says to do.
The idea was simple:
// Classify structure type + size → fixed price lookup
const PRICE_TABLE = {
raised_bed: { small: 50, medium: 100, large: 180 },
wall: { small: 25, medium: 50, large: 90 },
bench: { small: 30, medium: 60, large: 100 },
stairs: { small: 45, medium: 90, large: 140 },
planter: { small: 15, medium: 30, large: 55 }
};
But we ran into a problem we didn't anticipate �� and it had nothing to do with AI accuracy.
Our AI garden designer generates creative designs. A user uploads a photo of their garden, and Gemini Imagen creates a unique visualization with modular raised beds arranged to fit that specific space. The structures it generates are varied — L-shapes, curves that follow a garden path, beds integrated into slopes, benches connected to raised beds, tiered arrangements that blur the line between "stairs" and "wall."
To make catalog classification work, we would have had to constrain the image generator. "Only generate these 5 types. Only generate these 3 sizes. Keep everything rectangular." That would have made pricing accurate — but it would have killed the thing that makes the tool valuable: the creative, personalized designs.
We were facing a fundamental trade-off: accurate pricing vs. creative freedom in generated images.
And even when we tried to make classification work without constraining the generator, the results were poor:
- "Small/Medium/Large" meant nothing to the model. Without a reference object in the image, the same bed was "small" in one analysis and "large" in the next. There's no physical anchor for these words — "medium" is a language concept, not a measurement.
- Creative structures don't fit neat categories. Is an L-shaped bed one "large" raised bed or two "medium" ones? Is a bench integrated into a raised bed a "bench" or part of the bed? The categories were too rigid for what the generator actually produced.
- We found ourselves adding hacks. An overcounting discount (-15% for each structure above 3, because the model hallucinated extras). A reclassification step. A manual override table. Each hack was a sign that the approach didn't fit our use case.
The core problem: catalog pricing assumes a catalog. It works for Zillow because houses have known types (ranch, colonial, split-level) with decades of comparable sales data. It works for SimplyWise because construction projects map to standardized categories. Our AI generates unique designs every time — there's no catalog to classify against.
We never shipped this version. Instead, we went back to what actually worked — measurement — but with a crucial insight.
Attempt #3: Make the Product the Ruler
The research was right about one thing: you can't recover absolute dimensions from a single image without a reference point. But it was wrong about one assumption — that no reference point exists.
Our product has a built-in ruler.
The Brick modular system uses larch planks that are 120mm tall and 60mm thick. When stacked horizontally, each plank forms one visible layer — a distinct 12cm band in every generated image. This is a physical constant of the product. It's the same in every image, every design, every angle. And the image generator already knows about it — Agent 2 injects "thick planks (120mm tall × 60mm thick), stacked horizontally with joints offset row by row like brickwork" into every prompt where Brick structures appear, so the planks are rendered consistently.
With V1, we'd asked: "How many meters long is this bed?" — a question that requires solving the monocular ambiguity problem.
With V3, we ask: "How many plank layers do you see, and how many times longer is the wall compared to its height?" — questions that require only counting and estimating a proportion. Both are things vision models do well.
// The actual prompt in production (v3)
SCALE REFERENCE: Each horizontal plank layer = exactly 12cm
(0.12m) tall. Count layers to get the height, then estimate
length relative to the known height.
MEASURE each structure:
- layers: count visible horizontal plank layers (each = 12cm)
- length_ratio: how many times longer the wall is vs its height
- visible_faces: how many wall faces are visible
VERIFY: Typical gardens have 2-5 structures.
If you found >6, you likely overcounted.
Return JSON:
{"structures": [
{"reasoning": "4 horizontal layers visible, wall extends
about 3.5x the height, front and side visible",
"type": "raised_bed",
"layers": 4,
"length_ratio": 3.5,
"visible_faces": 2}
]}
The pricing engine does the arithmetic:
const LAYER_HEIGHT_M = 0.12;
const PRICE_PER_M2 = 120;
const PRICE_PER_STAIR_STEP_M = 58; // stairs price separately
function calculatePrice(s) {
const height = s.layers * LAYER_HEIGHT_M;
// 4 layers = 0.48m
if (s.type === 'stairs') {
// Stairs: per-step × width, not m² — vertical rises break the wall math
const width = s.length_ratio * LAYER_HEIGHT_M;
return Math.round(s.layers * width * PRICE_PER_STAIR_STEP_M);
// 6 steps × 1.2m × €58 = €418
}
const length = height * s.length_ratio;
// 0.48m × 3.5 = 1.68m
if (s.type === 'wall') {
// Single-face wall — no opposite face, no sides
return Math.round(length * height * PRICE_PER_M2);
// 1.68 × 0.48 × €120 = €97
}
// raised_bed / planter / bench — closed box, perimeter × height
const depth = height * (s.depth_ratio || 1.0);
const perimeter = 2 * (length + depth);
const totalM2 = perimeter * height;
// 2 × (1.68 + 0.48) × 0.48 = 2.07 m²
return Math.round(totalM2 * PRICE_PER_M2);
// 2.07 × €120 = €249
}
A note on the math above: the original V3 shipped with faceArea × visible_faces — counting only the wall faces visible in the photo. In production we noticed this systematically undercounted by 20–50% on box-type structures because it ignored the back wall and one side. V3.2 replaced it with a perimeter-based formula for closed boxes, a simple length×height for single-face walls, and a per-step formula for stairs. Stairs needed their own track because layer-counting breaks down on vertical rises — a staircase's "wall area" is meaningless; you pay for blocks per step, not for visible faces.
Why this works where V1 and V2 didn't:
- Counting is what vision models do well. Horizontal lines in stacked plank structures are high-contrast, repetitive visual features. Counting discrete layers is fundamentally different from estimating "how many meters" — it's pattern recognition, not spatial reasoning.
- Ratios are easier than absolutes. "This wall is about 3.5 times longer than it is tall" is a visual proportion judgment. The model doesn't need to know the absolute size — just the shape. This sidesteps the monocular ambiguity entirely.
- The scale reference is real. 12cm per layer isn't an assumption — it's a manufacturing spec baked into both the physical product and the image generation prompt. The AI "knows" the plank thickness because it generated the image with that constraint.
- Creative freedom is preserved. Unlike V2's catalog approach, we don't constrain what structures the generator can create. L-shapes, curves, integrated benches — anything goes. The layer-counting approach works on any shape because it measures visible wall surface, not predefined categories.
-
The AI observes, code calculates. We separated the task into what AI does well (visual pattern recognition) and what code does well (arithmetic). Neither does the other's job. The
reasoningfield forces the model to describe what it sees before giving numbers, which surfaces bad estimates in the logs and keeps outputs grounded.
What Changed Between the Approaches
| V1: Direct Measurement | V2: Catalog Classification | V3: Layer Counting | |
|---|---|---|---|
| What we ask AI | "How many meters?" | "What type and size?" | "How many layers? What ratio?" |
| Anchor point | None (guessing) | Fixed catalog (constraining) | 12cm plank layer (physical) |
| Creative freedom | Full | Constrained (needs predefined types) | Full |
| Accuracy | ±20-25% (unpredictable) | Inconsistent (never shipped) | ±20% (predictable) |
| Price range | ±20% symmetric | Fixed lookup (no range) | ±20% symmetric |
| Model | Gemini 2.5 Pro (~$0.005) | Gemini 2.5 Flash (~$0.001) | Gemini 2.5 Flash (~$0.001) |
| Status | Worked, but abandoned after research | Never shipped — too constraining | In production |
A note on the price range. We show ±20% around the central estimate — a deliberately wide band that reflects the real uncertainty of measuring from a single photo. A 4-layer bed might come back as 3 or 5 layers; a length ratio of 3.0 might be 2.5 or 3.5. The math multiplies those errors, so the honest communication is "somewhere between X and Y," not a false-precision single number. When a human quote-team takes over from the estimate, they typically land within the band.
From Generated Image to Price Estimate in 5 Seconds
Here's what happens after a user generates a garden design:
For registered users, the price estimate triggers automatically — no button click needed. The generated image is resized to 1024px and sent to a second AI model (Gemini 2.5 Flash, configured for vision analysis at temperature 0.2 for deterministic counting). This is a different model call than the one that generated the image — the generator creates, the analyzer measures.
The analyzer returns a JSON with its reasoning for each structure: "4 horizontal layers visible, wall extends about 3.5× the height, front and side visible." Our code multiplies layers by 0.12m, applies the ratio, calculates m², and sums everything up.
The result appears directly below the generated image — a green panel with a per-structure breakdown table. Each row shows: structure type, dimensions (height × length), visible faces, wall area in m², and estimated price. The total shows X.XX m² × €120/m² with the price range in large text. No black box — users can see exactly how the estimate was calculated and judge for themselves whether the layer count looks right.
Simultaneously, an email arrives with the same breakdown plus the garden image. If the user doesn't respond within 3 days, a single reminder follows: "Still thinking about your garden?" with a one-click button to request an exact quote from a human. The whole chain — image generation to price estimate to email — costs under $0.01.
The Economics: $0.135 Per Image, $0.001 Per Price Quote
Building an AI-powered tool is one thing. Making it economically sustainable is another. Here's what the numbers actually look like.
Image generation costs $0.134 per image. We use Gemini's Pro image model — the most expensive tier. We tried the cheaper Flash model early on. The output quality wasn't good enough: textures looked flat, wood grain was inconsistent, the Brick plank proportions drifted. For a tool where visual quality is the product, saving 60% on generation cost while producing images that don't look convincing wasn't a trade-off worth making. Pro only, no fallback.
Price estimation costs $0.001 per quote. Here the calculus is reversed — we use Gemini 2.5 Flash for the vision analysis. Counting plank layers and estimating proportions doesn't require the same model that generates photorealistic images. Flash handles counting tasks reliably at a fraction of the cost. Choosing the right model for each task — Pro where quality matters, Flash where accuracy of a specific narrow task matters — is the difference between a sustainable and an unsustainable product.
A typical user session looks like this:
| Step | Model | Cost |
|---|---|---|
| Generate garden design (×1 free, ×4 with email) | Gemini Pro (image) | $0.268 |
| Price estimate | Gemini 2.5 Flash (vision) | $0.001 |
| Pricing calculation + email | Node.js (no API call) | $0.000 |
| Total per session | ~$0.27 |
Every user gets 1 free generation without registration. Providing an email unlocks 3 more (4 total per session). Beyond that, users purchase credit packs — 3 images for €1 up to 50 for €10. At $0.134 per generation, the margins work out to roughly 40-60% depending on the pack size.
The price estimate itself is always free — at $0.001 per quote, gating it behind a paywall would cost more in lost engagement than it saves in API fees. And the pricing math (layers × 0.12m × ratio × faces × €120/m²) runs entirely in our code with zero API calls. Once Gemini Flash returns the layer counts, everything else is deterministic arithmetic.
We also optimize input costs at every step. User-uploaded photos are preprocessed with Sharp — resized to max 2048px and stripped of EXIF data before hitting the API. For price quote analysis, the generated image is further compressed to 1024px JPEG. A small pool of product reference photos is cached locally and Agent 2 picks 0–2 of them per scene (not always attached — meadow-only scenes get zero references). The generation prompt is kept under 150 words — above 200, the image model starts ignoring parts of the instruction.
The Business Model: We Lose Money on Generation. That's the Point.
Let's be honest about the economics. Most users generate 2-5 images using their free allowance and never buy a credit pack. The few who do buy credits don't come close to covering the total API costs for all users. On pure generation revenue, we're operating at a loss.
That's intentional. The AI Garden Designer isn't a product — it's a funnel.
Here's what we actually get from a user who generates a garden design and enters their email:
- A warm lead with purchase intent. Someone who uploads a photo of their garden, generates a design with raised beds, and reviews a price estimate is not a casual browser. They're actively considering a garden project. That's qualitatively different from someone who clicked an ad.
- A personalized price anchor. The user now has a specific number in their head — "my garden would cost around €350." That's far more effective than a generic product page listing plank prices per piece.
- A visual they've already fallen in love with. They generated the design themselves. They chose the density, the style, the arrangement. There's ownership in that image that no catalog photo can match.
The email sequence reinforces this. Immediately after generating a design, the user receives a price quote email with their garden image embedded — the specific design they created, not a stock photo. The email includes a per-structure breakdown (type, wall area, estimated price) and a prominent button to request an exact quote from a human.
If they don't respond within three days, a single reminder arrives: "Still thinking about your garden?" — same image, same price range, same one-click button. Just one reminder, not a drip campaign. We want to be helpful, not annoying.
Below the generated design on the website, there are always two CTAs: a link to the 3D Configurator where they can spec out exact dimensions, and a link to browse the e-shop. The journey from "I wonder what my garden could look like" to "I'm configuring my order" can happen in a single session.
On privacy: email submission is always accompanied by a link to our privacy policy and a clear note that users can unsubscribe anytime. The price quote email is transactional — the user explicitly requested a price estimate. Marketing emails (newsletter) require a separate explicit opt-in checkbox. We store only what's needed: email, locale, the design image, and the price breakdown. GDPR compliance isn't just a legal requirement — it's the only way to build trust with people who are giving you their contact details alongside a photo of their home.
The Bigger Lesson: Ask AI to Observe, Not to Answer
The mistake in V1 wasn't using AI for spatial tasks — it was asking the model to produce the final answer directly. "How many meters long is this?" requires the model to solve monocular ambiguity, convert visual features to physical units, and produce a calibrated number. That's three hard problems stacked together.
V3 breaks it into pieces. "How many horizontal layers?" is a counting task — one of the most reliable things vision models do. "How many times longer than tall?" is a proportion estimation — also reliable, because ratios are scale-invariant. The conversion from layers to meters, and from ratios to absolute dimensions, is deterministic code with a known physical constant.
The same principle applies beyond our use case:
- Don't ask "how tall is this building?" — ask "how many floors?" and multiply by standard floor height.
- Don't ask "how wide is this room?" — ask "how many tiles across?" and multiply by tile size.
- Don't ask "how long is this fence?" — ask "how many posts?" and multiply by standard spacing.
If your product or scene contains any repeated, visible, dimensionally consistent element, you already have a ruler. You don't need the AI to measure — you just need it to count.
Try It Yourself
Upload a photo of your garden, let the AI design it with modular raised beds, and get an instant price estimate. The whole process takes about 30 seconds. The design and the price estimate are free.
Get a Price Estimate in 30 Seconds
Upload a photo → AI generates your garden design → instant price breakdown.
Try AI Garden Designer Or Use the 3D ConfiguratorFrequently Asked Questions
How accurate are AI-generated price estimates from garden images?
Our system achieves approximately ±20% accuracy around the central estimate. We show the range explicitly so users see honest uncertainty instead of false-precision single numbers — human quotes from our team typically land inside that band.
What AI model is used for the price estimation?
We use Google's Gemini 2.5 Flash for vision analysis. Each estimate costs approximately $0.001 (one-tenth of a cent). We switched from the more expensive Gemini 2.5 Pro after finding that Flash performs comparably for our specific use case of counting structural layers.
Can AI really measure dimensions from a single photo?
Not directly — research shows AI vision models get absolute measurements wrong 63% of the time. Our approach sidesteps this by using the product's own structure (12cm plank layers) as a built-in scale reference. The AI counts layers and estimates proportions, then our code does the math.
Why not use GPT-4 Vision instead of Gemini?
Gemini Flash is approximately 4× cheaper with comparable spatial reasoning performance for our specific use case. Since we're making one API call per estimate, cost per call matters — at $0.001 each, we can offer unlimited free estimates.
Can this approach work for other products?
Yes — if your product has any known, visible, dimensionally consistent feature that appears in images. Brick courses in masonry, floor tiles, standard lumber widths, cinder blocks — anything with a fixed real-world dimension that AI can count can serve as a scale reference.
Is the price estimate a binding quote?
No, it's an indicative estimate to help you plan. You can request an exact quote with one click — a human reviews the design and provides a precise price within 24 hours.