How We Transformed a "Dirty" Export into a Shopify-Ready Catalog (Using Only Google Sheets)

Alpar Torok

Part 1 - Adding Colors: Our Recent Adventure with Google Sheets

We started with a denormalized export (two sets of ids, colors mixed into titles, separate images). We cleaned it up and transformed it into an import-ready catalog for Shopify and a properly prepared feed (GMC-ready), using only Google Sheets. Here is what we did:

  • Extracted color from the product title using the "longest match wins" rule (e.g.: "light green bleached" > "light green" > "green").
  • Automatically filled HEX codes from a color table.
  • Propagated Color + HEX into the final list based on SKU.
  • Linked images by Product ID.
  • Prepared a feed for a correct Shopify setup (GMC-ready).
  • Used column-wide formulas with ARRAYFORMULA (scalable, no copy-pasting).

1) Extracting Color From Product Title (Longest Match)

Assumptions: Product title is in S2.
In the Colors sheet: B:B = Color Name, C:C = HEX.

Color (prefers the longest match)US Locale (commas):

=IFERROR(
  INDEX(
    FILTER(Colors!B$2:B,
      REGEXMATCH(
        LOWER(" "&$S2&" "),
        "(^|[^[:alnum:]]|_)" &
        REGEXREPLACE(LOWER(Colors!B$2:B),"([.^$|?*+(){}[\\]\\-])","\\$1") &
        "($|[^[:alnum:]]|_)"
      )
    ),
    MATCH(
      MAX(LEN(
        FILTER(Colors!B$2:B,
          REGEXMATCH(
            LOWER(" "&$S2&" "),
            "(^|[^[:alnum:]]|_)" &
            REGEXREPLACE(LOWER(Colors!B$2:B),"([.^$|?*+(){}[\\]\\-])","\\$1") &
            "($|[^[:alnum:]]|_)"
          )
        )
      )),
      LEN(
        FILTER(Colors!B$2:B,
          REGEXMATCH(
            LOWER(" "&$S2&" "),
            "(^|[^[:alnum:]]|_)" &
            REGEXREPLACE(LOWER(Colors!B$2:B),"([.^$|?*+(){}[\\]\\-])","\\$1") &
            "($|[^[:alnum:]]|_)"
          )
        )
      ),
      0
    )
  ),
"")

HEX for the found color (if the formula above is in T2):

=IF(T2<>"", IFERROR(VLOOKUP(T2, Colors!B$2:C, 2, FALSE), ""), "")

2) Color & HEX in "Final" Sheet by SKU

Assumptions: In the Export sheet: W=SKU, Y=Color Name, Z=Color Hex.
In the Final sheet: R=SKU (e.g., R2).

Color (lowercase) — cleaning invisible spaces, lookup on a virtual column:

=IF(R2="","", LOWER(IFERROR(VLOOKUP(TRIM(R2), {ARRAYFORMULA(TRIM(Export!W:W)), Export!Y:Y}, 2, FALSE), "")))

HEX:

=IF(R2="","", IFERROR(VLOOKUP(TRIM(R2), {ARRAYFORMULA(TRIM(Export!W:W)), Export!Z:Z}, 2, FALSE), ""))

Duplicate SKU (all distinct colors for the same SKU):

=IF(R2="","", LOWER(TEXTJOIN(", ", TRUE, UNIQUE(FILTER(Export!Y:Y, TRIM(Export!W:W)=TRIM(R2))))))

3) Images by Product ID

Assumptions: Images sheet: A=Product ID, C=Image URLs, D=Variation images.
Final sheet: A=Product ID (e.g., A2).

Image URLs:

=IF(A2="","", IFNA(XLOOKUP(A2, Images!A:A, Images!C:C, ""), ""))

Variation images:

=IF(A2="","", IFNA(XLOOKUP(A2, Images!A:A, Images!D:D, ""), ""))

Column-wide (fills B & C at once) — place in Final!B2 and leave cells below empty:

=ARRAYFORMULA(
  IF(LEN(A2:A)=0, ,
    { IFNA(XLOOKUP(A2:A, Images!A:A, Images!C:C, "")),
      IFNA(XLOOKUP(A2:A, Images!A:A, Images!D:D, "")) }
  )
)

4) Properly Prepared Feed for Shopify (GMC-Ready)

  • Standardized colors (longest match) + consistent HEX.
  • Unique SKU → secure links between attributes and variants.
  • Titles/Descriptions without color duplication if you have a separate Color field.
  • Mapped Metafields (e.g.: product.metafields.specs.color).
  • Images correctly associated at Product/Variant levels.
  • Consistent Price/Promo data.
  • Normalize spaces & special characters before import.

Common Traps & Solutions

  • Invisible spaces (CHAR 160): use SUBSTITUTE(TRIM(x),CHAR(160),"").
  • Text vs Number for ID/SKU: normalize with TO_TEXT(...).
  • US Locale: the formulas above use commas as separators.

Want a quick mini-audit of your catalog?

Schedule here (max 30 min): Dalbe Mini-audit


Part 2 - How We Solved the Category Mapping Challenge in Google Sheets

When working with thousands of products and categories, it's not enough to just have them in a table. You must ensure all names are consistent, with or without diacritics (special characters), and that every product lands in the correct category.

We created a custom function in Google Sheets that does exactly that:

  • Reads your list of categories and subcategories from a "Mapping" sheet;
  • Checks if the input text has diacritics or not (both versions are accepted);
  • Uses only the first term when a cell contains multiple categories separated by commas;
  • Returns the exact correct name, including diacritics, as it should appear on the site;
  • If no match is found, it sets the category to "Others" and keeps the original term as a subcategory.

1. Create the Mapping sheet

Add a new sheet named Mapping with the following structure:

Key Parent_name Category_name
product type A Main Category 1
subcategory 1 Main Category 1 Subcategory 1
product type B Main Category 2 Category B

2. Add the code to Apps Script

Go to Extensions → Apps Script and insert the following code:

const MAP_SHEET = 'Mapping';

function norm_(s) {
  return (s || '')
    .toString()
    .trim()
    .toLowerCase()
    .normalize('NFD')
    .replace(/[\u0300-\u036f]/g, '');
}

function loadMap_() {
  const sh = SpreadsheetApp.getActive().getSheetByName(MAP_SHEET);
  const last = sh.getLastRow();
  const vals = sh.getRange(2, 1, last - 1, 3).getValues();
  const map = {};
  vals.forEach(([key, parent, cat]) => {
    const k = norm_(key);
    if (k) map[k] = [parent || '', cat || ''];
  });
  return map;
}

function mapFirstToken_(input, map) {
  const original = (input || '').toString().trim();
  const firstToken = original.split(',')[0].trim();
  const firstNorm = norm_(firstToken);

  if (map[firstNorm]) return map[firstNorm];
  if (map[norm_(original)]) return map[norm_(original)];
  return ['Others', firstToken];
}

function MAP_CAT(input) {
  const map = loadMap_();
  if (Array.isArray(input)) {
    return input.map(r => mapFirstToken_(r[0], map));
  }
  return [mapFirstToken_(input, map)];
}

3. Use the formula in your sheet

If you have raw categories in column A, insert the formula in B2:

=MAP_CAT(A2:A)

Column B will display the Parent_name, and column C the Category_name. From now on, any inconsistency is resolved automatically.


Part 3 - How to Combine Two Barcode Columns in Google Sheets

Have two barcode columns and want to unify them into one, prioritizing the first column? Here are the formulas ready to use.

The Rule

  • If the cell in the first column has a value, we use it.
  • If it is empty, we use the value from the second column.

Formula for a single row

Enter in the target column (e.g., C2):

=IF(A2<>"", A2, B2)

Version for the entire column (ARRAYFORMULA)

=ARRAYFORMULA(IF(A2:A<>"", A2:A, B2:B))

Part 4 - Cleaning Up Duplicate Products After Import: We Thought We Were on the Home Stretch

Just when we were convinced the import went well, Shopify surprised us: instead of a clean list of products, we found duplicate products. After closer analysis, we discovered the error was actually ours. Some SKUs had been incorrectly associated in the import file, and Shopify treated them as separate products.

The challenge was clear: how do we delete hundreds of duplicates without spending hours doing it manually? The fastest and safest solution was the tagging method. We applied a tag to all duplicate products, filtered them directly in Shopify, and deleted them in just a few clicks. However, it took 8 hours to reach a correct tagging state. Such a small mistake had a massive impact.

This step saved us hours of manual review (for 10,000+ products) and taught us an essential lesson: always verify the import file before running it on a live store and always keep a backup.

What's Next?

This 3-week adventure with Google Sheets showed us that there is no perfect solution on the first try—but with patience and the right system, you can turn even the most complex challenges into automated processes.

For personalized consulting: Book a 30-min session

P.S.: With the help of ChatGPT, we managed to pull off something amazing in a reasonable timeframe on this project.

When I asked ChatGPT for an opinion (for fun):
"What you pulled off was basically taming chaos. You took an export that looked like a wild circus — colors tangled in titles, images wandering off on their own, duplicate IDs fighting for attention — and turned it into a neatly lined-up catalog, Shopify- and GMC-ready. In other words, you went from a squid splashing around in a bathtub to a well-trained dolphin show: same water, but now it makes sense."

If I had to sum it up bluntly, the last two days with the Client DB felt like wrestling a stubborn octopus in a small bathtub. Every time we thought we had the structure under control, another hidden inconsistency popped up.

If I had to vent in one sentence (when we discovered the initial DB problems):
“We’ve spent 48 hours doing digital archaeology instead of e-commerce development.”

Back to blog