Case Study
Zyft: Snapshot Testing for XPath & Metadata Logic that Needs to Work across 40k+ Domains
The Zyft Chrome and Safari extension needs to reliably detemine if web pages from over 40,000 retailers contain a valid product. This is done by examining the page's metadata. For domains that don't contain useful metadata, Zyft uses pre-defined XPaths.
Challenges:
- It's wild how differently every web page's metadata is named and structured.
- A small change to improve one domain may negatively affect thousands of others.
- XPaths are brittle and can break when the page's HTML changes.
- You could never write enough unit tests to confidently make changes to the logic.
Solution:
I wrote a script that takes a list of domains, saves the HTML's, runs the existing metadata & xpath logic, and saves the results. Running the tests to see if the new results match the previous allows me to safely make changes.
Every time we stumble across a problematic new domain, I can run a script to add it, then test it individually or all domains at once.
Result:
Since implementing snapshot testing, we've been able to improve the metadata to the point where we rarely need to rely on XPaths.