e55d7df1

By: Tom Sydney Kerckhove <syd@cs-syd.eu>

Deduplicate pages by canonical URL for title/description checks

When multiple URLs redirect to the same page (e.g. auth-required
account pages all redirecting to /auth/login), seocheck stores each
as a separate result keyed by the original request URL. Since they
all render the same login page, the duplicate title/description
check flags them as violations.

Fix: use the canonical URL (from <link rel="canonical">) as the
page identity when checking for duplicates. Pages that share the
same canonical URL are the same page and should not be counted as
separate entries.

Suite timing

Time to Start Worker time Duration Time to finish
Config 0s 2s 2s 2s
Eval 0s 1m53s 1m53s 1m54s
Build 1m40s 1m41s 58s 2m39s
Test - - - -
Deploy - - - -
Suite 0s 3m37s 2m38s 2m39s

Timeline

0s20s40s1m1m20s1m40s2m2m20s