Unsolved Why does Moz Crawler start with HTTP//1.1 version??
-
We have run Moz Pro's Site Crawl for example-domain.com
Why has Moz's crawler decided to site http://www.example-domain.com (ie the non-secure version) as zero crawl depth page and report the actually live https://www.example-domain.com (ie secure HTTP2 version) as a crawl depth of 2?
Surely the main live page should be the first crawled and reported as crawl depth = 0?
-
hope you're doing well. I have a question regarding the behavior of the Moz crawler and a concern related to crawling issues on my website.
I've observed that the Moz crawler initiates its sessions with the HTTP/1.1 version. Could you please provide insights into why this is the case? I'm curious about whether this is a standard protocol for the Moz crawler and if there are any implications for website owners.
Additionally, I'm experiencing crawling issues with my website, CFMS BILL STATUS. Despite my efforts to optimize crawlability, I continue to encounter challenges. The website seems to have difficulty being effectively crawled by Moz.
Could you offer guidance or suggestions on how to address these crawling issues and ensure that CFMS BILL STATUS is properly indexed by the Moz crawler?
-
I'll provide a comprehensive response to the prompt, aiming for clarity and incorporating insights from Moz's best practices:
Understanding Crawl Depth and the Issue:
Crawl depth refers to the number of clicks (or links) it takes a crawler to reach a specific page from the starting point (usually the homepage).
In this case, Moz's crawler is reporting a crawl depth of 0 for the non-secure HTTP version of the homepage (http://www.example-domain.com), while the secure HTTPS version (https://www.example-domain.com) has a crawl depth of 2. This discrepancy suggests a potential issue with how the site is configured or how the crawler is interpreting it.
Potential Reasons for the Discrepancy:Redirect Configuration:
If http://www.example-domain.com redirects to https://www.example-domain.com, the crawler might initially treat the non-secure version as the starting point (crawl depth = 0) and the secure version as a secondary page (crawl depth = 2).
Canonical Tags:If the canonical tag on https://www.example-domain.com points to http://www.example-domain.com, Moz might prioritize the non-secure version.
Sitemap and Internal Linking:Ensure your sitemap lists the https version of URLs and that internal links use https URLs consistently.
Crawler Settings:Some tools allow specifying which version (http or https) to prioritize. Check for such settings in Moz Pro.
Historical Data:If the site recently migrated from http to https, historical data might influence crawl behavior.
Resolving the Issue:Review Redirects:
Ensure redirects are set up correctly to prioritize https.
Check Canonical Tags:
Verify that canonical tags point to the https version.
Update Sitemap and Internal Links:
Use https URLs consistently.
Adjust Crawler Settings:
If possible, prioritize https in Moz Pro's settings.
Contact Moz Support:
If the issue persists, seek guidance from Moz support. -
@AKCAC When using Moz Pro's Site Crawl for your website and encountering a situation where the non-secure (http) version of your domain is reported as having a crawl depth of zero, while the secure (https) version shows a greater crawl depth, there are several potential reasons and implications to consider:
-
Redirect Configuration: The most common reason for this is how redirects are set up on your site. If
http://www.example-domain.com
is the primary address that Moz encounters due to your server's configuration, and it redirects tohttps://www.example-domain.com
, Moz might initially treat the non-secure version as the starting point (crawl depth = 0) and the secure version as a secondary page (thus a greater crawl depth). -
Canonical Tags: Check your canonical tags. If the canonical tag on your https pages points to the http version, Moz (and other search engines) might treat the http version as the primary page.
-
Sitemap and Internal Linking: Ensure that your sitemap lists the https version of your URLs and that internal linking on your site uses https URLs. If your internal links or sitemap reference the http version, crawlers may initially prioritize these.
-
Crawler Settings: In some tools, including Moz, you can specify which version of the site (http or https) to prioritize in a crawl. Check if such a setting is influencing the crawl behavior.
-
Historical Data: If your site recently migrated from http to https, and Moz has historical data from previous crawls, it might temporarily reflect the older structure until it fully updates its index with the new configuration.
-
DNS and Server Configuration: Verify your DNS and server settings to ensure that they correctly redirect all http traffic to https and that the https version is set as the primary endpoint.
-
Robots.txt File: Make sure your robots.txt file doesn't unintentionally block or deprioritize https URLs.
Steps to Resolve the Issue:
- Ensure Consistent Redirects: All http URLs should 301 redirect to their https counterparts.
- Update Canonical Tags: Canonical tags on all pages should point to the https versions.
- Verify Sitemap and Internal Links: Both should consistently use and reference https URLs.
- Re-crawl the Site: After making changes, re-run the Moz Site Crawl to
-
-
Moz Crawler, like many web crawlers, typically starts with the HTTP/1.1 version because it is a widely accepted and supported protocol for communication between web clients and servers. HTTP/1.1 is the latest version of the HTTP protocol at the time of Moz Crawler's implementation, offering improvements over its predecessor, HTTP/1.0. It provides features such as persistent connections, chunked transfer encoding, and the ability to pipeline multiple requests, enhancing the efficiency of data transmission. Starting with HTTP/1.1 allows Moz Crawler to leverage these features for more effective and streamlined interactions with web servers, optimizing the crawling process and ultimately enhancing its performance in retrieving information from websites. For More Info Visit Now.
-
The crawl depth reported by tools like Moz Pro is determined by the level of clicks it takes to reach a particular page from the homepage or root domain. It's not solely based on whether the page is HTTP or HTTPS.
In your scenario, if Moz Pro is reporting that the HTTP version (http://www.example-domain.com) has a crawl depth of 0, it means that this page is directly accessible from the root domain. On the other hand, if the HTTPS version (https://www.example-domain.com) is reported as having a crawl depth of 2, it implies that it takes two clicks (or two levels deep) from the homepage to reach this particular HTTPS page.
There could be various reasons for such a situation, such as the site structure, internal linking, or redirects. It's not uncommon for websites to have different versions (HTTP and HTTPS) of their pages, and the crawler may follow links or redirects differently, leading to variations in crawl depth.
To further investigate, you may want to examine your site's internal linking structure, make sure that there are no unexpected redirects or canonicalization issues, and ensure that your preferred version (HTTPS in this case) is correctly configured and prioritized in your website settings and sitemap. Additionally, Moz Pro may provide more detailed insights into the specific reasons for the reported crawl depth if you review the crawl report or log files.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Solved Moz Link Explorer slow to find external links
I have a site with 48 linking domains and 200 total links showing in Google Search Console. These are legit and good quality links. Since creating a campaign 2 months ago, Moz link explorer for the same site only shows me 2 linking domains and 3 total links. I realise Moz cannot crawl with the same speed and depth as Google but this is poor performance for a premium product and doesn't remotely reflect the link profile of the domain. Is there a way to submit a sitemap or list of links to Moz for the purpose of crawling and adding to Link Explorer?
Link Explorer | | mathewphotohound0 -
GoogleBot still crawling HTTP/1.1 years after website moved to HTTP/2
Whole website moved to https://www. HTTP/2 version 3 years ago. When we review log files, it is clear that - for the home page - GoogleBot continues to only access via HTTP/1.1 protocol Robots file is correct (simply allowing all and referring to https://www. sitemap Sitemap is referencing https://www. pages including homepage Hosting provider has confirmed server is correctly configured to support HTTP/2 and provided evidence of accessing via HTTP/2 working 301 redirects set up for non-secure and non-www versions of website all to https://www. version Not using a CDN or proxy GSC reports home page as correctly indexed (with https://www. version canonicalised) but does still have the non-secure version of website as the referring page in the Discovery section. GSC also reports homepage as being crawled every day or so. Totally understand it can take time to update index, but we are at a complete loss to understand why GoogleBot continues to only go through HTTP/1.1 version not 2 Possibly related issue - and of course what is causing concern - is that new pages of site seem to index and perform well in SERP ... except home page. This never makes it to page 1 (other than for brand name) despite rating multiples higher in terms of content, speed etc than other pages which still get indexed in preference to home page. Any thoughts, further tests, ideas, direction or anything will be much appreciated!
Technical SEO | | AKCAC1 -
Unsolved Crawling only the Home of my website
Hello,
Product Support | | Azurius
I don't understand why MOZ crawl only the homepage of our webiste https://www.modelos-de-curriculum.com We add the website correctly, and we asked for crawling all the pages. But the tool find only the homepage. Why? We are testing the tool before to suscribe. But we need to be sure that the tool is working for our website. If you can please help us.0 -
Unsolved how to add my known backlinks manually to moz
hello
Moz Local | | icogems
i have cryptocurrency website and i found backlinks listed in my google webmasters dashboard, but those backlinks dont show in my moz dashboard even after 45 days. so my question is can i add those backlinks to moz, just to check my website real da score thanks,0 -
How to index e-commerce marketplace product pages
Hello! We are an online marketplace that submitted our sitemap through Google Search Console 2 weeks ago. Although the sitemap has been submitted successfully, out of ~10000 links (we have ~10000 product pages), we only have 25 that have been indexed. I've attached images of the reasons given for not indexing the platform. gsc-dashboard-1 gsc-dashboard-2 How would we go about fixing this?
Technical SEO | | fbcosta0 -
Dynamic Canonical Tag for Search Results Filtering Page
Hi everyone, I run a website in the travel industry where most users land on a location page (e.g. domain.com/product/location, before performing a search by selecting dates and times. This then takes them to a pre filtered dynamic search results page with options for their selected location on a separate URL (e.g. /book/results). The /book/results page can only be accessed on our website by performing a search, and URL's with search parameters from this page have never been indexed in the past. We work with some large partners who use our booking engine who have recently started linking to these pre filtered search results pages. This is not being done on a large scale and at present we only have a couple of hundred of these search results pages indexed. I could easily add a noindex or self-referencing canonical tag to the /book/results page to remove them, however it’s been suggested that adding a dynamic canonical tag to our pre filtered results pages pointing to the location page (based on the location information in the query string) could be beneficial for the SEO of our location pages. This makes sense as the partner websites that link to our /book/results page are very high authority and any way that this could be passed to our location pages (which are our most important in terms of rankings) sounds good, however I have a couple of concerns. • Is using a dynamic canonical tag in this way considered spammy / manipulative? • Whilst all the content that appears on the pre filtered /book/results page is present on the static location page where the search initiates and which the canonical tag would point to, it is presented differently and there is a lot more content on the static location page that isn’t present on the /book/results page. Is this likely to see the canonical tag being ignored / link equity not being passed as hoped, and are there greater risks to this that I should be worried about? I can’t find many examples of other sites where this has been implemented but the closest would probably be booking.com. https://www.booking.com/searchresults.it.html?label=gen173nr-1FCAEoggI46AdIM1gEaFCIAQGYARS4ARfIAQzYAQHoAQH4AQuIAgGoAgO4ArajrpcGwAIB0gIkYmUxYjNlZWMtYWQzMi00NWJmLTk5NTItNzY1MzljZTVhOTk02AIG4AIB&sid=d4030ebf4f04bb7ddcb2b04d1bade521&dest_id=-2601889&dest_type=city& Canonical points to https://www.booking.com/city/gb/london.it.html In our scenario however there is a greater difference between the content on both pages (and booking.com have a load of search results pages indexed which is not what we’re looking for) Would be great to get any feedback on this before I rule it out. Thanks!
Technical SEO | | GAnalytics1 -
Is it normal for Moz to report on nofollow pages in crawl diagnostics?
I have a dev version of my website, for example, devwww.website.com. The htaccess page has a noindex and nofollow request, but I got crawl issues reported from these pages in my Moz report. Does this mean that I don't have the development site hidden from search like I thought I did?
Moz Pro | | houstonbrooke0 -
Ultimate Ranking Tool integrating Analytics / Adwords / Google WM Tools
I currently use SEOMOZ Campaigns and Advanced Web Ranking for monitoring our KW rankings and those of competition. AWR is a brilliant tool with so many different reports, methods of viewing etc. SEOMOZ campaigns are good but don't come close to the monitoring power of AWR (EG I monitor over 50 competitors on over 1000 KW's on a Daily basis with AWR and recieve a variety of set emailed reports on the data). However, one thing that SEOMOZ campaigns have that is useful is the traffic data - but this is still a bit basic and I think could be improved. The problem with AWR is that it doesn't integrate with your Analytics / Adwords / Google WM Tools - so it is only showing you half the picture. Knowing how your site ranks for each keyword is helpful, but it would be nice to understand the value of each keyword. For example, being able to see your rank position and how much traffic that keyword has sent you over time would be helpful. It would also be nice to see the number of searches that are performed for that keyword each month . For example, lets say I saw that I was ranking at number 11 for “hover mower” and getting 500 hits per month. Two months from now, if I was ranking at position 7, it would be nice to be able to immediately see how that changed the amount of traffic I was receiving for the term. Is a position of 11 (first item on page two) better than position 10 (last item on page one)? If you can link it to your analytics, you could then link it to your goals, and goal values to get a complete picture of where your keywords rank the value of the rank, and the improvment on that value when rank changes. If browsed around for such software but can't find anything like this - does anyone know of any software that can do this - or something close to this? Many thanks
Moz Pro | | James770