In this article we will talk about the importance of sitemap in the following aspects:
- What is a sitemap?
- Why do we need it?
- When do we need it?
- How to create a good sitemap?
Once you understand these 4 questions, you will have an idea of how a good sitemap be like, and be able to create one yourself.
What is a sitemap?
Well, this is a simple question, sitemap is a file where you would put all your pages into. But in reality, we usually don’t put all our pages’ url inside, this is because some of the pages in our website is not for public, or we want to control the crawl budget.
So in general, you should only put all your pages inside a sitemap if you want them to appear on the public internet.
Why do we need sitemaps?
A sitemap can help crawlers to find your pages more conveniently and directly. This is very important, and relevant to crawl budget, especially for large websites.
To better understand the necessity of the sitemap, let’s imagine a scenario.
A wallpaper website just released 40 new wallpaper detail pages, these 40 urls are totally new and unknown to crawlers if they have no other pages link to them (internal or external links). And here is how the sitemap plays a role, the website owner put these 40 new pages’ url inside a sitemap file, and crawlers have already known it, they visit the sitemap periodic and discover the new 40 pages’ url.
Therefore, sitemap is a place where crawlers from Google, Bing, OpenAI and other search engines will check frequently and crawl pages discovered in it.
When do we need a sitemap?
According to Google documents about sitemap, you might need a sitemap in the following occasions:
- Your site has a lot of pages.
- Your site is a new site.
- Your site has a lot of rich media content.
Still have no idea? No worries, I will go through them with you one by one, except the last one.
How large does a website need to be before a sitemap is required?
Technically, there is no specific number here. Because some websites have millions of pages but their pages still get well connected, which means there are no orphan pages (no links from other pages point to it). But this could be hard to do, especially when you have too many pages and get no professional tools for assistance.
Rule of thumb: you should build a sitemap when you have more than 300 pages.
If your website does not have a good internal link system, you should always build a sitemap for your website.
Why does a new site need sitemaps?
New websites are usually unknown to search engines unless you tell them.
There are only two ways search engines know your existence.
- There are other websites that have pages which have your website links on them.
- You add your website on Google Search Console or other search engines' webmaster tool.
In the first situation, crawlers follow the external links that point to one of your website page, and keep crawling the links found on your page.
In the second situation, the search engines know you since you submit your website to them, so they will send crawlers to your website to crawl all pages they see.
How to create a good sitemap?
Finally, we are about to get our hand dirty. But before we begin to create a sitemap, it is always a good idea that you have a full map of your website on mind, you need to know the complete URL path structure of the website and the content or function of each path.
For example, this is the home page url of SERPary, and this is the page of all SEO tools on SERPary, sitemap url counter & robots.txt tester are the two tool pages under the /tools path, now you might have an idea that all pages under /tools path are a specific SEO tool.
So with a rough map on mind, you can build a sitemap system that contains all pages’ url you want to get indexed.
Now let’s begin to build a sitemap, below are the exact steps you can follow.
Decide which URLs should be included
A common mistake is trying to put everything into a sitemap. That’s not the right approach.
You should only include pages that:
- Return a 200 status code
- Are indexable (not blocked by noindex meta tag)
- Have actual SEO value
And you should exclude:
- Login or admin pages (commonly see in SaaS websites)
- Parameter URLs (e.g. ?sort=, ?filter=)
- Duplicate pages
Again, only include the pages you wish to make public.
Structure your sitemap properly
When your site grows, using only one sitemap file will quickly become messy.
Instead, you should organize it like this:
- sitemap_index.xml (a
<sitemapindex>file)- sitemap-pages.xml
- sitemap-tools.xml
- sitemap-blog.xml
Below is a sitemapindex file be like, it is a Sitemap file that contains multiple sitemap inside.
<?xml
version='1.0' encoding='utf-8'?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap_general.xml</loc>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap_tools.xml</loc>
</sitemap>
</sitemapindex>
This makes your sitemap:
- Easier to manage
- Easier to debug
- Easier to scale
However, you need to be aware that you cannot nest sitemapindex within sitemapindex. Otherwise, crawlers like Google cannot process it.
Keep each sitemap within limits
Search engines have clear limits for sitemap files:
- Maximum 50,000 URLs
- Maximum 50MB (uncompressed)
Even if you don’t hit the limit, it’s better to:
- Keep each file around 10k–30k URLs
- Split early rather than too late
This will save you a lot of trouble in the future.
Use important fields correctly
A sitemap URL entry usually looks like this:
<loc>(required)<lastmod>(recommended)<changefreq>(optional, no longer used by Google)<priority>(optional, no longer used by Google)
In practice:
<lastmod>is the most important one, it tells crawlers when its last modified was made<changefreq>and<priority>are mostly ignored
Make sure your lastmod reflects the real update time. Don’t fake it.
Only include canonical URLs
Each page should only appear once in your sitemap.
For example, avoid including:
- /page
- /page/
- /page?ref=xxx
Instead, only include the canonical version.
This helps:
- Avoid duplicate crawling
- Prevent SEO signal dilution
Automate the generation process
Manually updating a sitemap is not realistic in the long run.
A better approach is:
- Generate from your database
- Use scheduled tasks (cron jobs)
- Or dynamically generate via backend
For most modern websites, automation is not optional, it’s required.
Submit your sitemap to search engines
Creating a sitemap is not enough, you also need to tell search engines about it.
You should:
- Submit it via Google Search Console
- Submit it via Bing Webmaster Tools
And also add this line in your robots.txt file to ensures crawlers can always find it.
Monitor and improve over time
A sitemap is not a “set and forget” task.
You should regularly check:
- Submitted vs indexed pages
- Crawl status
- Coverage issues
If pages are not getting indexed:
- Check content quality
- Check internal linking
- Check if you included low-value pages
Not sure how many urls you have on your website? Try our sitemap url counter for free to count the number of urls in your sitemap.
Don’t rely on sitemap alone
One important thing to understand:
A sitemap helps discovery, but it does not replace internal links.
Search engines mainly rely on:
- Internal linking structure
- Page importance signals
So the right approach is:
- Use internal links as the foundation
- Use sitemap as a support
Once you understand all these steps, you will realize that a good sitemap is not just about generating a file, but about making clear decisions on what deserves to be indexed.