Frequently asked questions
As with all XML files, any data values (including URLs) must use entity escape codes for the following characters: ampersand (&), single quote ('), double quote ("), less than (<), and greater than (>). You should also make sure that all URLs follow the RFC-3986 standard for URIs, the RFC-3987 standard for IRIs, and the XML standard. If you are using a script to generate your URLs, you can generally URL escape them as part of that script. You will still need to entity escape them. For instance, the following python script entity escapes http://www.example.com/view?widget=3&count>2
$ python Python 2.2.2 (#1, Feb 24 2003, 19:13:11) >>> import xml.sax.saxutils >>> xml.sax.saxutils.escape("http://www.example.com/view?widget=3&count>2")
The resulting URL from the example above is:
Yes. Your Sitemap files must use UTF-8 encoding.
Use W3C Datetime encoding for the lastmod timestamps and all other dates and times in this protocol. For example, 2004-09-22T14:12:14+00:00.
This encoding allows you to omit the time portion of the ISO8601 format; for example, 2004-09-22 is also valid. However, if your site changes frequently, you are encouraged to include the time portion so crawlers have more complete information about your site.
For static files, this is the actual file update date. You can use the UNIX date command to get this date:
$ date --iso-8601=seconds -u -r /home/foo/www/bar.php >> 2004-10-26T08:56:39+00:00
For many dynamic URLs, you may be able to easily compute a lastmod date based on when the underlying data was changed or by using some approximation based on periodic updates (if applicable). Using even an approximate date or timestamp can help crawlers avoid crawling URLs that have not changed. This will reduce the bandwidth and CPU requirements for your web servers.
It is strongly recommended that you place your Sitemap at the root directory of your HTML server; that is, place it at http://example.com/sitemap.xml.
In some situations, you may want to produce different Sitemaps for different paths on your site — e.g., if security permissions in your organization compartmentalize write access to different directories.
We assume that if you have the permission to upload http://example.com/path/sitemap.xml, you also have permission to report metadata under http://example.com/path/.
All URLs listed in the Sitemap must reside on the same host as the Sitemap. For instance, if the Sitemap is located at http://www.example.com/sitemap.xml, it can't include URLs from http://subdomain.example.com. If the Sitemap is located at http://www.example.com/myfolder/sitemap.xml, it can't include URLs from http://www.example.com.
Sitemaps should be no larger than 10MB (10,485,760 bytes) and can contain a maximum of 50,000 URLs. These limits help to ensure that your web server does not get bogged down serving very large files. This means that if your site contains more than 50,000 URLs or your Sitemap is bigger than 10MB, you must create multiple Sitemap files and use a Sitemap index file. You should use a Sitemap index file even if you have a small site but plan on growing beyond 50,000 URLs or a file size of 10MB. A Sitemap index file can include up to 1,000 Sitemaps and must not exceed 10MB (10,485,760 bytes). You can also use gzip to compress your Sitemaps.
You can list the URLs that change frequently in a small number of Sitemaps and then use the lastmod tag in your Sitemap index file to identify those Sitemap files. Search engines can then incrementally crawl only the changed Sitemaps.
Once you have created your Sitemap, let search engines know about it by submitting directly to them, pinging them, or adding the Sitemap location to your robots.txt file.
Yes. You need to include the protocol (for instance, http) in your URL. You also need to include a trailing slash in your URL if your web server requires one. For example, http://www.example.com/ is a valid URL for a Sitemap, whereas www.example.com is not.
No. Please list only one version of a URL in your Sitemaps. Including multiple versions of URLs may result in incomplete crawling of your site.
Yes. Including session IDs in URLs may result in incomplete and redundant crawling of your site.
No. The position of a URL in the Sitemap is not likely to impact how it is used or regarded by search engines.
Please include both URLs.
Please use gzip to compress your Sitemaps. Remember, your Sitemap must be no larger than 10MB (10,485,760 bytes), whether compressed or not.
The "priority" hint in your Sitemap only indicates the importance of a particular URL relative to other URLs on your own site and does not imply any effect on the ranking of your pages in search results.
Yes. An XML schema is available for Sitemap files at http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd, and a schema for Sitemap index files is available at http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd. You can also read more about validating your Sitemap.
See the documentation available from each search engine for more details about submission and usage of Sitemaps.
Last Updated: 27 February 2008