reprinted from the free blog, Shanghai dragon贵族宝贝/1193.html
error of 2.Disallow command and the Allow command not to slash "/" at the beginning.
write robots.txt and introduce rules of misunderstanding and noble love Shanghai baby robots tools, introduces several mistakes in robots.txt file rule, and found it not only that, there are a lot of people do not pay attention to the mistakes.
should be Disallow: /123.html, so as to shield the /123.html file, this rule can take effect.
error 1.Allow contains more than Disallow, and the position in front of the Disallow.
want all spiders to crawl all pages outside the station in /mulu/ following URL, all URL ban grab below /mulu/.
address: 贵族宝贝libo After the
for example, there are a lot of people do:
today to enumerate the misunderstanding of the actual use of the common robots.txt:
Use a few days before the
error 3. shielding directory without slash "/".
is also very easy to be ignored, as it is written in Disallow: 123.html, this is no effect. Slash "/" represents the file location for the root directory, if no, nature cannot judge which is URL, so wrong.
it is right, if you only want to shield the directory, you must in the end take slash "/".
(Disallow: /mulu) wrote the first really can all URL shield /mulu/ below, but also blocked all pages similar to the /mulu123.html at the beginning of /mulu, equivalent to Disallow: /mulu*.
search engine spider rules from top to bottom, which will cause second command failure. That means no grab /mulu/ directory this rule is invalid. The right is Disallow should be on top of Allow, which will take effect.
is like /mulu/ shielding grab this directory, some people write Disallow: /mulu. So this is correct?