What is Robot.txt?
Robots.txt is a special text format that is not HTML or any other type. It gives webmasters more flexibility in giving or without bots of search engine (SE) indexing an area of your website.
When using robots.txt files, you need to be careful. Because if corrected wrongly, all SEO results will flow.
If your project is small and you are not sure what you are doing, it is best not to use a robots.txt file. Let things be just like that. Quang's blog also doesn't use robots.txt files.
However, for large projects, especially e - comerce, the use of the robot.txt file is almost mandatory. The robots.txt file helps Google index your website more effectively, preventing backlinks from scanning, as well as limiting duplicate content that is very common when SEO for the e-comerce field.
- The Smart Website Content Choice You Should Know
- What Is Traffic Website And What You Need To Know
- Top 4 Website Backlink Test Tool
Advantages when using Robot.txt
Prevent bugs during the system setup process
In the process of website design (interface design, plugin installation, website structure building), things are still very messy. You should block Google bugs, so that it doesn't index the incomplete content that you don't want.
A sitemap is like a map for Google to discover your site. If the number of indexes of the website is too large and the website does not have a sitemap, Google bugs may not have enough resources (crawl budget) to scan your website. From there, Google may not be able to index some important content.
A website can have more than one sitemap (eg article sitemap, image sitemap, news sitemap ...). You should use a software to create a sitemap for the website, and then declare the sitemap links in the robots.txt file.
Prevent bugs check backlink
Currently in Vietnam, the three most popular backlink check tools are Ahrefs, Majestic and Moz. Their bugs are named AhrefsBot (Ahrefs), mj12bot (Majestic) and rogerbot (Moz), respectively.
To prevent opponents from using tools to analyze your backlinks, you can block their bugs in robots.txt files.
Prevent harmful bugs
In addition to the bug check backlink, there are some other types of harmful bugs.
For example, Amazon, the giant of the world e-commerce industry, must block a bug called EtaoSpider.
Block sensitive folders
Website source code, usually with sensitive directories, such as wp-admin, wp-includes, phpinfo.php, cgi-bin, memcache….
You should not let the bug search index index this content, because then, their content will be public on the internet. Hackers can get information from them, to attack your system.
Block bugs in e-commerce
In e-commerce, there are some unique features for users such as:
- Sign up for an account
- Log in to your account
- Transaction history
- User interest (wishlist)
- Internal search bar
- Compare prices (price)
- Sort attributes (high to low prices, bestsellers, A & B characters….)
- Filter properties (manufacturer, color, price, capacity ...)
- Products no longer sold (comes with 301 redirects)
Those functions are indispensable for users, but often create duplicate content in SEO, and do not have any relevant content to support keyword SEO. Therefore, you can block indexing of these paths the robots.txt file.
In the file robot.txt, you use * (replace any string of characters) and $ (file format, such as .doc, .pdt, .ppt, .swf ..., used at the end of a sentence) to block the corresponding file.
Disadvantages when using
When using the robots.txt file, be careful. Because if corrected wrongly, all SEO results will flow.
How it works
Crawl-Delay: This parameter determines how long (in seconds) bots must wait before moving on to the next section. This will be useful to prevent arbitrary search engine load servers.
#: is used before the lines to comment.
The robots.txt works by identifying a user-agent and a command for this user-agent.
The parameters are in robots.txt file
Disallow: is the area that you want to localize without search engine access.
User-agent: Declare the name of the search engine you want to control, for example: Googlebot, Yahoo! Slurp
Note when using robot.txt
- To be found by bots, robots.txt files must be placed in the top-level directories of the site.
- txt is case sensitive. So the file must be named robots.txt. (not Robots.txt or robots.TXT, ...)
- Do not put / wp-content / themes / or / wp-content / plugins / in the Disallow section. That will prevent search engines from correctly seeing the look of your blog or website.
- Some user-agents may choose to bypass your standard robots.txt files. This is quite common with nefarious user-agents such as:
- Malware robots (bots of malicious code)
- Scraping processes (the process of gathering information on your own) email addresses
- Robots.txt files are usually available and made public on the web. You only need to add /robots.txt to the end of any root domain to see the site's directives.
This means that anyone can see the pages you want or don't want to crawl. So do not use these files to hide the user's personal information.
Each subdomain on a root domain will use separate wordpress txt files. This means that both blog.example.com and example.com should have their own robots.txt files. (blog.example.com/robots.txt and example.com/robots.txt). In short, this is considered to be the best way to indicate the location of any sitemaps associated with the domain at the end of the robots.txt file.
➡ What is marketing? Will it affect branding?
What is The post Robots.txt? appeared first on SEO COMPANY WEBSITE PROFESSIONAL SEO SERVICE IMK.