Frequently Asked Questions About Robots



To send this page to a friend, click here!



Do I have to submit each page on my site to get them in a database?

It is not necessary to submit multiple pages, as virtually all robots will crawl your site from the page you submit, usually the first or home page, to get all the linked pages in your site. As long as there are legitimate links from the first page to another page that links to another, and eventually to all of your pages, multiple page submission is not needed. In fact it is not wanted by most robots. An exception is if there are multiple entry points to your site. Then you should submit each entry point. Another is that you may submit each page individually for immediate addition with some engines. (Most have a limited page per domain per day limit.) Otherwise, the robot will pick up the other pages when it crawls your site during the normal crawling schedule.

How does the robot follow a link?

Most follow "HREF" links. Most also follow framed pages to index the content of other pages containing frames, but do not create separate index entries for individual frame. Most robots give information to the database that is better from non-framed sites than from framed sites. Tables are a better alternative for indexing weighting.

How often should sites be submitted for indexing?

Generally, most robots only update or index when you request them to do so. It then takes a while, depending on backlog and traffic, before they actually crawl or spider a site. A few update index listings automatically every 30 days or so. For those engines, you do not need to alert them as to updated or outdated pages. To be certain or to expedite the update of your page or site, you may wish to re-submit your site to make sure that the robot indexes it on the next crawl. There is a fine line between abuse and just enough for your requests. The term spamming also applies to website submission processes. Each engine has different guidelines.

If I submit my site to one location, does it become submitted for others as well?

In some cases, yes; in most cases no. Some companies own more than one search engine but use only one database. Some companies own and use multiple databases based on certain parameters. Some search engines read other companies databases. That is especially true with metasearch engines.

How long does it take for a submission to show in the database rankings?

Depending on the timing of the submission, the database owner and the schedule and activity of the robot, it can take anywhere from seconds to several weeks to get listed in an index.

What factors determine the database rankings?

Rankings are based on actual search activity of millions of Internet users. New sites submitted or found during a crawl are listed in the search engine and given an opportunity to be found by searchers. Most engines weight favorably sites that users visit and spend time at for particular search topics. They are then ranked higher than sites that are consistently ignored. Most robots follow legitimacy guidelines they establish themselves. That includes the number of times a word shows up on a page, subliminal suggestions and other sneaky tactics. Generally rankings are based on key words. Some engine search results are based solely on comparing the user's search query to the content of millions of Web pages. Those use no list that matches certain search terms or keywords with special results. Basic factors affecting a page's ranking include the words in the title, keyword meta tags, word frequency in the document, and document length. Some people create pages to maliciously "spoof" search engines. Spoofing a search engine makes search engines return pages that are irrelevant to the search, or pages that rank higher than their content warrants. Common spoofing techniques include the repetition of words, the inserting of meta tags unrelated to the document's content, or the use of words that cannot be read due to their small size or color. If a search robot detects search engine spoofing, it will significantly downgrade a page's ranking or eliminate it from the database.

What kind of pages result in better rankings?

Most robots see static pages best. However, with the increasing popularity of generated pages, some crawl documents regardless of whether they are generated statically or dynamically (on the fly). Becasue of the problems in detecting dynamic pages, most crawlers do, however, avoid some types dynamically generated (and potentially infinite) URL spaces by ignoring links to URLs that contain the following characters or strings: ?, =, @, &, cgi-, CGI-, and Javascript.

Is there a meta keyword description limit?

Because some engines only accept a specific amount of text for each item, you should limit descriptions to about 150 characters and keywords to 75 characters. Some engines allow more and some less. Some will truncate if you are over the limit but some will just ignore the page and possible subsequent pages.

What happens if I make a spelling mistake when I submit my site?

Resubmit the corrected URL information. The corrected information cancels incorrect URLs and invalid E-Mail addresses. In the case of some robots, this will occur during a normal re-indexing of the site at the scheduled interval.

How many pages can I submit at one time?

This borders into the grey areas of standard guidelines. It is best to limit this to a few a day, unless the submitter information tells you otherwise. On the most forgiving site I have seen, you can submit no more than 50 URLs from the same domain in a 24-hour period. If you are working on many different sites with different domains, then you can submit 50 URLs from each one in a 24-hour period. Some allow only one per domain.

Why doesn't my URL get indexed the first time?

There are several reasons submissions fail. The Host server is non-operational during spidering. The URL is submitted without the "http://" prefix to an engine that needs and wants to see it, or with it to one that doesn't want it. The host server contains the robots.txt exclusion tag. Pages containing frames require special formatting. See HTML guidelines on the Internet for information on how to structure your tags. Pages requiring a cookie will not be indexed by most robots. Some engines must be able to resolve the DNS name. If there is a problem resolving the DNS name, use a static IP address. If the URL contains special characters such as ?, =, %, &, that is a problem for some robots. Engines having pages per domain per day limitations often max out. As a result, submissions from large domains with user homepages often experience problems becoming indexed.

Why is my site no longer in a search engine index?

Your site may have dropped out of the system for a variety of technical reasons. It is possible that your server was busy or down at the time that a crawler attempted to reach your site. In that or other cases with the same effect, resubmit the site.

Information about robots
A working example of a no-robot page on our site
Co-operating Sponsors and Technology used on our Website
The above links were last checked on 8/1/2005.

International Copyright Violation
Registered® Trademark™ and Copyright© 1973 - CSG, Computer Support Group, Inc. and CSGNetwork.Com All Rights Reserved