Learn what search engines do and how they work. Easy step-by-step tutorial with a video for beginners.
If you are a developer, designer, small business owner, marketing professional, website owner, or thinking of creating a personal blog or website for your business, then you need to understand how search engines work.
Having a clear understanding of how search works, can help you create a website that search engines can understand, and this has a number of added benefits.
It’s the first step you need to take before even dealing with Search Engine Optimization (SEO) or any other SEM (Search Engine Marketing) tasks.
In this guide, you’ll learn the three main processes (crawling, indexing, and ranking) that search engines follow to find, organize, and present information to users.
What does a Search Engine Do?
Have you ever wondered how many times per day you use Google or any other search engine to search the web?
Is it 5 times, 10 times or even sometimes more? Did you know that Google alone handles more than 2 trillion searches per year?
The numbers are huge. Search engines have become part of our daily life. We use them as a learning tool, a shopping tool, for fun and leisure but also for business.
It’s not an exaggeration to say that we reached a point that we depend on search engines for almost anything we do.
And the reason this is happening is very simple. We know that search engines and in particular, Google has answers to all our questions and queries.
What happens though when you type a query and click search? How do search engines work internally and how do they decide what to show in the search results and in what order?
How Do Search Engines Work
Search engines are complex computer programs.
Before they even allow you to type a query and search the web, they have to do a lot of preparation work so that when you click “Search”, you are presented with a set of precise and quality results that answer your question or query.
What does ‘preparation work’ includes? Three main stages. The first stage is the process of discovering the information, the second stage is organizing the information, and the third stage is ranking.
This is generally known in the Internet World as Crawling, Indexing, and ranking.
Step 1: Crawling
Search engines have a number of computer programs called web crawlers (thus the word Crawling), that are responsible for finding information that is publicly available on the Internet.
To simplify a complicated process, it’s enough for you to know that the job of these software crawlers (also known as search engine spiders), is to scan the Internet and find the servers (also known as webservers) hosting websites.
They create a list of all the web servers to crawl, the number of websites hosted by each server, and then start work.
They visit each website and by using different techniques, they try to find out how many pages they have, whether it is text content, images, videos, or any other format (CSS, HTML, javascript, etc).
When visiting a website, besides taking note of the number of pages they also follow any links (either pointing to pages within the site or to external websites), and thus they discover more and more pages.
They do this continuously and they also keep track of changes made to a website so that they know when new pages are added or deleted, when links are updated, etc.
If you take into account that there are more than 130 trillion individual pages on the Internet today and on average thousands of new pages are published on a daily basis, you can imagine that this is a lot of work.
Why care about the crawling process?
Your first concern when optimizing your website for search engines is to ensure that they can access it correctly otherwise if they cannot ‘read’ your website, you shouldn’t expect much in terms of high rankings or search engine traffic.
As explained above, crawlers have a lot of work to do and you should try and make their job easier.
There are a number of things to do to make sure that crawlers can discover and access your website in the fastest possible way without problems.
- Use Robots.txt to specify which pages of your website you don’t want crawlers to access. For example, pages like your admin or backend pages and other pages you don’t want to be publicly available on the Internet.
- Big search engines like Google and Bing, have tools (aka Webmaster tools), you can use to give them more information about your website (number of pages, structure, etc) so that they don’t have to find it themselves.
- Use an XML sitemap to list all important pages of your website so that the crawlers can know which pages to monitor for changes and which to ignore.
Step 2: Indexing
Crawling alone is not enough to build a search engine.
Information identified by the crawlers needs to be organized, sorted and stored so that it can be processed by the search engine algorithms before being made available to the end-user.
This process is called Indexing.
Search engines don’t store all the information found on a page in their index but they keep things like: when it was created/updated, title and description of the page, type of content, associated keywords, incoming and outgoing links, and a lot of other parameters that are needed by their algorithms.
Google likes to describe its index as the back of a book (a really big book).
Why care about the indexing process?
It’s very simple, if your website is not in their index, it will not appear for any searches.
This also implies that the more pages you have in the search engine indexes, the more your chances of appearing in the search results when someone types a query.
Notice that I mentioned the word ‘appear in the search results’, which means in any position and not necessarily on the top positions or pages.
In order to appear in the first 5 positions of the SERPs (search engine results pages), you have to optimize your website for search engines using a process called Search Engine Optimization, or SEO in short.
How to find how many pages of your website are included in the Google index?
There are two ways to do that.
Open Google and use the site operator followed by your domain name. For example site:reliablesoft.net. You will find out how many pages related to the particular domain are included in the Google Index.
The second way is to create a free Google Search Console account and add your website.
Then look at the Coverage report and in particular the VALID AND INDEXED pages.
Step 3: Ranking
Search Engine Ranking Algorithms
The third and final step in the process is for search engines to decide which pages to show in the SERPS and in what order when someone types a query.
This is achieved through the use of search engine ranking algorithms.
In simple terms, these are pieces of software that have a number of rules that analyze what the user is looking for and what information to return.
These rules and decisions are made based on what information is available in their index.
How do search engine algorithms work?
Over the years search engine ranking algorithms have evolved and become really complex.
At the beginning (think 2001) it was as simple as matching the user’s query with the title of the page but this is no longer the case.
Google’s ranking algorithm takes into account more than 255 rules before making a decision and nobody knows for sure what these rules are.
And this includes Larry Page and Sergey Brin (Google’s founders), who created the original algorithm.
Things have changed a lot and now machine learning and computer programs are responsible for making decisions based on a number of parameters that are outside the boundaries of the content found on a web page.
To make it easier to understand, here is a simplified process of how search engine ranking factors work:
Step 1: Analyze User Query
The first step is for search engines to understand what kind of information the user is looking for.
To do that, they analyze the user’s query (search terms) by breaking it down into a number of meaningful keywords.
A keyword is a word that has a specific meaning and purpose.
For example, when you type “How to make a chocolate cake”, search engines know from the words how-to that you are looking for instructions on how to make a chocolate cake and thus the returned results will contain cooking websites with recipes.
If you search for “Buy refurbished ….”, they know from the words buy and refurbished that you are looking to buy something and the returned results will include eCommerce websites and online shops.
Machine learning has helped them associate related keywords together. For example, they know that the meaning of this query “how to change a light bulb” is the same as this “how to replace a light bulb”.
They are also clever enough to interpret spelling mistakes, understand plurals and in general extract the meaning of a query from natural language (either written or verbal in case of Voice search).
Step 2: Finding matching pages
The second step is to look into their index and decide which pages can provide the best answer for a given query.
This is a very important stage in the whole process for both search engines and web owners.
Search engines need to return the best possible results in the fastest possible way so that they keep their users happy and web owners want their websites to be picked up so that they get traffic and visits.
This is also the stage where good SEO techniques can influence the decision made by the algorithms.
To give you an idea of how matching works, these are the most important factors:
Title and content relevancy – how relevant is the title and content of the page with the user query?
Type of content – if the user is asking for images, the returned results will contain images and not text.
Quality of the content – content needs to be thorough, useful and informative, unbiased, and cover both sites of a story.
Quality of the website – The overall quality of a website matters. Google will not show pages from websites that don’t meet their quality standards.
Date of publication – For news-related queries, Google wants to show the latest results so the date of publication is also taken into account.
The popularity of a page – This doesn’t have to do with how much traffic a website has but how other websites perceive the particular page.
A page that has a lot of references (backlinks), from other websites is considered to be more popular than other pages with no links and thus has more chances in getting picked up by the algorithms. This process is also known as Off-Page SEO.
Language of the page – Users are served pages in their language and it’s not always English.
Webpage Speed – Websites that load fast (think 2-3 seconds) have a small advantage compared to websites that are slow to load.
Device Type – Users searching on mobile are served mobile-friendly pages.
Location – Users searching for results in their area i.e. “Italian restaurants in Ohio” will be shown results related to their location.
That’s just the tip of the iceberg. As mentioned before, Google uses more than 255 factors in its algorithms to ensure that its users are happy with the results they get.
Why care how search engine ranking algorithms work?
In order to get traffic from search engines, your website needs to appear in the top positions on the first page of the results.
It is statistically proven that the majority of users click one of the top 5 results (both desktop and mobile).
Appearing on the second or third page of the results will not get you any traffic at all.
Traffic is just one of the benefits of SEO, once you get to the top positions for keywords that make sense for your business, the added benefits are much more.
Knowing how search engines work can help you adjust your website and increase your rankings and traffic.
Conclusion
Search engines have become very complex computer programs. Their interface may be simple but the way they work and make decisions is far from simple.
The process starts with crawling and indexing. During this phase, the search engine crawlers gather as much information as possible for all the websites that are publicly available on the Internet.
They discover, process, sort, and store this information in a format that can be used by search engine algorithms to make a decision and return the best possible results back to the user.
The amount of data they have to digest is enormous and the process is completely automated. Human intervention is only done in the process of designing the rules to be used by the various algorithms but even this step is gradually being replaced by computers through the help of artificial intelligence.
As a webmaster, your job is to make their crawling and indexing job easier by creating websites that have a simple and straightforward structure.
Once they can “read” your website without issues, you then need to ensure that you give them the right signals to help their search ranking algorithms, and pick your website when a user types a relevant query (that’s SEO).