How Search Engine Works – An Introductory Guide
Understanding How Search Engine Works
This chapter is a non-technical and step-by-step discussion of how search engine works. Further chapters will go more detail into the technicalities of the working of search engines.
The main focus of this chapter is to make you understand how search engines collect data from all over the internet and deliver relevant results to users for their search queries. Understanding how search engine works will help you to rank your websites and web pages on the first pages of Google organically.
NOTE: The explanation of working of search engines will be based on Google Search Engine as Google is the widely used search engine in the world. The terms Search Engines and Google will be interchanged based on the context either of which means the same
- How Does Google Search Engine Collect All Information?
- How Do Google Retrieve And Rank Pages?
- What will search engines see in a webpage while Crawling and Indexing?
- What Google cannot see on a web page while Crawling and Indexing?
- What major factors does Google consider for deeming a web page important?
- What are the negative ranking factors considered by Google?
- Important elements on a webpage that needs to be search engine optimized
- What are Semantic Keywords?
We know the fact that, in this digital era, people use search engines as a tool to find answers for their queries. Regardless of the type of queries, search engines do its job of getting the best possible answers in fractions of seconds in the form of Search Engine Result Pages (SERPs) using the process called “crawling” and “indexing”. Understanding “Crawling” and “Indexing” is crucial while learning how search engine works.
In order to give relevant search results to users’ queries, Google has developed an automated algorithmic robot-like tool called Crawler or Spider. Its job is to crawl and discover the entire web pages on the world wide web that is made publicly available to the search engine.
NOTE: The terms “Crawler”, “Spider”, and “Search Engine” will be used interchangeably whenever the context demands whose meaning remains the same
To start with, the spider crawls some trusted sites (also called “seed sites”) and records those particular sites’ information. Based on the internal and external links available on those seed sites it continues to crawl and discover other web pages on the internet. This continues until it discovers as many web pages as possible from the entire internet.
This link structure (got as a result of crawling one page of the web to another) is used by the spider to scan and reach trillions of web pages and documents available on the internet. The core of the search engine working is “Crawling”.
Indexing As Part of Crawling:
During the process of crawling a webpage, the spider will make a note and build an index of terms that it feels significant about that particular page. It starts building a database for those significant terms (also called “Keyword(s)”).
Let us assume that, the spider is crawling website that talks about bodybuilding. During the process, it will make note of all significant keywords related to bodybuilding and indexes it into its database.
For example, one of the indexing keywords in the bodybuilding website could be “tips to build muscle”. This keyword is indexed and mapped (or associated) with the webpage that contains this keyword, so that when a user queries for this particular keyword (i.e., “tips to build muscle”) the mapped page would be considered for user results in Search Engine Result Pages (SERPs).
It’s a huge database of phrases, words etc. from which search engine pulls out results for user search queries. Along with significant terms and keyword(s), search engines also makes note of several other details like – the number of other pages that are linked to this particular page that the spider is crawling, the internal and external links from one page (technically called “anchor text” ) to other pages, mapping of significant terms to web pages in the form of links and so on.
You should be aware of the fact that crawling and indexing is a very complex process and I have just tried to give an overview of it in this chapter. More details will be revealed in further chapters.
In the above paragraphs, we learned how search engine works by gathering information about anything and everything through the process of “Crawling” and “Indexing”.
The answer is – Through two important factors Relevance and Importance.
Google’s ranking algorithm uses relevance as the first factor to determine search results. Google has a massive database of several keywords that includes phrases, significant terms or words and a map that relates those keywords, phrases, terms, and words to links of particular web pages.
When a user keys in a query into the search box (For example: tips to lose weight), Google tries to locate those phrases (tips to lose weight) in its database and returns only those links of pages that contain this phrase in it.
Not necessarily Google should only show those links of pages related to exact keywords in the Search Engine Result Pages (SERPs), but, also it can show those keywords that slightly matches the actual keywords. For example, Google can also bring in results from pages that talk about keywords like “weight loss methods”. Such keywords that match actual keywords are called “semantic keywords” (explained in detail at the end of this chapter).
The relevance of a webpage increases if the page contains keywords, terms, and words relevant to the user’s search query and if links are pointing to it from relevant pages ( from within a site and from outside the site called backlinks).
After Google has collected a bunch of relevant web pages for a particular user query (For example: “tips to lose weight”) from its massive database, which one will it rank higher on the Search Engine Result Pages (SERPs)? It ranks those pages that it feels is important.
A web page is considered important by the search engine depending on how many other trusted web pages or website are linking to yours (usually called “Backlinks” in the SEO world). It might be social media pages or other websites that already have a huge number of visitors and are considered trustworthy by search engines.
Though the above explanation looks simple, the actual process of retrieving the exact web pages as part of user queries is a very challenging task for search engines. Search Engines uses several ranking factors to finally arrive at a conclusion using very complex algorithmic ranking criteria. But, ultimately, it is the combination of relevance and importance that determines the ranking order for web pages on search engines.
During the crawling and indexing process, the search engine can only analyze the raw HTML format of the webpage. To view the HTML format of the webpage, you can use your browser as shown in the figure below. Click on “View Page Source” option.
The below figure shows the HTML source of the webpage.
The HTML source code contains various attributes and tags. Some of the parts of the code that the search engines are mainly interested are as follows.
a. The content part of the HTML Source code. This is the content that the user sees on the website as information.
b. The <title> tag of the webpage. It is the text that shows in the browser’s title bar (above the browser menu and the address bar).
c. The <meta description> tag in the HTML source of the webpage.
The meta description is used as a part of the description of a particular web link in the Search Engine Result Pages (SERPs). See below.
d. The <alt> attribute of the images on the webpage. The <alt> attribute describes the image when the webpage is unable to load the image for various reasons.
e. The six header tags like <h1>, <h2>, <h3>, <h4>, <h5>, <h6>
Google Search Engine does web page analysis by looking for Keyword(s) or search terms in the above sections of the HTML source code. If you want your page to rank high on google, you should make sure your intended keyword(s) for optimization should appear in all the five above mentioned tags cleverly.
Apart from the above-mentioned tags, there is another tag called <noscript> tag that can be read by Google. The <noscript> tag on a webpage defines an alternate content for users when using a browser that doesn’t support script.
Though Google Search Engine is brilliant enough to crawl and index trillions of web pages on the internet, it only captures information from the text. It cannot understand or analyze the contents of the following:
- Contents of Images
- Flash Files
- Audio Data
- Video Data
- Program code
a. Quality Content:
Google search engine uses the following parameters to consider the Web Page as having low-quality content:
- Heavy grammatical mistakes
- Poor sentence formation
- Long sentences without any punctuation marks
- Long paragraphs
Google search engine always ranks high for websites and web pages that is always fresh in content along with good readability.
b. User Engagement On Site:
Google search engine uses several ways to figure out the user experience on a particular website. Among all, the following parameters are crucial for Google.
- Bounce Rate: The percentage of visitors who visit only one page on your website.
- Time on Site: Maximum number of readers spending less time on site indicates your web page has nothing interesting for the reader to offer – a reason for Google to demote your web page in google rankings.
- Page Views by Visitors: The average number of pages viewed per visitor on your website.
c. Link Analysis (technically called “Anchor Text”):
Google search engine analyses each and every link going out of your web page (outbound links) as well as the links pointing to your site from other websites (inbound links). Also, links within your website are analyzed by Google as a ranking factor.
Google checks who is linking to your website or web page and what are they saying about your website or web page. Your website’s relevance and importance grow with Google if you have links from authoritative and trusted websites. A link from an authoritative, relevant and trusted website is more powerful than multiple links from non-authoritative and non-relevant websites.
Google search engine also checks and analyses if your outbound links are taking readers to relevant and trusted other websites. The wording in the anchor text or links is also considered as a ranking factor for your website or web page by Google.
d. Social Media Signals:
As more and more people are spending time on Social Media like Facebook, Twitter, Google Plus, etc., it is being used by the search engines to rank websites and web pages. Google ranks certain web pages and websites high when it can notice the popularity of those sites or pages through the number of shares. Google search engine understands that people share contents when they value it.
e. Site Architecture:
The website architecture is considered as an important factor by Google’s algorithm while displaying search results. Most importantly, the way search engine spider navigates through your website is dependent on the way internal links are placed on your website. Links help search engine spider to navigate your website by discovering internal pages quickly and accurately.
e. Page Load Time:
Search Engines assigns more weight to those websites that load faster. A website with quick response time has more chances of climbing the search engine rankings and thus outperforming its competitors.
Use this tool to check your page speed insights:
There are many spammers who try to rank high in Google Search Engine Result Pages (SERPs) through unscrupulous ways. Those ways are considered negative ranking factors by Google and if a website or web page is found following such practices will be penalized by Google. Some of the factors that result in lowering your page rankings are mentioned below:
- The website owner trying to get inbound links artificially, resulting in low-quality inbound links.
- Google penalizes websites that have been found installing malware
- Showing one thing to the user and making attempts to show something else to search engine ( Also called as “cloaking” )
- Pages on sites for links for sale will be immediately penalized by google
- Low page speed: This results in users getting out of your site even before the page loads, annoying the visitor which is against google customer satisfaction policy. Low page speed also annoys the search engine spider as it becomes difficult for it to crawl your entire site especially if your site contains thousands of pages.
The following list of 19 elements on your webpage should be cleverly optimized for search engines in order to rank your web pages high on search engine result pages.
- Title tag
- Meta Description tag
- Meta Keywords tag (Less prominent these days)
- Header tags (H1 to H6)
- Textual content
- Alt attributes on images
- Fully qualified links
- HMTL sitemap
- XML sitemap
- Textual navigation
- Canonical elements
- Structured data markup
- URL structures of the webpage
- Placing of JavaScrip/CSS files
- Ordered and unordered lists
- Robot.txt file
- Back Links
- Image names
- Dedicated IP address
- Website with security certificate installed (HTTPS)
All the above elements will be explained in detail in further chapters.
Google is constantly researching ways to make the user experience as satisfactory as possible. For this reason, the search engine is inching closer to understand human behavior when it comes to searching on the web.
It is trying to act like a best friend who understands your mind and wants to help you with all the possible options during your search. Semantic Keywords are those keywords which in one way or the other are related to the actual keyword that the user queries onto Google’s query box.
Google suggests these keywords to give the user better search experience. Google search engine uses its semantic indexing capabilities to pull results from related search engine result pages out of their massive database that they believe are substantially related. Most of the times the semantic keywords would be in the form of questions that the user is trying to find an answer.
For example, if the actual keyword is “tips to lose weight”, then, the semantic keywords suggested by Google will be as shown in the picture below:
Congratulations! You are done with the third chapter on “How Search Engine Works”. Hope you enjoyed the reading.
All the best for your next chapter on “ 11 Important Google Algorithm Updates Every SEO Professional Should Know”. In the next chapter, you will learn how the Google search engine makes regular updates to its ranking algorithm in order to present relevant results to search users. These updates are intended to rank high-quality websites to the top of the SERPs.
Feel free to comment below if this blog post was useful or not. If yes, please do me a favor by sharing it with others who might benefit.
Interested In Full Time Digital Marketing Course?
Feel free to check out the modules covered in DIGITAL MARKETING TRAINING IN BANGALOREInterested In SEO Course?
Feel free to check out the modules covered in SEO TRAINING IN BANGALORE
Subhash.K.U is a Professional Programmer turned Digital Marketing Enthusiast. He is the most sought marketing consultants for small and medium scale businesses. He founded Subhash Digital Academy to teach professional digital marketing skills to students, entrepreneurs, and working professionals. He holds a Bachelor’s degree in Electrical Engineering and is an Oracle Certified Programmer. He also holds certificates of Google AdWords, Facebook Blueprint and Hubspot Marketing. He is the co-author of the best selling book – Cracking The C, C++ and Java Interview published by McGraw Hill. He is now penning another book on the subject of marketing and entrepreneurship.