There are three broad categories that cover most web search queries: informational, navigational, and transactional. These are also called "do, know, go." Although this model of searching was not theoretically derived, the classification has been empirically validated with actual search engine queries.
Informational queries – Queries that cover a broad topic for which there may be thousands of relevant results.
Navigational queries – Queries that seek a single website or web page of a single entity.
Transactional queries – Queries that reflect the intent of the user to perform a particular action, like purchasing a car or downloading a screen saver.
Search engines often support a fourth type of query that is used far less frequently:
Connectivity queries – Queries that report on the connectivity of the indexed web graph.
Characteristics
Most commercial web search engines do not disclose their search logs, so information about what users are searching for on the Web is difficult to come by. Nevertheless, research studies appeared in 1998. Later, a study in 2001 analyzed the queries from the Excite search engine showed some interesting characteristics of web search:
The average length of a search query was 2.4 terms.
About half of the users entered a single query while a little less than a third of users entered three or more unique queries.
Close to half of the users examined only the first one or two pages of results.
Less than 5% of users used advanced search features.
The top four most frequently used terms were, , and, of, and sex.
A study of the same Excite query logs revealed that 19% of the queries contained a geographic term. Studies also show that, in addition to short queries, there are also predictable patterns to how users change their queries. A 2005 study of Yahoo's query logs revealed 33% of the queries from the same user were repeat queries and that 87% of the time the user would click on the same result. This suggests that many users use repeat queries to revisit or re-find information. This analysis is confirmed by a Bing search engineblog post telling about 30% queries are navigational queries In addition, much research has shown that query term frequency distributions conform to the power law, or long tail'' distribution curves. That is, a small portion of the terms observed in a large query log are used most often, while the remaining terms are used less often individually. This example of the Pareto principle allows search engines to employ optimization techniques such as index or database partitioning, caching and pre-fetching. In addition, studies have been conducted on discovering linguistically-oriented attributes that can recognize if a web query is navigational, informational or transactional. But in a recent study in 2011 it was found that the average length of queries has grown steadily over time and average length of non-English languages queries had increased more than English queries. Google has implemented the hummingbird update in August 2013 to handle longer search queries since more searches are conversational. For longer queries, Natural language processing helps, since parse trees of queries can be matched with that of answers and their snippets. For multi-sentence queries where keywords statistics and Tf–idf is not very helpful, Parse thicket technique comes into play to structurally represent complex questions and answers.
Structured queries
With search engines that support Boolean operators and parentheses, a technique traditionally used by librarians can be applied. A user who is looking for documents that cover several topics or facets may want to describe each of them by a disjunction of characteristic words, such as vehicles OR cars OR automobiles. A faceted query is a conjunction of such facets; e.g. a query such as AND is likely to find documents about electronic voting even if they omit one of the words "electronic" and "voting", or even both.