ABIS Infor - 2015-03
Windows 7 and 8 search trouble
Abstract
Windows Explorer allows us to search the hard disk for files, based on part of the file name, or a keyword in the content of the file. Very useful and also very easy to use, but... do I actually always find what I searched for? A short excursion in the Windows search landscape!
A simple search action... or not?
When using the search function of the Explorer in Windows 7 or Windows 8, you possibly encounter some surprises. At least, if you know which files should appear, and you observe that the Explorer did not find some of these! Or if you were used to the old Search Assistant (the little dog) of Windows XP.
Just try the following: navigate with your Explorer to the library "Documents", create in there a new text file named Abc.txt, and enter the following text into that file: "Here is one of the ABIS courses."
In the search field in the right hand corner, enter the two letters ab. After at most a few seconds the file appears in the list of found files (which possibly contains some other files, of course). It's also visually clear why: the two letters "Ab" in the name, as well as the "AB" in the file content, are explicitly marked in yellow by the Explorer.
Now rename the file to "Text-Abc.txt.gif". the file is still found (because the name still contains "Ab"), but it's no longer found because of its content: the extension ".gif" indicates a binary file (although this is not the case here), and this kind of files are in principle never scanned on their content by the search functionality of the Explorer.
And a surprise ...
Now rename the file to "TextAbc.txt.gif". Surprise: the file is no longer found! So not only must the name contain the characters "ab"; there should actually be a "word" in its name starting with "ab" (case-insensitive, of course). The same holds also for the content: only words starting with "ab" are found. Verify this for yourself, e.g. by searching for "BIS" in text files containing the word "ABIS".
Restore the extension .txt for the file and you notice it will be found again, but now only because of its content. Also extensions .asc, .csv and a few more "text" extensions will work. Just try this! Finally rename to TextAbc.txt.
There could be other reasons, except for the extension, why a file is not found but still contains a certain word. Right-click the file TextAbc.txt, choose "Properties", click on "Advanced" under Attributes, and uncheck "Indexing" (second check box). Now the file will no longer be found: apparently only files with (1) the correct extension, and moreover (2) having the attribute "Indexed" will be eligible for content-based search!
Notice that also all subdirectories (and everything further down) will be scanned, from the point where the search action was started: So navigating to C:\ and entering a search term there will start scanning your complete (first) hard disk.
More search possibilities
The search field of the Windows File Explorer provides some additional possibilities. For example, searching for name:Text will only find files based on their name, not their content. And since the file TextAbc.txt reappeared: re-enable its indexing attribute so that we will be able again to search it based on its file content.
A second, similar possibility is a search term of the following form: content:ab which limits the search to content only. content:BIS is still not found, but content:IS will of course be: not because of the word "ABIS" but because of the word "is".
Search terms can easily be combined, by just specifying all of them in the search field, separated by a blank. Our file will e.g. be found with the combined search term course name:txt content:ab
Indeed: the name actually contains a word starting with "txt", viz. the extension! Also notice that "combining" means "reducing": only when all search terms are satisfied, a file will be shown. In order to search for either one or another search term, the word "OR" should be used, e.g. content:abis OR name:doc
Notice that the search field does not understand any wildcards like e.g. a star: actually, all characters except for letters and digits will be ignored in the search box. For example, if you enter the search term "-" (a dash): not only files having a dash in their names but really all files will be shown! Indeed, after ignoring the punctuation, "nothing" is left over, so all files with a name starting with the empty text will be shown...
Actually, punctuation is not really ignored, but can be used to enter a pair of search terms, referring to two subsequent words. For example, the combined search term content:course content:abis asks for files containing "somewhere" these two words; but the search term content:abis-course searches for a word starting by "abis", immediately followed by a word starting by "course".
Apart from "content" and "name", a dozen of other search prefixes exist; try experimenting with e.g. size:>20 or size:<=20 or datemodified:today or type:text or attributes:16 (meaning: "is directory") ...
Windows index
Searching based on file content has been deliberately limited, because of efficiency: not the file contents but a global index will be searched when a certain search term is entered. This index contains all individual words of all indexed files (content) and of file names, in alphabetic order. This explains the quick result for a search term like content:AB, even in C:\ and it also explains why searching is only supported for beginning-of-word search terms. This index is automatically updated when the content of an indexed file or the name of a file changes. Apart from that, this index can be configured at will: have a look at the "Indexing options" (from the Control Panel). There, it becomes apparent that not all locations are indexed, and you also see which file extensions are indexed for content and which ones only for "properties". Content will be indexed for e.g. extensions txt, csv, doc, docx, pdf, cmd, rtf, sql, xls and xlsx, but e.g. not for extensions exe, bmp, gif, jpeg, vnc or xyz (unless you would have changed this...)
Nice to know
Of course, a search term will never return a result which is non-authorised, like e.g. the content of a file for which you have no read permission, or the name of a file in a directory for which you have no listing permission.
It is also possible to explicitly search in non-indexed locations, by adapting the Folder Options of the Explorer. Or to search in files contained in a zip archive. Or find hidden files. But the content of non-indexed files or files with the wrong extension will never be scanned...
Notice finally that, concerning the index, the collected words in there are "normalised", i.e.: accents are removed and upper case letters will be stored as lower case.
For those who want to know: the global index used by the Explorer is contained in the system file called
C:\ProgramData\Microsoft\Search\Data\Applications\Windows\Windows.edb
Alternative search possibilities
There exists other software (apart from the Windows File Explorer) which allows you to search your file system. An interesting alternative is called SearchMyFiles, and has been authored by Nir Sofer (see http://nirsoft.net/). This fantastic software is free for use. The graphical interface consists of two windows: one resembles the Explorer, and will list the files found by the search; the other provides a lot of search options. Most important pro (compared with the Explorer) is the guarantee that everything you ask for will also effectively be found. Authorisation restrictions still hold, of course, so only files with read access and folders with listing access are returned. Possibly launch SearchMyFiles as administrator. Most important down side (but this is a deliberate choice) is the speed: since no index is used, and particularly when you want a content-based search, every file has to be scanned from beginning to end. Over and over again, for every new search. But when you explicitly choose to not search by content, SearchMyFiles is remarkable fast, often even faster than the Explorer!
The different search options are self-explanatory. Just as with the Explorer, each additionally entered search term will reduce the files found. The search criteria are clearly subdivided in categories: (1) search locations (indeed, possibly several); (2) file name, this time including a wildcarding possibility ("*"); (3) content (either text, or binary, or wildcarded, at choice); (4) file size; (5) attributes: read-only, hidden, system, ...; (6) creation date and modification date. Not entering a field (or leaving the "*") means: no limitation for that criterion.
Except for Nir Sofer's software, there are innumerable other search programs for Windows, some of them even non-graphical like "findstr" which is pre-installed on Windows. It would lead us too far to go deeper into this matter.
Conclusion
Finding a particular file can be a real challenge. Sometimes one would even start questioning oneself, in case no files show up. Hopefully, this text has given you some insight into the complex matter of searching, so your next search action should cause you less frustration!
References:
- http://windows.microsoft.com/en-us/windows/improve-windows-searches-using-index-faq
- http://windows.microsoft.com/en-us/windows-8/search-file-explorer
- http://windows.microsoft.com/en-us/windows7/change-advanced-indexing-options
- http://windowssecrets.com/forums/showthread.php/157500-Windows-7-search-doesn't-find-text-strings
- http://nirsoft.net/utils/search_my_files.html