GroupDocs.Search
24.12.0-alpha-20241227013536
See the version list below for details.
dotnet add package GroupDocs.Search --version 24.12.0-alpha-20241227013536
NuGet\Install-Package GroupDocs.Search -Version 24.12.0-alpha-20241227013536
<PackageReference Include="GroupDocs.Search" Version="24.12.0-alpha-20241227013536" />
paket add GroupDocs.Search --version 24.12.0-alpha-20241227013536
#r "nuget: GroupDocs.Search, 24.12.0-alpha-20241227013536"
// Install GroupDocs.Search as a Cake Addin #addin nuget:?package=GroupDocs.Search&version=24.12.0-alpha-20241227013536&prerelease // Install GroupDocs.Search as a Cake Tool #tool nuget:?package=GroupDocs.Search&version=24.12.0-alpha-20241227013536&prerelease
Advanced Document Search & Indexing .NET API
GroupDocs.Search for .NET is a powerful full-text search API that allows you to search through over 70 document formats in your applications. To make it possible to search instantly across thousands of documents, they must be added to the index.
- No additional software is required to search through documents of supported formats.
- Great variety of indexing and search options are provided to meet any requirements.
- Wide selection of full-text search types is available in text or object form queries.
- Externally pluggable text recognition in images and built-in reverse image search are supported.
- High indexing and search performance is achieved by unique algorithms and data structures, optimizations and multi-threaded execution.
- Various ways of visualizing search results in the text of documents are supported.
- Index scaling and load balancing are provided out of the box.
Supported Document Formats
https://docs.groupdocs.com/search/net/supported-document-formats/
Supported Features
- Create index on disk or in memory: The search index can be stored on disk or entirely in memory to speed up its operation.
- Character replacement during indexing: Can be used to convert all text to lowercase characters or to remove diacritics from text.
- Specifying a type for each character: Different types of characters are processed and indexed in different ways.
- Custom text extractors: Possibility of implementing an extractor of any document format in addition to those supplied out of the box.
- Removing documents from index: Ability to remove documents from the index.
- Document attributes: This is a special feature designed for marking indexed documents with text labels without the need for re-indexing.
- Document filtering during indexing: The ability to filter documents for indexing by various file properties, as well as combine such filters with different logic.
- Document renaming: Renaming indexed documents without the need for reindexing.
- Extraction in separate process: Data for indexing can be extracted from documents in a separate process.
- Separate data extraction: Possibility to separate the operations of extracting data from a document and adding the extracted data to the index.
- Indexing additional fields: Additional fields can be added to each indexed document in addition to those already in the document file itself.
- Indexing from different sources: Documents can be indexed from different sources: file, stream, structure.
- Indexing metadata of documents: It is possible to index document content and metadata together or separately.
- Indexing password protected documents: Indexing password-protected documents with password transfer in various ways.
- Indexing with stop words: Stop words are frequently used words that do not carry a semantic meaning and can be removed from an index to reduce its size.
- Merge indexes: Merging indexes to improve search performance.
- OCR support: Ability to connect an external module for optical character recognition on images, either separate or embedded in documents.
- Index optimization: Optimization increases search performance in the index.
- Storing text of indexed documents: Accelerates text generation with search results highlighting.
- Text file encoding detection: Automatically detects encoding of a text file.
- Update index: This operation is used to reindex documents that have been changed, deleted or added to indexed folders.
- Index scaling: Scaling the index becomes necessary when the number of indexed documents increases significantly.
Supported Search Types
- Simple word search: Searches for the exact occurrence of a word in the indexed documents.
- Boolean search: Combines multiple search terms using logical operators AND, OR, NOT.
- Regular expression search: Uses patterns and expressions to search for complex text structures.
- Faceted search: Searches on specific document fields.
- Case sensitive search: Differentiates between uppercase and lowercase characters in the search query.
- Flexible fuzzy search: Finds words with similar spelling, allowing for minor typos or spelling errors.
- Synonym search: Searches for words and their synonyms to expand search results.
- Homophone search: Finds words that sound the same but have different spellings.
- Wildcard search: Uses special wildcard characters to match varying characters or word fragments.
- Phrase search with wildcards: Searches for a specific phrase, allowing for variations in the distance between words in the sequence.
- Search for different word forms: Matches different grammatical forms of a word, such as plural or tense variations.
- Date range search: Searches in documents dates from a given range in given date formats.
- Numeric range search: Searches in documents integer numbers from a given range.
- Search by chunks (pages): Sequential search in the index segment by segment.
- Combining different types of search into one search query: Mixes multiple search types, such as combining boolean and wildcard searches in a single query.
- Alias substitution in search queries: Replaces defined aliases with their full meanings during the search.
- Spell check during search: Automatically corrects minor spelling mistakes in the search query.
- Keyboard layout correction during search: Adjusts the search query for different keyboard layouts or language settings.
- Search queries in text or flexible object form: Search queries can be built in text form or in a more flexible object form.
- Highlighting search results: Search results can be highlighted in the text of the entire document or in small fragments of text, in plain text or HTML format.
- Multiple simultaneous thread safe search: Performs multiple independent searches simultaneously.
- Thread safe search during indexing, updating, or merging operations: Ensures safe searching while the index is being modified.
- Reverse image search: Searches for images that are similar to a given reference image.
Getting Started
1. Create an index
First of all, you need to create an index. Documents will be processed and added to the index in a special format that provides very high search speed. The following example shows how to create an index on disk.
string indexFolder = @"c:/MyIndex/";
Index index = new Index(indexFolder);
2. Add files to the index
Once the index is created, you can add documents to it that you want to search. Adding documents to the index takes some time to convert the data into a searchable format. The following example shows how to perform indexing synchronously.
string documentsFolder = @"c:/MyDocuments/";
index.Add(documentsFolder);
3. Search in the index
After indexing your documents, you can search the index. The example below shows how to perform simple search in the index.
string query = "Einstein";
SearchResult result = index.Search(query);
4. Highlight search results
Search results can be seen highlighted in the text of the entire document or in fragments of text. The following example shows how to highlight search results in the text of an entire document in HTML format.
if (result.DocumentCount > 0)
{
FoundDocument document = result.GetFoundDocument(0);
OutputAdapter outputAdapter = new FileOutputAdapter(OutputFormat.Html, @"c:\Highlighted.html");
DocumentHighlighter highlighter = new DocumentHighlighter(outputAdapter);
index.Highlight(document, highlighter);
}
Fuzzy search
Below is a complete example code for a fuzzy search with an acceptable number of differences of 2 characters.
string indexFolder = @"c:\MyIndex\";
string documentsFolder = @"c:\MyDocuments\";
string query = "Deoxyribonucleic";
// Creating an index in the specified folder
Index index = new Index(indexFolder);
// Indexing documents from the specified folder
index.Add(documentsFolder);
// Creating the fuzzy search algorithm
SearchOptions options = new SearchOptions();
options.FuzzySearch.Enabled = true;
options.FuzzySearch.FuzzyAlgorithm = new TableDiscreteFunction(2);
// Search in index
SearchResult result = index.Search(query, options);
Search for different word forms
The following listing shows a complete code example for searching for different word forms in the index.
string indexFolder = @"c:\MyIndex\";
string documentsFolder = @"c:\MyDocuments\";
// Creating an index in the specified folder
Index index = new Index(indexFolder);
// Indexing documents from the specified folder
index.Add(documentsFolder);
// Enabling search for word forms
SearchOptions options = new SearchOptions();
options.UseWordFormsSearch = true;
// Search in the index
SearchResult result = index.Search("relative", options);
// The following words can be found:
// relative
// relatives
// relatively
Reverse image search
The following code example demonstrates all stages of the reverse image search.
string indexFolder = @"c:\MyIndex";
string documentFolder = @"c:\MyDocuments";
// Creating an index
Index index = new Index(indexFolder);
// Setting the image indexing options
IndexingOptions indexingOptions = new IndexingOptions();
indexingOptions.ImageIndexingOptions.EnabledForContainerItemImages = true;
indexingOptions.ImageIndexingOptions.EnabledForEmbeddedImages = true;
indexingOptions.ImageIndexingOptions.EnabledForSeparateImages = true;
// Indexing documents in a document folder
index.Add(documentFolder, indexingOptions);
// Setting the image search options
ImageSearchOptions imageSearchOptions = new ImageSearchOptions();
imageSearchOptions.HashDifferences = 10;
imageSearchOptions.MaxResultCount = 100;
imageSearchOptions.SearchDocumentFilter =
SearchDocumentFilter.CreateFileExtension(".zip", ".png", ".jpg");
// Creating a reference image for search
SearchImage searchImage = SearchImage.Create(@"c:\MyDocuments\image.png");
// Search in the index
ImageSearchResult result = index.Search(searchImage, imageSearchOptions);
Tags
Aspose
| GroupDocs
| Advanced Document Search
| Indexing API
| .NET Search Library
| Semantic Search API
| Boolean Search
| Fuzzy Search
| Metadata Search
| Entity Recognition API
| Sentiment Analysis
| Custom Entity Extraction
| Document Classification API
| Full-Text Search
| Field Search
| Regular Expressions Search
| Proximity Search
| Custom Search Ranking
| Indexing Optimization
| Distributed Search Network
| Reverse Image Search
| Search API
| .NET Document Search
| Document Indexing API
| GroupDocs.Search for .NET
| Text Search API
| Search Results Highlighting
| Document Metadata Search
| Snippets Extraction
| Wildcard Search
| Search API for .NET
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. |
.NET Core | netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.1 is compatible. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.1
- Microsoft.Extensions.DependencyModel (>= 2.1.0)
- Microsoft.ML.OnnxRuntime (>= 1.20.1)
- Microsoft.NETCore.Portable.Compatibility (>= 1.0.1)
- Microsoft.Win32.Registry (>= 4.7.0)
- Newtonsoft.Json (>= 13.0.3)
- SkiaSharp (>= 2.88.8)
- SkiaSharp.NativeAssets.Linux.NoDependencies (>= 2.88.6)
- System.CodeDom (>= 4.4.0)
- System.Diagnostics.Debug (>= 4.3.0)
- System.Diagnostics.PerformanceCounter (>= 4.5.0)
- System.Drawing.Common (>= 6.0.0)
- System.Formats.Asn1 (>= 9.0.0)
- System.IO.FileSystem.Primitives (>= 4.3.0)
- System.Net.Primitives (>= 4.3.1)
- System.Reflection.Emit (>= 4.7.0)
- System.Reflection.Emit.ILGeneration (>= 4.7.0)
- System.Security.Cryptography.Pkcs (>= 7.0.3)
- System.Security.Permissions (>= 4.6.0)
- System.Security.Principal.Windows (>= 4.7.0)
- System.Text.Encoding.CodePages (>= 8.0.0)
GitHub repositories
This package is not used by any popular GitHub repositories.