Hypercane's principle focus is sampling documents from a web archive collection.
Input a list of memento URLs, TimeMap URLs, or live web resource URLs, then grow and reduce that list to produce an intelligent sample according to our predefined algorithms.
Extract simple reports from the collection of archived pages, like lists of entities, terms, metadata, and images.
Hypercane was developed to ensure the most flexibility when selecting mementos from a collection. With Hypercane's advanced features, anyone can design their own sampling algorithms.
Convert a list of memento URLs, TimeMaps, or live web resource URLs into each another. Use the Memento Protocol to find memento URLs of corresponding live web resource URLs.
From your list of URLs, filter all whose content is off-topic, near duplicate, containing a given pattern, or more...
Cluster your URLs based on content using a variety of clustering algorithms and features.
Score your URLs based using various scoring functions based on features such as Memento Damage, site categorization, URL length, and BM25.
Order your URLs based on Memento-Datetime, Publication Date, their scores from the Score Action.