With Hypercane you can create a sample of documents from a large web archive collection.

Get Started with Hypercane

Hypercane's Core Actions

Hypercane's principle focus is sampling documents from a web archive collection.


Sample

Input a list of memento URLs, TimeMap URLs, or live web resource URLs, then grow and reduce that list to produce an intelligent sample according to our predefined algorithms.

Report

Extract simple reports from the collection of archived pages, like lists of entities, terms, metadata, and images.

Synthesize

Convert your list of URLs into input for Raintale, WARCs for Archives Unleashed Toolkit, or other formats for further exploration outside of Hypercane.

What Gems Will You Discover In A Web Archive Collection?

Let's Get Started!

Photo by Erick Zajac on Unsplash.

Hypercane's Advanced Actions

Hypercane was developed to ensure the most flexibility when selecting mementos from a collection. With Hypercane's advanced features, anyone can design their own sampling algorithms.


Identify

Convert a list of memento URLs, TimeMaps, or live web resource URLs into each another. Use the Memento Protocol to find memento URLs of corresponding live web resource URLs.

Filter

From your list of URLs, filter all whose content is off-topic, near duplicate, containing a given pattern, or more...

Cluster

Cluster your URLs based on content using a variety of clustering algorithms and features.

Score

Score your URLs based using various scoring functions based on features such as Memento Damage, site categorization, URL length, and BM25.

Order

Order your URLs based on Memento-Datetime, Publication Date, their scores from the Score Action.

Hypercane is a software package developed as part of the Dark and Stormy Archives (DSA) Project.

Hypercane uses the DSA Project's MementoEmbed, OTMT, and AIU software packages.