Skip to main content

Extract Every URL from Any Text Extract all URLs (http/https) from text and list unique results.

URL Extractor illustration
📝

URL Extractor

Extract all URLs (http/https) from text and list unique results.

1

Paste Text

Paste text containing URLs.

2

Review Extracted URLs

All HTTP/HTTPS URLs are found and listed.

3

Copy Results

Copy the unique list of extracted URLs.

Loading tool...

What Is URL Extractor?

A URL Extractor is a text processing utility that scans input strings to extract and list all embedded HTTP and HTTPS URLs. Developers and data analysts use it to quickly identify and isolate web links from unstructured text data, such as documents, emails, or social media posts. One specific problem it solves is the tedious manual process of finding and copying individual URLs from large blocks of text, which can be prone to errors.

The tool uses a regular expression with the pattern /https?:\/\/[^\s<>"{}|\\^`[\]]+/gi to match URLs in the input text, ensuring that both HTTP and HTTPS links are captured. What makes this tool different is its ability to clean trailing punctuation from extracted URLs, removing characters like periods, commas, and parentheses that may be attached to the end of a link. It also deduplicates the list of extracted URLs, presenting only unique links in the output.

When you use it to extract urls from text, the process involves passing the input string through the regular expression matcher, which returns an array of matches. If no matches are found, it returns a message indicating that no URLs were found. Otherwise, the tool proceeds to clean and deduplicate the extracted URLs, ultimately returning a list of unique links in a format suitable for further processing or analysis, essentially acting as a link extractor or url finder.

Why Use URL Extractor?

  • Extract all links from documents or web content
  • Automatic cleanup of trailing punctuation
  • Deduplication of found URLs
  • Works with complex URLs including query strings

Common Use Cases

Link Auditing

Extract all URLs from content for link checking and validation.

Research

Collect referenced URLs from academic papers or articles.

SEO Analysis

Identify outbound links in web page content for SEO review.

Data Mining

Pull URLs from log files, emails, or text databases.

Technical Guide

The extractor relies on the React library, specifically the useCallback hook from react, to memoize the onProcess function and prevent unnecessary re-renders. This optimization ensures that the regular expression matching and URL cleaning logic is only executed when the input changes. Under the hood, the regex engine uses a deterministic finite automaton to match the pattern /https?:\/\/[^\s<>"{}|\\^`[\]]+/gi against the input string, allowing it to efficiently scan large texts for URLs. The use of the global and case-insensitive flags (gi) enables the regex to find matches regardless of the URL's position in the text or its case.

When a match is found, the extracted URL is passed through another regex replacement to remove trailing punctuation, which involves using JavaScript's String.prototype.replace method with a pattern that matches one or more occurrences of certain punctuation characters at the end of the string. The cleaned URLs are then added to an array and passed through a Set data structure to eliminate duplicates, taking advantage of the fact that Sets in JavaScript only store unique values. This deduplication step ensures that each URL is only listed once in the output, even if it appears multiple times in the original text. The TextToolLayout component from the @/components/shared directory handles rendering the input field, output label, and extracted URLs, providing a simple and intuitive user interface for the extractor.

The extractor's output is a plain text string containing the list of unique URLs, one per line, which can be easily copied or further processed using other tools or scripts. The use of React's client-side rendering capabilities allows the extractor to run entirely in the browser, without requiring any server-side infrastructure or network requests, making it a self-contained and lightweight solution for URL extraction tasks.

Tips & Best Practices

  • 1
    Trailing punctuation after URLs is automatically cleaned
  • 2
    Only HTTP and HTTPS URLs are extracted
  • 3
    Duplicates are removed from the results
  • 4
    URLs in HTML attributes and source code are also found

Related Tools

Frequently Asked Questions

Q Does it find FTP or other protocol URLs?
Currently only HTTP and HTTPS URLs are extracted.
Q How does it handle URLs at the end of sentences?
Trailing periods, commas, and other punctuation are automatically stripped from extracted URLs.
Q Are duplicate URLs removed?
Yes, each unique URL appears only once in the output.
Q Does it validate the URLs?
It extracts URL patterns but does not check if they are reachable or valid.
Q Can it find URLs without http/https?
No, only URLs with explicit http:// or https:// protocol are matched.

About This Tool

URL Extractor is a free online tool by FreeToolkit.ai. All processing happens directly in your browser — your data never leaves your device. No registration or installation required.