URLs

Canonicalize a URL

Canonicalization turns a relative URL into a complete URL.

To canonicalize a URL, use this function:

indieweb_utils.canonicalize_url(url: str, domain: str = '', full_url: str = '', protocol: str = 'https') → str[source]

Return a canonical URL for the given URL.

Parameters:

url (str) – The URL to canonicalize.
domain (str) – The domain to use for the canonical URL.
full_url (str or None) – Optional full URL to use for the canonical URL.
protocol (str or None) – Optional protocol to use for the canonical URL.

Returns:

The canonical URL.

Return type:

str

import indieweb_utils

url = "/contact"
domain = "jamesg.blog"
protocol = "https"

endpoints = indieweb_utils.canonicalize_url(
    url, domain, protocol=protocol
)

print(webmention_endpoint) # https://jamesg.blog/contact/

This function returns a URL with a protocol, host, and path.

The domain of the resource is needed so that it can be added to the URL during canonicalization if the URL is relative.

A complete URL returned by this function looks like this:

https://indieweb.org/POSSE

Add hashtags and person tags to a string

The autolink_tags() function replaces hashtags (#) with links to tag pages on a specified site. It also replaces person tags (ex. @james) with provided names and links to the person’s profile.

This function is useful for enriching posts.

To use this function, pass in the following arguments:

indieweb_utils.autolink_tags(text: str, tag_prefix: str, people: dict, tags: List[str] | None = None) → str[source]

Replace hashtags (#) and person tags (@) with links to the respective tag page and profile URL.

Parameters:

text (str) – The text to process.
tag_prefix (str) – The prefic to append to identified tags.
people (dict) – A dictionary of people to link to.
tags (List[str]) – A list of tags to link to (optional).

Returns:

The processed text.

Return type:

str

Example:

import indieweb_utils

note = "I am working on a new #muffin #recipe with @jane"

# tag to use, name of person, domain of person
people = { "jane": ("Jane Doe", "https://jane.example.com") }

note_with_tags = indieweb_utils.autolink_tags(note, "/tag/", people, tags=["muffin", "recipe"])

This function will only substitute tags in the list of tags passed through to this function if a tags value is provided. This ensures that the function does not create links that your application cannot resolve.

If you do not provide a tags value, the function will create links for all hashtags, as it is assumed that your application can resolve all hashtags.

Tagging people is enabled by providing a dictionary with information on all of the people to whom you can tag.

If a person in an @ link is not in the tag dictionary, this function will not substitute that given @ link.

Here is an example value for a person tag database:

{
    "james": (
        "James' Coffee Blog",
        "https://jamesg.blog/""
    )
}

This function maps the @james tag with the name “James’ Coffee Blog” and the URL “https://jamesg.blog/”. More people can be added as keys to the dictionary.

Remove tracking parameters from a URL

The remove_tracking_params() function removes tracking parameters from a URL.

This function removes all utm_* parameters from a URL by default. You can specify your own parameters or starts of parameters to remove by passing in a list of strings to the custom_params argument.

indieweb_utils.remove_tracking_params(url: str, custom_params: list) → str[source]

Remove all UTM tracking parameters from a URL.

Parameters:

url (str) – The URL to remove tracking parameters from.
custom_params (list) – A list of custom parameters to remove.

Returns:

The URL without tracking parameters.

Return type:

str

Example:

import indieweb_utils

url = "https://jamesg.blog/indieweb/?utm_source=twitter&utm_medium=social&utm_campaign=webmention"

url_without_tracking = indieweb_utils.remove_tracking_params(url)

print(url_without_tracking) # https://jamesg.blog/indieweb/

Check if a URL is of a given domain

The is_site_url() function checks if a URL is of a given domain.

indieweb_utils.is_site_url(url: str, domain: str) → bool[source]

Determine if a URL is a site URL.

Parameters:

url (str) – The URL to check.
domain (str) – The domain to check against.

Returns:

Whether or not the URL is a site URL.

Return type:

bool

Raises:

ValueError – If the URL does not include a scheme.

Example:

import indieweb_utils

url = "https://jamesg.blog/indieweb/"
domain = "jamesg.blog"

is_site_url = indieweb_utils.is_site_url(url, domain)

print(is_site_url) # True

Reduce characters in a URL slug

The slugify() function takes a URL and removes all characters that are not:

Alphanumeric characters
Periods
Dashes
Underscores

You can override the default list of characters to remove by passing in a list of strings to the allowed_chars argument (This argument still enforces the alphanumeric requirement).

indieweb_utils.slugify(url: str, remove_extension: bool = False, allowed_chars: list = ['-', '/', '_', '.']) → str[source]

Turn a URL into a slug. Only alphanumeric characters, periods, dashes, and underscores are allowed in the resulting slug, unless an allowed_chars list is provided.

Parameters:

url (str) – The URL to slugify.
remove_extension (bool) – Whether or not to remove the file extension from the slug.
allowed_chars (list) – A list of allowed characters.

Returns:

A slugified URL.

Example:

from indieweb.utils import slugify

slugify(”https://jamesg.blog/indieweb.html”, True) # https://jamesg.blog/indieweb/ slugify(“indieweb.html”, True) # /indieweb/