URLs
Canonicalize a URL
Canonicalization turns a relative URL into a complete URL.
To canonicalize a URL, use this function:
- indieweb_utils.canonicalize_url(url: str, domain: str = '', full_url: str = '', protocol: str = 'https') str [source]
Return a canonical URL for the given URL.
- Parameters:
- Returns:
The canonical URL.
- Return type:
import indieweb_utils url = "/contact" domain = "jamesg.blog" protocol = "https" endpoints = indieweb_utils.canonicalize_url( url, domain, protocol=protocol ) print(webmention_endpoint) # https://jamesg.blog/contact/
This function returns a URL with a protocol, host, and path.
The domain of the resource is needed so that it can be added to the URL during canonicalization if the URL is relative.
A complete URL returned by this function looks like this:
https://indieweb.org/POSSE
Remove tracking parameters from a URL
The remove_tracking_params() function removes tracking parameters from a URL.
This function removes all utm_* parameters from a URL by default. You can specify your own parameters or starts of parameters to remove by passing in a list of strings to the custom_params argument.
- indieweb_utils.remove_tracking_params(url: str, custom_params: list) str [source]
Remove all UTM tracking parameters from a URL.
- Parameters:
- Returns:
The URL without tracking parameters.
- Return type:
Example:
import indieweb_utils url = "https://jamesg.blog/indieweb/?utm_source=twitter&utm_medium=social&utm_campaign=webmention" url_without_tracking = indieweb_utils.remove_tracking_params(url) print(url_without_tracking) # https://jamesg.blog/indieweb/
Check if a URL is of a given domain
The is_site_url() function checks if a URL is of a given domain.
- indieweb_utils.is_site_url(url: str, domain: str) bool [source]
Determine if a URL is a site URL.
- Parameters:
- Returns:
Whether or not the URL is a site URL.
- Return type:
- Raises:
ValueError – If the URL does not include a scheme.
Example:
import indieweb_utils url = "https://jamesg.blog/indieweb/" domain = "jamesg.blog" is_site_url = indieweb_utils.is_site_url(url, domain) print(is_site_url) # True
Reduce characters in a URL slug
The slugify() function takes a URL and removes all characters that are not:
Alphanumeric characters
Periods
Dashes
Underscores
You can override the default list of characters to remove by passing in a list of strings to the allowed_chars argument (This argument still enforces the alphanumeric requirement).
- indieweb_utils.slugify(url: str, remove_extension: bool = False, allowed_chars: list = ['-', '/', '_', '.']) str [source]
Turn a URL into a slug. Only alphanumeric characters, periods, dashes, and underscores are allowed in the resulting slug, unless an allowed_chars list is provided.
- Parameters:
- Returns:
A slugified URL.
Example:
from indieweb.utils import slugify
slugify(”https://jamesg.blog/indieweb.html”, True) # https://jamesg.blog/indieweb/ slugify(“indieweb.html”, True) # /indieweb/