Redshift’s UDFs (User Defined Functions) permit to execute, with some limitations, certain Python libraries and custom code.
In my case, I wanted to find a way to extract the country code from a phone number in E.164 format. UDFs are a perfect fit for this, the implementation in SQL would most certainly require creating custom views, and hacking your way around, while in python, we can use the library phone-iso3166
>>> from phone_iso3166.country import * >>> phone_country('+1 202-111-1111') 'US' >>> phone_country('+34645678901') 'ES'
To upload a library to Redshift, we first need to check it follows the structure:
directory/ __init__.py extra_files.py subdirectory/ __init__.py other_files.py
In our case, phone-iso3166 is already in that structure. Now we need to zip the library:
tar -xvzf phone-iso3166.tar.gz zip -r phone-iso3166.zip phone-iso3166
With our zipped library, we need to upload it to S3. I did this part manually into a bucket named s3://redshift/custom-udfs/
Now, connect to Redshift and issue:
CREATE LIBRARY phone_iso3166 LANGUAGE plpythonu FROM 's3://redshift/custom-udfs/phone_iso3166.zip' CREDENTIALS 'aws_access_key_id=<your-aws-key>;aws_secret_access_key=<your-aws-pass>' region as '<your-region>';
Last step:
CREATE OR REPLACE FUNCTION udf_phone_country (phone_number VARCHAR(64)) RETURNS VARCHAR(64) IMMUTABLE as $$ from phone_iso3166.country import phone_country, InvalidPhone try: return phone_country(phone_number) except: return None $$ LANGUAGE plpythonu;
You should be all set to use this in your queries:
SELECT analytics.udf_phone_country('+14151111975'), analytics.udf_phone_country('+7652112311'), analytics.udf_phone_country('+34626472918')
That returns:
"US","RU","ES"