Skip to main content
Test Double company logo
Services
Services Overview
Holistic software investment consulting
Software Delivery
Accelerate quality software development
Product Management
Launch modern product orgs
Legacy Modernization
Renovate legacy software systems
DevOps
Scale infrastructure smoothly
Upgrade Rails
Update Rails versions seamlessly
Technical Recruitment
Build tech & product teams
Technical Assessments
Uncover root causes & improvements
Case Studies
Solutions
Accelerate Quality Software
Software Delivery, DevOps, & Product Delivery
Maximize Software Investments
Product Performance, Product Scaling, & Technical Assessments
Future-Proof Innovative Software
Legacy Modernization, Product Transformation, Upgrade Rails, Technical Recruitment
About
About
What's a test double?
Approach
Meeting you where you are
Founder's Story
The origin of our mission
Culture
Culture & Careers
Double Agents decoded
Great Causes
Great code for great causes
EDI
Equity, diversity & inclusion
Insights
All Insights
Hot takes and tips for all things software
Leadership
Bold opinions and insights for tech leaders
Developer
Essential coding tutorials and tools
Product Manager
Practical advice for real-world challenges
Say Hello
Test Double logo
Menu
Services
BackGrid of dots icon
Services Overview
Holistic software investment consulting
Software Delivery
Accelerate quality software development
Product Management
Launch modern product orgs
Legacy Modernization
Renovate legacy software systems
Cycle icon
DevOps
Scale infrastructure smoothly
Upgrade Rails
Update Rails versions seamlessly
Technical Recruitment
Build tech & product teams
Technical Assessments
Uncover root causes & improvements
Case Studies
Solutions
Solutions
Accelerate Quality Software
Software Delivery, DevOps, & Product Delivery
Maximize Software Investments
Product Performance, Product Scaling, & Technical Assessments
Future-Proof Innovative Software
Legacy Modernization, Product Transformation, Upgrade Rails, Technical Recruitment
About
About
About
What's a test double?
Approach
Meeting you where you are
Founder's Story
The origin of our mission
Culture
Culture
Culture & Careers
Double Agents decoded
Great Causes
Great code for great causes
EDI
Equity, diversity & inclusion
Insights
Insights
All Insights
Hot takes and tips for all things software
Leadership
Bold opinions and insights for tech leaders
Developer
Essential coding tutorials and tools
Product Manager
Practical advice for real-world challenges
Say hello
Developers
Developers
Developers
Software tooling & tips

How to achieve natural sorting in Postgres using custom collations

Sorting alphanumeric data can be tricky! Discover how to use custom collations in Postgres to achieve natural sorting and make your data handling a breeze.
Kenneth Bogner
|
December 11, 2022
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Recently, a situation came up where some alphanumeric data I was working with needed to be sorted in a way that felt more “natural.” Normally, when you’re sorting an alphanumeric list of data, it sorts by character code points. For example,

BAC012
ABC102
CAB210
ABC103

gets sorted as

ABC102
ABC103
BAC012
CAB210

That seems right … so what do I mean by “natural?” The “natural” descriptor is meant to provide an alternative label for how the numeric part of the alphanumeric data gets ordered. When we sort numbers, it’s sometimes awkward to sort them by character code points.

101
11
2
2003
31

That ordering isn’t wrong; but it’s usually not what we want. When sorting numbers, it’s more “natural” to sort numbers as numbers.

2
11
31
101
2003

Naturally, numbers, by default, get treated as numbers and sorted from low to high value, but when they are specifically part of an alphanumeric value, they’re sorted according to the code point, or value tied to that character.

Postgres to the rescue

With that understanding of “natural,” the problem I was working to solve led me to learning that you can create your own collation objects in Postgres. Within that section of the documentation, I found a nifty example provided.

Numeric ordering, sorts sequences of digits by their numeric value, for example: A-21 < A-123 (also known as natural sort).
CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true');
CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');

Apparently this was a common enough desire, that they simply give an example in the docs. It seemed like exactly what I was looking for; the only problem was that I didn’t really know what anything after numeric meant in the CREATE COLLATION clause. And I wanted to understand it!

This is where the small task of simply creating and using the collation became a rabbit hole to Wonderland.

Down the rabbit hole of Postgres docs

To get a better understanding, I read all the previous context in the Postgres docs and I followed the links about Unicode to discover much more than I expected. One example: I thought I knew what a collation is for, but if asked, I wouldn’t have had an answer. Now I do.

In case you’re like me and collations are a fuzzy concept, hopefully I can explain by simply saying it’s used for determining the sort order of a character set. Before, I think I saw collations as something that defined a character set. But according to Unicode, it’s specifically for defining how characters in a character set should get ordered. This sounds perfect if I want to treat numbers different from non-numbers in an alphanumeric string.

ICU but do UC me?

Now that I have that title out of my system, we can start breaking down what the CREATE COLLATION clause is even saying.

But! Before I could look more into what ICU is, my brain needed to know if it was an acronym. So, for anyone with a brain like mine, I eventually found it (after more digging than I ever expected it to require). ICU is an acronym for “International Components for Unicode.”.

With that out of the way, the ICU provider gives context about how to interpret the locale value given. It’s a common Unicode syntax, and it’s also helpful for defining collations that are more than just language+country. There’s a lot about ICU, but for our purposes, the locale value is where I’ll shift focus.

Nested rabbit holes

Both of the CREATE COLLATION clauses use the same provider, but have two separate locale values that allegedly do the same thing: en-u-kn-true and en@colNumeric=yes. The Postgres docs mention that en-u-kn-true is a “’language tag’ per BCP 47” while 'en@colNumeric=yes' is the “traditional ICU-specific locale syntax.” They also mention that the language tag is preferred but isn’t supported by older ICU versions. To be entirely honest, I didn’t look further into it. If you choose the one that makes the most sense to you right now, it might be fine. YMMV.

With that said, it’s possible you’re thinking what I was: what does that mean?

[Note: The separator can also be an underscore for backward compatibility reasons only. A dash is the preferred approach if your database supports it.]

en-u-kn-true

The en-u-kn-true value is known as a Unicode locale identifier. Each section is preferably separated by a -.

  •  en: The first part of the string refers to the language this locale string is being created for. If you wanted a region, you could do en_US or en_GB, but en is also acceptable (case also seems to matter).
  •  u: Figuring this out took some effort. Despite there being a lot written, so much of it had implicit assumptions. Eventually I tracked it down here and subsequently RFC6067 to determine that it is a Unicode locale language tag extension. By itself, it doesn’t mean much.
  •  kn: With the u- prefix before it, I looked at more extensions under that BCP 47 specification and found kn to be the Unicode locale extension for numeric ordering here.
  •  true: This last part simply enables the kn extension to treat any sequence of decimal digits as digits for sorting numerically.

Overall, the u-kn-true is the key section in the language tag that enables a numeric ordering within alphanumeric strings. For the second string, I’d assume it’s just a different way to say the same thing. (Spoiler alert: it is.)

en@colNumeric=yes

  •  en: This is the same as above.
  •  colNumeric=yes: The best place I’ve found for finding this information is in the ICU GitHub repository. Within that file, the available values are broken down along with a small note about what each do. I tried a few sources, but nothing I found went over the traditional ICU-specific syntax as clearly. Per this page, the @ performs a similar duty to u in the BCP 47 language tag by specifying what follows is an extension.

Information for determining what is right is sadly not cohesive. You really have to know what you’re looking for first. So if you’re needing to create a collation with different requirements, it may take some time to track down exactly what you want. Hopefully, I’ll have saved you some time.

Applying the collation

Now for the straightforward part! Once I had the collation created, I applied it to the columns I wanted with:

ALTER TABLE my_table ALTER COLUMN my_column TYPE character varying(255) COLLATE numeric

Originally, I assumed I would need to group numeric values together and sort more manually, but thankfully Postgres is exceptional and manages this ordering..

Your monthly dose of better dev

Test Double Dispatch: hot takes, smart tools, and coding shortcuts to make your life easier—delivered every month.

Subscribe now

Related Insights

🔗
How to fake time in a Postgres database
🔗
How to back up Heroku Postgres databases: a complete guide
🔗
How to completely uninstall and reinstall Homebrew Postgres

Explore our insights

See all insights
Developers
Developers
Developers
C# and .NET tools and libraries for the modern developer

C# has a reputation for being used in legacy projects and is not often talked about related to startups or other new business ventures. This article aims to break a few of the myths about .NET and C# and discuss how it has evolved to be a great fit for almost any kind of software.

by
Patrick Coakley
Leadership
Leadership
Leadership
Turning observability into a team strength without a big overhaul

By addressing observability pain points one at a time, we built systems and practices that support rapid troubleshooting and collaboration.

by
Gabriel Côté-Carrier
Developers
Developers
Developers
Why I actually enjoy PR reviews (and you should, too)

PR reviews don't have to be painful. Discover practical, evidence-based approaches that turn code reviews into team-building opportunities while maintaining quality and reducing development friction.

by
Robert Komaromi
Letter art spelling out NEAT

Join the conversation

Technology is a means to an end: answers to very human questions. That’s why we created a community for developers and product managers.

Explore the community
Test Double Executive Leadership Team

Learn about our team

Like what we have to say about building great software and great teams?

Get to know us
Test Double company logo
Improving the way the world builds software.
What we do
Services OverviewSoftware DeliveryProduct ManagementLegacy ModernizationDevOpsUpgrade RailsTechnical RecruitmentTechnical Assessments
Who WE ARE
About UsCulture & CareersGreat CausesEDIOur TeamContact UsNews & AwardsN.E.A.T.
Resources
Case StudiesAll InsightsLeadership InsightsDeveloper InsightsProduct InsightsPairing & Office Hours
NEWSLETTER
Sign up hear about our latest innovations.
Your email has been added!
Oops! Something went wrong while submitting the form.
Standard Ruby badge
614.349.4279hello@testdouble.com
Privacy Policy
© 2020 Test Double. All Rights Reserved.

Your monthly dose of better dev

Test Double Dispatch: hot takes, smart tools, and coding shortcuts to make your life easier—delivered every month.

Subscribe now