Pydantically Perfect blog series
Welcome to Pydantically Perfect, the blog series where we explore how to solve data-related problems in Python using Pydantic, a feature-rich data validation library written in Rust. Whether you're a seasoned developer or just starting, we're hoping to give you actionable insights you can start applying right now to make your code more robust and reliable with stronger typing.
If you're a newcomer here, we encourage you to take a look at our first installment: Pydantically perfect: A beginner’s guide to Pydantic for Python type safety
Where you are in the Pydantic for Python blog series:
- A beginner's guide to Pydantic to Python type safety
- You are here: Seamlessly handle non-pythonic naming conventions
- Field report in progress: Normalize legacy data
- Field report in progress: Declare rich validation rules
- Field report in progress: Build shareable domain types
- Field report in progress: Add your custom logic
- Field report in progress: Apply alternative validation rules
- Field report in progress: Validate your app configuration
- Field report in progress: Put it all together with a FHIR example
The validation problem: receiving data in a different naming convention
The generally accepted field naming convention in the Python ecosystem is snake_case
. Our Pydantic models should reflect this to be consistent with the rest of our code base.
What if we get data from other APIs that use camelCase
instead? This will require some tweaks on our Pydantic models.
Let's imagine a basic Person
model with a few fields using the standard snake_case
naming convention. We get back an error when validating data in camelCase
because the validation engine failed to match with our fields:
from pydantic import BaseModel
class Person(BaseModel):
first_name: str
last_name: str
data = {
"firstName": "John",
"lastName": "Smith",
}
Person.model_validate(data)
# pydantic_core._pydantic_core.ValidationError: 2 validation errors for Person
# first_name
# Field required [type=missing, input_value={'firstName': 'John', 'lastName': 'Smith'}, input_type=dict]
# For further information visit https://errors.pydantic.dev/2.11/v/missing
# last_name
# Field required [type=missing, input_value={'firstName': 'John', 'lastName': 'Smith'}, input_type=dict]
# For further information visit https://errors.pydantic.dev/2.11/v/missing
How do we solve this? We'll walk through different solutions and rank them from worst to best.
Worst solution: break our naming convention
The quickest way to get it to green with what we know so far would be to break our naming convention and rename our model to use camelCase
.
from pydantic import BaseModel
class Person(BaseModel):
firstName: str
lastName: str
data = {
"firstName": "John",
"lastName": "Smith",
}
Person.model_validate(data)
# Person(firstName='John', lastName='Smith')
This isn't the way to go because it allows this external complexity to creep into our code everywhere that we use a Person
object.
Instead, let's lean on Pydantic features to handle the different naming convention for us.
Better solution: use field aliases
The Pydantic Field metadata enables us to set aliases. These aliases are alternative names available to Pydantic when validating and serializing.
For example, here we could define aliases for our two fields like so:
from pydantic import BaseModel, Field
class Person(BaseModel):
first_name: str = Field(alias="firstName")
last_name: str = Field(alias="lastName")
data = {
"firstName": "John",
"lastName": "Smith",
}
Person.model_validate(data)
# Person(first_name='John', last_name='Smith')
Now, the rest of our code outside of Pydantic doesn't have to care about the external camelCase
naming convention and we can stay true to our snake_case
naming convention. That external complexity is contained solely within our Pydantic layer.
That said, the drawback of this method is that it requires an alias to be defined for every field. This small example doesn't look like much, but aliasing 20 different models with 10 fields each could quickly become a bother. If only there was a way to automate this…
Best solution: use alias generators
Of course, Pydantic has a way to automate this: alias generators.
Every Pydantic model has a model_config
attribute that enables us to set configuration for the whole model by assigning it a ConfigDict
. One of these configuration options is to pass in an alias_generator
function that will automatically build aliases for all the fields in the model. Pydantic also offers a few out-of-the-box functions for our convenience, including a to_camel
function that transforms the names into camelCase
.
Let's see what it might look like:
from pydantic import BaseModel, ConfigDict
from pydantic.alias_generators import to_camel
class Person(BaseModel):
model_config = ConfigDict(alias_generator=to_camel)
first_name: str
last_name: str
data = {
"firstName": "John",
"lastName": "Smith",
}
Person.model_validate(data)
# Person(first_name='John', last_name='Smith')
This works well for us. There's no need now to specify aliases individually and all of our fields are converted to be validated from camelCase
with just a single line of code.
What if one of the automatically generated aliases doesn't neatly match the incoming data? We can overwrite automatically generated aliases by defining a field alias like in the previous solution. This has the benefit of letting us handle edge cases without losing out on the automatic alias generation for the other fields.
Taking it further: having a shared model configuration
What if we had 20 different models with that same problem? We would still need one line of code per model to apply that configuration. How could we simplify that if we consider it to be too much?
Pydantic enables us to share our model configurations by defining our own parent class. In practice, it'd mean that we could create a OurBaseModel
class inheriting from Pydantic's own BaseModel
with an alias_generator
configuration. Then, any new model inheriting from OurBaseModel
would have that alias_generator
by default.
It would look like this:
from pydantic import BaseModel, ConfigDict
from pydantic.alias_generators import to_camel
class OurBaseModel(BaseModel):
model_config = ConfigDict(alias_generator=to_camel)
class Person(OurBaseModel):
first_name: str
last_name: str
data = {
"firstName": "John",
"lastName": "Smith",
}
Person.model_validate(data)
# Person(first_name='John', last_name='Smith')
Now, we could create our 20 models by inheriting directly from our OurBaseModel
class, and they would all correctly handle data in camelCase
.
Outside the box: custom alias generators
Pydantic offers the following three alias generators out-of-the-box:
to_pascal
, forPascalCase
to_camel
, forcamelCase
to_snake
, forsnake_case
What happens if we encounter a naming convention that isn't covered, like kebab-case
or Pascal_Snake_Case
? We can just make our own!
The alias_generator
configuration accepts a function, so we can write a custom name transformation function and pass it in. Here's an example for kebab-case
:
from pydantic import BaseModel, ConfigDict
def to_kebab(name: str) -> str:
return name.replace("_", "-")
class Person(BaseModel):
model_config = ConfigDict(alias_generator=to_kebab)
first_name: str
last_name: str
data = {
"first-name": "John",
"last-name": "Smith",
}
Person.model_validate(data)
# Person(first_name='John', last_name='Smith')
The serialization problem: sending data in another naming convention
We've explored aliases and alias generation in the context of accepting incoming data in another naming convention, but another likely context would be serializing to another naming convention.
For example, our REST APIs are expected company-wide to use camelCase
. In that case, we'd want to serialize to that standard but still use snake_case
in our python code. This will require a bit more explanation because of the default Pydantic behaviors, so bear with me while we take a little detour.
Pydantic's intent is to try to provide us with useful default behaviors, but still empower us to configure it differently if our use case is a bad match. Being aware of these configuration options and when we should use them will be very helpful.
When dealing with aliases, Pydantic's default behavior isn't consistent across validation and serialization:
- Validation: Prefer the field alias
- Serialization: Use the field name
For our use case of serializing to another naming convention, we will need to override Pydantic's default behavior somewhere.
We can override these default behaviors at two different levels:
- The function calls by passing in
by_alias
orby_name
arguments in themodel_dump
and/ormodel_validate
functions. - The model configuration by passing in the right boolean flags in the
model_config
dictionary. For better or worse, these flags have a larger impact radius because they'll affect all usage of that model.
If we wanted to serialize with field aliases at the function call level, it would look like this:
from pydantic import BaseModel, ConfigDict
from pydantic.alias_generators import to_camel
class Person(BaseModel):
model_config = ConfigDict(alias_generator=to_camel)
first_name: str
last_name: str
data = {
"firstName": "John",
"lastName": "Smith",
}
person = Person.model_validate(data)
# Default behavior
person.model_dump()
# {'first_name': 'John', 'last_name': 'Smith'}
# Explicitly asking to use aliases
person.model_dump(by_alias=True)
# {'firstName': 'John', 'lastName': 'Smith'}
If we wanted to serialize with field aliases at the model configuration level, it would look like this:
from pydantic import BaseModel, ConfigDict
from pydantic.alias_generators import to_camel
class Person(BaseModel):
model_config = ConfigDict(
alias_generator=to_camel,
serialize_by_alias=True, # Overridden default behavior here
)
first_name: str
last_name: str
data = {
"firstName": "John",
"lastName": "Smith",
}
person = Person.model_validate(data)
# Overridden default behavior
person.model_dump()
# {'firstName': 'John', 'lastName': 'Smith'}
# We can still explicitly serialize by name
person.model_dump(by_alias=False)
# {'first_name': 'John', 'last_name': 'Smith'}
A gotcha: Pydantic aliases and Python constructors
One of the gotchas of the default validation behavior is that having aliases will break the model constructor. Passing in the field names in our Python code while the validation process looks for field aliases will raise a validation error:
from pydantic import BaseModel, ConfigDict
from pydantic.alias_generators import to_camel
class Person(BaseModel):
model_config = ConfigDict(alias_generator=to_camel)
first_name: str
last_name: str
Person(first_name="John", last_name="Smith")
# pydantic_core._pydantic_core.ValidationError: 2 validation errors for Person
# firstName
# Field required [type=missing, input_value={'first_name': 'John', 'last_name': 'Smith'}, input_type=dict]
# For further information visit https://errors.pydantic.dev/2.11/v/missing
# lastName
# Field required [type=missing, input_value={'first_name': 'John', 'last_name': 'Smith'}, input_type=dict]
# For further information visit https://errors.pydantic.dev/2.11/v/missing
To solve this, we have essentially two options:
- Switch the validation behavior to use field names by default rather than aliases like the serialization behavior. The associated tradeoff is that we'll need to explicitly validate by aliases when needed.
- Configure the validation behavior to look for both field names and aliases. Pydantic would look at both for each field and take the first one that exists and contain a valid value. That option is great if we're comfortable with the tradeoff of the validation being less strict in all cases.
Here's how we can configure the model to validate by field name instead of field aliases:
from pydantic import BaseModel, ConfigDict
from pydantic.alias_generators import to_camel
class Person(BaseModel):
model_config = ConfigDict(
alias_generator=to_camel,
validate_by_name=True, # Overridden default behavior
)
first_name: str
last_name: str
# Using the constructor works!
Person(first_name="John", last_name="Smith")
# Person(first_name='John', last_name='Smith')
# Using data matching the aliases also works
# as long as we specify to use the alias!
data = {
"firstName": "John",
"lastName": "Smith",
}
person = Person.model_validate(data, by_alias=True)
# Person(first_name='John', last_name='Smith')
Here's how we can configure the model to validate with both field names and field aliases:
from pydantic import BaseModel, ConfigDict
from pydantic.alias_generators import to_camel
class Person(BaseModel):
model_config = ConfigDict(
alias_generator=to_camel,
# Validate with field names AND aliases
validate_by_name=True,
validate_by_alias=True,
)
first_name: str
last_name: str
# Using the constructor works!
Person(first_name="John", last_name="Smith")
# Person(first_name='John', last_name='Smith')
# Using data matching the aliases also works!
data = {
"firstName": "John",
"lastName": "Smith",
}
person = Person.model_validate(data)
# Person(first_name='John', last_name='Smith')
Conclusion: what's next for the Pydantically Perfect series?
Now that we've covered using aliases to manage different naming conventions, it'll be worthwhile to deepen our coverage of features related to aliases. The next problem we'll be walking through will be leveraging aliases to normalize inconsistent data. This issue can typically happen when we receive data from different systems or when data formats change over time.
Our goal isn't to go through all of Pydantic's features, but rather to provide a curated list of Pydantic features we found helpful when adopting it. If you're looking for a larger overview or want to know more without waiting for future posts, we encourage you to take a look at the Official Pydantic documentation.