Welcome to Pydantically Perfect, the blog series where we explore how to solve data-related problems in Python using Pydantic, a feature-rich data validation library written in Rust. Whether you're a seasoned developer or just starting, we're hoping to give you actionable insights you can start applying right now to make your code more robust and reliable with stronger typing.
If you're a newcomer here, we encourage you to take a look at our first installment: Pydantically perfect: A beginner’s guide to Pydantic for Python type safety.
Where you are in the Pydantic for Python blog series:
- A beginner's guide to Pydantic to Python type safety
- Seamlessly handle non-pythonic naming conventions
- You are here: Normalize legacy data in Python
- Field report in progress: Declare rich validation rules
- Field report in progress: Build shareable domain types
- Field report in progress: Add your custom logic
- Field report in progress: Apply alternative validation rules
- Field report in progress: Validate your app configuration
- Field report in progress: Put it all together with a FHIR example
The problem: inconsistent data
We're trying to parse error responses from a legacy system. The issue is that the response structure varies depending on the endpoint queried and where the error occurred internally. We can't predict in advance which structure we'll receive.
The inconsistent part for us is where the user ID lives and we needed to properly log the error in our systems.
There are three different formats:
1. The user ID is an attribute of the root.
2. The user ID is nested inside a user object.
3. The user ID is nested inside a list of user objects. It's unclear why, but there's always exactly one user object.
data_format_one = {
"timestamp": "2025-09-08T15:16:03Z",
"error_message": "'NoneType' object has no attribute 'lower'",
"error_type": "AttributeError",
"user_id": "e1c3cd56-ed1f-4291-9dea-fd54f9b379c2",
}
data_format_two = {
"timestamp": "2025-09-08T15:16:03Z",
"error_message": "'NoneType' object has no attribute 'lower'",
"error_type": "AttributeError",
"user": {
"id": "e1c3cd56-ed1f-4291-9dea-fd54f9b379c2",
},
}
data_format_three = {
"timestamp": "2025-09-08T15:16:03Z",
"error_message": "'NoneType' object has no attribute 'lower'",
"error_type": "AttributeError",
"users": [
{
"id": "e1c3cd56-ed1f-4291-9dea-fd54f9b379c2",
}
],
}
What are our goals here?
1. We want to handle this complexity exactly once rather than letting it spread in our application. Repeatedly checking for where the user ID is stored will muddle our business logic that should only care about the value of the user ID.
2. We want to lean on Pydantic to contain that complexity as much as possible. That will limit the amount of custom validation logic we’ll need to write without compromising on robustness or quality because Pydantic is a dedicated validation library.
We'll do this step by step by making things slightly better each time.
First step: mapping the models
We can start by creating the first version of our models in Pydantic. The three models below directly map the existing structures:
from datetime import datetime
from uuid import UUID
from pydantic import BaseModel
class ErrorFormatOne(BaseModel):
timestamp: datetime
error_message: str
error_type: str
user_id: UUID
class NestedUser(BaseModel):
id: UUID
class ErrorFormatTwo(BaseModel):
timestamp: datetime
error_message: str
error_type: str
user: NestedUser
class ErrorFormatThree(BaseModel):
timestamp: datetime
error_message: str
error_type: str
users: list[NestedUser]
Notice how the models contain more advanced types likedatetime,UUID, and other Pydantic models? Natively handling advanced types is one of Pydantic's biggest strengths and we'll deep dive into these in a later installment of the series.
Why aren't these models satisfying yet?
They simply map the structure without any abstraction, which means the responsibility of handling the structure discrepancies would be forwarded to the rest of our code. That's exactly the opposite of what we want.
The rest of our code shouldn't have to check whether to use error_model.user_id, error_model.user.id, or error_model.users[0].id. Let's address this inconsistency in accessing the user ID.
The trap: combining everything as optional
From here, it could be tempting to directly combine the three models, but mark the different sources of user ID as optional to accommodate the fact that we'd only ever have one of them at once. It would look something like this:
from datetime import datetime
from uuid import UUID
from pydantic import BaseModel, Field
class NestedUser(BaseModel):
id: UUID
class Error(BaseModel):
timestamp: datetime
error_message: str
error_type: str
user_id: UUID | None = Field(default=None)
user: NestedUser | None = Field (default=None)
users: list[NestedUser] | None = Field(default=None)
# This data has no user ID and won't raise an error
invalid_data = {
"timestamp": "2025-09-08T15:16:03Z",
"error_message": "'NoneType' object has no attribute 'lower'",
"error_type": "AttributeError",
}
Error.model_validate(invalid_data)
# Error(
# timestamp=datetime.datetime(2025, 9, 8, 15, 16, 3, tzinfo=TzInfo(UTC)),
# error_message="'NoneType' object has no attribute 'lower'",
# error_type="AttributeError",
# user_id=None,
# user=None,
# users=None,
# )
While the code does combine all structures into a single model, there are two main drawbacks:
- We're still not handling the different structure discrepancies. The main issue of muddling our business logic with the inconsistent ways of accessing the user ID is just as present as in the beginning and that would make the rest of our code unnecessarily complex.
- We've lost the validation that we have a user ID. As you can see in the
invalid_dataexample, no validation error is raised to let us know that there’s no valid user ID. Our own custom logic would have to double-check that there is a valid user ID, which is the opposite of what we want to do by bringing in Pydantic validation. We want to lean on Pydantic’s tried and true validation process so we can focus on business logic instead.
Let's hold off on combining the models for now and focus on normalizing the structure.
Step two: flattening the models
Amongst Pydantic's alias features, we can use the AliasPath to flatten the structure. We provide the AliasPath with one or more keys and Pydantic will follow these keys into the nested objects when populating an attribute.
Let's update the relevant error models to reach inside their nested structure and extract a user_id attribute directly. We'll also test the changed models with our example data:
from datetime import datetime
from uuid import UUID
from pydantic import AliasPath, BaseModel, Field
class ErrorFormatOne(BaseModel):
timestamp: datetime
error_message: str
error_type: str
user_id: UUID
class ErrorFormatTwo(BaseModel):
timestamp: datetime
error_message: str
error_type: str
# Equivalent to reaching for `input_data["user"]["id"]`
user_id: UUID = Field(validation_alias=AliasPath("user", "id"))
class ErrorFormatThree(BaseModel):
timestamp: datetime
error_message: str
error_type: str
# Equivalent to reaching for `input_data["users"][0]["id"]`
user_id: UUID = Field(validation_alias=AliasPath("users", 0, "id"))
data_format_two = {
"timestamp": "2025-09-08T15:16:03Z",
"error_message": "'NoneType' object has no attribute 'lower'",
"error_type": "AttributeError",
"user": {
"id": "e1c3cd56-ed1f-4291-9dea-fd54f9b379c2",
},
}
ErrorFormatTwo.model_validate(data_format_two)
# ErrorFormatTwo(
# timestamp=datetime.datetime(2025, 9, 8, 15, 16, 3, tzinfo=TzInfo(UTC)),
# error_message="'NoneType' object has no attribute 'lower'",
# error_type='AttributeError',
# user_id=UUID('e1c3cd56-ed1f-4291-9dea-fd54f9b379c2')
# )
data_format_three = {
"timestamp": "2025-09-08T15:16:03Z",
"error_message": "'NoneType' object has no attribute 'lower'",
"error_type": "AttributeError",
"users": [
{
"id": "e1c3cd56-ed1f-4291-9dea-fd54f9b379c2",
}
],
}
ErrorFormatThree.model_validate(data_format_three)
# ErrorFormatThree(
# timestamp=datetime.datetime(2025, 9, 8, 15, 16, 3, tzinfo=TzInfo(UTC)),
# error_message="'NoneType' object has no attribute 'lower'",
# error_type='AttributeError',
# user_id=UUID('e1c3cd56-ed1f-4291-9dea-fd54f9b379c2')
# )
Why use validation_alias rather than alias?
In our previous post, Pydantically perfect: seamlessly handle non-Pythonic naming conventions, we used the alias argument because it'll assign that value both to the validation and serialization aliases. However, the more advanced alias validation features don't conceptually make sense as part of a serialization alias. That's why alias will only accept string values while validation_alias will accept stronger alias types.
With the changes above, all models' user IDs can now be accessed directly with error_model.user_id.
We're not quite there yet though because there are still three different models. We want Pydantic to handle all of that complexity at once with a single model.
Step three: combining the models
The only complexity in combining all three models into one is how user_id is populated. All the other fields are the same.
We can lean into the other main Pydantic alias feature: AliasChoices.
AliasChoices lets us provide a list of potential sources for the field value, and the first one to exist will be the one used in the validation process. The best part? It also accepts AliasPath values, so we can provide one option per format and Pydantic will handle it all:
from datetime import datetime
from uuid import UUID
from pydantic import AliasChoices, AliasPath, BaseModel, Field
class Error(BaseModel):
timestamp: datetime
error_message: str
error_type: str
user_id: UUID = Field(
validation_alias=AliasChoices(
"user_id",
AliasPath("user", "id"),
AliasPath("users", 0, "id"),
)
)
data_format_one = {
"timestamp": "2025-09-08T15:16:03Z",
"error_message": "'NoneType' object has no attribute 'lower'",
"error_type": "AttributeError",
"user_id": "e1c3cd56-ed1f-4291-9dea-fd54f9b379c2",
}
Error.model_validate(data_format_one)
# Error(
# timestamp=datetime.datetime(2025, 9, 8, 15, 16, 3, tzinfo=TzInfo(UTC)),
# error_message="'NoneType' object has no attribute 'lower'",
# error_type='AttributeError',
# user_id=UUID('e1c3cd56-ed1f-4291-9dea-fd54f9b379c2')
# )
data_format_two = {
"timestamp": "2025-09-08T15:16:03Z",
"error_message": "'NoneType' object has no attribute 'lower'",
"error_type": "AttributeError",
"user": {
"id": "e1c3cd56-ed1f-4291-9dea-fd54f9b379c2",
},
}
Error.model_validate(data_format_two)
# Error(
# timestamp=datetime.datetime(2025, 9, 8, 15, 16, 3, tzinfo=TzInfo(UTC)),
# error_message="'NoneType' object has no attribute 'lower'",
# error_type="AttributeError",
# user_id=UUID("e1c3cd56-ed1f-4291-9dea-fd54f9b379c2"),
# )
data_format_three = {
"timestamp": "2025-09-08T15:16:03Z",
"error_message": "'NoneType' object has no attribute 'lower'",
"error_type": "AttributeError",
"users": [
{
"id": "e1c3cd56-ed1f-4291-9dea-fd54f9b379c2",
}
],
}
Error.model_validate(data_format_three)
# Error(
# timestamp=datetime.datetime(2025, 9, 8, 15, 16, 3, tzinfo=TzInfo(UTC)),
# error_message="'NoneType' object has no attribute 'lower'",
# error_type="AttributeError",
# user_id=UUID("e1c3cd56-ed1f-4291-9dea-fd54f9b379c2"),
# )
At this point, we have a single model that can accept and normalize all three different structures we can receive. The Pydantic engine will fully encapsulate this complexity for us and none of our code past this point will have to know about structural discrepancies. Mission accomplished!
Conclusion: what's next for the Pydantically Perfect series?
With this, we've expanded our coverage of alias features to include normalizing inconsistent legacy data. The next post will move away from aliases to take a deeper look into describing models with a rich set of validation rules and constraints strictly with Pydantic features.
Our goal isn't to go through all of Pydantic's features, but rather to provide a curated list of Pydantic features we found helpful when adopting it.
If you're looking for a larger overview or want to know more without waiting for future posts, we encourage you to take a look at the official Pydantic documentation.
Gabriel Côté-Carrier is a senior software consultant at Test Double, and has experience in full–stack development, leading teams and teaching others.
Kyle Adams is a staff software consultant at Test Double who lives for that light bulb moment when a solution falls perfectly in place or an idea takes root.










