Developers

Developers

Developers

Software tooling & tips

IndyPy Talk: Pydantically perfect in every way

In this IndyPy talk, Kyle Adams helps you learn advanced Pydantic techniques to bring order to chaotic, real world data.

|

January 26, 2026

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Bring order to Python data chaos

In the real world, data is messy. Required information is left out, a number is typed with an O instead of a zero, and don't even get me started on date formatting. Python has a reputation as being the language for wrangling info; however, how can we protect our apps from all that data debris?

Pydantic is a library for modeling data, but that simple task belies its power to bring order to the data chaos. If you've ever asked yourself, "what's in this dictionary again," this talk is for you. Validation, serialization, mapping, conversions … we'll cover a variety of ways to wrangle your bits and bytes into data zen

Kyle Adams delivered this talk at the IndyPy November 2025 meetup, hosted by Six Feet Up

More Pydantic for Python content resources

We also have a Pydantic for Python blog series unfolding:

A beginner's guide to Pydantic to Python type safety
Seamlessly handle non-pythonic naming conventions
Normalize legacy data
Declare rich validation rules
Field report in progress: Build shareable domain types
Field report in progress: Add your custom logic
Field report in progress: Apply alternative validation rules
Field report in progress: Validate your app configuration
Field report in progress: Put it all together with a FHIR example

Transcript

0:01

[Music] Hi, I'm Kyle Adams and today tonight

0:09

we're going to talk about Pydantically perfect in every way. This presentation is going to be a deep dive on uh into

0:16

Using Pydantic which is a speedy data modeling and validation library to

0:22

wrangle ordinary data. Now, first I want to give you a little

0:28

bit about me. I uh I'm a staff software consultant at a company called Test Double

0:35

and uh we're a software consultancy based in Columbus, Ohio.

0:40

And we are known for or sorry, we have remote workers even though we're based in Ohio, we have

0:46

remote workers across the US and Canada. Now, we're known primarily for Ruby and

0:53

JavaScript, but we actually work across a wide swath of tech, including Python,

0:59

which is why we're here tonight. So,

1:05

uh, our studios as you sit down at your keyboard and to

1:13

work on your next work and you take a look at it. Doesn't look too bad. You're going to be pulling in

1:20

patient data and it's probably going to look something like this. So, we've got an ID, first name, then

1:28

another record with an ID, the first name. Pretty sensible, straightforward,

1:34

short and sweet. Now, let's open the attached example to see what it actually looks like.

1:43

Well, we've got a problem. We've got Pascal snakes uh as I refer to this uh

1:51

casing scheme. Uh we've got nested data. We've got a

1:56

value that is nested within an object that is nested within a list instead of just being directly

2:04

uh associated with the ID key. And we've got whatever's happening here where

2:10

we've had an ID that's in one place and then in another place it's called MRN.

2:16

So we've got some data cats to herd. Fortunately, we have a cat and his name

2:24

is Pandandy. Uh now some of you may be wondering

2:32

uh and so I want to take a moment uh before we get into the really technical advanced wizardry stuff

2:39

to give a brief overview of what Pydantic is

2:45

and there may be people who are asking what is this Pydantic library? Well, as

2:51

I've mentioned, uh, Pydantic is a Python library for data modeling and

2:57

validation. But let's take a look at a basic example to see that in action.

3:05

So the first thing uh that we have is we have to define our model.

3:11

The first thing we're going to do is define our model. Now I feel like I'm too close, but uh if if there was more

3:18

feedback from the shot, let me know. So we've we've defined a a name, which is a

3:24

string, a breed, which can either be a Labrador or a Chihuahua, and an ID,

3:32

which is an integer. The next thing we do is we load our data. And we're doing that here using

3:40

model validate. and that's going to both parse your data

3:46

and validate it. Now, there are actually two ways to load data into Pydantic model depending on

3:54

the context. The first way is with a function and that's what we've seen already in this

4:00

example because this is a a good way to load your data in if you don't control

4:05

your data. For example, if you're getting it from a third party API.

4:12

Now the second way is with constructing and keyword arcs and you see this a lot

4:19

of times in blog posts about Pydantic or uh documentation.

4:26

This is a really good way to use if you control the data. Uh another good example would be for tesst. Uh once we've

4:34

loaded our data in, then we can use it to do whatever it is that we want to do for either our uh you know, maybe we've

4:42

got a uh brilliant money-making idea, but in this case, it's moving healthcare

4:47

data around. So once we get the data into the model,

4:54

importantly we can also get it back out of the model using model dump and that

4:59

will serialize it to either a Python dictionary uh or a JSON string.

5:07

And the last thing I want to highlight is that we're protected. So if we try to pass in bad data and in this case I try

5:14

to pass in um a a dog whose name is spot has a breed of boxer and an ID of two

5:23

then it's going to throw a validation error

5:28

and I want to go through that validation error a little bit because there's some good info in here.

5:33

So the first thing is it tells me how many validation errors there were. That's really important when you're

5:40

dealing with a complex object with a lot of data in it. Uh our example is pretty

5:46

simple. It's pretty straightforward to see where the error is at. But when you have large objects, that can be a lot

5:53

harder thing to track down. Not just where it is, but how many because you might have multiple validation errors

5:59

too. So the next thing is it shows me

6:07

exactly the the the kind of information that I need to see in order to debug

6:13

where my validation error is at. It tells me what field is problematic. In

6:18

this case, it's breed. Tells me what the input should be. In this case, I'm

6:24

limited to Labrador or Chihuahua. and it tells me what I actually sent in

6:29

which was boxer. So that's my error.

6:35

Now when we were playing around with Pydantic the first time um as we were learning it

6:42

this was kind of a light bulb moment for us because we realized in Python

6:50

type hints give you buildtime type enforcement

6:55

but paid with Pydantic that buildtime type enforcement now becomes runtime

7:03

type enforcement. And that was a like we didn't realize it at the time, but that became a very

7:11

helpful thing when dealing with APIs that would spew a lot of really messy data to have that kind of runtime data

7:18

enforcement uh data typing enforcement in place. So now you all have graduated Pydantic

7:26

101. Don't expect any diplomas in the mail. Uh, so we're going to move on to

7:31

the more advanced wizardry. Back to our problems.

7:38

We're going to take them one at a time. We're going to start with Pascal snakes.

7:45

So, Pydantic's solution to this is something called aliases.

7:51

And Pydantic handle. Let's, sorry, let's take a look at what we

7:58

might do if we didn't know about aliases. So we could just mimic the same casing

8:07

structure in our model. Why would this be bad? Well, we're going

8:14

to see this. This is going to be a re reoccurring theme tonight. It's allowing the complexity from the data to

8:20

seep into our code. And then so that means any code that interacts with our patient is also going to have to use

8:26

these kind of non-Pythonic uh uh naming conventions

8:32

case conventions. Aliases on the other hand

8:38

they let us confine the external complexity to the Pydantic layer.

8:44

Here we're using the Pydantic field to define uh alias the alias as part of the

8:53

um attributes metadata. These aliases are alternative names

8:58

available to Pydantic when validating or serializing.

9:04

The problem with a field-level alias is that in our example we only have two

9:13

fields. It's very easy to do, but when you have 20 different models

9:18

with 10 to 15 fields each, it gets to be a lot of typing. So, there's it'd be

9:25

really nice if there were a way to automate all of this typing. Of course,

9:30

Pydantic has a way to automate all of that. Uh, and it's called alias generators.

9:36

So, our second pass here, our third pass is going to be using an alias generator.

9:44

Every Pydantic model has a model config attribute in it. And this model config

9:50

attribute contains kind of the default configuration if you don't touch it at all. But we can override that uh with

9:57

the config dict object. Uh and we can pass in an alias generator.

10:04

uh and this alias generator will be used to create aliases for every field in the

10:11

model. Now, Pydantic also offers a few out

10:16

ofthe-box alias generators for our convenience uh including the two camel

10:22

function that we see here which transforms the names into camel case.

10:28

However, our actual data isn't in camel case.

10:36

It's in snake Pascal snake case. So, what are we going to do about that?

10:41

Well, it turns out we can also create our own alias generators.

10:46

Alias generators are just functions that take in strings and return other strings. So here we're taking in our

10:55

snake case field name and we're converting it to a Pascal

11:01

snake case uh field alias.

11:07

Oops. Uh so once the custom alias generator is

11:13

ready to go, we can pass it to our uh we can pass it into our alias generator.

11:20

Now, aliases are awesome. We should definitely do more of them,

11:27

but there are some gotchas to know about them.

11:33

So, Pydantic's default behavior with regards to aliases is inconsistent

11:39

across validation, which is when you're reading data in, and serialization, which is when you're dumping the data

11:45

back out again. When we're doing validation, Pydantic

11:50

prefers the field alias. But when you're doing serialization or

11:57

writing out, it wants to use the field name. I'm sorry, I keep I'm trying to

12:04

hold this at just the right place. Um so the first trap happens when

12:11

validating where Pydantic's default is to use the field alias

12:17

and that trap is constructing. Uh remember when we talked about how we could load data into a model with

12:24

keyword instruct or keyword arguments in the constructor.

12:30

Anyone want to venture a guess as to what's going to happen here where we have our two Pascal snake alias

12:38

generator and we're trying to construct a patient using snake case in the

12:43

constructor. If you guess validation error, pat

12:48

yourself on the back and let's take a look at what's going on here.

12:57

So we have

13:03

uh our input is using snake case

13:09

but python python Pydantic is expecting Pascal snake case. Uh so it's saying hey

13:18

first name is required and you didn't give it to me. We gave it to them. We just he used the

13:24

wrong case for it. So how do we fix this?

13:31

We can switch the validation behavior to use field names by default rather than the aliases by setting the um validate

13:40

by name in the model config. The trade-off is that anytime we want to

13:46

to um validate by alias, we have to now explicitly set it using that by alias

13:53

setting. So as you can see

14:00

um we'll need to anytime that we want to do

14:08

a model validate we now need to explicitly set by alias. Uh so there's another way and that's

14:16

that we can say okay we want to validate by name or by alias by saying those both

14:21

to true and the trade-off here

14:26

is that uh this is a great option if you don't mind your validation being a little bit

14:34

less strict. Uh, and what Pydantic does

14:39

is it's checking both the names and the aliases

14:45

to see if they exist and if they are set to a valid value. If they they do and

14:50

they're set to a valid value, it'll use that value.

14:55

Now, bit of foreshadowing here, but I want to remind you that when serializing, Pydantic uses the field

15:02

name. The second trap is when we try to take the data in our model and serialize it

15:09

back to another format like a Python dictionary or a JSON string. Here's what

15:14

that behavior looks like in the code. Now, using a different case scheme when

15:21

dumping out could be a problem if we have downstream systems that still

15:26

expected to be in Pascal snake case. We need to override pyantic's default

15:33

behavior when doing that model dump. We can do that by setting a by alias

15:39

argument. Uh we can do that at the function level or as we've seen uh previously we can

15:48

set it at the model level by setting this serialize by alias to true.

15:57

So now we can safely serialize and get the results that we want with the Pascal snake case in the output

16:04

um and avoid that that gotcha. Now for our next spell, we're going to

16:12

look at nested data. So what do we do when the data that we

16:18

want isn't at the level in the data structure that we would like it at?

16:24

Pydantic solution should look familiar as alias paths are adjacent to aliases.

16:31

As with our Pascal snakes issue, fir first pass might

16:37

uh might be to represent the nested values.

16:42

So here we have gem nested inside of value, nested inside of list, nested inside of first name as nested models.

16:50

So here now we're um nesting our name inside of a value model which gets

16:57

nested inside of a list which gets nested inside first name which gets nested inside of patient.

17:04

Now I will admit there's a certain amount of simplicity to this approach

17:10

but let's look at what happens when we try to access our patient’s first name.

17:16

We have did I lose? Okay, sorry. We have this long value

17:23

uh where we instead of just being patient first name, it's patient first name index zero dov value.

17:31

So alias paths let us dig into our data structure and pluck out just the values

17:36

that we want. Let's take a closer look at how that works. We define a path that navigates from the

17:43

root of the object down to the data that we want. In this case, we need to go down to the first name field, then to

17:50

index zero in our list, and then into the value field.

17:56

Once we have our path, we can pass that path into the alias path constructor.

18:01

Two notes here. For this example, I've defined the path and the alias path separately

18:08

so I could fit them onto one slide. But in real world code, you'd almost always

18:14

just define the path in line. For my second note, some of you may be

18:19

wondering why we're using validation alias here rather than alias.

18:24

Unfortunately, as you get deeper into more advanced alias features, you lose the ability to be bidirectional. What I

18:31

mean by bidirectional is that when you read data in, it does the same things as when you write data out.

18:38

So in this case, we can no longer serialize our first name field out to back out to a nested path. So if I do

18:45

this model dump, it's just going to put it in first name Jim. Consequently, we that's why we're using validation alias

18:51

here. uh is because alias path only works on input and it doesn't apply to

18:58

uh which is validation and it doesn't apply to output which is serialization.

19:04

So back to our updated code, how does it look to access a first name now that

19:12

we're using alias path? Much better. Now it's just the patient first name that we expected.

19:20

All right, we are at our final uh incantation here and this one is going

19:26

to deal with the discrepancy between ID and our first patient and MRN and our

19:33

second patient and I will say MRN here is a medical record number. It's a common identifier in healthcare systems.

19:41

Uh so that's what MRN is and it's time to solve the problem of

19:49

multiple paths to the data that we want. How do we tell Pyantic that that about

19:54

these multiple paths? Well, we're going to use alias choices often in

19:59

conjunction with alias path. We could try to deal with the problem by creating two different models.

20:08

The first model would deal with uh the ID that's deeply nested

20:14

and the second would handle our MRN model.

20:21

The problem here is the same as with our other examples. We're letting the complexity of the data structure seep

20:28

into our code. And so now every bit of code that deals with the patient has to know, does this patient have an MRN? Am

20:34

I checking the MRN for the ID? or does it have a deeply nested ID?

20:40

So instead, let's write a single model and let alias choices abstract away that

20:46

complexity. Here we can see that alias choices lets us specify multiple paths at which we

20:53

might find the user's ID. In this case, either under an MRN attribute or

20:59

following an alias path to a more deeply nested number. And again, I've split alias choices onto

21:07

its own line for brevity's sake. However, you'd likely inline it into that field definition in the real code.

21:16

So, now we've talked about how to address all of our problems.

21:22

We can kick back and relax, right? Well, what does the full solution look like,

21:27

though? So, let's step through this. uh we have all of our imports and we have a few

21:33

utility functions here. We've already talked about to Pascal case. So I just want to point out this gen

21:41

path function. Uh we have a pattern with our alias paths that you may have noticed.

21:48

Uh specifically we navigate a list and then we get the value uh from the first

21:53

item in that list for each of our attributes in our model. and gen path

21:59

helps dry up that pattern a little bit. Uh so looking at the model

22:06

in our model config, we use two Pascal snake uh to deal with uh making sure

22:13

that we can use nice Pythonic attribute names and we use alias path as generated by

22:21

our gen path function to give us access to the deeply nested data. And then

22:27

finally, we use alias choices to normalize across multiple paths access

22:34

to the user's ID. Now we're back to our problematic data

22:40

here. Let's see how it does when we run it through our new Pydantic model.

22:47

Perfect. This is exactly what we want. We've abstracted away the complexity of

22:52

the third party API, the third party data schema, and now our client code

22:58

doesn't have to know anything about that. Just ask for the patient ID, gets the ID, asks for the first name, gets

23:05

the first name. So, we've hit the portion of my talk

23:10

here where I'm going to open up for questions. And while we're talking, I'm going to throw my contact info up there.

23:16

Uh if there are any questions that you guys have that we don't get to talk about, uh feel free to reach out to me

23:21

on LinkedIn or email me. I'd love to set up a Zoom call. Uh this is the the kind

23:27

of stuff that I love to talk about. Uh so any questions

23:34

actually here, I'll give you this mic and you can ask us everyone can hear it.

23:39

Have you created any Pydantic models for owl files? For what kind of files? owl house OL

23:46

they're uh I have not okay they're a transfer mechanism for medical terminology records from one

23:52

system to another yeah that's seemed relevant that's interesting because I have not created Pydantic models for owl

23:58

files but I've created lots of Pydantic models for fire uh for HL7 fire stuff

24:04

and that's another data transfer um schema that's common in the healthcare

24:09

world um and I will say uh modeling fire

24:14

in pretty much anything is a lot of work. Uh, fire is a very big specification if you've ever dug into

24:20

it. All right, more questions.

24:26

Well, you all got more questions. What What have there been downsides to

24:33

the static typing Pydantic? Has it caused you to spend more time someplace or has

24:41

it always been a total upside? Yeah, I I think probably the most difficult thing uh is the learning

24:48

curve. It's the human aspect of it. Um and it honestly it's it's less to do

24:54

with Pydantic and more to do with with uh like runtime enforcement of typing. Uh

24:59

like that's kind of a shift if you've spent a big portion of your career working in a very uh I I don't want to

25:06

say Python's loosely type because it's, it's very complicated. It's strictly typed, but it's dynamically typed. And

25:12

that's a whole other conversation. Uh but yeah, it is most people come into

25:18

using Pydantic with uh a mindset uh that it's a bit of a shift uh and it can be a

25:25

bit jarring to see all these red squiggies show up in your code that weren't there before.

25:32

Other questions for Kyle? I will also say uh sorry to to build on that a little bit more. This talk

25:39

actually comes out of a series of of exercises that we created when we were

25:44

rolling Pydantic out at the healthcare organization that I was working at. Uh so this actually came out of that

25:51

challenge of uh getting people up to speed on Pydantic.

25:58

How many people in the room are using statically or statically typed and and paid?

26:04

Oh, so lots of people still to adopt. have a question.

26:12

My question is like I I use different integration products

26:17

like Mulesoft, Delmov and stuff like that. Like if I have to deal with the

26:23

data which comes from different source systems like you have all kinds of crazy data coming but we want a kind of a

26:30

single schema. Yeah. Like if I have to use this library

26:36

like am I running it in like a lambda like like how would I use this library

26:43

Python library on top of what my integration platform is to kind of validate the data

26:50

maybe you know unify it. Yeah. And put it somewhere else like you know CSV or JSON. Um

26:58

so what we did uh is to say okay we want to keep the

27:05

mess as close to the boundaries of our systems as possible and then this this

27:10

particular uh engagement um it was a collection of microservices that were all talking with each other. So we

27:17

wanted to keep when when messy data came in from all the different platforms that we were integrating with, we wanted to

27:24

clean up that data as soon as possible like right at that boundary. And we used

27:29

Pydantic to define a whole set of models

27:35

uh that then we packaged up and it became uh like a schema that could be

27:40

used across multiple applications. you could just in every application, every microservice that you spun up, you could

27:47

download this this one library and would define all of the essentially domain

27:52

models uh for the whole system. So patients, uh facilities, doctors, all of

28:00

those domain models were defined in this library and they were all defined in Pydantic. Uh and so then that gave us a

28:07

tool for any micros service, any integration I did, it could um clean up

28:13

that data, put it into our internal uh schema, our internal domain models, and

28:19

then our internal systems kind of all worked seamlessly because they all had the same view. They all knew what a

28:26

doctor was. They all knew what an employee was. They all knew what all these domain models were. So does that

28:32

does that answer your question? Yeah. Any other questions or

28:42

so? Python for a bit now for the last couple of years is bringing in uh typing

28:47

into the language and we're seeing it more and more featurerich in that area

28:54

with 13 and 14. Yeah. Now rolling out. Are you

29:00

you think Pydantic's uh significance will devolve now with the typing that's

29:08

coming into Python itself? Yeah. So I would say Pydantic is actually built to take really good advantage of

29:14

all of that new stuff. And that goes back to that those yellow slides where I talked about the the light bulb moment.

29:21

uh and that all of these new things that are being added by Python are all

29:26

build-time things like you have to run my pi or pyite to you know to do static analysis on your code and flag anything

29:33

that might be a problem. Uh if you run it without doing that static analysis in production, Python's not going to

29:38

complain if you store an integer in a uh

29:44

variable with a string uh type on it. Uh and so what Pydantic does is it kind of

29:50

fills in that gap of pi python's um defining all of these buildtime

29:57

uh type hints and features around those. Pydantic is taking those over and enforcing them at runtime.

30:07

There's still no performance improvements because you're using typing in your code like like compiled

30:13

languages like yeah it is there there's no performance improvements. What we did find and and

30:19

there are lots of great talks online about this uh we found it reduced the

30:25

number of bugs uh that we could run into and and one of the analogies I I've seen

30:32

that I love for this um unit testing and and various types of testing if you

30:37

think about your your your bugs it's like a big circle like a a vin diagram uh unit tests punch holes in that uh big

30:45

circle of all possible bugs So it looks like a little bit like Swiss cheese, but when you have uh um types that are

30:54

enforced at runtime, that like slices a whole slice out of that uh realm of that

31:00

circle of all the possible bugs. Uh so that you don't even have to worry about these kinds of problems.

31:06

Yeah. Whole whole classes of issues just go away. Yeah. Yep. Now, there are there are other issues. I don't want to I'm not

31:13

here to be a strong typing uh evangelist. Uh but we did find that in

31:20

our real world use. Excellent. Any other questions for Kyle?

31:26

Can you repeat repeat the question? Yeah, go ahead. Is there a

31:36

Oh, like a competitor. Is there a competitor to Python? Yeah. Is there is there a a open source library that does similar things to

31:42

Python? There is I think it's called Marshmallow and uh so we had initially

31:48

built our our microservices on flask and using specifically using a library for flask called flask restx that gave like

31:56

rest api capabilities to flask. Flask restx is currently rewriting because

32:02

they had a lot of these capabilities baked in. they realized they weren't able to keep up with the rate of

32:09

innovation. They were rewriting to um specifically to support Marshmallow, but

32:14

also with kind of um secondass support for uh Pydantic. Now, I'm not sure if

32:21

they're maybe rethinking those plans because that was way back when Marshmallow and Pydantic were both getting started and Pyantic's kind of

32:27

come out on top. Um, but yeah, that's there are other libraries out there that

32:34

do this. Uh, they're just not they don't have the community that Pydantic has.

32:40

Awesome. There's any other questions? I think we're good. Let's give a big round of applause for Kyle. Oh, uh, one more thing here.

32:48

One more thing here. Uh I I want to leave you guys if if you leave here with only one thing in your head, I want that

32:54

to be paid aliases are useful and I should read through the docs. Uh or we

32:59

can simplify that to aliases are useful. Uh now still read the docs.

33:05

I will also say uh this is not the end. Um I mentioned that this came out of a

33:11

series of exercises we did and we actually turned them into a series of blog posts. So I I tell everyone, you

33:17

know, I don't expect you to remember anything that I said here tonight. Um, but what I do hope you remember is if

33:22

you forget uh that you remember to that our blog is at testdouble.com/insights

33:28

and you can go read this Pydantically perfect blog series to find uh the the stuff I covered tonight is in the first

33:34

three blog posts. We actually have six more that are in the works right now. So there's going to be a lot of information

33:40

on kind of advanced Pydantic details out there. So, when you're dealing with gnarly data, come back to uh the

33:47

Test Double blog and figure out uh if we've got anything in there that might help

33:52

you. Excellent. Super awesome. All right, thank you very much. I give a big round of applause, Kyle.

33:59

[Music]

‍

Kyle Adams is a staff software consultant at Test Double who lives for that light bulb moment when a solution falls perfectly in place or an idea takes root.

‍

Related Insights

Explore our insights

See all insights

Quality you can’t generate: AI is only as good as your constraints

AI changed the cost structure of software. It didn't change the value structure. The value is no longer the code you write. It's the taste, judgment, and constraints you encode into the system that shapes what AI produces.

by

Dave Mosher

Why we're not chasing the AI hype (And what we're doing instead)

We want clients and prospective clients to know they can entrust us to solve problems with AI while remaining true to who we are, how we work, and the value we actually provide.

by

Todd Kaufman

Speed is a side effect of making the system work

When leaders demand speed, teams often cut corners on validation and composition—ironically creating the brittleness that slows everything down. Real speed emerges from flow, small batches, and systems designed for adaptation, not from pushing harder on execution.

by

Doc Norton

Join the conversation

Technology is a means to an end: answers to very human questions. That’s why we created a community for developers and product managers.

Explore the community

Test Double Executive Leadership Team

Learn about our team

Like what we have to say about building great software and great teams?