Natural Prodcast Episode 24 - Jackie Winter and SMC - DOE Joint Genome Institute (original) (raw)

Jaclyn Winter, University of Utah

Jaclyn Winter, University of Utah

On this bonus episode of Natural Prodcast, it’s the Self Promotion Episode!

Dan chats with new-ish co host Jackie Winter, from the University of Utah, about her secondary metabolism research on novel microbes and bioactive compounds in the Great Salt Lake.

Then, Dan talks (maybe way too much) about the Secondary Metabolism Collaboratory, or SMC, JGI’s new data portal for natural product biosynthetic gene clusters.

Logo for the Secondary Metabolism Collaboratory (SMC)

Transcript

DAN UDWARY: Hey, everyone! Welcome to the “Self Promotion Episode” of Natural Prodcast. A little bit of a bonus episode, since I’m off schedule, but this is an important one that wanted to get out there as soon as possible. Not one of our normal interviews, but instead it’s two interviews in one. My new co-host, Jackie Winter, has been on a few episodes now, and I figured we were way overdue to give her a proper introduction and a chance to talk about all the cool work she’s doing over at the University of Utah. So, you’ll be hearing about microbiology and natural products from the Great Salt Lake, and you’re going to learn, like me, what a “haboob” is. Then, we switch interview chairs and I talk about SMC – which is the new Secondary Metabolism Collaboratory data portal I’ve been working on with my colleagues Drew Doering and Bryce Foster and too many others to name at JGI for the last two years or so. SMC is now released and ready for the public to tear it up. I’m super excited about this thing, and I hope you will be too. If this episode went a little long, it’s definitely because we like talking about our science. So, sorry, but not really sorry. I haven’t mentioned them enough lately, but as always I have transcripts and show notes at naturalprodcast.com — I went a little overboard with links on this, since I’m hoping this episode and the transcripts will be one thing I can point people to when it comes to getting out information about both SMC, and Jackie, so please do hop over there and check them out. Thanks for listening!

DAN UDWARY: Ok, we’re recording now.

JACLYN WINTER: Ooh, yay.

DAN UDWARY: So I get all the podcast stats, right? And so I see all the download numbers. And so one of the things I’ve noticed is that whenever it’s me or whenever we’re talking about promoting something, like any of the primer episodes where we’re trying to explain things to people, those get terrible download numbers. [Laughs]

I mean, not terrible. They’re like a third less than the normal amount that we get, which is already a fairly small number. We’re a very niche podcast. And so maybe if we can combine two things where we interview you and we also talk about my stuff so that we have a self-promotion episode. Then maybe people will actually listen to this.

JACLYN WINTER: I think that’d be fun. Yeah. Let’s do it. What if we just start like swearing, bring out our New Yorkers in us.

DAN UDWARY: That’s OK. That’s fine.

JACLYN WINTER: Dropping F bombs. We should do some kind of special sound effect like a kazoo.

DAN UDWARY: I do have some beeps and stuff that I can easily drop in. Not a problem. But yeah, I don’t know. Maybe we start talking about you, because you’ve been on a couple of the podcasts now that we’ve put out. And I gave you a little introduction during my intro part of the first one that you were on. But that’s not nearly enough. And so I thought it would be good to spend maybe a couple of minutes just talking to you and introducing you more properly to the audience, I guess.

JACLYN WINTER: We’ll see what happens.

DAN UDWARY: OK. One of the– I don’t know. Maybe we ask you the boilerplate questions that I try to tend to ask to most people. So one of those is like, what’s your origin story in natural products? Why are you doing this?

JACLYN WINTER: I’ll kind of go back to where it all started. I think for me, natural products– I became interested in natural products as an undergrad. So for me, it was studying sort of the interface of chemistry and biology because I was getting dual bachelor’s in both areas.

And I took a class in natural products. That was my first exposure, my junior year as an undergrad. And to me, looking at the chemistry of the molecules and then looking at how they were made was really interesting and intriguing to me. And I could combine my passions in different areas to study how these molecules made, how do you test the bioactivity, how do you tinker with the machinery.

And so it really is a blend of a lot of different disciplines. And so then going to grad school and learning about chemical biology, which I don’t think it was called that back then. I think we’re still trying to– it had many names and iterations.

But for me, it was a combination of different passions and not having to really like pick or choose one different discipline and then also having an output where you could make a difference with human health and increasing our longevity and agriculture. And there’s so many applications. So it was really the draw for that as well.

DAN UDWARY: Before then, were you always a science-y kid?

JACLYN WINTER: I was not actually. So in high school, I didn’t– well, I guess– I mean, a few people I guess know what they want to do in high school, but I was taking AP classes in English and history so that didn’t– I wasn’t interested in science as much, and my parents weren’t scientists either.

So it was sort of just taking classes and getting exposure from different faculty members. And I was kind of saying like, hey, actually, I actually like this, and it’s more interesting to me than other areas.

DAN UDWARY: Uh-huh. Sure. Yeah. Yeah, yeah.

Maybe if you could talk about what your– I don’t know– research background pedigree is. I hate to like sort of like pin people to people, but you’ve worked with some cool people in the past. So you want to tell the audience about those?

JACLYN WINTER: Sure. I’ve had some great mentors along the journey. And so I started my PhD in Brad Moore’s group and joined his lab when he was at University of Arizona. And I think a month later, we packed up and moved to SIO, UCSD.

So I was with Brad during my PhD, and then I did a short postdoc with Christian Hertweck in Jena in Germany and kind of expanded learning about natural products. So in Brad’s lab, I worked on identifying and characterizing haloperoxidases from bacteria.

And with Christian, I started getting a little bit more into fungi and learning the differences of all the genetic differences and then expanded on on that knowledge by doing a second postdoc with Yi Tang at UCLA and then kind of really gained more of the bioengineering aspect with fungi, kind of going along the track of always working on natural products but just different tools and how to really access and manipulate these pathways to get new compounds.

And lots of– had some really amazing mentors.

DAN UDWARY: For sure. And for those that don’t know, I met you when I was a postdoc in Brad’s lab. We made that move from Arizona to San Diego that– not in the same car, but at the same time.

JACLYN WINTER: Yeah. I had the incubator in my car. No–

DAN UDWARY: Oh, that’s right. Yeah. We all moved a lot of weird stuff.

JACLYN WINTER: Crossing the border like “nothing to see here”. We were the first Breaking Bad, I guess. But no, you trained me in how to genome mine, so thank you for that. And you had to put up with me as a young grad student. Oh, my gosh, I’m sorry.

DAN UDWARY: Oh, no, no.

JACLYN WINTER: So sorry. [CHUCKLES]

DAN UDWARY: Nothing to be sorry about. Yeah, no, your project was really cool. The vanadium haloperoxidases are some really cool enzymes, and it was fun to see that project take shape. Yeah.

JACLYN WINTER: Yeah, that was a golden age right there when we were all in the lab together and the ring tank party. But that’s when a bunch of PIs moved to SIO, and it was really a remarkable time of bringing together quite a few groups. And I mean, that was such an amazing atmosphere.

DAN UDWARY: Yeah, yeah. Historically cool time. So Bill Fenical was [and is] still there. And Bill Gerwick moved around the same time and Pieter Dorrestein. Who else?

JACLYN WINTER: And Ted Molinski.

DAN UDWARY: Ted Molinski. Yep.

JACLYN WINTER: Yeah. And Paul had started his independent lab. And so there would–

DAN UDWARY: Like Paul Jensen. Mm-hmm.

JACLYN WINTER: And Mike Burkart. Yeah, there were so many people there. It was fun.

DAN UDWARY: Yeah, it was a great time to be there. It was really, really fun, energetic atmosphere, a lot of new people learning new things from each other. It was the best that I think natural products can be at times like– So it’s a very interdisciplinary field, and a lot of people have a lot of different sort of tools and techniques that they bring to it. And we try to learn as much as we can to apply those to our own little individual problems and things. And so there’s a lot of good crossover. I guess there still is. Like, it’s still a great place from what I understand, but yeah, it was a fun, exciting time to be there, for sure.

JACLYN WINTER: Yes, yes.

DAN UDWARY: So I guess in your words, what are you working on now?

JACLYN WINTER: Oh, man.

DAN UDWARY: What’s different now that you are a professor at University of Utah?

JACLYN WINTER: It’s scary being a professor.

DAN UDWARY: Yeah. I know that.

JACLYN WINTER: I’m scared to be in charge.

DAN UDWARY: [CHUCKLES] Yeah.

JACLYN WINTER: But it is– we have a lot of projects going on. And one that sort of happened serendipitously that I never anticipated starting– that’s not why I came to the University of Utah– is looking at the natural product potential of microorganisms from Great Salt Lake, which is a hypersaline terminal lake about 30 minutes away from campus. And we’re finding some really, really interesting compounds, and we’re also identifying some new microorganisms that may be of interest to you and folks at JGI.

These organisms we’re finding– at least by looking at their full genomes — are really not like anything that’s been identified to date and characterized to date. So we’re one, looking at how unique they are, how they survive in this hypersaline environment, which can range from 8% to 28% salinity. We have toxic metals, so these concentrations of arsenic and lead and mercury are pretty high as well. And these organisms have no problem living there.

So we’re looking at how they adapt to these extreme environments. And then can we use that, for example, like bioremediation applications? So we have some pathways and some molecules that we’re really pushing in that endeavor, which is pretty cool.

DAN UDWARY: Yeah. So I guess what made you want to be a professor and do this stuff as opposed to– I think like a perfectly viable career track in natural products, and especially if you’re in the drug discovery angle, is to look in industry. But you’re a professor. Why’d you do that?

JACLYN WINTER: So I actually– well, I mean, not having family in science, it was hard to see what are the different career paths. And luckily, when I was an undergraduate at SUNY Fredonia, I had the opportunity to intern at Merck Pharmaceutical. And for me, that was– I mean, it was a great opportunity, but it also reinforced that I probably am not cut out for industry. It wasn’t as appealing for me. I mean, it was a great experience. Don’t get me wrong.

But I also was kind of wavering between going into academia and industry. And I always liked being a TA. I liked teaching. I like mentoring. And it really then catalyzed that, OK, this is the track. For me, its going into academia.

And then being in Brad’s lab, just having opportunities to mentor students, teaching classes, and really setting the stage for what I really wanted to do and getting to– so I think it’s getting exposure– was that I found that I really enjoyed mentoring. I enjoyed teaching and just helping the next generation of researchers in natural products and in other areas. So for me, it was– I like interacting with new people and seeing that light bulb go on when all of a sudden like, you know, that moment when they’re going, oh, my gosh, I actually understand this. And that’s pretty remarkable.

DAN UDWARY: Yeah, cool, cool.

JACLYN WINTER: I think being a professor is you get those a lot. And you need those a lot. You need those moments to get you through the day. So for me, it was if a door opens, take an opportunity to explore it, and you’ll find something about yourself. And maybe you’ll see you like or dislike something.

DAN UDWARY: But besides the Salt Lake stuff, are there other projects you’re really keen on or stuff you want to try to get into or maybe haven’t had the chance to yet? I mean, you’ve been there a few years now, but it always feels like we’re just getting started in a lot of ways.

JACLYN WINTER: You always have new projects. You always have new ideas that you want to pursue. There’s just not enough time and money. I mean, we have a huge marine fungal project in collaboration with Bill Fenical at Scripps. So we’re looking at biosynthesis of marine fungal metabolites.

We’re also looking– now we’re started a newer project with Great Salt Lake– again, kind of going back to that– is toxicity of these heavy metals when they do become airborne and what happens when we breathe these in, especially individuals. We live in Salt Lake Valley, too, so we’re exposed to these metals and these toxic compounds. So we’re now trying to see the implications of those effects and how that compares to Owens Lake in California and what happened there and just trying to see what we can do.

DAN UDWARY: Is that human health effects or environmental or both?

JACLYN WINTER: Human health effects. Oh, and both. I mean, we’re also looking at environmental effects as well but mostly human effects.

DAN UDWARY: OK, yeah. And what? Because the lake is sort of consistently drying out, or it’s periodic at least? You get a lot of wind blowing things around that are salts that dry out? Is that the idea?

JACLYN WINTER: That is correct, yeah. So they’re actually these– it’s called dust hotspots on the lake. So the lake is [INAUDIBLE] with climate change. I mean, right now– like last year, we had a great winter, so a lot of snowpack when it melted– or a lot of water when the snowpack melted went into the lake.

And so the levels were decreasing at an alarming rate, but it has risen a little bit but still not to where it needs to be. But because you have all these– it’s a terminal lake, so anything that goes into the lake with the snowpack melt is retained and concentrated. And we have a lot of the mining going on. And so there’s certain areas of the lake that have really toxic concentrations of certain metals and whatnot.

And when the lake desiccates, they become– the wind blows through, and we get some gnarly windstorms coming through. And so then they become airborne. And then certain areas of the lake, depending on where you are, you will have these dust hotspots. And it almost looks like these like the dust haboob sometimes coming off the lake. And you just– we run screaming.

DAN UDWARY: The what?

JACLYN WINTER: Those dust haboobs that come in. You see them in aerosols.

DAN UDWARY: Haboobs?

JACLYN WINTER: Yeah, the haboobs.

DAN UDWARY: I’ve never heard that word. I don’t know what that is. [CHUCKLES]

JACLYN WINTER: We need to add that in, a dust haboob.

DAN UDWARY: Apparently. [CHUCKLES]

JACLYN WINTER: It’s sort of like this cloud of dust that you can’t see through. And we know what’s in that. And you have these different types of particulate like 2.5, 10 PPMs and just the size of the particle and how far they can actually travel. And that’s now starting to get into is like, what’s actually becoming airborne?

And not only metals, but microorganisms as well can go along for the ride. So now we’re trying to see the compounding effect, especially if people are immunocompromised. Like, what happens when they’re breathing this stuff in?

DAN UDWARY: Yeah. Yeah, yeah. OK. I don’t actually know. How close is Salt Lake City to the Great Salt Lake?

JACLYN WINTER: I can actually see it. I’m in my office right now. And so if we go to the south arm, it’s about 30-minute drive. So if you fly into Salt Lake City, the airport’s right on Great Salt Lake.

DAN UDWARY: I assume it’s close, but I didn’t know how close, close. I haven’t been there.

JACLYN WINTER: Yeah, I don’t know mileage-wise. I should know that. But it’s very close to the university.

DAN UDWARY: Yeah, that sounds pretty important then.

JACLYN WINTER: It’s there, and, unfortunately, it affects us all. And a lot of times, though, it’s unfortunate people are– it’s one of those like, oh, it’s not my backyard, so I don’t care-type locality. But in reality, I mean, these things become airborne, and they’re going to go with the trade winds. They’re going to get moved around the US.

I mean, it’s– I mean, we see what happens with the wildfires, right? I mean, you’re in Northern California, and it doesn’t matter where they are. You still get the air. Air is not really segregated. It moves.

DAN UDWARY: Yeah. Where do you see your work going, say, in the long term, like a decade from now or whatever?

JACLYN WINTER: Oh, wow. I would really– I mean, we have a unique environment in our backyard. And what would be really fun to start to look at and to compare is sort of adaptability, what we’re looking more for like microorganism adaptability to these extreme environments.

And Great Salt Lake has existed for thousands of years, but we’d like to start comparing evolution in real time to maybe like hypersaline environments that are formed from like hurricanes– like in Puerto Rico, you get these hypersaline lakes– and start comparing and contrasting systems and then also comparing to the marine environment, looking at more like holistic or big picture view, if we’re still around in a decade. We have the toxic dusts.

DAN UDWARY: Yeah, just that change over time to the system, huh? Yeah.

JACLYN WINTER: It would be– it’s fun or to see in a decade if we can find a new antibiotic agent from a Great Salt Lake microorganism because we do have a lot of wastewater treatment or wastewater going into Salt Lake. And so a lot of organisms coming from the hospital setting and agricultural areas are going into the lake, and these organisms actually have quite a few antibiotic-resistant genes in there. And so we’re trying to use those as screening tools.

And then also, if we do have something that’s active against them, maybe this has a new scaffold and target, new mechanism of action. So what would be great in a decade is to actually have something that is promising, a promising agent that we could start pushing through the pipe.

DAN UDWARY: Yeah. Very cool. OK. Yeah, I would imagine– what? There must be a lot of competition as the lake gets smaller. These organisms that live in the water column are probably increasingly going to compete with one another, right? And so there must be some sort of chemical warfare happening there.

JACLYN WINTER: Oh, absolutely. And these organisms are pretty gnarly. We have an E. coli strain that we’ve been working on that we sequenced the genome. And it has 17 resistant genes on a plasmid that’s actually– it’s pretty scary what this thing is resistant against.

But it is. It’s sort of– you can see evolution in real time is how these organisms are adapting to increasing salinity and what other organisms are there. And so the chemistry has got to be new, or at least we hope. This is what we’re using to get funding, but.

DAN UDWARY: Well, you’ve seen some new chemistry already, right?

JACLYN WINTER: Mm-hmm. We have. Yeah. It reminds me almost working with Bill back in the ’50s and ’60s of the golden age of antibiotic discoveries. Everything that we’re finding seems to be a new structure, and the microorganisms are novel. I mean, we have Streptomyces, but they’re new species. And they have very similar [genomes] to other Streptomyces, for example. So there’s just– we need more people, honestly, to really explore this, which is exciting, because there’s just–it’s uncharted. It’s underexplored. And there’s just so much potential there. So it gets you going in the day. It’s kind of like, what are you going to find today?

DAN UDWARY: Very cool. All right. Good, good.

JACLYN WINTER: Well, we just had that– I think we’re having one of those thunder snowstorms coming.

DAN UDWARY: Oh, really?

JACLYN WINTER: Now we’re kind of having a storm haboob coming through. So if I lose you, it’s because we lost electricity. That was just– sorry, there was a huge thunder that just went off.

DAN UDWARY: Yeah, that’s OK. You get thunder snowstorms?

JACLYN WINTER: Yeah, thunder snow.

DAN UDWARY: [CHUCKLES] Oh, my God. I’ve never seen that.

JACLYN WINTER: It’s never a good thing.

DAN UDWARY: [CHUCKLES]

JACLYN WINTER: It doesn’t end well for anyone. I’ve had to walk home a couple of times last time when we’ve had those because it just dumps snow so fast and it’s icy. And so I’m just like out getting in my car.

DAN UDWARY: Can the lightning travel down when it’s snowing, or does it just stay up in the clouds, it just booms?

JACLYN WINTER: I usually don’t see the– we don’t see the lightning. It’s just hear the thunder. And so the lightning is somewhere, but I don’t know if it’s cloud to cloud. And we just don’t see it.

DAN UDWARY: OK. [CHUCKLES] Wow, all right.

JACLYN WINTER: Yeah. I mean, you grew up in New York, and so you get the lake-effect snow. We get lake-effect snow from Great Salt Lake too. So that moves across, and I’m looking at my window because if it starts snowing, I’m going to start– I will drop some f-bombs.

DAN UDWARY: OK. So then I don’t know how to transition. [CHUCKLES]

JACLYN WINTER: Do you want me to ask you about SMC and–

DAN UDWARY: Yeah, I guess we reverse it, right? You can interview me, and otherwise, I’m just going to babble. [CHUCKLES]

JACLYN WINTER: Yeah, I guess– how do you want me to introduce that? Because you have a primer for you. Or do you want me to ask anything in particular?

DAN UDWARY: No, you don’t have to introduce me. That’s fine. I don’t really care about that. The thing I wanted to be able to put out to people is that well, by the time this comes out, SMC will be in its first release version. And so yeah, that should be around the end of the month or slightly before, just depending, and on a few sort of dot the i’s, cross the t’s kinds of things. So yeah, by the time this comes out, then SMC will be like out of beta and will be a real thing. [CHUCKLES]

JACLYN WINTER: Right. That’s super, super cool. OK. OK. Let me I’m trying to think how to–

DAN UDWARY: Yeah, I’m trying not to step on you. It doesn’t matter. You can– yeah. We can pull back the curtain, and you can just ask some questions because this is all very artificial. Like, some of it, you already know, but we’re really talking to other people, right?

JACLYN WINTER: Yeah. OK. So let’s see. So with your background in natural products, this is just pretty amazing with spearheading this endeavor at JGI. So SMC is a new platform coming out. And I’m wondering if you can just kind of share some information of– we’ll just call it your baby in being released and how this can help the community.

DAN UDWARY: Yeah. So OK. So SMC is the Secondary Metabolism Collaboratory. We all know what secondary metabolism is, hopefully if we’re listening to this. But the word “collaboratory” is sort of a made-up word that — I think collaborative laboratory is the idea here.

And so maybe I should back up and sort of give a little bit of a historical context. JGI, for many years, has been doing work in secondary metabolism. The very first CSP I ever worked on back in Brad’s lab, we talked about it in his podcast was the Salinispora genome project. And that was a JGI CSP project.

They sequenced that genome for us at a time when they would do one genome at a time. And the whole purpose of that was that it was one of the first genomes that had been done specifically in order to explore its secondary metabolism, and the BGCs that we knew that it had a few of. And we never expected it to have as many as it did and how that became sort of a trend in genomics.

So anyway, yeah, JGI has been working in secondary metabolism for a while now, a long time. But when our new director, Nigel, Nigel Mouncey (who also has a podcast episode if you want to go back and listen to it), when he came in, he created the Secondary Metabolites Science Program. And I am part of that under him.

We started thinking about what we needed to do in secondary metabolism in order to help our users. So for those who are familiar with JGI’s IMG data portal, the Integrated Microbial Genomics, there was a subsection of that called ABC, the Atlas of Biosynthetic Clusters. And at the time, it was a really good resource for secondary metabolism BGCs at a time when it was kind of difficult to always find them. antiSMASH was around, and they used that on all of IMG’s genomes. But IMG is microbial genomes — mostly bacteria and archaea.

And we knew that secondary metabolism is everywhere, right? And so we also sequence a lot of fungi, and we sequence plants. And so we started thinking about if we had a different approach to this, how it should go. And so that’s where the Secondary Metabolism Collaboratory, SMC, came from.

So we wanted this to be a more comprehensive data portal for BGCs, biosynthetic gene clusters, that came from everywhere and not just from bacteria. So the release version of it will have all of the bacteria. [CHUCKLES] And over the next few weeks or months, we’ll keep chugging on putting in the rest of the archaea and fungal genomes and then work on metagenomes.

So yeah. So we’ve taken every public sequence that we can possibly find as far as we know for the most part and put them through a couple of different tools, antiSMASH and a machine learning tool called emeraldBGC along with some other domain analyses like NCBI CD-Search, and InterProScan. And all of those have ended up in the SMC database. There’s a web front end for it, and it’s just a massive pile of data. When I looked this morning, we were up to 1.1 million genomes.

JACLYN WINTER: Wow.

DAN UDWARY: And I think we’ve got another 200,000 or 300,000 bacteria to go in before we go to the 1.0 release. And then we’ll have however many– I should know the numbers, but I don’t know the numbers of archaea and fungi. But we’ve already run them through, and they just have to sort of go into the pipeline for addition.

So yeah. So it’s a big data repository of all the BGCs that we can put together. What makes it a little bit different beyond that– and sorry, I’m just going to keep babbling unless you would interrupt me.

JACLYN WINTER: No, no, I actually was going to ask. So this is a repository of BGCs from public databases. And so what if a researcher– like, my lab, we have a lot of our own genomes that are not publicly available because sometimes we want to protect the clusters we’re working on and not get scooped.

And so what if a researcher did want to deposit their genomes or particular clusters to SMC? Is there a way to do that?

DAN UDWARY: Well, so there’s good news and bad news on that. So one of the things that we’re trying to do with SMC– and it’s a little bit different than what we often do at the JGI– is that SMC is what’s referred to as a FAIR repository, F-A-I-R is: findable, accessible, interoperable, and reusable. So all of it–

JACLYN WINTER: Today’s the acronyms and abbreviations.

DAN UDWARY: I know. Well, it’s JGI, right? So we’re going to have acronyms.

So the idea– you can go look up what all of that entails, but we’ve tried to adhere very strictly to those principles. And part of the findable part is that we don’t have secret data. So all of the data that will be in SMC comes from public resources. You can submit your own genomes or whatever you’d like, whatever D– you can submit any DNA sequence really, and it will go through SMC’s pipeline.

But then it will become public. Like, by submitting it, you are checking a box and agreeing to making that data public. So yeah. So if you don’t want your data to be out there for whatever reasons– and there are totally valid reasons for that– then SMC is not a great place to send your data.

JACLYN WINTER: [CHUCKLES] But just if researchers did have a variety of genomes that they’re going when we have so many people in our lab. You could ha– let’s work together with other groups as long as they’re– people are recognized for their contributions just to move this research forward.

DAN UDWARY: One of the things in this and the motivation for this and something that I really believe in is the ideas of open science. And I think– you and I came up at a time when there was a lot more– I don’t know– secrecy around the science and what you’re working on. And for good reasons, especially in the early days.

Drug discovery was a potentially lucrative business. And so there’s always a potential of stumbling across $1 billion molecule. I think in the last– I don’t know– two decades or so, people have realized that– especially in academic circles, none of us are getting rich off of the things that we discover. That– it’s a very long process in order to go from the first stage of discovery to a molecule that’s actually helping people and being beneficial in terms of medicine.

And that’s something we should strive for and we all should try to do because microorganisms are probably going to kill us all unless we find enough antibiotics, right?

JACLYN WINTER: Absolutely.

DAN UDWARY: So we really do need these molecules. And I think in my opinion– and people are definitely free to disagree with me. But as the government scientist that I am, and a former academic– and I’ve also worked in industry, and so I’ve seen the other side of secrecy. I think what helps the humanity, I guess, the most, what helps humankind is to be able to have access to the information and the things that could potentially help people.

If we all mistakenly believe that we’re going to get very wealthy off of antibiotics, then you really haven’t been paying attention.

And so we’ve tried to build SMC to be a transparent and open system. And so what that means–and I’m fully prepared for people to hate this. It’s like the one, like, potentially fatal flaw that I really want to get feedback from the community on, is that when you work on something in the Collaboratory, your posts are all public. Things that you do to BGCs– you can add your own annotations and do things to the data — That all becomes public, and everyone can see your name attached to those changes, those posts, whatever you’ve attached to the information. That creates accountability systems, and it also creates just an open atmosphere that we’re all sharing here. We’re all working together on these larger problems.

And so what I’m hoping we can do is kind of use the data to lift all the boats because anytime somebody is working on something or making some posts or whatever that becomes public and everybody sees that people are working on that, that lets you sort of plant a little bit of a flag, maybe, but also gets other people’s attention and brings more people into a problem or whatever– or helps people see your solutions, too, and gets your name out there in terms of being someone who’s working in this field.

So I think it’s a good thing. I think people get a little bit nervous about seeing their name and their– I don’t know– reputation, I guess, attached to science, but I think it’s totally cool if we all make mistakes and sort of work on things together. And working together is the only way that we solve some of these really difficult problems in secondary metabolism.

JACLYN WINTER: I agree. And I think a lot of times, too, when we even look at publications, it’s always positive data, right? No one ever publishes–

DAN UDWARY: Absolutely.

JACLYN WINTER: But the negative data which you could have found out– hey, I tried this method, or I tried this knock-on. I tried these promoters to express this natural product cluster. It just didn’t work. And you could save someone time and resources, and that could be a student’s PhD.

DAN UDWARY: And thousands of dollars.

JACLYN WINTER: Like you said, we’re all working together to address a problem. And what we figure out in the lab, whether it’s positive or negative, can help someone else. And that’s what I think is really important.

That’s great that SMC is really– one of the goals of that is to make science more transparent and that we are all in this together. And to try to get a drug through is you need a huge– you need the village, right? And–

DAN UDWARY: Yeah, yeah. And when we were thinking about what a BGC database really should be and what it would have to be is– anybody who’s worked in our field knows that getting an annotation for a BGC, it’s not a consensus, right? Every tool will tell you something different about the genes or homology to something.

There’s a lot– every BGC is a really complicated unique snowflake, and it’s a system. And trying to understand that system is just not possible from– completely understanding the system is not possible from the tools that we have. antiSMASH is a great tool. It’s the best thing that we have. And its rule-based analysis is second to none. And it’s the most valuable thing I think our community has. But it’s not the end of the story.

There’s always some kind of weird thing that happens in every BGC. And one nucleotide difference can throw everything off in terms of whatever is happening in a biosynthetic pathway. And we’ve all seen those examples, hopefully if we’re in that field. So yeah. So at the end of the day, we can create a database of all these things, but we thought it was really important to have all of the information that we could figure out about whatever is in the genes in the BGC or the putative BGCs. We don’t really know the boundaries most of the time.

So being able to put all that information in one place. It’s a lot of computation. It’s taken us a long time to actually generate this, and I’m not quite sure how we sustainably keep going in the future. And we’ll have a lot of work to do to actually figure that out. But yeah, getting all the data in one place has been like the first thing.

But thinking about what that data means is definitely not the end of the story. And that’s where other people come in.

JACLYN WINTER: So I guess because you have– these are all biosynthetic gene clusters. You’re pulling from genomes. And so kind of comparing it to other repositories out there like MIBiG, for example, those just characterized BGCs that have been affiliated or– there’s been some kind of experimental analysis that can correlate those clusters to a certain class or set of molecules. And so you have that information, but then you also have uncharacterized BGCs from–

DAN UDWARY: Yeah, yeah.

JACLYN WINTER: –a whole slew of organisms. This is a huge repository.

DAN UDWARY: It’s pretty big. Yeah. [CHUCKLES] So we figured in terms of also positioning this. And where we thought this would fit into the community is– on the one side, you’ve got– and if you see me talk, I have a figure on this in my generic talk that I give on this. But on the one side, you’ve got the NCBI and IMG and all of the other great sequence repositories. None of those things really spend the extra compute time that you would need to do in order to annotate the secondary metabolism, right? None of them run antiSMASH or any other tool really to annotate them. So those are just sitting there as the big sequence repositories.

And on the other side, you’ve got a very small repository which is MIBiG. It’s about 3,000 now, I think, maybe. I can’t remember.

JACLYN WINTER: I think a little bit more maybe, but nah, it’s around that. Yeah, a couple thou–

DAN UDWARY: Right around 3,000, something like that, experimentally characterized, well understood. For the most part, you really know what each gene in a pathway is probably doing, or you have a good sense of it because people have investigated those. And so we thought SMC sort of fits somewhere in the middle.

antismash-db is another great database. They have a new version coming out pretty soon, I understand. And they’ve taken all of the really good high-quality genomes and run antiSMASH on those and put them into a repository. And so that’s a really valuable resource, but it’s only one tool, right?

And also, it’s fairly static. It’s the output of what antiSMASH gives you, right? And so there isn’t– hopefully, people will take sort of the stuff from our stuff and antismash-db and sort of be able to start working on them and translating them over to the MIBiG side of things. But yeah, putting all the things together.

We thought we wanted to be comprehensive. We wanted to get– even the fragmented low-quality genomes, like even a BGC fragment can be useful in some way if you’re doing comparative analysis. We want to make sure we had everything to the extent that we can. Yeah.

JACLYN WINTER: Wow. So I guess with this huge repository, like where– and I mean, with the ability to sequence genomes just so easily now, I mean, how big is this going to get?

DAN UDWARY: Good question. I mean, I can– if you go to the website, which is smc.jgi.doe.gov, the stats page is showing me that right now we have just over a million BGC– or sorry, just over a million DNA sources. And so a “source” is some source of DNA. It could be a genome — Everything in there is genome right now. But eventually, it will be– it could be a viral sequence. It could be a contig. It could be whatever somebody has submitted to us. It could be a metagenome.

We’re going to run metagenomes. That’s going to take a while. [CHUCKLES] But yeah. I know there’s a big upload happening. And so we should have somewhere around 10, 11, 12 million BGCs in the database on the release, and that’s the bacterial stuff.

Could be more. [CHUCKLES] I’m not quite sure. So yeah. So it’s a lot. It’s a lot.

Like, 10, 12, 15 million BGCs is definitely kind of the baseline of where we’ll be at. I think we’ll probably double that when we get to metagenomes, I would imagine, depending on how strict they get in terms of contig length and all those kinds of things because metagenomes can be pretty noisy and I know that there’s a lot of fragmented — identifiable BGCs but highly fragmented, in most of the metagenome sequences that we’ve already analyzed.

But that’s a whole other issue that we’ll tackle. So yeah, I don’t know. I think over time, it’s just going to continue to grow.

If you hear Nigel talk, he is very interested in sequencing a lot more natural product-heavy culture collections, especially older culture collections and things that have sat kind of dormant. So yeah. We’re just going to continue to grow, and hopefully, the system that we have built will continue to scale. So far it’s– so far, so good. But we’ll add more hardware and more software as we need to go.

JACLYN WINTER: You’re going to single-handedly cause like massive brownouts in California when you’re doing the computational analysis of metagenomes.

DAN UDWARY: Well, the nice thing is that so far, all of the computation has been done on JGI’s new supercomputer called Dori. I think it’s a moderately sized machine. It’s not super huge, but it’s allowed us to chug through an awful lot of stuff in a pretty quick fashion. So yeah, we’ve been basically running ever since antiSMASH 7.0 was released.

JACLYN WINTER: Oh. So I guess when will SMC be available then for the whole community to start using?

DAN UDWARY: Well, it’s available right now, Jackie. So beta version 2 has been going on for a couple of months. Everyone is welcome to go and use the site. The data that’s in there will not be changing, so part of the accessible and reusable portion of FAIR is that we have static addresses for all of the BGC numbers and ID numbers of the sources and everything.

So none of that is going to change. It’s only just– we’re just building onto it. So the site is usable now. The website is fully accessible, and most of the features that we expect are in there. The only thing missing are the BLAST databases, which will generate as soon as we have the new sets of data completely populated.

And that database will also be available for download if you want to run your own SQL and/or, I guess, it’s Postgres analysis. And we also have a pretty full set of APIs. So if you want to access the data remotely or programmatically, you’re more than welcome to do that. We think a lot of people will want to use that for– Will want to use this data in order to do comparative analysis. In order to do that, you got to like, say, download all the things that you want to download or get the sequences or genes or whatever it is that you want do your work on. And so that will be an option as well.

And by the time we release, we should have some good documentation and tutorials that– the API documentation is all there, but yeah, all that stuff is just sort of being finished up at the moment. And by the time you hear this, hopefully, it’s all available. [CHUCKLES]

JACLYN WINTER: That’s amazing. It’s just amazing, though, to think about a couple decades ago– it feels like yesterday, but the first genome, like you said, Salinispora, was coming out. S[treptomyces] coelicolor had just come out and where we are now, where we’re really realizing the potential, the natural product potential of these organisms and how easy it is to sequence just anything that you want and then what do you do with that data.

So having this one location with everything is going to be really helpful, I think, and push hopefully some molecule. We’ll discover some compounds together.

DAN UDWARY: Yeah, hopefully, third time is the charm. So this is the third iteration of this project that I’ve done through my career. So when I was at URI, University of Rhode Island, and I was a assistant professor– I was working on a database– It was a lot smaller at the time– of natural product clusters.

And then when I went to Warp Drive, they had a massive amount of sequence, and we organized that into a data structure so that we could do the work that we needed to do with all of that sequence data. And so now this is the third iteration of this thing. And I think I finally got it right.

JACLYN WINTER: Yay.

DAN UDWARY: I hope. I hope. It’s the thing that I want it to be, at least.

JACLYN WINTER: It’s going to be exciting to see what data comes out of this and sort of what tangible products do come out of this endeavor.

DAN UDWARY: Yeah, I’m really excited to be surprised by the community and [CHUCKLES] figure out what crazy things people want to do and how they’re going to break my machine in order to do it. [CHUCKLES]

JACLYN WINTER: Challenge them.

DAN UDWARY: I’m into it. Yeah, totally.

JACLYN WINTER: Yes, break the machine. Break Dori.

DAN UDWARY: [CHUCKLES] Well, this [webserver] doesn’t run on Dori. It’s OK. But yeah. So yeah.

And if people try to do things and it doesn’t work, please contact me because I am more than happy to work with you and get you whatever data it is that you want. We will figure it out. If there isn’t an easy way for you to figure out how to do it, either I’ll show you how to do it, or we’ll create the path in order for you to be able to do it.

I think this thing exists for the scientific community, not for my research. Nigel’s already mad that I don’t publish enough, but [CHUCKLES] I am not looking to take anybody’s data or scoop anybody’s data. But I want to help you as that’s what the JGI is, is a “user facility”. And so you don’t have to be a user either for me to try to help you out. I’m really keen to see what people can do with this information when it’s all gathered into one place.

JACLYN WINTER: I think that that’s– it’s just important for our community. Now, because this is a newer platform– and sometimes it’s a little– trying to use different tools and applications, settings within SMC, are you going to any conferences coming up like more natural products based conferences where– are you doing like a workshop potentially for like undergrads, grad students, postdocs, for trainees sort of like we do with GNPS, the Global Natural Product Social Molecular Network, where you can teach people in real time how to use the platform?

Or is that something that you have to kind of figure out–

DAN UDWARY: Yeah. So we actually just ran a workshop at the JGI User Meeting. The JGI User Meeting is always a good place to hear about what’s going on with JGI users and to get training on some of our platforms. By the time we release this, I will also–that’s a good idea. I will put out those training materials. I already promised to do that to somebody, and I haven’t done it yet. So yeah. There will be workshop [materials].

The workshop training materials that we use, I’ll just put them out to the public, and people can go nuts replicating that. It sort of gives you some ideas of things to do without holding your hand, I hope, too much in terms of using the web interface and using the API to get data that you want. And hopefully, people can build on that and get started. But yeah, I’m always happy to talk to people or present or whatever.

I’ve been putting my head down a lot, just trying to get this thing done, working with our pretty small team that’s put this together. There’s only a few of us. It’s been sort of a little–

JACLYN WINTER: Huge undertaking.

DAN UDWARY: –skunkworks project. Yeah. Kind of a little avenue for us to try out new things. Yeah. So sure. I’m totally happy to talk to anybody who wants to use it, but I don’t have any immediate plans right now for more workshops or whatever.

JACLYN WINTER: We’ll just put that out to scientific organizers for conferences. Please contact Dan. He will come give a workshop.

DAN UDWARY: Yeah, we could do that. We could do that. Mm-hmm.

JACLYN WINTER: That’d be fun.

DAN UDWARY: I hope so. [CHUCKLES]

JACLYN WINTER: Right, I mean, think about– if you kind of go back to when– at Scripps, we had the class, and you spearheaded the genome mining class. And we got a paper out of that, and you taught everybody how to look at the different Frankia genomes. And we had three genomes. We did comparative analysis. We did structure prediction. And it was amazing. And now you have– just to the exponential expression of all these genomes and what can happen with all that information. But I think it’s taking data and the different stories you can tell by using it.

DAN UDWARY: Yeah. And down the road, I mean, there’s all kinds of things that I want to see happen with this. And so one of the things that I’ll probably start with is sort of throwing out challenges to the community for sort of like, here’s a BGC that appears on all these genomes. We don’t know what it is. Like, what do you think, right?

Those kinds of things, those kind of little community challenges, I think, are a good way to get people engaged and sort of seed some ideas for people in the community, I think. Also, I want to– well, I don’t know. I don’t know if I’ll talk about this but might want to explore like some gamification, like giving people points for doing things or whatever sort of like monitor or– I don’t know– award people who are active in doing this kind of science and give them some recognition.

I think that’s really important, especially for students and people who are sort of coming up to be able to show that they’re a force in the field. I think it’s a good thing.

JACLYN WINTER: Yeah, and they’re thinking about–

DAN UDWARY: I’m sort of thinking about how to do that.

JACLYN WINTER: –some problems.

DAN UDWARY: Yeah. So I’m thinking about how we can do that. And there’s still a ton of things that I know that we want to do in terms of enabling more comparative analysis and especially the visualizations.

Right now, the only real visuals are sort of the gene displays of the different annotation tracks. And I think we can improve on that. And we are working on making a great big BiG-SLiCE map out of all this. BiG-SLiCE is a comparative analysis tool that puts BGCs into gene families. And so we’re figuring out the best ways to do that with– so Satria Kautsar, who is the BiG-SLiCE guy, works for JGI now. And he’s helping us out a lot with that. Yeah, other kinds of comparative things, Corason and– I don’t know. There’s a lot of different directions and things that we’re trying to figure out how best to do on the scale that we need to do it on, right?

Really, that’s the trick. Is like any time you think about oh, we could do this analysis. But can we do this analysis on 20 million BGCs, right? [CHUCKLES] That becomes a problem.

JACLYN WINTER: My laptop, I think, would explode. I know it would explode. It has a hard time running Zoom and Outlook at the same time.

DAN UDWARY: JGI has a lot of compute capabilities, and so I think providing these kinds of things to the community is one of the things that we can do that I think is valuable. But yeah, figuring out how to do them is the trick. And so if you have ideas about that, anybody out there, talk to me, and we will figure out how to make it happen.

JACLYN WINTER: I can’t wait to see what happens in like the next–

DAN UDWARY: Me too.

JACLYN WINTER: –1, 5, 10 years.

DAN UDWARY: I mean, I just hope people use it, right? There’s something to building a system where you have built it the way that you want to use it. And so SMC’s not perfect in that regard, but I think we’re closer than anything that I’ve been able to use in the past in terms of secondary metabolism and BGCs. I think we’re getting real close to the tool that I want it to be.

I’m so nervous about this whole thing. I’ve lived with it for like two years, basically, and I don’t know. [CHUCKLES] I’m worried it’s going to crash and burn.

JACLYN WINTER: I don’t think so. I think there’s going to be a lot of interest because it is– there’s no database that has everything. And with what Ben Shen was doing, too, where he goes, you have all the genomes, and you might have the same cluster and all these different strains. But only one produces the compound.

So I think having options is going to be key. And then also, I don’t know what kind of metadata you have with the genomic information, but where these strains coming from. Can you make any kind of association with the potential like, what’s the ecological role? So there’s a lot there that the databases don’t have.

DAN UDWARY: Yeah. I should say there’s a balance there in terms of the metadata. So we’ve pulled everything from public sources, and we have the original accession number that it came from, say, the NCBI or GenBank, RefSeq, IMG. So you can go back to those sources and get that data.

We’re not a genome repository, right? So the genomes are in there, at least the genomes that have BGCs. The stuff that didn’t have identifiable BGCs, which usually is about a third of the bacteria that we run, then those are not in there. As new tools come out or new versions of antiSMASH, we’ll sort of– have to rerun. And we hope they have a system in place that’ll be able to keep up with that.

JACLYN WINTER: Just keep building the pipeline.

DAN UDWARY: Yeah, it won’t be the full pipeline. It won’t have to be every time, right? It’ll just be– as there’s a new tool, then– I think we got it worked out, TBD. But hopefully, it won’t be a months-long process of regenerating new data or new annotations for things. But yeah.

So the point is like because we have pulled from places that have that information, it should be tied back. It’s just not– we can’t put everything into SMC itself. Like that’s I don’t know.

But we do have the taxonomy. We do have sort of the kingdom through species level information as it’s come from the sources, GC-content, and other things. We don’t, say, have geographic information of wherever the thing was sampled from if that’s available. We don’t have all of that stuff in there.

So that might make querying some of that a little bit more– I don’t know– tricky. And maybe that’s something we’ll address in the future. We’ll have to figure it out. Yeah, if that’s something–

JACLYN WINTER: I think that could be helpful, too, for context of the compound.

DAN UDWARY: Yeah, yeah, yeah. I can see it, for sure. Just–

JACLYN WINTER: There’s just one more piece of–

DAN UDWARY: One more data field.

JACLYN WINTER: –data content.

DAN UDWARY: Yeah, that we have to put together.

JACLYN WINTER: Not all the people submit also include that as well, those identifiers which can be frustrating at times. So people submitting posters, please include that information.

DAN UDWARY: [CHUCKLES] Always.

JACLYN WINTER: Always include as much as you can.

DAN UDWARY: Yeah. And that’s some of the– one of the things too. The older data won’t have that, right? The older genome sequences and even the older genome annotations are not always so good. We’ve basically redone gene annotations for all of the stuff that’s gone through. And that also takes a chunk of time.

JACLYN WINTER: I’m excited to use it. We have people in my lab start looking at it.

DAN UDWARY: Yeah, please do and give me feedback whenever. I’m really happy to hear whatever people feel like they need or don’t understand or whatever. If you click on the About SMC link on the main page, then you’ll see a roadmap of where I think we’re going in terms of what developments are coming next as you see new version numbers.

So version 1.0 will be the first release, but those version numbers aren’t going to correlate to data releases as a lot of sequence repositories do. Our version numbers will correlate to feature releases, new things that you can do with the data. And the data will just keep rolling in over time as it goes, so there won’t ever be a sort of a version 2 SMC release.

There might be a version 2, but it will be the features of the website or APIs or whatever rather than a specific data release. So just use it. I mean, basically, if there’s information that’s of use to you in there, there’s already a massive amount of stuff there, and it’s just going to keep rolling in. So go crazy.

All right, Jackie. Thanks so much. I’ll talk to you later.

JACLYN WINTER: I’ll talk to you later. Have a good rest of your day.

DAN UDWARY: Yep. You too. Bye.

JACLYN WINTER: Bye.

Show Notes: