As a computational biologist, what roles have you played in drug discovery?

I've done everything from supporting drug discovery labs to translational research groups.

Why is sequence analysis important in drug discovery?

In discovery, you can use sequence analysis for target discovery screening. You can also start looking for patient selection biomarkers. Is there a particular genetic profile, either at the expression level or the DNA sequence level, with various mutations? Are there certain mutations that are present in patients that make them more or less susceptible to your drug being effective? That is an extremely exciting and active area in the pharmaceutical industry. It can really make the difference between success and failure to be able to know before treatment, yes, this drug should work, or no, it probably won't work. Additionally, you're not wasting patients' precious time. Especially in a field like oncology, where patients don't have much time, you're not wasting their time with trial and error and multiple lines of therapy with drugs that just won't work. And so all of that involves sequence analysis. It touches on every aspect of the drug development process.

Where do you think the biggest challenges are within sequence analysis?

I would say the challenges are actually dropping pretty darn fast because the technology is advancing every year. Storage is still an issue. Even in my most recent role, we had to move sequence data, and the fastest way to do that was to load it onto a hard drive and drop it in a FedEx box. Instead of moving hundreds of gigabytes or terabytes of data over the internet, it's faster to ship it in a box. Local storage is not a problem, it's the transmission of a large amount of data from one place to another. These days, once you have the data where it needs to be, you can find enough computing power to run your sequencing project, but getting the data to the machines is still the bottleneck. Patient-derived samples are a challenge, too. They're painful, biopsies are invasive, and sick people don't want to have to give multiple samples. Once they are taken, they're generally formalin-fixed and paraffin-embedded, so any nucleic acid material is going to be degraded to some degree. There are ways to try to extract and use that sort of prepared sample tissue for sequencing, but the quality of the sequence is always going to take a hit. If you're a small company, the technology is also expensive–these machines cost a lot of money. Similarly, computational biologists are becoming much more common, but still, they're not everywhere, and everybody wants to work for the biggest and the best and make the most income. The labor pool is growing, but it's still limited. In a way, nobody wants to work on sequence analysis anymore. They all want to make the next great learning model. The focus is not so much on number crunching and data analysis, it's now on advanced AI and ML. Everybody wants to be working on the new, hot, shiny technology–and that's not sequence analysis. So that is going to be a challenge soon.

Do you really need a computational biologist to do sequence analysis now?

If you're doing cookie-cutter, well-established methodologies that are well-developed, validated, and documented, then no, you don't.

Can AI and ML be helpful with sequence analysis?

With a well-curated data set, AI and ML can definitely help.

What are your thoughts on AlphaFold, which performs AI predictions of protein structures?

I think AlphaFold is an absolute game-changer.

What do you think is the cutting edge of AI and ML in drug discovery?

It's spatial, which is kind of the next generation of single cell. Multi omics.

Do you think we're going to be creating models of biological systems?

If you'd asked me that when I was in grad school, I would have said humanity does not have the mathematics that can describe a biological system. But now, that is probably the direction it will have to go in. Will it require a quantum computer? Maybe? It might be after my lifetime, but I will say now with some confidence that at some point, humanity will be able to have accurate, reliable computational simulations of living systems. And that statement kind of scares me. I know there's a lot of work being done in the digital twin space. Limited first-step scenarios, but digital twins are online and being used in clinical trials now. That's kind of the beginning of it.

What do you think is needed to drive these new frontiers forward? New algorithms, new frameworks, or something else?

It's all of it—we need new ways of thinking about the problem.

If you had a magic wand to change anything in the drug discovery process, what would you change?

I would make all the data well-annotated and available to everyone.

配列分析には計算生物学者が必要か。

創薬で計算生物学が進歩し続けている中、新しい課題や可能性も絶えず発生しています。配列解析は、長い間、バイオインフォマティクスの主要な側面でした。本記事では、ライフサイエンス分野におけるデータ解析の専門家であるNullSet Informatics Solutions社創設者のJefferson Parker博士（Ph.D.）に、創薬での配列解析の新境地についてお話を伺いました。

CAS：計算生物学者として、博士は創薬において今までどういった役割に携わってこられたのですか。

CAS：創薬において配列分析が重要なのはなぜですか。

CAS：配列解析における最大の課題は、どこにあると思いますか。

CAS：配列解析をするのに、いま本当に計算生物学者が必要なのでしょうか。

CAS：AIとMLは配列解析に役立つのでしょうか。

CAS：タンパク質の構造をAI予測するAlphaFoldについてはどう思われますか。

CAS：創薬におけるAIとMLの最先端とは、何のことを指すのでしょうか。

CAS：生物システムのモデルを作ることになると思いますか。

CAS：これらの最先端をさらに前進させるには、何が必要だと思いますか。新たなアルゴリズム、新たなフレームワーク、それとも何か他のものでしょうか。

CAS：創薬プロセスにおいて何かを変えられる魔法の杖があるとしたら、何を変えますか？

Jefferson Parker博士（Ph.D.）はMITで研究者としてのキャリアをスタートさせ、グラム陽性土壌細菌であるRhodococcus aetherovoransの生体異物代謝を調査しました。彼はDNAマイクロアレイを開発するためにゲノムに注釈を付けようとした際、データの過負荷に直面し、コンピューティングの世界に足を踏み入れました。それ以来、生物学、コンピューティング、数学が交差する分野で働いています。彼のキャリアの歩みは、NovartisやThomson Reutersを含む中小製薬企業、大手製薬会社、コンサルティング組織を経てきました。その過程で、Jeffersonはペンシルベニア州立大学で応用統計学の大学院修了証書を取得し、ボストン大学でコンピューターサイエンスの修士号を取得しました。現在、Jeffersonは自身のバイオインフォマティクスコンサルティング会社、NullSet Informatics Solutionsで新たな道を切り開いています。NullSet Informatics Solutionsは、データと分析、データモデリング、技術プロジェクト管理サービスを提供しています。