Bots and Robots, gather round, for a random, high-level, and completely non-scientific introduction to bioinformatics is about to begin! 
Are you ready to learn about bioinformatics without all the boring details? Then you've come to the right place.

We'll start with a general view of the field, and then zoom in to the details, just like a David Fincher film. But instead of mind-bending plots and psychological thrillers, we'll be talking about DNA and proteins. And unlike a Fincher movie, you'll actually understand what's going on.

But before we begin, let's get one thing straight: this is not a real introduction to bioinformatics. This is an introduction on how to understand bioinformatics at a high level. So, if you're looking for a detailed explanation on how to align a sequence or predict protein structure, you're out of luck. We're about to discover when we'll learn about it.

My goal for each post is to develop some understanding that I can leverage onto the next post.

 

Why Bioinformatics ? I strongly believe that biology has always been the backbone of research. Within the 21th century, every generational company will emerge building through software, and within the ML field, everyone has the opportunity to get in on the ground floor and be on the team building it.
Thus, learning about Bioinformatics is getting the best of our world !

 

I would like to start by mentioning Scott H. Young's TED Talk which inspired me to research the form of this series. Looking to follow MIT's CS course, he leveraged Open Courseware material and dedicated 12-months to following it. Based on the School's Course Catalog

Throughout my exploration, I gradually realized that simply writing down an introduction to a whole field might be more difficult that I expected. Fortunately enough, this preliminary formal exploration led me to discovering a few inspirations and guided me toward a clear direction about how this series of Post should be written.
Despite naming this page Bioinformatics 101, it was meant to be an overview, sort of Table of Content worth exploring where I can each time sources a clear list of subjects worth exploring. 

As mentioned in my Preface, this is my 'topic-bucket' meant to provide me with prompts to know where my next post is headed.

 

For each subject, you should expect research to be done into Wikipedia, Google, MIT's Open Courseware and finally try to write a small blog post about it. Unlike I expected early on, there shouldn't be any reason to think my blog post would be paper-like despite their subjects being academic. What could be interesting would be to further any blog-post by looking up its topic on arXiv and see the most recent published papers.

Without further due, here's the outline :

[ The Following Sections Are Under Construction ]

Biology

Local and Global Alignment (BLAST, NW, SW, PAM, BLOSUM)

Mapping : Library Complexity & Short Read Alignment

Template preparation, Sequencing and Imaging, Genome Alignment and Assembly Approaches

Proteomics[1]

Introduction to Protein Structure, Structure Comparaison & Classification

Protein structure prediction and protein-protein interactions. 

Predicting Protein Structure & Interactions

Analysis and alignment of Protein-Protein Interaction networks

Protein folding
 

Genomics

Comparative Genomic & Gene Regulation

Causality, Natural Computing and Engineering Genomes

Genomic data, genome assembly, gene prediction, and functional genomics.

Markov Models of Genomics & Protein Features

 

ChIP-seq Analysis, DNA-Protein Interactions

 

RNA

Introduction to RNA, RNA Structure and modeling

RNA-seq Analysis, Expression, Isoforms, Splicing 

RNA Secondary Structure, Biological Functions & Predictions

Modeling of Sequence Motifs

Single cell RNA-sequencing 

scRNA-seq, dimensionality reduction 
 

Dimensionality Reduction, Genetics, and Variation

 GWAS and Rare variants

 

Minimum free energy and partition function with the Nearest neighbor energy model

Stochastic prediction of RNA secondary structures 

Comparative modeling of RNAs

Simultaneous folding and alignment of structured RNAs.

RNA sequence-structure maps and simulation the evolution of RNA populations

Pseudo-knots and RNA-RNA interaction predictions 

RNA functional structures are made of theoretically predicted overrepresented blocks

RNA 3D Modeling & structure prediction 

Protein & Proteomics (again)

Introduction to Protein structure

Protein secondary structure prediction using Neural Networks

Protein residue contact prediction 

Protein fold recognition and threading 

Molecular dynamics simulation 

Protein 3D structure prediction 

Structural bioinformatics and machine learning for drug design

Minimalist models: Protein folding on HP lattice models 

3D Genomics 

 

Networks :

Gene Regulatory Networks

Protein Interaction Networks

Logic Modeling of Cell Signaling Networks

 

Introduction to Chromatin : Structure & Classification

Chromatin and gene regulation 

 

eQTLs

Quantitative Trait Loci (QTLs)

Human Genetics, SNPs, and Genome Wide Associate Studies

 

DNA Accessibility, Promoters and Enhancers

Transcription Factors, DNA methylation 

Gene Expression, Splicing 

 

Drug Discovery and Design

Discovery and design of new drugs, including virtual screening, molecular dynamics simulations, and cheminformatics.

Biomedical Informatics

Bioinformatics in medicine and healthcare : electronic health records, clinical decision support systems, and personalized medicine.

Electronic health records and patient data 

Imaging applications in healthcare

ML for health data

Synthetic Biology

Modules & Therapeutic Systems

Biological Engineering

Fundamentals of Biological Engineering

Introduction to Cell and molecular biology, biochemistry, and genetics.

Bioinformatics

Computational methods for sequence alignment, gene annotation, and phylogenetic analysis.

Bioengineering

Introduction to gene therapy, genetic engineering, and synthetic biology.

Bioprocess Engineering

Design and optimization of biological systems for the production of chemicals, drugs, and other products.

Biomedical Imaging

Principles and techniques of biomedical imaging : X-ray, CT, MRI, ultrasound, and PET/CT.

Tissue Engineering and Regenerative Medicine

Tissue-engineered products and therapies, including stem cells, scaffolds, and biomaterials. (feel like this is starting to be very far fetched)

 

Computer Science

Systems Biology and Synthetic Biology

Computational methods used in systems biology : network inference, model building, data integration.

Biomedical Data Science

Data analysis, data visualization and ML methods used in biomedical research.

ML Foundations, CNN

Recurrent Neural Networks, Graph Neural Networks

Neural Networks Review

Interpretability, Dimensionality Reduction

Generative Models, GANs, VAE

Video processing, structure determination

Imaging and Cancer

EHRs and data mining 

Neuroscience

 

 

If you're reading this, It's Too Late thank you for giving me the benefit of the doubt despite the obvious chaos. 

But you asked for it folks! A brief introduction list to bioinformatics. If you feel passionate about a missing topic, do not feel free to comment which topic is missing. 

As with any Fincher movie, this is just a small glimpse into my project ; from implementing bioinformatics methods to exploring the latest papers in genomics and proteomics, we'll eventually be covering it all.

So, whether you're a student, a researcher, or just an avid reader wondering if this can get any worse, stay tuned for more in posts about Bioinformatics.

 

In the meantime, keep learning, keep exploring, and remember to never stop asking questions. Everything is constantly evolving, and there's always more to discover.

Thank you for reading. And as usual, Godspeed :)

 

Bottom note : I think it would be interesting to find a way to express each blog post into a concrete format. Out of my hat, I would be thinking of generating an artistic representation specific to the topic at hand.
More to think about.

You might have noticed the non-exhaustiveness of this list, from being too long to repetitive. I wish I could truthfully say that it was purposely created this way but it's not : I'll try to edit it as we go !

  1. ^

    I would emphasize this part as I have a keen interest around proteins

5

New Comment