Skip to content

Computational Biology

Awesome Computational Biology Awesome

A knowledge collection of databases, software and papers related to computational biology.

Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modelling and computational simulation techniques to the study of biological, ecological, behavioural, and social systems. - Wikipedia

Databases

scRNA

Compound

Pathway

Mass Spectra

  • MassBank - Open souce databases and tools for mass spectrometry reference spectra.
  • MoNA MassBank of North America - Meta database of metabolite mass spectra, metadata and associated compounds.

Protein

Genome

Disease

  • KEGG DRUG - Comprehensive drug information resource for approved drugs.
  • DrugBank - A database of drug and target maintained by the University of Alberta.

Interaction

  • Drug Gene Interaction
  • DGIdb - A database of drug-gene interactions and the druggable genome.
  • Comparative Toxicogenomics Database - A database of Chemical-gene interactions, Chemical-disease associations, Gene-disease associations, and Chemical-phenotype associations.
  • SNAP - A dataset which contains Drug-gene interactions.
  • Therapeutics Data Commons - A database for a lot of tasks such as drug-target, drug-response, drug-drug interaction.
  • Drug (-Cell line) Response
  • NCI60 A database which focus on 60 cancer cell lines with many drugs.
  • Genomics of Drug Sensitivity in Cancer (GDSC) - A database of drug sensitibity which has 1000 human cancer cell lines and 100s compounds.
  • Cancer Cell Line Encyclopedia - A database of cancer cell lines. This has 1000 cell lines.
  • CellMiner Cross Database (CellMinerCDB) - Integration of multiple cancer cell line databases.
  • Chemical Protein Interaction
  • STITCH - A database of Chemical Protein Interaction.
  • BindingDB - A database of compounds and targes.
  • PDBBind - Database of experimentally measured binding affinity data for biomolecular complexes.
  • CrossDocked2020 - Large-scale dataset for machine learning in structure-based virtual screening.
  • Protein-Protein Interaction
  • STRING - Protein-Protein Interaction Networks for several organisms.
  • BioGRID - Database of Protein, Genetic and Chemical Interactions.
  • HIPPIE - Human Protein-Protein Interaction database.
  • Knowledge Graph
  • Drug Mechanism Database (DrugMechDB): database of the mechanism of action from a drug to a disease.
  • DRKG - A library for biological knowledge graph.

Clinical Trial

API

Preprocess

  • Chemistry Development Kit - A software of cheminformatics and Machine Learning.
  • RDKit - A software of cheminformatics and Machine Learning.
  • Scanpy - scRNA analysis library in Python.
  • Seurat - scRNA analysis library in R.

Machine Learning Tasks and Models

Drug Response Prediction

  • drGAT: A model for drug response prediction with gene explainability with attention mechanism.
  • MOFGCN: GCN + heterogeneous network
  • DeepDSC: Autoencoder + Fully Connected NN
  • DGDRP: Multi-view embedding NN.
  • DeepAEG: GNN Embedding + Attention

Drug Repurposing

Drug Target Interaction

  • NeoDTI - A library for Drug Target Interaction.

Compound Protein Interaction

  • MCPINN - A library for drug discovery using Compound Protein Interaction and Machine Learning.
  • TransformerCPI - A library for Compound Protein Interaction prediction using Transformer.

Pre-trained embedding

LLM for biology

  • AI4Chem/ChemLLM-7B-Chat - LLM for chemical and molecule science
  • BioGPT - LLM for Biomedical text generation
  • GeneGPT - LLM for biomedical information with several API.
  • GenePT - foundation LLM for single cell data
  • scPRINT - scPRINT is pretrained on 50M cells to denoise and perform zero imputation of any single cell RNAseq profile.