Skip to content

Computational Biology

Awesome Computational Biology Awesome

A knowledge collection of databases, software and papers related to computational biology.

Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modelling and computational simulation techniques to the study of biological, ecological, behavioural, and social systems. - Wikipedia

Databases

scRNA

Compound

  • PubChem - One of the biggest chemical database such as compounds, genes and proteins.
  • ChEBI - Chemical database focused on small chemical compounds.
  • ChEMBL - Database of bioactive molecules with drug-like properties.
  • ChemSpider - Chemical structure database.
  • KEGG COMPOUND - Collection of small molecules and biopolymers.
  • LIPID MAPS - Database of lipids.
  • Rhea - Database of chemical reactions.
  • Drug Repurposing Hub - Collections of drug repurposing data containing drug, moa, target etc.

Pathway

Mass Spectra

  • MassBank - Open souce databases and tools for mass spectrometry reference spectra.
  • MoNA MassBank of North America - Meta database of metabolite mass spectra, metadata and associated compounds.

Protein

Genome

Disease

  • KEGG DRUG - Comprehensive drug information resource for approved drugs.
  • DrugBank - A database of drug and target maintained by the University of Alberta.

Interaction

  • Drug Gene Interaction
  • DGIdb - A database of drug-gene interactions and the druggable genome.
  • Comparative Toxicogenomics Database - A database of Chemical-gene interactions, Chemical-disease associations, Gene-disease associations, and Chemical-phenotype associations.
  • SNAP - A dataset which contains Drug-gene interactions.
  • Comparative Toxicogenomics Database - A database for drug-target interactions.
  • Therapeutics Data Commons - A database for a lot of tasks such as drug-target, drug-response, drug-drug interaction.
  • Drug (-Cell line) Response
  • NCI60 A database which focus on 60 cancer cell lines with many drugs.
  • Genomics of Drug Sensitivity in Cancer (GDSC) - A database of drug sensitibity which has 1000 human cancer cell lines and 100s compounds.
  • Cancer Cell Line Encyclopedia - A database of cancer cell lines. This has 1000 cell lines.
  • Chemical Protein Interaction
  • STITCH - A database of Chemical Protein Interaction.
  • BindingDB - A database of compounds and targes.
  • Protein-Protein Interaction
  • STRING - Protein-Protein Interaction Networks for several organisms.
  • BioGRID - Database of Protein, Genetic and Chemical Interactions.
  • HIPPIE - Human Protein-Protein Interaction database.

Preprocess

  • Chemistry Development Kit - A software of cheminformatics and Machine Learning.
  • RDKit - A software of cheminformatics and Machine Learning.
  • Scanpy - scRNA analysis library in Python.
  • Seurat - scRNA analysis library in R.

Machine Learning Tasks and Models

Drug Repurposing

  • DeepPurpose - A DL Library for Drug Repurposing and so on.
  • DRKG - A library for biological knowledge graph.

Drug Target Interaction

  • NeoDTI - A library for Drug Target Interaction.

Compound Protein Interaction

  • MCPINN - A library for drug discovery using Compound Protein Interaction and Machine Learning.
  • TransformerCPI - A library for Compound Protein Interaction prediction using Transformer.

Pre-trained embedding

LLM for biology