Computational Biology
Awesome Computational Biology ¶
A knowledge collection of databases, software and papers related to computational biology.
Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modelling and computational simulation techniques to the study of biological, ecological, behavioural, and social systems. - Wikipedia
Databases¶
scRNA¶
- Gene Expression Omnibus - Public functional genemics database.
- Single Cell PORTAL - Public database for single cell RNA.
- Single Cell Expression Atlas - Public database for single cell RNA.
Compound¶
- PubChem - One of the biggest chemical database such as compounds, genes and proteins.
- ChEBI - Chemical database focused on small chemical compounds.
- ChEMBL - Database of bioactive molecules with drug-like properties.
- ChemSpider - Chemical structure database.
- KEGG COMPOUND - Collection of small molecules and biopolymers.
- LIPID MAPS - Database of lipids.
- Rhea - Database of chemical reactions.
- Drug Repurposing Hub - Collections of drug repurposing data containing drug, moa, target etc.
Pathway¶
- PathwayCommons - Database of Pathways and Interactions.
- KEGG PATHWAY - Collection fo drawn pathway maps.
- WikiPathways - Database of biological pathways.
Mass Spectra¶
- MassBank - Open souce databases and tools for mass spectrometry reference spectra.
- MoNA MassBank of North America - Meta database of metabolite mass spectra, metadata and associated compounds.
Protein¶
- THE HUMAN PROTEIN ATLAS - One of the biggest human protein database contained cells, tissues, and organs.
- PROTEIN DATA BANK - Database of the 3D shapes of proteins, nucleic acids, and complex assemblies.
- UniProt - The collection of functional information on proteins.
- AlphaFold Protein Structure Database - Database of 3D protein structures.
Genome¶
- Human Genome Resources at NCBI - Database of image, proteomics, transcriptomics and systems biology.
- GenBank - Database of genetic sequence offered by NCBI.
- UCSC Genome Browser - Genome blowser offered by UCSC.
- cBioPortal - Database of Cancer Genomics. This has overall metaview for a lot of patients.
Disease¶
- KEGG DRUG - Comprehensive drug information resource for approved drugs.
- DrugBank - A database of drug and target maintained by the University of Alberta.
Interaction¶
- Drug Gene Interaction
- DGIdb - A database of drug-gene interactions and the druggable genome.
- Comparative Toxicogenomics Database - A database of Chemical-gene interactions, Chemical-disease associations, Gene-disease associations, and Chemical-phenotype associations.
- SNAP - A dataset which contains Drug-gene interactions.
- Comparative Toxicogenomics Database - A database for drug-target interactions.
- Therapeutics Data Commons - A database for a lot of tasks such as drug-target, drug-response, drug-drug interaction.
- Drug (-Cell line) Response
- NCI60 A database which focus on 60 cancer cell lines with many drugs.
- Genomics of Drug Sensitivity in Cancer (GDSC) - A database of drug sensitibity which has 1000 human cancer cell lines and 100s compounds.
- Cancer Cell Line Encyclopedia - A database of cancer cell lines. This has 1000 cell lines.
- Chemical Protein Interaction
- STITCH - A database of Chemical Protein Interaction.
- BindingDB - A database of compounds and targes.
- Protein-Protein Interaction
- STRING - Protein-Protein Interaction Networks for several organisms.
- BioGRID - Database of Protein, Genetic and Chemical Interactions.
- HIPPIE - Human Protein-Protein Interaction database.
Preprocess¶
- Chemistry Development Kit - A software of cheminformatics and Machine Learning.
- RDKit - A software of cheminformatics and Machine Learning.
- Scanpy - scRNA analysis library in Python.
- Seurat - scRNA analysis library in R.
Machine Learning Tasks and Models¶
Drug Repurposing¶
- DeepPurpose - A DL Library for Drug Repurposing and so on.
- DRKG - A library for biological knowledge graph.
Drug Target Interaction¶
- NeoDTI - A library for Drug Target Interaction.
Compound Protein Interaction¶
- MCPINN - A library for drug discovery using Compound Protein Interaction and Machine Learning.
- TransformerCPI - A library for Compound Protein Interaction prediction using Transformer.
Pre-trained embedding¶
- Evolutionary Scale Modeling - a library for protein embeddings.
- ChemBERTa-2 - a library for chemical embeddingg and prediction.
LLM for biology¶
- AI4Chem/ChemLLM-7B-Chat - LLM for chemical and molecule science
- BioGPT - LLM for Biomedical text generation