Welcome to the FIXED PDB documentation. This guide provides an overview of our platform, its features, and how you can utilize our curated database for your research.

Introduction

The Protein Data Bank (PDB) is one of the most foundational resources in structural biology, offering over 200,000 experimentally determined 3D structures of proteins, nucleic acids, and macromolecular complexes. These structures—derived from X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy—are indispensable for understanding biological mechanisms, guiding drug discovery, and advancing computational modeling. However, raw PDB files often contain structural imperfections that limit their direct usability in simulation-based workflows.

Common deficiencies include missing side-chain or backbone atoms, incomplete polypeptide chains due to unresolved electron density, nonstandard or modified amino acid residues (e.g., phosphorylated serine), and a complete absence of hydrogen atoms in X-ray structures. While these gaps reflect experimental realities, they pose significant challenges for molecular dynamics simulations, docking studies, virtual screening, and other force field–dependent computational methods that require topologically complete and chemically consistent inputs.

To bridge this gap between experimental data and computational readiness, we developed FIXED PDB (https://fixedpdb.com)—a fully curated, publicly accessible database of over 150,000 repaired and simulation-ready PDB structures. Using an automated pipeline built around PDBFixer (part of the OpenMM suite), we systematically process each entry to:

  • Identify and model missing residues and atoms using idealized geometry,
  • Replace nonstandard residues with canonical counterparts to ensure compatibility with AMBER, CHARMM, GROMACS, and other molecular modeling packages,
  • Preserve all original heteroatoms (ligands, cofactors, ions),
  • Retain critical metadata such as CRYST1, REMARK, HELIX, and SHEET records—carefully renumbered to reflect corrected residue sequences,
  • Apply localized energy minimization only to newly added atoms, leaving the original experimental coordinates untouched.

Our mission is to eliminate the repetitive, error-prone preprocessing step that researchers typically perform before every computational experiment. By delivering biologically faithful, structurally complete, and force-field–compatible models, FIXED PDB accelerates research in structural bioinformatics, rational drug design, and biomolecular simulation.

The project is led by a multidisciplinary team of experts from the Department of Medicinal Chemistry at Isfahan University of Medical Sciences, with deep experience in computational drug design, structural biology, and cheminformatics. All source code including the Python-based batch-processing pipeline is openly available on GitHub, ensuring full transparency, reproducibility, and community collaboration.

Whether you're a graduate student visualizing a protein for the first time, a computational chemist running large-scale docking campaigns, or a bioinformatician building structural datasets, FIXED PDB provides a trusted, ready-to-use foundation for your work—free of charge and without registration.

Getting Started

FIXED PDB is designed to offer researchers, students, and computational scientists immediate access to high-quality, simulation-ready protein structures—without the need for manual preprocessing. Whether you’re new to structural biology or an experienced modeler running large-scale docking pipelines, this guide will help you quickly get up and running.

Quick Start Guide

Follow these simple steps to find and download your first curated PDB file:

  1. Visit the Homepage: Go to https://fixedpdb.com. You’ll see a clean, responsive interface with a prominent search bar at the top.
  2. Search for a Structure: Enter a PDB ID (e.g., 1TIM), protein name (e.g., “kinase”), organism (e.g., “Homo sapiens”), or any keyword in the search field and press Search.
  3. Review Results: The results table displays key metadata including classification, organism, experimental method, resolution, and deposition date. Each entry includes a direct link to download the fixed PDB file.
  4. Explore in 3D (Optional): Click the More Info button next to any entry to reveal an embedded LiteMol viewer. This allows you to inspect secondary structure, ligand binding sites, and overall fold directly in your browser—no external software required.
  5. Download Files:
    • Download individual files using the PDB File link.
    • Use the Download All button to batch-download all results from your current search as ZIP archives (files are split into manageable batches of 1,000).
    • Export search metadata (e.g., IDs, classifications, resolution) as a CSV file using the Download CSV button for integration into scripts or analysis pipelines.

Navigating the Website

The FIXED PDB interface is built for intuitive exploration and efficient data retrieval. Here’s a breakdown of its key components:

  • Header Navigation:
    • Search / Home: Returns you to the main search page.
    • Explore: Directs you to the full database listing with advanced filtering options.
    • About: Provides background on the project’s scientific mission and methodology.
    • FAQs: Answers common questions about file differences, conformational changes, and data coverage.
    • Contact: Connects you with our team via a web form or direct contact details.
    • Our Team: Introduces the researchers behind FIXED PDB from Isfahan University of Medical Sciences.
    • GitHub: Links to the open-source codebase at github.com/nazari210/fixedpdb, where you can inspect the curation pipeline, report issues, or contribute improvements.
    • Help: This documentation hub, designed to onboard new users and support advanced workflows.
  • Advanced Search Panel:

    Click the Advanced Search button below the main search bar to reveal powerful filtering options:

    • Search by Resolution Range (e.g., ≤2.0 Å for high-quality X-ray structures).
    • Filter by Experimental Method (X-ray, NMR, or Electron Microscopy).
    • Specify a Date Range for deposition dates.
    • Combine multiple criteria (e.g., “human kinases solved by X-ray with resolution < 2.5 Å”).
    • Sort results by any column (ID, organism, resolution, etc.) using the arrow buttons next to each field.
  • Results Table & Viewer:

    Each result row shows essential information at a glance. Expanding a row reveals:

    • Full protein title and deposition date.
    • Link to the original RCSB PDB entry for reference.
    • Interactive 3D molecular viewer powered by LiteMol.js.

  • Responsive Design & Accessibility:

    The site adapts seamlessly to desktops, tablets, and mobile devices. A built-in dark mode (toggle via the moon/sun icon in the top-right) reduces eye strain during extended use.

For Developers & Power Users

If you're integrating FIXED PDB into automated workflows:

  • All search parameters are passed via URL query strings (e.g., ?id=1abc&organism=human&min_res=1.0&max_res=2.5), enabling programmatic access.
  • Bulk downloads return ZIP files compatible with standard extraction tools and scripting environments (Python, Bash, etc.).

With FIXED PDB, you skip the tedious cleanup step and move straight to analysis, modeling, or visualization—all while maintaining scientific rigor and reproducibility.

Exploring the Database

FIXED PDB is built to empower researchers with rapid, precise, and intuitive access to over 150,000 curated protein structures. Whether you're looking for a single PDB entry or filtering thousands of structures by experimental criteria, our database offers flexible search capabilities, detailed result displays, and powerful export tools—all designed to streamline your scientific workflow.

Searching for Data

FIXED PDB supports two complementary approaches to data discovery:

1. Quick Search via the Main Search Bar

Located prominently on the homepage and the Explore page, the main search bar accepts free-text queries and matches against multiple metadata fields simultaneously. You can search by:

  • PDB ID (e.g., 1TIM, 7XYZ) for direct access to a specific structure.
  • Protein name or title (e.g., “hemoglobin”, “EGFR kinase”) to find all related entries.
  • Organism (e.g., “Homo sapiens”, “Escherichia coli”) to restrict results to a biological source.
  • Classification (e.g., “hydrolase”, “transferase”) based on the PDB’s SCOP/CATH-like categorization.
  • Experimental method (e.g., “X-RAY”, “NMR”, “ELECTRON MICROSCOPY”).
  • Deposition date or resolution (e.g., typing “2.0” may match structures with ~2.0 Å resolution).

The system performs a broad, case-insensitive substring match across these fields, making it forgiving and user-friendly for exploratory searches.

2. Advanced Search for Precision Filtering

For reproducible, criteria-driven queries, click the Advanced Search button (available on the Explore Data page) to reveal a structured filtering panel. This allows you to combine multiple constraints:

  • PDB ID prefix or exact match
  • Organism name or taxonomic identifier
  • Resolution range (e.g., 1.0–2.5 Å) using interactive sliders or numeric inputs—ideal for selecting high-quality X-ray structures.
  • Deposition date range (e.g., structures released between 2015 and 2023).
  • Experimental method (filter exclusively for X-ray, NMR, or Cryo-EM entries).
  • Protein classification or title keywords

Example use case: “Find all human GPCRs solved by X-ray crystallography with resolution ≤ 2.8 Å, deposited after 2018.”

All advanced search parameters are encoded in the URL (e.g., ?organism=Homo+sapiens&method=X-RAY&max_res=2.8), enabling you to bookmark or share complex queries with collaborators.

Understanding Search Results

Search results are displayed in a clean, responsive table with consistent metadata derived from the original RCSB PDB header and enhanced by our curation pipeline. Each row includes:

  • ID: The 4-letter PDB identifier (e.g., 1AON), linked directly to the downloadable fixed file.
  • Classification: The macromolecular class (e.g., “Oxidoreductase”, “Viral Protein”).
  • Organism: The source species (e.g., “Mus musculus”, “SARS-CoV-2”).
  • Download: A direct link to the {pdb_id}_fixed.pdb file, ready for simulations or visualization.
  • More Info: A button that expands the row to reveal additional context:
    • Full protein title and biological function.
    • Experimental method (X-ray, NMR, EM) and resolution (for X-ray/EM).
    • Deposition date (formatted as DD-Mon-YY).
    • A live LiteMol 3D viewer that renders the structure in your browser—no plugins required.
    • A link to the original RCSB PDB entry for reference and validation.

The results table is:

  • Sortable: Click the small ↑/↓ arrows next to any column header to sort results (e.g., by resolution, date, or ID).
  • Paginated: Default page size is 25 entries, adjustable to 50 or 100 via the dropdown above the table.
  • Exportable: Use the Download CSV button to obtain a spreadsheet of all matching IDs and metadata for integration into scripts or analysis pipelines.
  • Bulk-downloadable: The Download All button fetches all matching PDB files in ZIP archives (split into batches of 1,000 for performance).

Every fixed structure retains critical experimental annotations:

  • CRYST1 records (unit cell parameters) are preserved from the source.
  • HELIX and SHEET assignments are renumbered to reflect corrected residue sequences.
  • Heteroatoms (ligands, cofactors, ions) are kept intact—only missing protein atoms are added.
  • No hydrogen atoms are added (consistent with X-ray PDB conventions), but the topology is complete for force fields like AMBER or CHARMM.

By combining intuitive search, transparent metadata, and one-click access to simulation-ready files, FIXED PDB transforms raw structural data into actionable scientific insight—faster and more reliably than manual preprocessing ever could.

Understanding Our Fixing Process

The core mission of FIXED PDB is to transform raw Protein Data Bank (PDB) entries—often riddled with gaps and inconsistencies—into computationally ready, structurally complete, and force-field–compatible models. This is achieved through a standardized, automated pipeline built on PDBFixer (part of the OpenMM suite), enhanced with custom logic to preserve experimental context and biological fidelity. Below is a step-by-step breakdown of our curation workflow.

1. Source Data Integrity

All structures originate directly from the RCSB PDB. We download the original PDB file (.ent or .pdb) without modification, ensuring traceability to the experimental source. Critical metadata—including resolution, experimental method, deposition date, protein title, organism, and classification—is extracted from header records (HEADER, TITLE, COMPND, SOURCE, EXPDTA, REMARK 2) and stored in our MySQL database for search and indexing.

2. Structural Repair via PDBFixer

Using PDBFixer, we apply the following corrections in a reproducible, script-driven manner (see src/pdb_processor.py in our GitHub repository):

  • Missing Residue Detection: PDBFixer compares the residue sequence in the SEQRES records (if present) against the actual atomic coordinates. Gaps in the polypeptide chain are identified as missing residues.
  • Terminal Gap Exclusion: Missing residues at the very N- or C-termini are intentionally not modeled. This avoids introducing biologically unsupported extensions beyond the experimentally observed chain.
  • Nonstandard Residue Replacement: Modified amino acids (e.g., phosphoserine, hydroxyproline, selenocysteine) are replaced with their standard counterparts (SER, PRO, CYS). This ensures compatibility with classical force fields (AMBER, CHARMM, GROMACS) that require canonical residue types.
  • Missing Atom Addition: All missing backbone and side-chain heavy atoms are added using idealized geometry based on the local residue conformation and standard bond lengths/angles.
  • Local Energy Minimization: Only the newly added atoms undergo 500 steps of steepest descent minimization (via OpenMM) to relieve steric clashes. The original experimental coordinates remain completely untouched.

3. Metadata Preservation & Record Correction

A key innovation of FIXED PDB is the careful reconciliation of repaired atomic coordinates with original experimental annotations. Our custom post-processing scripts (see helix_fixer.py and sheet_fixer.py) perform the following:

  • HELIX and SHEET Record Adjustment:
    • Secondary structure assignments from the original PDB header (HELIX, SHEET) are preserved.
    • Residue numbering in these records is dynamically re-aligned to reflect any inserted residues. For each chain, we compute an offset (addnum) between the original and corrected sequences and apply it to HELIX/SHEET start/end positions.
    • This ensures that secondary structure annotations remain accurate and interpretable in downstream visualization (e.g., PyMOL, ChimeraX, LiteMol).
  • REMARK and Footer Preservation: Experimental details and refinement statistics from the original file’s footer are retained to provide full scientific context.

4. Conformational Changes: Loop Modeling and Fidelity

The most noticeable structural adjustments occur in loop regions where missing residues are inserted. PDBFixer uses a knowledge-based approach to sample plausible conformations for these gaps. While this introduces minor backbone deviations compared to the original (incomplete) model, it is essential for:

  • Creating a topologically continuous protein chain.
  • Enabling molecular dynamics simulations that require bonded connectivity.
  • Supporting accurate ligand docking into binding sites that may be partially occluded by unresolved loops.

Crucially, core secondary elements (α-helices, β-sheets) and the positions of all originally resolved atoms remain unchanged. Heteroatoms—including ligands, cofactors, ions, and water molecules—are preserved exactly as deposited. This ensures that biologically critical features (e.g., active sites, metal coordination spheres) are not altered by our pipeline.

5. Output Format and File Naming

The final repaired structure is saved as a standard PDB file named {pdb_id}_fixed.pdb. It includes:

  • A corrected ATOM section with complete residue/atom topology.
  • Unmodified HETATM records for all non-polymer entities.
  • Updated HELIX and SHEET records with renumbered residues.
  • Original CRYST1, REMARK, and footer content.
  • A new REMARK indicating processing by FIXED PDB and the OpenMM/PDBFixer version used.

This format is immediately compatible with simulation packages (GROMACS, AMBER, NAMD), docking tools (AutoDock, Glide), and visualization software—eliminating the need for manual preprocessing.

For full transparency, our entire pipeline—including Python scripts and configuration files—is openly shared on GitHub. Researchers can reproduce, validate, or extend our methodology as needed.

Using the Tools

FIXED PDB provides a suite of integrated tools to visualize, download, and programmatically access curated protein structures. Whether you're inspecting a single entry or automating large-scale data retrieval, our platform is designed for flexibility and scientific utility.

1. 3D Molecular Viewer (LiteMol Integration)

Every search result in FIXED PDB includes an interactive 3D molecular viewer powered by LiteMol, a lightweight and high-performance visualization library. This eliminates the need for external software to inspect structural features.

How to Use the Viewer:
  1. Perform a search on the Explore Data page.
  2. Click the “More Info” button next to any PDB entry.
  3. A collapsible row will expand, showing protein metadata on the left and the 3D viewer on the right.
  4. The structure loads automatically and is rendered as a cartoon representation by default—ideal for visualizing secondary structure elements (α-helices as ribbons, β-sheets as arrows).
Viewer Controls & Features:
  • Rotation: Click and drag to rotate the molecule.
  • Zoom: Scroll with your mouse wheel or pinch on touch devices.
  • Pan: Hold Shift + drag to move the molecule within the viewport.
  • Context Menu: You can access rendering options (e.g., switch to ball-and-stick, surface, or wireframe views).
  • Ligand Visibility: Heteroatoms (ligands, ions, cofactors) are automatically loaded and can be highlighted via the LiteMol context menu.
  • Responsive Design: The viewer adapts to screen size—ideal for use on laptops, tablets, or mobile devices.

The viewer loads the exact {pdb_id}_fixed.pdb file hosted on our servers, ensuring you’re inspecting the same curated structure you’ll download.

2. Data Download Options

FIXED PDB supports multiple download strategies to accommodate individual researchers and large-scale computational pipelines.

A. Individual File Download
  • From the search results table, click the “PDB File” link in the Download column.
  • The file is named {pdb_id}_fixed.pdb and is served directly from https://fixedpdb.com/data/{pdb_id}_fixed.pdb.
  • No login or registration is required—downloads begin immediately.
B. Bulk Download (All Search Results)

For large-scale studies, use the “Download All” button above the results table:

  • All PDB files matching your current search query are packaged into ZIP archives.
  • Files are batched into groups of 1,000 structures per ZIP to ensure reliable delivery and avoid browser timeouts.
  • A real-time progress bar shows download status, including completed files, failures, and batch count.
  • You can cancel the download at any time using the Cancel Download button.
  • Failed downloads (e.g., due to network issues) are logged in the browser console for debugging.
C. Metadata Export (CSV)
  • Click the “Download CSV” button to export all search result metadata (PDB ID, classification, organism, resolution, deposition date, method).
  • This CSV file is ideal for integration into Python, R, or Excel-based analysis workflows.
  • It includes the original RCSB link and our fixed file URL for cross-referencing.

3. Programmatic Access & Scripting

FIXED PDB is fully compatible with automated workflows. Every search query generates a unique, persistent URL that encodes all parameters, enabling reproducible and scriptable access.

URL-Based Query Structure

All search and filter parameters are passed as URL query strings. For example:

https://fixedpdb.com/searchdb.php?
  organism=Homo+sapiens&
  method=X-RAY&
  min_res=1.0&
  max_res=2.5&
  class=kinase&
  page=1&
  rows-per-page=50

You can construct these URLs manually or generate them dynamically in scripts.

Example: Fetching Structures via Python
import requests
from urllib.parse import urlencode

# Define your search criteria
params = {
    'organism': 'Homo sapiens',
    'method': 'X-RAY',
    'min_res': '1.0',
    'max_res': '2.5',
    'class': 'kinase',
    'page': 1,
    'rows-per-page': 25
}

# Build the URL
base_url = "https://fixedpdb.com/searchdb.php"
url = f"{base_url}?{urlencode(params)}"

# Fetch the page (for scraping or parsing)
response = requests.get(url)
# Note: For bulk file URLs, use the /get_all_results.php endpoint (see below)
Advanced: Bulk File List via API Endpoint

To retrieve a machine-readable list of all PDB files matching a query (without HTML), use the get_all_results.php endpoint:

GET https://fixedpdb.com/get_all_results.php?organism=Escherichia+coli&method=NMR

This returns a JSON object like:

{
  "files": [
    {"id": "1abc", "url": "https://fixedpdb.com/data/1abc_fixed.pdb"},
    {"id": "2xyz", "url": "https://fixedpdb.com/data/2xyz_fixed.pdb"},
    ...
  ]
}

This is ideal for integration with workflow managers or large-scale docking pipelines.

Best Practices for Scripting:
  • Include delays between requests to avoid overwhelming the server (time.sleep(0.5) in Python).
  • Parse the CSV export for metadata instead of scraping HTML when possible.
  • Use the rows-per-page=100 parameter to minimize the number of HTTP requests.
  • All fixed PDB files are static and cached—once downloaded, they won’t change unless reprocessed in future database updates.

With these tools, FIXED PDB seamlessly bridges interactive exploration and high-throughput computation—making it a versatile resource for education, research, and industrial drug discovery.

Who Can Use Our Website?

FIXED PDB is intentionally designed as a universal resource—bridging the gap between raw experimental structural data and computational readiness. Its curated, simulation-ready files serve a broad spectrum of users, from undergraduate students to industry professionals. Below is a detailed breakdown of how different user communities can benefit from our platform.

1. Researchers & Academics

For academic scientists conducting structural biology, biophysics, or molecular modeling research, FIXED PDB eliminates a major bottleneck: the repetitive and error-prone manual curation of PDB files before simulations or analyses. Whether you're:

  • Preparing structures for molecular dynamics (MD) simulations using AMBER, GROMACS, or NAMD,
  • Running protein–ligand or protein–protein docking with AutoDock, Glide, or HADDOCK,
  • Performing comparative modeling or binding site analysis, or
  • Writing a manuscript that requires consistent, high-quality structural inputs,

our database provides publication-ready models that maintain experimental integrity while resolving topological gaps.

2. Students & Educators

Teaching and learning structural concepts becomes significantly more intuitive when students can visualize complete, biologically plausible protein folds—without being distracted by missing loops or broken chains. FIXED PDB supports education by offering:

  • Interactive 3D viewing via the embedded LiteMol.js viewer—no software installation required—allowing students to explore secondary structure, domain organization, and ligand binding in real time.
  • Consistent structural representations that align with textbook diagrams, making it easier to correlate sequence with 3D architecture.
  • Reliable download formats compatible with free educational tools like PyMOL Edu, and UCSF Chimera.
  • Searchable metadata (organism, classification, resolution) that enables classroom exercises in bioinformatics and evolutionary biology.

Instructors can build assignments or demonstrations around specific protein families (e.g., “Find all human kinases with resolution ≤ 2.0 Å”) and distribute direct download links to fixed structures.

3. Drug Discovery Professionals

In pharmaceutical and biotech settings, speed and accuracy are critical. FIXED PDB accelerates early-stage drug discovery by delivering force-field–compatible target structures that are immediately suitable for:

  • Virtual screening campaigns across hundreds of targets without manual cleanup.
  • Structure-based lead optimization, where accurate side-chain placement and loop geometry influence binding affinity predictions.
  • Target validation through rapid structural assessment of disease-relevant proteins (e.g., SARS-CoV-2 proteases, oncogenic kinases).
  • Homology modeling templates that are topologically complete and residue-numbering–consistent.

Because all heteroatoms (including co-crystallized ligands, cofactors, and metal ions) are preserved exactly as deposited, active site geometry remains intact—crucial for reliable docking and pharmacophore modeling.

4. Bioinformaticians & Computational Scientists

For developers and data scientists building large-scale structural pipelines, FIXED PDB offers a standardized, machine-readable resource that integrates seamlessly into automated workflows:

  • Programmatic access via parameterized URLs (e.g., ?organism=Homo+sapiens&method=X-RAY&max_res=2.5) enables scripted queries and bulk metadata retrieval.
  • Bulk download endpoints (via get_all_results.php) return structured JSON lists of file URLs, ideal for integration with workflow managers like Snakemake, Nextflow, or Apache Airflow.
  • CSV metadata exports facilitate downstream statistical analysis, machine learning feature engineering, or database enrichment.
  • Reproducible curation logic: the entire processing pipeline (including HELIX/SHEET renumbering and nonstandard residue replacement) is open-source on GitHub, allowing users to audit, extend, or replicate the methodology.

This eliminates the need to maintain in-house PDB preprocessing scripts—reducing technical debt and ensuring consistency across collaborative projects.

5. Structural Biologists & Crystallographers

Even experimentalists benefit from FIXED PDB when:

  • Preparing figures for publication that require uninterrupted polypeptide chains for clear visualization.
  • Sharing simulation-ready versions of their structures with computational collaborators.

In all cases, FIXED PDB respects the original experimental data—only adding atoms where gaps exist, never altering resolved coordinates. This makes it a trustworthy extension of the RCSB archive rather than a substitute.

Ultimately, FIXED PDB is built for anyone who needs a protein structure that “just works”—whether you’re running a 100,000-CPU-hour simulation or showing your first alpha-helix to a classroom of undergraduates.

How Our Database Works

FIXED PDB is built on a robust, scalable architecture that bridges automated structural curation with intuitive web-based access. From data ingestion to file delivery, every component is designed for scientific rigor, reproducibility, and performance. Below is a detailed breakdown of our end-to-end system.

1. Database Architecture

The FIXED PDB backend combines a relational database with static file storage to balance structured querying and high-throughput delivery:

  • MySQL Database: Hosts all searchable metadata extracted from PDB headers, including:
    • PDB ID, protein title, classification, and source organism
    • Experimental method (X-ray, NMR, EM), resolution (for X-ray/EM), and deposition date
    • Original RCSB URL and our fixed file URL
  • PHP Application Layer: Serves the dynamic web interface (searchdb.php, get_all_results.php, download_csv.php), handling user requests, query construction, and pagination logic.
  • Static File Hosting: All curated PDB files ({pdb_id}_fixed.pdb) are stored as static assets on our server and served directly via Apache/Nginx for maximum speed and reliability—bypassing PHP for raw file downloads.

This hybrid design ensures that metadata queries (e.g., “Find all kinases with resolution < 2.0 Å”) are fast and flexible, while file downloads benefit from the efficiency of direct HTTP delivery.

2. Data Ingestion Pipeline

Our pipeline continuously pulls structures from the official RCSB PDB repository using automated scripts:

  • Source: All entries originate directly from RCSB.org via their rsync endpoints.
  • Selection Criteria: We prioritize entries with clear experimental evidence (X-ray, EM, NMR) and exclude purely theoretical models.
  • Metadata Extraction: Critical header records—HEADER, TITLE, COMPND, SOURCE, EXPDTA, REMARK 2—are parsed to populate our MySQL database fields.
  • Update Cycle: The database is refreshed weekly to incorporate newly released structures and updates from RCSB.

3. Curation Workflow Automation

The core of FIXED PDB is a fully automated, parallelized curation pipeline written in Python. The workflow (as implemented in src/pdb_processor.py and batch_process.py) follows these steps:

  1. PDBFixer Processing:
    • Missing residues are identified by comparing SEQRES records against atomic coordinates.
    • Terminal gaps (N-/C-termini) are intentionally skipped to avoid biologically unsupported extensions.
    • Nonstandard residues (e.g., phosphorylated serine) are replaced with standard counterparts to ensure force-field compatibility.
    • Missing heavy atoms are added using idealized geometry.
    • Localized energy minimization (500 steps, OpenMM) is applied only to newly added atoms.
  2. HELIX/SHEET Record Correction:
    • Secondary structure assignments from the original file are preserved.
    • Residue numbering in HELIX and SHEET records is dynamically adjusted using per-chain offsets (addnum), ensuring correct alignment with the repaired topology.
    • Implemented in dedicated modules: helix_fixer.py and sheet_fixer.py.
  3. REMARK Preservation:
    • A new REMARK is appended indicating processing by FIXED PDB and the OpenMM/PDBFixer version.
  4. Parallel Execution: The batch_process.py script uses Python’s multiprocessing.Pool to fix hundreds of structures concurrently—scaling with available CPU cores.

4. Metadata Indexing

To enable fast, responsive search across >150,000 entries, we employ several indexing strategies:

  • Full-Text Search: MySQL’s MATCH ... AGAINST syntax powers the main keyword search, indexing PDB ID, title, classification, organism, and method fields.
  • B-Tree Indexes: Used for range queries (e.g., resolution 1.0–2.5 Å, date ranges) and exact-match filters (e.g., method = “X-RAY”).
  • Date Formatting: Deposition dates are stored in DD-Mon-YY format (e.g., “15-Jan-23”) to match RCSB conventions and support direct range comparisons using STR_TO_DATE.
  • Resolution Parsing: Numeric resolution values are extracted from strings like “1.85 ANGSTROMS” for accurate sorting and filtering.

This indexing strategy ensures sub-second response times even for complex queries involving multiple criteria.

5. File Hosting & Delivery

User experience and reliability are central to our delivery system:

  • File Structure: Each repaired file is saved as {pdb_id}_fixed.pdb in a flat directory (/data/) for simplicity and fast access.
  • Direct HTTP Downloads: Files are served as static assets—users receive them immediately without PHP overhead.
  • Bulk Downloads: The “Download All” feature uses JavaScript (JSZip + FileSaver.js) to fetch files in batches of 1,000, package them into ZIP archives client-side, and trigger browser downloads—avoiding server-side memory limits.
  • CSV Metadata Export: The download_csv.php endpoint dynamically generates a CSV of all search result metadata (ID, organism, resolution, etc.) for integration into scripts or spreadsheets.
  • Programmatic Access: The get_all_results.php endpoint returns a JSON list of file URLs matching any search query, enabling easy integration with workflow managers (Snakemake, Nextflow) or custom pipelines.

Together, these components create a platform that is both scientifically trustworthy and practically efficient—eliminating preprocessing bottlenecks while maintaining full compatibility with simulation and visualization tools.

Frequently Asked Questions (FAQs)

FIXED PDB is designed to streamline computational workflows, but questions often arise about how our files differ from original PDB entries, what changes are made during curation, and how to best use our platform. Below are answers to the most common inquiries from researchers, students, and industry professionals.

1. What makes FIXED PDB files different from those downloaded directly from RCSB.org?

While RCSB PDB provides the original, experimentally determined structures, many entries contain gaps or inconsistencies that hinder computational use. FIXED PDB adds value by:

  • Completing missing polypeptide chains: Missing backbone and side-chain heavy atoms are added using idealized geometry.
  • Replacing nonstandard residues: Modified amino acids (e.g., phosphoserine, selenocysteine) are replaced with their standard counterparts (SER, CYS) to ensure compatibility with AMBER, CHARMM, GROMACS, and other force fields.
  • Preserving experimental context: All heteroatoms (ligands, cofactors, ions) are retained verbatim.
  • Correcting secondary structure records: HELIX and SHEET annotations are renumbered to reflect the corrected residue sequence, ensuring accurate visualization in tools like PyMOL or ChimeraX.

In short: we fix only what’s broken—never altering resolved coordinates or biologically critical features.

2. Do the fixed structures exhibit any conformational changes compared to the original experimental models?

Yes—but only in regions that were originally unresolved.

  • All originally observed atoms remain unchanged.
  • Core secondary structures (α-helices, β-sheets) are never modified.
  • Only newly added atoms undergo local energy minimization (500 steps, OpenMM) to relieve steric clashes—no global relaxation is performed.

These adjustments are essential for creating a topologically continuous chain required by molecular dynamics engines and docking software.

3. Are hydrogen atoms added to the fixed structures?

No. In alignment with standard X-ray crystallography conventions, FIXED PDB files do not include hydrogen atoms. This maintains consistency with the original PDB archive and allows users to add hydrogens according to their specific needs (using simulation package tools with desired protonation states).

4. What happens to terminal residues that are missing in the original structure?

Missing residues at the extreme N- or C-termini are intentionally excluded from modeling. This conservative approach avoids introducing biologically unsupported extensions beyond the experimentally observed chain—a common source of error in automated loop modeling.

5. How can I request a structure that isn’t currently in the FIXED PDB database?

We welcome requests! If you need a specific PDB entry that hasn’t been processed yet, you can:

We prioritize high-impact targets (e.g., drug discovery candidates, educational examples) and typically process requests within 1–3 business days.

6. Can I use FIXED PDB files for molecular dynamics simulations or docking studies?

Yes—this is precisely our intended use case. Every file is:

  • Topologically complete (no missing backbone atoms).
  • Force-field compatible (standard residue types only).
  • Geometrically reasonable (minimized only where necessary).

They are ready for immediate use in GROMACS, AMBER, NAMD, AutoDock, Glide, Schrödinger Suite, and similar platforms—without additional preprocessing.

7. Is the original RCSB PDB entry still accessible?

Absolutely. Each search result includes a direct link to the original RCSB page under “More Info,” allowing you to compare our fixed version with the source data, access experimental details, or retrieve the unmodified file if needed.

8. How often is the database updated?

FIXED PDB is updated on a weekly basis to incorporate newly released structures from RCSB. Our pipeline automatically processes all new entries that meet our inclusion criteria (primarily X-ray and Cryo-EM structures with resolvable chains).

9. Is the source code for the curation pipeline available?

Yes! The entire processing workflow—including PDBFixer integration, and HELIX/SHEET renumbering—is open-source under the MIT License on GitHub: github.com/nazari210/fixedpdb. We encourage inspection, reuse, and contribution from the community.

Support & Contact

We’re committed to ensuring FIXED PDB meets the evolving needs of the global research community. Whether you have a technical question, need assistance with a specific structure, or want to suggest a new feature, our team is here to help.

1. Contact Form (Recommended for General Inquiries)

The easiest way to reach us is through the Contact Form on our website. This form:

  • Securely transmits your message directly to our support inbox.
  • Collects your name, email, subject, and detailed message to ensure we can respond accurately.
  • Is monitored daily by our core team at Isfahan University of Medical Sciences.

Please allow 1–3 business days for a response. For urgent matters, include “URGENT” in the subject line.

2. Direct Contact Information

For direct communication, you may reach us via the following channels:

  • Email: info@fixedpdb.com Ideal for technical questions, bug reports, collaboration proposals, or requests for bulk data access.
  • Phone: +98 913 46 96 246 Available during business hours (Iran Standard Time) for brief inquiries or scheduling discussions.
  • Institutional Address: Department of Medicinal Chemistry, School of Pharmacy and Pharmaceutical Sciences, Isfahan University of Medical Sciences, Hezar Jerib Street, Isfahan, Iran.

All correspondence is treated confidentially and will be addressed by a member of our team with expertise in computational chemistry, structural biology, or software development.

3. Requesting New Files

While FIXED PDB currently hosts over 150,000 curated structures, you may need a specific PDB entry that hasn’t been processed yet. We welcome such requests! Here’s how to submit one:

  1. Identify the PDB ID(s) (e.g., 8XYZ) from the RCSB PDB website.
  2. Use the Contact Form or email us with:
    • The PDB ID(s) you need.
    • Your intended use case (e.g., “molecular dynamics study of SARS-CoV-2 spike protein”).
    • Any urgency or deadline (if applicable).
  3. We’ll process your request using our standard pipeline (PDBFixer + HELIX/SHEET correction) and notify you once the fixed file is available for download—typically within 1–3 business days.

Note: We prioritize requests based on scientific impact, educational value, and feasibility (e.g., single-chain X-ray structures are processed faster than multi-model NMR ensembles).

4. Technical Support & Bug Reports

If you encounter issues with:

  • File downloads or broken links,
  • Search functionality or metadata inaccuracies,
  • 3D viewer rendering or website responsiveness,
  • Discrepancies in fixed PDB files (e.g., missing atoms not added, incorrect residue replacement),

please include the following in your report:

  • PDB ID in question,
  • Browser and operating system (for UI issues),
  • Screenshot or error message (if applicable),
  • Expected vs. observed behavior.

This helps us reproduce and resolve the issue efficiently.

5. Collaboration & Academic Use

FIXED PDB is actively used in academic teaching, and research labs. If you’re:

  • An educator building a structural biology course,
  • A student preparing for national science competitions,
  • A researcher integrating our data into a publication or pipeline,

we encourage you to contact us. We can provide:

  • Custom dataset exports,
  • Guidance on best practices for using fixed PDB files,
  • Co-authorship discussions for substantial collaborations.

Our team has over 300 hours of teaching experience and is passionate about supporting the next generation of scientists.

Thank you for using FIXED PDB. Your feedback helps us build a more robust, reliable, and user-centered resource for structural bioinformatics.

Additional Resources

FIXED PDB is built on transparency, reproducibility, and open science. To empower researchers, educators, and developers, we provide full access to our codebase, technical documentation, and integration guidelines. Whether you want to audit our pipeline, adapt it for your own projects, or contribute improvements, the following resources are available:

GitHub Repository

The entire FIXED PDB project—including data curation scripts, and documentation—is openly hosted on GitHub under the MIT License:

Explore the Repository

The repository includes:

  • Core Processing Pipeline (src/):
    • pdb_processor.py: Integrates PDBFixer to add missing atoms/residues and replace nonstandard residues.
    • helix_fixer.py & sheet_fixer.py: Correct HELIX and SHEET record numbering using per-chain offset logic (addnum).
    • file_combiner.py: Merges the repaired atomic model with original header/footer blocks to preserve CRYST1, REMARK, and other experimental metadata.
  • Batch Execution Tools (batch_process.py): Uses Python’s multiprocessing.Pool for high-throughput parallel processing across thousands of PDB files.
  • Setup & Configuration: Includes setup_directories.py and modular config files to reproduce the pipeline locally.
  • Detailed README: Provides installation instructions, usage examples, and project architecture diagrams.

We welcome bug reports, feature requests, and pull requests from the community. Contributions help improve the reliability and scope of this resource for everyone.