Skip to main content
Back to blog

Inscinstech's CMC knowledge base v2.2: What's in it and why it matters

When customers ask "what makes inscinstech.ai different from a wrapper around a general-purpose LLM?", the honest answer has three parts:

  1. The agents (InBeacon · InPrism · InAnvil · InForge) — the productized workflows
  2. The device-software loop (NestoPure · OligoMS · CDSystem) — the connection to physical instrumentation
  3. The CMC knowledge base v2.2 — the domain corpus the agents are calibrated on

This post is about (3). It is the most-asked-about piece and the least-public-facing. Let us open the hood.

What it is

AI4CMC v2.2 is an Inscinstech-internal knowledge base built over five years of biopharma process development work. It is proprietary — not shared, not exfiltrated, not used to train shared models. It is structured — every entry is annotated with metadata for retrieval. It is calibrated — paired with real wet-lab outcomes wherever those exist.

The v2.2 designation reflects the third major revision (v1 was an internal wiki; v2 was the first structured form; v2.2 is the production form serving the agents today).

What is in it

The 82+ curated entries break down roughly as:

Capture (chromatography) — 18 entries

  • Protein A behaviors across mAb classes (IgG1/2/4, IgM, IgA fusions)
  • IEX (CEX and AEX) parameter ranges for typical mAb pI windows
  • HIC behavior for hydrophobic mAbs and ADCs
  • Bind/elute vs flow-through strategies

Polishing — 14 entries

  • AEX in flow-through mode (the workhorse)
  • Mixed-mode resin selection
  • Hydrophobic charge induction
  • SEC for aggregate removal vs analytical sizing

Viral safety — 9 entries

  • Low-pH inactivation protocols (pH 3.5 vs pH 3.7 vs pH 4.0)
  • UVC inactivation parameters and validation
  • Nano-filtration (Planova 20N vs 75N selection)
  • Acceptance criteria across regulatory regions

Ultrafiltration / TFF — 8 entries

  • Pellicon 3 vs Sartocon Slice vs Hydrosart trade-offs
  • Membrane MWCO selection
  • Diafiltration volume optimization

Formulation — 11 entries

  • Buffer system selection (acetate vs citrate vs phosphate vs histidine)
  • Excipient choice for stability vs viscosity vs immunogenicity
  • Container-closure compatibility data

Impurity characterization — 9 entries

  • HCP risk profiles by cell line
  • Protein A leach acceptance criteria
  • Residual DNA assays and limits
  • Aggregate quantification methods

FDA review document distillations — 8 entries

  • Cross-precedent comparisons (e.g., "all mAb biosimilar reviews 2020-2025")
  • Common review questions and the precedent answers
  • Specification-setting precedents

Process precedents — 5 entries

  • Reference process trees for mAb, ADC, oligo, biosimilar, BsAb
  • "If your molecule looks like X, your process probably looks like Y"

What is not in it

Three things we explicitly do not put in v2.2:

  1. Customer-specific data. A customer's wet-lab feedback may calibrate v2.2 entries (with consent, on Bespoke tier). The customer's raw data never enters v2.2 directly.
  2. Unpublished IP. Anything that touches a customer's proprietary chemistry stays in the customer's tenant namespace.
  3. Speculation. If an entry is not backed by either Inscinstech process history, FDA review precedent, or peer-reviewed literature, it does not enter v2.2.

How the agents use it

Each of the four agents touches v2.2 differently:

  • InForge uses v2.2 as its primary source. When you ask "what's a reasonable polishing strategy for this molecule?", InForge searches v2.2 plus FDA review docs and synthesizes an answer with citations to the underlying entry.
  • InAnvil uses v2.2 for calibration. The developability scoring uses open-source tools (BioPhi, TAP, SAP, Boltz-2, ProteinMPNN, etc.), but the risk thresholds are calibrated against v2.2's real-outcome data. A "low risk" call from InAnvil means "based on what we have seen in v2.2, this is consistent with successfully manufactured molecules."
  • InPrism uses v2.2 as one of many corpora for literature search. The agent will tell you when v2.2 is the source vs PubMed vs FDA guidance vs a customer-uploaded PDF.
  • InBeacon does not touch v2.2 directly — it is an intelligence agent, not a domain agent.

What "calibrated against real outcomes" actually means

A specific example. The InAnvil aggregation risk threshold for mAbs is set so that:

  • If the prediction says "low risk" (≤ 0.5% HMW projected), v2.2 retrospective data shows that 93% of such molecules made it through process development without an aggregation issue.
  • If the prediction says "medium risk" (0.5%–2% HMW projected), the rate drops to ~70%.
  • If the prediction says "high risk" (> 2% HMW projected), the rate drops below 40%.

These thresholds were not picked from a paper. They were picked from looking at v2.2 outcomes and choosing thresholds where the actionability is reliable. As v2.2 grows, these get recalibrated.

That is what "calibrated against real outcomes" means in practice — and that is the layer a general-purpose LLM cannot provide.

How v2.2 evolves

Three update paths:

  1. Process completions. When a real CMC project completes inside Inscinstech or with a Bespoke partner (with consent), the outcome data updates the relevant entry.
  2. Regulatory updates. When the FDA, NMPA, or EMA publishes new guidance, v2.2 entries that reference the old version are flagged for review. We do not silently drift.
  3. Quarterly review. Every quarter, the team triages newly published literature for entries that should be added or updated.

Maintainer: the Inscinstech CMC team plus external scientific advisors. Review cadence: quarterly. Public changelog: a redacted version (no customer data, no unpublished IP) is being prepared for /resources/changelog.

How to access it

Tier-by-tier:

  • Free / Pro: No direct access. v2.2 informs InAnvil and InForge outputs but the entries themselves are not retrievable.
  • Team: Query access — InPrism can search v2.2 alongside other corpora and return cited paragraphs.
  • Enterprise: Full retrieval. You can request specific entries and see the underlying outcome data (anonymized).
  • Bespoke: Co-build relationship — your wet-lab outcomes calibrate v2.2 entries specific to your pipeline; you share IP on the resulting predictions.

Why this matters

If you have ever asked a general-purpose LLM "what's a reasonable Protein A elution buffer for this mAb?", you got a generic answer. The answer was probably correct — chromatography buffer chemistry has not changed much in 20 years. But generic answers do not encode the institutional knowledge: which resin you actually have in your facility, what your team has historically had problems with, what the FDA reviewers at the relevant Office have asked about in the past.

v2.2 is the encoding of that institutional knowledge for the customers and partners who choose to leverage it. It is the part that is hard to build and hard to copy. The agents are the surface; the corpus is the moat.

For the public-facing version of how this fits together: /products. For the technical specification of what each agent does with it: /products/inforge and /products/inanvil.

Inscinstech's CMC knowledge base v2.2: What's in it and why it matters | inscinstech.ai