2026-05-128 min

Inscinstech's CMC knowledge base v2.2: What's in it and why it matters

Inscinstech CMC team· CMC + Knowledge engineering

When customers ask "what makes inscinstech.ai different from a wrapper around a general-purpose LLM?", the honest answer has three parts:

The product (InCortex) — the productized CMC process-development workflow
The device-software loop (NestoPure · OligoMS · CDSystem) — the connection to physical instrumentation
The CMC knowledge base v2.2 — the domain corpus InCortex is calibrated on

This post is about (3). It is the most-asked-about piece and the least-public-facing. Let us open the hood.

What it is

AI4CMC v2.2 is an Inscinstech-internal knowledge base built over five years of biopharma process development work. It is proprietary — not shared, not exfiltrated, not used to train shared models. It is structured — every entry is annotated with metadata for retrieval. It is calibrated — paired with real wet-lab outcomes wherever those exist.

The v2.2 designation reflects the third major revision (v1 was an internal wiki; v2 was the first structured form; v2.2 is the production form serving the agents today).

What is in it

The 82+ curated entries break down roughly as:

Capture (chromatography) — 18 entries

Protein A behaviors across mAb classes (IgG1/2/4, IgM, IgA fusions)
IEX (CEX and AEX) parameter ranges for typical mAb pI windows
HIC behavior for hydrophobic mAbs and ADCs
Bind/elute vs flow-through strategies

Polishing — 14 entries

AEX in flow-through mode (the workhorse)
Mixed-mode resin selection
Hydrophobic charge induction
SEC for aggregate removal vs analytical sizing

Viral safety — 9 entries

Low-pH inactivation protocols (pH 3.5 vs pH 3.7 vs pH 4.0)
UVC inactivation parameters and validation
Nano-filtration (Planova 20N vs 75N selection)
Acceptance criteria across regulatory regions

Ultrafiltration / TFF — 8 entries

Pellicon 3 vs Sartocon Slice vs Hydrosart trade-offs
Membrane MWCO selection
Diafiltration volume optimization

Formulation — 11 entries

Buffer system selection (acetate vs citrate vs phosphate vs histidine)
Excipient choice for stability vs viscosity vs immunogenicity
Container-closure compatibility data

Impurity characterization — 9 entries

HCP risk profiles by cell line
Protein A leach acceptance criteria
Residual DNA assays and limits
Aggregate quantification methods

FDA review document distillations — 8 entries

Cross-precedent comparisons (e.g., "all mAb biosimilar reviews 2020-2025")
Common review questions and the precedent answers
Specification-setting precedents

Process precedents — 5 entries

Reference process trees for mAb, ADC, oligo, biosimilar, BsAb
"If your molecule looks like X, your process probably looks like Y"

What is not in it

Three things we explicitly do not put in v2.2:

Customer-specific data. A customer's wet-lab feedback may calibrate v2.2 entries (with consent, on Bespoke tier). The customer's raw data never enters v2.2 directly.
Unpublished IP. Anything that touches a customer's proprietary chemistry stays in the customer's tenant namespace.
Speculation. If an entry is not backed by either Inscinstech process history, FDA review precedent, or peer-reviewed literature, it does not enter v2.2.

How the agents use it

InCortex uses v2.2 in two ways:

As its primary source. When you ask "what's a reasonable polishing strategy for this molecule?", InCortex searches v2.2 plus FDA review docs and synthesizes an answer with citations to the underlying entry.
For calibration. InCortex's DoE and process-prediction models produce risk and outcome estimates; the thresholds behind them are calibrated against v2.2's real-outcome data. A "low risk" call means "based on what we have seen in v2.2, this is consistent with successfully manufactured molecules." v2.2 is also one of several corpora — the answer tells you when v2.2 is the source vs PubMed, FDA guidance, or a customer-uploaded PDF.

What "calibrated against real outcomes" actually means

A specific example. The InCortex aggregation risk threshold for mAbs is set so that:

If the prediction says "low risk" (≤ 0.5% HMW projected), v2.2 retrospective data shows that 93% of such molecules made it through process development without an aggregation issue.
If the prediction says "medium risk" (0.5%–2% HMW projected), the rate drops to ~70%.
If the prediction says "high risk" (> 2% HMW projected), the rate drops below 40%.

These thresholds were not picked from a paper. They were picked from looking at v2.2 outcomes and choosing thresholds where the actionability is reliable. As v2.2 grows, these get recalibrated.

That is what "calibrated against real outcomes" means in practice — and that is the layer a general-purpose LLM cannot provide.

How v2.2 evolves

Three update paths:

Process completions. When a real CMC project completes inside Inscinstech or with a Bespoke partner (with consent), the outcome data updates the relevant entry.
Regulatory updates. When the FDA, NMPA, or EMA publishes new guidance, v2.2 entries that reference the old version are flagged for review. We do not silently drift.
Quarterly review. Every quarter, the team triages newly published literature for entries that should be added or updated.

Maintainer: the Inscinstech CMC team plus external scientific advisors. Review cadence: quarterly. Public changelog: a redacted version (no customer data, no unpublished IP) is being prepared for /resources/changelog.

How to access it

Tier-by-tier:

Free / Pro: No direct access. v2.2 informs InCortex outputs but the entries themselves are not retrievable.
Team: Query access — InCortex can search v2.2 alongside other corpora and return cited paragraphs.
Enterprise: Full retrieval. You can request specific entries and see the underlying outcome data (anonymized).
Bespoke: Co-build relationship — your wet-lab outcomes calibrate v2.2 entries specific to your pipeline; you share IP on the resulting predictions.

Why this matters

If you have ever asked a general-purpose LLM "what's a reasonable Protein A elution buffer for this mAb?", you got a generic answer. The answer was probably correct — chromatography buffer chemistry has not changed much in 20 years. But generic answers do not encode the institutional knowledge: which resin you actually have in your facility, what your team has historically had problems with, what the FDA reviewers at the relevant Office have asked about in the past.

v2.2 is the encoding of that institutional knowledge for the customers and partners who choose to leverage it. It is the part that is hard to build and hard to copy. The product is the surface; the corpus is the moat.

For the public-facing version of how this fits together: /products/incortex.

RELATED AGENT