Inscinstech's CMC knowledge base v2.2: What's in it and why it matters
When customers ask "what makes inscinstech.ai different from a wrapper around a general-purpose LLM?", the honest answer has three parts:
- The agents (InBeacon · InPrism · InAnvil · InForge) — the productized workflows
- The device-software loop (NestoPure · OligoMS · CDSystem) — the connection to physical instrumentation
- The CMC knowledge base v2.2 — the domain corpus the agents are calibrated on
This post is about (3). It is the most-asked-about piece and the least-public-facing. Let us open the hood.
What it is
AI4CMC v2.2 is an Inscinstech-internal knowledge base built over five years of biopharma process development work. It is proprietary — not shared, not exfiltrated, not used to train shared models. It is structured — every entry is annotated with metadata for retrieval. It is calibrated — paired with real wet-lab outcomes wherever those exist.
The v2.2 designation reflects the third major revision (v1 was an internal wiki; v2 was the first structured form; v2.2 is the production form serving the agents today).
What is in it
The 82+ curated entries break down roughly as:
Capture (chromatography) — 18 entries
- Protein A behaviors across mAb classes (IgG1/2/4, IgM, IgA fusions)
- IEX (CEX and AEX) parameter ranges for typical mAb pI windows
- HIC behavior for hydrophobic mAbs and ADCs
- Bind/elute vs flow-through strategies
Polishing — 14 entries
- AEX in flow-through mode (the workhorse)
- Mixed-mode resin selection
- Hydrophobic charge induction
- SEC for aggregate removal vs analytical sizing
Viral safety — 9 entries
- Low-pH inactivation protocols (pH 3.5 vs pH 3.7 vs pH 4.0)
- UVC inactivation parameters and validation
- Nano-filtration (Planova 20N vs 75N selection)
- Acceptance criteria across regulatory regions
Ultrafiltration / TFF — 8 entries
- Pellicon 3 vs Sartocon Slice vs Hydrosart trade-offs
- Membrane MWCO selection
- Diafiltration volume optimization
Formulation — 11 entries
- Buffer system selection (acetate vs citrate vs phosphate vs histidine)
- Excipient choice for stability vs viscosity vs immunogenicity
- Container-closure compatibility data
Impurity characterization — 9 entries
- HCP risk profiles by cell line
- Protein A leach acceptance criteria
- Residual DNA assays and limits
- Aggregate quantification methods
FDA review document distillations — 8 entries
- Cross-precedent comparisons (e.g., "all mAb biosimilar reviews 2020-2025")
- Common review questions and the precedent answers
- Specification-setting precedents
Process precedents — 5 entries
- Reference process trees for mAb, ADC, oligo, biosimilar, BsAb
- "If your molecule looks like X, your process probably looks like Y"
What is not in it
Three things we explicitly do not put in v2.2:
- Customer-specific data. A customer's wet-lab feedback may calibrate v2.2 entries (with consent, on Bespoke tier). The customer's raw data never enters v2.2 directly.
- Unpublished IP. Anything that touches a customer's proprietary chemistry stays in the customer's tenant namespace.
- Speculation. If an entry is not backed by either Inscinstech process history, FDA review precedent, or peer-reviewed literature, it does not enter v2.2.
How the agents use it
Each of the four agents touches v2.2 differently:
- InForge uses v2.2 as its primary source. When you ask "what's a reasonable polishing strategy for this molecule?", InForge searches v2.2 plus FDA review docs and synthesizes an answer with citations to the underlying entry.
- InAnvil uses v2.2 for calibration. The developability scoring uses open-source tools (BioPhi, TAP, SAP, Boltz-2, ProteinMPNN, etc.), but the risk thresholds are calibrated against v2.2's real-outcome data. A "low risk" call from InAnvil means "based on what we have seen in v2.2, this is consistent with successfully manufactured molecules."
- InPrism uses v2.2 as one of many corpora for literature search. The agent will tell you when v2.2 is the source vs PubMed vs FDA guidance vs a customer-uploaded PDF.
- InBeacon does not touch v2.2 directly — it is an intelligence agent, not a domain agent.
What "calibrated against real outcomes" actually means
A specific example. The InAnvil aggregation risk threshold for mAbs is set so that:
- If the prediction says "low risk" (≤ 0.5% HMW projected), v2.2 retrospective data shows that 93% of such molecules made it through process development without an aggregation issue.
- If the prediction says "medium risk" (0.5%–2% HMW projected), the rate drops to ~70%.
- If the prediction says "high risk" (> 2% HMW projected), the rate drops below 40%.
These thresholds were not picked from a paper. They were picked from looking at v2.2 outcomes and choosing thresholds where the actionability is reliable. As v2.2 grows, these get recalibrated.
That is what "calibrated against real outcomes" means in practice — and that is the layer a general-purpose LLM cannot provide.
How v2.2 evolves
Three update paths:
- Process completions. When a real CMC project completes inside Inscinstech or with a Bespoke partner (with consent), the outcome data updates the relevant entry.
- Regulatory updates. When the FDA, NMPA, or EMA publishes new guidance, v2.2 entries that reference the old version are flagged for review. We do not silently drift.
- Quarterly review. Every quarter, the team triages newly published literature for entries that should be added or updated.
Maintainer: the Inscinstech CMC team plus external scientific advisors. Review cadence: quarterly. Public changelog: a redacted version (no customer data, no unpublished IP) is being prepared for /resources/changelog.
How to access it
Tier-by-tier:
- Free / Pro: No direct access. v2.2 informs InAnvil and InForge outputs but the entries themselves are not retrievable.
- Team: Query access — InPrism can search v2.2 alongside other corpora and return cited paragraphs.
- Enterprise: Full retrieval. You can request specific entries and see the underlying outcome data (anonymized).
- Bespoke: Co-build relationship — your wet-lab outcomes calibrate v2.2 entries specific to your pipeline; you share IP on the resulting predictions.
Why this matters
If you have ever asked a general-purpose LLM "what's a reasonable Protein A elution buffer for this mAb?", you got a generic answer. The answer was probably correct — chromatography buffer chemistry has not changed much in 20 years. But generic answers do not encode the institutional knowledge: which resin you actually have in your facility, what your team has historically had problems with, what the FDA reviewers at the relevant Office have asked about in the past.
v2.2 is the encoding of that institutional knowledge for the customers and partners who choose to leverage it. It is the part that is hard to build and hard to copy. The agents are the surface; the corpus is the moat.
For the public-facing version of how this fits together: /products. For the technical specification of what each agent does with it: /products/inforge and /products/inanvil.
Keep reading.
How to set up a global biopharma intelligence pipeline in 30 minutes
A practical 30-minute setup for a daily biopharma intelligence pipeline — 7+ sources, cross-source dedup, AI summaries, push to Slack / WeChat / email.
FDA vs NMPA vs EMA: How they differ on mAb biosimilar guidance in 2026
A side-by-side look at how the FDA, NMPA, and EMA handle mAb biosimilar guidance in 2026 — where they converge, where they still diverge, and what it means for your filing strategy.
The state of open-source Agent frameworks in 2026: Hermes, LangGraph, and what we picked
We evaluated 8 Agent frameworks before picking the one we run inCore on. Here are the trade-offs we considered, what we chose, and why we forked it.