SCIB is a framework used to benchmark different methods for single-cell data integration. In the original publication, the authors benchmarked different batch effect removal methods in terms of scalability and usability. The methods are assessed in different metrics, in both technical effect removing and biological conservation.

In this notebook, we will use the `api`

from SCIB to run different batch effect removal methods for the human pancreas data from `GSE81076`

, `GSE85241`

, `GSE86469`

, `GSE84133`

, `GSE81608`

then compute the evaluation metrics for the integrated results.

In [1]:

```
import scanpy as sc
import pandas as pd
import os
import sys
# Required step before importing scib
RHOME = os.path.join(sys.executable.split('/bin/')[0],'lib/R')
from rpy2.rinterface_lib import openrlib
openrlib.R_HOME = RHOME
import scib
```

In [4]:

```
import urllib.request
pancreas_data, _ = urllib.request.urlretrieve(
'https://www.dropbox.com/s/xbu8a1uw36ip5ia/human_pancreas_norm_complexBatch.h5ad?dl=1',
'human_pancreas_norm_complexBatch.h5ad'
)
# Import adata
adata = sc.read_h5ad(pancreas_data)
integration_result = {}
bench_result = {}
BATCH_KEY = 'tech'
CELLTYPE_KEY = 'celltype'
```

Find highly-variable features

In [7]:

```
# Total-count normalize (library-size correct) to 10,000 reads/cell
sc.pp.normalize_per_cell(adata, counts_per_cell_after=1e4)
# log transform the data.
sc.pp.log1p(adata)
# identify highly variable genes.
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)
# keep only highly variable genes:
adata = adata[:, adata.var['highly_variable']]
```

In this notebook, we will run five methods **Harmony, BBKNN, Scanorama, Combat, Saucie** for the illustration purpose. SCIB provideds a simple API to run other batch effect removal methods as below:

`scib.integration.<method>(adata, batch=<batch_key>)`

You just need to replace the method name in the function above by your method. For example, the function used to run `bbknn`

with the batch key `batch`

will be `scib.integration.scanorama(adata, batch='batch')`

The full list of supported Python methods is: **scanorama, trvae, trvaep, scgen, mnn, bbknn, scvi, scanvi, combat, saucie, desc**

Note that some methods that involve the training steps like trvae, trvaep, scgen, scvi, scanvi will run faster with GPU

In [10]:

```
# Run PCA prior to run Harmony
sc.tl.pca(adata)
# Harmony runs through R so:
sc.external.pp.harmony_integrate(adata, BATCH_KEY)
```

In [11]:

```
eval_metrics = {
'isolated_labels_asw_': True,
'silhouette_':True,
'hvg_score_':True,
'graph_conn_':True,
'pcr_':True,
'isolated_labels_f1_':True,
'nmi_':True,
'ari_':True,
'cell_cycle_':True,
'kBET_':True,
'ilisi_':True,
'clisi_':True
}
```

In [12]:

```
# Compute the evaluation metrics for Harmony
bench_result['harmony'] = scib.me.metrics(
adata,
adata,
batch_key = BATCH_KEY,
label_key = CELLTYPE_KEY,
verbose = True,
embed = 'X_pca_harmony',
organism = 'human',
**eval_metrics
)
```

In [13]:

```
bench_result['harmony'].rename(columns = {0 : 'harmony'}, inplace = True)
bench_result['harmony']
```

Out[13]:

In [14]:

```
# Run bbknn
integration_result['bbknn'] = scib.integration.bbknn(adata, BATCH_KEY)
# Eval bbknn
bench_result['bbknn'] = scib.me.metrics(
adata,
integration_result['bbknn'],
batch_key = BATCH_KEY,
label_key = CELLTYPE_KEY,
verbose = True, embed = 'X_pca',
organism = 'human',
**eval_metrics
)
```

In [15]:

```
# Rename column
bench_result['bbknn'].rename(columns = {0 : 'bbknn'}, inplace = True)
bench_result['bbknn']
```

Out[15]:

In [16]:

```
# Run scanorama
integration_result['scanorama'] = scib.integration.scanorama(adata, BATCH_KEY)
integration_result['scanorama'].obs[BATCH_KEY] = adata.obs[BATCH_KEY]
integration_result['scanorama'].obs[CELLTYPE_KEY] = adata.obs[CELLTYPE_KEY].astype('category')
# Bench scanorama
bench_result['scanorama'] = scib.me.metrics(
adata,
integration_result['scanorama'],
batch_key = BATCH_KEY,
label_key = CELLTYPE_KEY,
verbose = True,
embed = 'X_pca',
organism = 'human',
**eval_metrics
)
```

In [17]:

```
# Rename column
bench_result['scanorama'].rename(columns = {0 : 'scanorama'}, inplace = True)
bench_result['scanorama']
```

Out[17]:

In [18]:

```
# Run combat
integration_result['combat'] = scib.integration.combat(adata, 'tech')
# Bench combat
bench_result['combat'] = scib.me.metrics(
adata,
integration_result['combat'],
batch_key = BATCH_KEY,
label_key = CELLTYPE_KEY,
verbose = True,
embed = 'X_pca',
organism = 'human',
**eval_metrics
)
```

In [19]:

```
# Rename column
bench_result['combat'].rename(columns = {0 : 'combat'}, inplace = True)
bench_result['combat']
```

Out[19]:

In [20]:

```
# Run saucie
integration_result['saucie'] = scib.integration.combat(adata, 'tech')
# Bench saucie
bench_result['saucie'] = scib.me.metrics(
adata,
integration_result['saucie'],
batch_key = BATCH_KEY,
label_key = CELLTYPE_KEY,
verbose = True,
embed = 'X_pca',
organism = 'human',
**eval_metrics
)
```

In [21]:

```
# Rename column
bench_result['saucie'].rename(columns = {0 : 'saucie'}, inplace = True)
bench_result['saucie']
```

Out[21]:

In [22]:

```
merged_benchscores = pd.concat(bench_result.values(), axis = 1)
```

In [23]:

```
# Metrics for batch effect removal
batch_effect_removal = {
'PCR_batch',
'ARI_cluster/label',
'graph_conn',
}
# Metrics for conservation of biological variance
biological_conservation_metrics = {
'NMI_cluster/label',
'ASW_label',
'ASW_label/batch',
'iLISI',
'cLISI',
'isolated_label_F1',
'isolated_label_silhouette',
'cell_cycle_conservation',
'hvg_overlap'
}
```

The overall score of all scores is calculated by a 40% weighted mean of batch correction metrics per 60% of biological conservation metrics.

In [24]:

```
total_scores = {}
for k in merged_benchscores.keys():
batch_score = merged_benchscores[k].loc[batch_effect_removal].sum()/len(batch_effect_removal)*0.4
conservation_score = merged_benchscores[k].loc[biological_conservation_metrics].sum()/len(biological_conservation_metrics)*0.6
total_scores[k] = batch_score + conservation_score
```

In [25]:

```
totalscore_df = pd.DataFrame.from_dict({"Total score": total_scores}, orient="index")
merged_benchscores = merged_benchscores.append(totalscore_df)
merged_benchscores = merged_benchscores.T
```

In [26]:

```
# Move Total score column up
cols = merged_benchscores.columns.tolist()
cols = cols[-1:] + cols[:-1]
merged_benchscores = merged_benchscores[cols]
```

In [27]:

```
# Sort benchscore table by Total score
merged_benchscores.sort_values(by = 'Total score', ascending = False)
```

Out[27]:

All evaluation metrics below are scaled from 0 to 1. Higher is better.

Batch identity is a covariate that shouldn't contribute to the variance of the expression data. For each Principal component computed from the expression data, we do a linear regression model with PC as the response value and batch identity (one-hot encoded) as the covariate. The ratio of $R^2$ from the linear regression model to the variance of the PCA for each PC is the contribution of the batch to the variance of the expression data.

The PCR value computed by `scib`

is the difference ratio of the contribution mentioned above after batch effect removal

In [28]:

```
merged_benchscores[['PCR_batch']].sort_values(by = 'PCR_batch', ascending = False)
```

Out[28]:

Assesses whether the kNN graph representation (of the integrated data) directly connects all cells with the same cell identity label.

In [29]:

```
merged_benchscores[['graph_conn']].sort_values(by = 'graph_conn', ascending = False)
```

Out[29]:

Determines whether the label composition of a k nearest neighborhood of a cell is similar to the expected (global) label composition. The test is repeated for a random subset of cells, and the results are summarized as a rejection rate over all tested neighborhoods.

In [30]:

```
merged_benchscores[['kBET']].sort_values(by = 'kBET', ascending = False)
```

Out[30]:

Compares the overlap of two cell identities in the integrated dataset, Louvain cluster cell label and cell-type label. NMI scores of 0 or 1 correspond to uncorrelated clustering or a perfect match, respectively.

Louvain clustering was performed at a resolution range of `0.1`

to `2`

in steps of `0.1`

, and the clustering output with the highest NMI with the label set was used.

**scib** uses the NMI implementation from `scikit-learn (v.0.22.1)`

In [31]:

```
merged_benchscores[['NMI_cluster/label']].sort_values(by = 'NMI_cluster/label', ascending = False)
```

Out[31]:

Compares the overlap of two cell identities in the integrated dataset, Louvain cluster cell label and cell-type label. One is the NMI-optimized Louvain clustering and the other is the cell-type labels. `ARI`

of 0 or 1 corresponds to random labeling or a perfect match, respectively.

**scib** uses the ARI implementation from `scikit-learn (v.0.22.1)`

In [32]:

```
merged_benchscores[['ARI_cluster/label']].sort_values(by = 'ARI_cluster/label', ascending = False)
```

Out[32]:

ASW is commonly used to determine the separation of clusters where 1 represents dense and well-separated clusters, while 0 or −1 corresponds to overlapping clusters or strong miscalculation. For more information, please see: "Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis"

Measures the relationship between the `within-cluster`

distances of a cell and the `between-cluster`

distances of that cell to the `closest`

cluster. Averaging over all silhouette widths of a set of cells yields the ASW, which ranges between −1 and 1.

Cell-type ASW is the silhouette of the cell labels and batch ASW measures batch mixing.

The ASW metric for a batch effect removal method is the `Cell-type ASW`

/ `Batch ASW`

In [33]:

```
# ASW_label/batch
merged_benchscores[['ASW_label/batch']].sort_values(by = 'ASW_label/batch', ascending = False)
```

Out[33]:

In [34]:

```
# ASW_label
merged_benchscores[['ASW_label']].sort_values(by = 'ASW_label', ascending = False)
```

Out[34]:

Is a diversity score assess bath mixing (iLISI) and cell-type separation (cLISI). This evaluation metric was proposed by Korsunsky et. al 2019 Harmony

`iLISI`

defines the effective number of datasets in a neighborhood. Neighborhoods represented by only a single dataset get an iLISI of 1, while neighborhoods with an equal number of cells from 2 datasets get an iLISI of 2. Note that even under ideal mixing, if the datasets have different numbers of cells, iLISI would be less than 2

To assess accuracy, the authors use “cell-type LISI” (cLISI), the same mathematical measure, but applied to cell-type instead of dataset labels. Accurate integration should maintain a cLISI of 1, reflecting a separation of unique cell types throughout the embedding.

To extend LISI graph-based integration outputs, the developers of `scib`

developed **graph LISI**, which uses the integrated graph structure as an embedded space for distance calculation.

In [35]:

```
# iLISI
merged_benchscores[['iLISI']].sort_values(by = 'iLISI', ascending = False)
```

Out[35]:

In [36]:

```
# cLISI
merged_benchscores[['cLISI']].sort_values(by = 'cLISI', ascending = False)
```

Out[36]:

Evaluates how well the isolated cell label (the cell type label present in the least number of batches) separate from other cell identities. There are 2 version: the best clustering of the isolated label (F1 score) and the global ASW (average silhouette width) of the isolated label. It gives value between 0 & 1, where 1 shows that all the isolated label cells and no others are captured in the cluster.

In [37]:

```
# isolated_label_F1
merged_benchscores[['isolated_label_F1']].sort_values(by = 'isolated_label_F1', ascending = False)
```

Out[37]:

In [38]:

```
# isolated_label_silhouette
merged_benchscores[['isolated_label_silhouette']].sort_values(by = 'isolated_label_silhouette', ascending = False)
```

Out[38]:

Evaluates how well the cell-cycle effect can be captured before and after integration. Values close to 0 indicate lower conservation and 1 indicates complete conservation of the variance explained by cell cycle

The authors computed cell-cycle scores using Scanpy’s `score_cell_cycle`

function with a reference gene set from `Tirosh et al`

. then computed the variance contribution of the resulting S and G2/M phase scores using principal component regression (PCs as the response values and `GS/M`

and `S`

scores as the covariates), which was performed for each batch separately.

Then the differences in variance before, $Var_{before}$, and after, $Var_{after}$, integration were aggregated into a final score between 0 and 1, using the equation

$${\mathrm{CC}}\,{\mathrm{conservation}} = 1 - \frac{{\left| {{\mathrm{Var}}_{{\mathrm{after}}} - {\mathrm{Var}}_{{\mathrm{before}}}} \right|}}{{{\mathrm{Var}}_{{\mathrm{before}}}}}{{{\mathrm{.}}}}$$In [39]:

```
merged_benchscores[['cell_cycle_conservation']].sort_values(by = 'cell_cycle_conservation', ascending = False)
```

Out[39]:

Computes the number of HVGs before and after correction of each batch. The overall HVG score is the mean of the per-batch HVG overlap coefficients.

In [40]:

```
merged_benchscores[['cell_cycle_conservation']].sort_values(by = 'cell_cycle_conservation', ascending = False)
```

Out[40]:

To illustrate the integration results of different methods, we draw the UMAP plot with cells colored by batches and the cell types

In [41]:

```
sc.tl.umap(integration_result['bbknn'])
sc.pl.umap(integration_result['bbknn'], color = ['tech', 'celltype'])
```

In [42]:

```
sc.tl.umap(integration_result['combat'])
sc.pl.umap(integration_result['combat'], color = ['tech', 'celltype'])
```

In [43]:

```
sc.tl.umap(integration_result['saucie'])
sc.pl.umap(integration_result['saucie'], color = ['tech', 'celltype'])
```

In [44]:

```
sc.tl.umap(integration_result['scanorama'])
sc.pl.umap(integration_result['scanorama'], color = ['tech', 'celltype'])
```

In [45]:

```
# Find neigbors for adata
sc.pp.neighbors(adata, use_rep="X_pca_harmony")
```

In [46]:

```
sc.tl.umap(adata)
sc.pl.umap(adata, color = ['tech', 'celltype'])
```