This notebook uses CellphoneDB to infer the cell-cell communication in the scRNA-Seq dataset GSE96981: Multilineage communication regulates human liver bud development from pluripotency
In the origin publication, the authors screened for the potential ligand-receptor interactions between Mesenchymal cells (MC), and Hepatic cell (HE) in a late-developed human liver bud. By counting the frequency of each MC-HE cell pair that express both the ligand and receptor for an interaction, they found a list of interactions used for experimental verification.
In this notebook, we use CellphoneDB v3 to screen for such potential interactions. One of the interaction that we found using CellPhoneDB, VEFGA signalling (VEFGA-KDR) was confirmed experimentally in the publication to be a driver of human liver bud development
# Import required packages
import pandas as pd
import scanpy as sc
import numpy as np
import sys
import os
Set the PATH
environment variable to run the cellphonedb
cli
os.environ['PATH'] = os.path.dirname(sys.executable) + ":" + os.environ['PATH']
We will use the dataset GSE96981_data.lb.late
, which is the scRNA-Seq data of late-developed human liver bud at five time points.
os.system('wget https://ftp.ncbi.nlm.nih.gov/geo/series/GSE96nnn/GSE96981/suppl/GSE96981_data.lb.late.csv.gz')
Read the expression data into a dataframe
liverbud_late = pd.read_csv('GSE96981_data.lb.late.csv.gz', index_col=0)
meta_cols = ['ident', 'tSNE_1', 'tSNE_2', 'PC1', 'PC2', 'PC3', 'PC4', 'PC5', 'nGene', 'orig.ident']
metadata = liverbud_late[meta_cols]
matrix = liverbud_late.drop(meta_cols, axis=1)
metadata
matrix
Create a scanpy object to store the data
adata_lblate = sc.AnnData(matrix, dtype=np.float32)
adata_lblate.obs = metadata.loc[:, ['ident', 'nGene', 'orig.ident']]
adata_lblate.obsm['X_pca'] = metadata.loc[:, ['PC1', 'PC2', 'PC3', 'PC4', 'PC5']].values
adata_lblate.obsm['X_tsne'] = metadata.loc[:, ['tSNE_1', 'tSNE_2']].values
adata_lblate
Get the cell type annotation from the authors
os.system('wget -O lb_late_celltype.tsv https://www.dropbox.com/s/1dh5b46ik6noqit/lb_late_celltype.tsv?dl=1')
adata_lblate.obs['Cell type'] = pd.read_csv('lb_late_celltype.tsv', sep='\t')['Cell type'].to_list()
sc.pl.tsne(adata_lblate, color=['orig.ident', 'Cell type'], size=50)
We plot some marker genes provided in the publication. The marker genes define three main populations in the dataset: mesenchymal cells (MC), epithelial cells (EC), and Hepatic cells (HC)
sc.pl.tsne(adata_lblate, color=['AFP', 'PECAM1', 'TTR', 'COL1A2'], size=50, ncols=2, color_map='YlOrRd')
Save the scanpy object to a h5ad file
adata_lblate.write_h5ad('liverbud_late.h5ad')
Save the cell annotations into a text file as well
adata_lblate.obs.to_csv('lb_late_meta.tsv', sep = '\t')
Execute cellphonedb
from system
os.system('cellphonedb method statistical_analysis \
lb_late_meta.tsv \
liverbud_late.h5ad \
--counts-data hgnc_symbol \
--output-path lb_late_cellphonedb \
--threshold 0.1')
os.system('ls -lah ./lb_late_cellphonedb')
p-values
for the all the interacting partners: p.value refers to the enrichment of the interacting ligand-receptor pair in each of the interacting pairs of cell types. The p-value results are stored in the pvalues.txt
file
pvalues = pd.read_csv('lb_late_cellphonedb/pvalues.txt', sep='\t')
pvalues.head()
Mean values for all the interacting partners are stored in the file means.txt
. Mean value refers to the total mean of the individual partner average expression values in the corresponding interacting pairs of cell types. If one of the mean values is 0, then the total mean is set to 0.
means = pd.read_csv('lb_late_cellphonedb/means.txt', sep='\t')
means.set_index('interacting_pair', inplace=True)
means.head()
Extract the pvalues of all cell type pairs
pvalues_matrix = pvalues[[
'interacting_pair', 'EC|EC', 'MSC|MSC', 'MSC|EC', 'EC|MSC',
'Hepatic|MSC', 'MSC|Hepatic', 'Hepatic|Hepatic', 'EC|Hepatic', 'Hepatic|EC'
]].set_index('interacting_pair')
pvalues_matrix
Search for interactions related to one cell type pair
# We want to find significant interactions from Hepatic to epithelial cells
PAIR='Hepatic|EC'
PVALUE_THRESHOLD=0.05
sig_pair = pvalues_matrix[pvalues_matrix[PAIR]< PVALUE_THRESHOLD]
sig_pair[PAIR].sort_values()
Get the mean expression of both ligand and receptor
means.loc[sig_pair.index, PAIR].sort_values()
Construct the dataframe for the dotplot
pair_df = pd.DataFrame.from_dict({
'mean': means.loc[sig_pair.index][PAIR],
'pvalue': sig_pair[PAIR]
})
pair_df['pair'] = PAIR
pair_df.head()
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl
The helper function for dotplot
def dotplot(df, figsize=(20, 1)):
mpl.rcParams['axes.grid'] = False
mpl.rcParams['legend.loc'] = 'upper center'
mpl.rcParams['legend.frameon'] = False
fig, ax = plt.subplots(figsize=figsize)
sns.set_style('whitegrid')
sns.scatterplot(x=df.index, y='pair',
size='mean', sizes = (100, 300),
hue=df['pvalue'], data=df, palette='crest')
plt.setp(ax.get_xticklabels(), rotation=90)
plt.legend(bbox_to_anchor=(1.04,0.5), loc='center left', borderaxespad=0)
plt.show()
We plot the potential interactions between two cell types: Hepatic and Epithelial cells. The size of the dot is correlated with the total mean
expression of the ligand and receptor. The color of the dot is related to the p-value
of the interaction
dotplot(pair_df)
We can filter out the interactions that involve protein complex to retain only simple ligand-receptor pair
simple_int = pair_df.index.str.endswith('complex')
simple_df = pair_df.loc[~simple_int]
simple_df.head()
dotplot(simple_df)
We can also filter the interactions by the mean expression threshold
THRESHOLD = 2
hvg_df = simple_df[simple_df['mean'] > THRESHOLD]
dotplot(hvg_df)
These significant interactions above could be a list of potential interactions for downstream verification. Actually, the VEGFA-KDR
interaction (VEGFA signalling) in the list was confirmed experimentally using a KDR inhibitor experiment. The experimental results show that LB development is impaired in the presence of KDR inhibitor.
Helper function to extract the top interactions
def get_top_interactions(pair, pvalue=0.05, mean_exp=2, only_simple=True):
"""
Extract the top interactions of one celltype pair
@param pair: cell type pair. e.g 'MC|EC'
@param pavalue: the pvalue threshold to filter significant interactions
@param mean_exp: the mean expression threshold to filter interactions
@param only_simple: whether only simple interactions are considered
@return list: the top interactions
"""
### filter by pvalues
sig_df = pvalues_matrix[pvalues_matrix[pair]< pvalue]
sig_df = sig_df[pair]
mean_df = means.loc[sig_df.index][pair]
pair_df = pd.DataFrame.from_dict({'mean': mean_df, 'pvalue': sig_df})
pair_df['pair'] = pair
### filter by mean
hvg_df = pair_df[pair_df['mean'] > mean_exp]
### filter by complex interaction
if only_simple:
simple_int = hvg_df.index.str.contains("complex")
hvg_df = hvg_df[~simple_int]
return hvg_df.index.to_list()
Get the top interactions of our interested cell type pairs
pairs = ['Hepatic|Hepatic', 'Hepatic|EC', 'Hepatic|MSC', 'MSC|EC', 'EC|MSC']
top_inters = {p: get_top_interactions(p) for p in pairs}
Get the unique set of interactions
shared_inters = []
for p in pairs:
shared_inters += top_inters[p]
shared_inters = list(set(shared_inters))
print("Total shared interactions: ", len(shared_inters))
Draw the heatmap of top interactions with values is p-values
top_df_pvalues = pvalues_matrix.loc[shared_inters, pairs]
sns.clustermap(top_df_pvalues, yticklabels=True, cmap="YlGnBu")
top_df_means = means.loc[shared_inters, pairs]
sns.clustermap(top_df_means, yticklabels=True, cmap="YlGnBu")
Draw the dotplot of the top interactions
### set the interacting_pair column
top_df_means['interacting_pair'] = top_df_means.index
top_df_pvalues['interacting_pair'] = top_df_pvalues.index
### Melt the dataframes
melt_df_means = top_df_means.melt(id_vars=['interacting_pair'], value_name="mean").set_index("interacting_pair")
melt_df_pvalues = top_df_pvalues.melt(id_vars=['interacting_pair'], value_name="pvalue").set_index("interacting_pair")
top_df = melt_df_means
top_df['pvalue'] = melt_df_pvalues['pvalue']
top_df = top_df.rename(columns={"variable": "pair"})
top_df = top_df.sort_values(["pair", "pvalue"])
Draw the dotplot of all cell type pairs
dotplot(top_df, figsize=(20, 4))