scRNA-seq#
Here, we will
read a single
.h5ad
file as anAnnData
and seed a versioned dataset with itappend a new data batch (a new
.h5ad
file) to create a new version of the datasetlook at an overview of ingested files and cell markers
query the data and store analytical results as plots
annotate the data by a cell type prediction
discuss migrating a lakehouse of files to a single TileDB SOMA store of the same data
Setup#
!lamin init --storage ./test-scrna --schema bionty
Show code cell output
β
saved: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-29 14:44:56)
β
saved: Storage(id='975nKuX0', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna', type='local', updated_at=2023-09-29 14:44:56, created_by_id='DzTjkKse')
π‘ loaded instance: testuser1/test-scrna
π‘ did not register local instance on hub (if you want, call `lamin register`)
import lamindb as ln
import lnschema_bionty as lb
import pandas as pd
ln.track()
π‘ loaded instance: testuser1/test-scrna (lamindb 0.54.3)
π‘ notebook imports: lamindb==0.54.3 lnschema_bionty==0.31.2 pandas==1.5.3
π‘ Transform(id='Nv48yAceNSh8z8', name='scRNA-seq', short_name='scrna', version='0', type=notebook, updated_at=2023-09-29 14:44:58, created_by_id='DzTjkKse')
π‘ Run(id='Pd5UweAXC3cCH1aOMdr1', run_at=2023-09-29 14:44:58, transform_id='Nv48yAceNSh8z8', created_by_id='DzTjkKse')
Access #
Let us look at the data of Conde et al., Science (2022).
These data are available in standardized form from the CellxGene data portal.
Here, weβll use it to seed a growing in-house store of scRNA-seq data managed with the corresponding metadata in LaminDB registries.
Note
If youβre not interested in managing large collections of in-house data and youβd just like to query public data, please take a look at CellxGene census, which exposes all datasets hosted in the data portal as a concatenated TileDB SOMA store.
lb.settings.species = "human"
By calling ln.dev.datasets.anndata_human_immune_cells
below, we download the dataset from the CellxGene portal here and pre-populate some LaminDB registries.
adata = ln.dev.datasets.anndata_human_immune_cells(
populate_registries=True # this pre-populates registries
)
adata
AnnData object with n_obs Γ n_vars = 1648 Γ 36503
obs: 'donor', 'tissue', 'cell_type', 'assay'
var: 'feature_is_filtered', 'feature_reference', 'feature_biotype'
uns: 'cell_type_ontology_term_id_colors', 'default_embedding', 'schema_version', 'title'
obsm: 'X_umap'
This AnnData
is already standardized using the same public ontologies underlying lnschema-bionty, hence, we expect validation to be simple.
Nonetheless, LaminDB focuses on building clean in-house registries
Note
In the next notebook, weβll look at the more difficult case of a non-standardized dataset that requires curation.
Validate #
Validate genes in .var
#
lb.Gene.validate(adata.var.index, lb.Gene.ensembl_gene_id);
β 148 terms (0.40%) are not validated for ensembl_gene_id: ENSG00000269933, ENSG00000261737, ENSG00000259834, ENSG00000256374, ENSG00000263464, ENSG00000203812, ENSG00000272196, ENSG00000272880, ENSG00000270188, ENSG00000287116, ENSG00000237133, ENSG00000224739, ENSG00000227902, ENSG00000239467, ENSG00000272551, ENSG00000280374, ENSG00000236886, ENSG00000229352, ENSG00000286601, ENSG00000227021, ...
148 gene identifiers canβt be validated (not currently in the Gene
registry). Letβs inspect them to see what to do:
inspector = lb.Gene.inspect(adata.var.index, lb.Gene.ensembl_gene_id)
β 148 terms (0.40%) are not validated for ensembl_gene_id: ENSG00000269933, ENSG00000261737, ENSG00000259834, ENSG00000256374, ENSG00000263464, ENSG00000203812, ENSG00000272196, ENSG00000272880, ENSG00000270188, ENSG00000287116, ENSG00000237133, ENSG00000224739, ENSG00000227902, ENSG00000239467, ENSG00000272551, ENSG00000280374, ENSG00000236886, ENSG00000229352, ENSG00000286601, ENSG00000227021, ...
detected 35 Gene terms in Bionty for ensembl_gene_id: 'ENSG00000274175', 'ENSG00000276017', 'ENSG00000198712', 'ENSG00000277196', 'ENSG00000273748', 'ENSG00000198786', 'ENSG00000198727', 'ENSG00000274792', 'ENSG00000276345', 'ENSG00000212907', 'ENSG00000277475', 'ENSG00000198804', 'ENSG00000276760', 'ENSG00000278633', 'ENSG00000198938', 'ENSG00000198886', 'ENSG00000277400', 'ENSG00000198899', 'ENSG00000198695', 'ENSG00000278704', ...
β add records from Bionty to your Gene registry via .from_values()
couldn't validate 113 terms: 'ENSG00000273837', 'ENSG00000215271', 'ENSG00000271870', 'ENSG00000258808', 'ENSG00000270672', 'ENSG00000280710', 'ENSG00000272880', 'ENSG00000272354', 'ENSG00000263464', 'ENSG00000244952', 'ENSG00000259820', 'ENSG00000256892', 'ENSG00000254561', 'ENSG00000286228', 'ENSG00000268955', 'ENSG00000262668', 'ENSG00000272267', 'ENSG00000280095', 'ENSG00000227902', 'ENSG00000233776', ...
β if you are sure, create new records via ln.Gene() and save to your registry
Logging says 35 of the non-validated ids can be found in the Bionty reference. Letβs register them:
records = lb.Gene.from_values(inspector.non_validated, lb.Gene.ensembl_gene_id)
ln.save(records)
β did not create Gene records for 113 non-validated ensembl_gene_ids: 'ENSG00000112096', 'ENSG00000182230', 'ENSG00000203812', 'ENSG00000204092', 'ENSG00000215271', 'ENSG00000221995', 'ENSG00000224739', 'ENSG00000224745', 'ENSG00000225932', 'ENSG00000226377', 'ENSG00000226380', 'ENSG00000226403', 'ENSG00000227021', 'ENSG00000227220', 'ENSG00000227902', 'ENSG00000228139', 'ENSG00000228906', 'ENSG00000229352', 'ENSG00000231575', 'ENSG00000232196', ...
The remaining 113 are legacy IDs, not present in the current Ensembl assembly (e.g. ENSG00000112096).
Weβd still like to register them, but wonβt dive into the details of converting them from an old Ensembl version to the current one.
validated = lb.Gene.validate(adata.var.index, lb.Gene.ensembl_gene_id)
records = [lb.Gene(ensembl_gene_id=id) for id in adata.var.index[~validated]]
ln.save(records)
β 113 terms (0.30%) are not validated for ensembl_gene_id: ENSG00000269933, ENSG00000261737, ENSG00000259834, ENSG00000256374, ENSG00000263464, ENSG00000203812, ENSG00000272196, ENSG00000272880, ENSG00000270188, ENSG00000287116, ENSG00000237133, ENSG00000224739, ENSG00000227902, ENSG00000239467, ENSG00000272551, ENSG00000280374, ENSG00000236886, ENSG00000229352, ENSG00000286601, ENSG00000227021, ...
Now all genes pass validation:
lb.Gene.validate(adata.var.index, lb.Gene.ensembl_gene_id);
Our in-house Gene registry provides rich metadata for each gene measured in the AnnData
:
lb.Gene.filter().df().head(10)
symbol | stable_id | ensembl_gene_id | ncbi_gene_ids | biotype | description | synonyms | species_id | bionty_source_id | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||
10U0zkroKVT0 | KRT4 | None | ENSG00000170477 | 3851 | protein_coding | keratin 4 [Source:HGNC Symbol;Acc:HGNC:6441] | CK4|CYK4|K4 | uHJU | 3xbK | 2023-09-29 14:45:09 | DzTjkKse |
K3Wk2Weq7nAK | LHX5-AS1 | None | ENSG00000257935 | lncRNA | LHX5 antisense RNA 1 [Source:HGNC Symbol;Acc:H... | LOCUS4010 | uHJU | 3xbK | 2023-09-29 14:45:09 | DzTjkKse | |
wflpKBzsK0Av | OPN1LW | None | ENSG00000102076 | 5956 | protein_coding | opsin 1, long wave sensitive [Source:HGNC Symb... | CBP|RCP|COD5|CBBM | uHJU | 3xbK | 2023-09-29 14:45:09 | DzTjkKse |
w9e5Qnb9wKc6 | STYX | None | ENSG00000198252 | 6815 | protein_coding | serine/threonine/tyrosine interacting protein ... | uHJU | 3xbK | 2023-09-29 14:45:09 | DzTjkKse | |
0d0GdvYbsgD6 | CCDC158 | None | ENSG00000163749 | 339965 | protein_coding | coiled-coil domain containing 158 [Source:HGNC... | FLJ25770 | uHJU | 3xbK | 2023-09-29 14:45:09 | DzTjkKse |
Ye0yKQnymf0P | None | None | ENSG00000286914 | lncRNA | novel transcript | uHJU | 3xbK | 2023-09-29 14:45:09 | DzTjkKse | ||
PXHy9K7Pa0xj | MKNK1-AS1 | None | ENSG00000269956 | 100507423 | lncRNA | MKNK1 antisense RNA 1 [Source:HGNC Symbol;Acc:... | uHJU | 3xbK | 2023-09-29 14:45:09 | DzTjkKse | |
lCG4pWbZk0Hc | None | None | ENSG00000255021 | lncRNA | novel transcript | uHJU | 3xbK | 2023-09-29 14:45:09 | DzTjkKse | ||
ZWC2HZdTOTdH | None | None | ENSG00000258922 | lncRNA | novel transcript | uHJU | 3xbK | 2023-09-29 14:45:09 | DzTjkKse | ||
ykfHbfg3YTFv | None | None | ENSG00000250237 | lncRNA | novel transcript | uHJU | 3xbK | 2023-09-29 14:45:09 | DzTjkKse |
There are about 36k genes in the registry, all for species βhumanβ.
lb.Gene.filter().df().shape
(36503, 11)
Validate metadata in .obs
#
adata.obs.columns
Index(['donor', 'tissue', 'cell_type', 'assay'], dtype='object')
ln.Feature.validate(adata.obs.columns)
β 1 term (25.00%) is not validated for name: donor
array([False, True, True, True])
1 feature is not validated: "donor"
. Letβs register it:
feature = ln.Feature(name="donor", type="category", registries=[ln.ULabel])
ln.save(feature)
Tip
You can also use features = ln.Feature.from_df(df)
to bulk create features with types.
All metadata columns are now validated:
ln.Feature.validate(adata.obs.columns)
array([ True, True, True, True])
Next, letβs validate the corresponding labels of each feature.
Some of the metadata labels can be typed using dedicated registries like CellType
:
validated = lb.CellType.validate(adata.obs.cell_type)
β received 32 unique terms, 1616 empty/duplicated terms are ignored
β 2 terms (6.20%) are not validated for name: germinal center B cell, megakaryocyte
Register non-validated cell types - they can all be loaded from a public ontology through Bionty:
records = lb.CellType.from_values(adata.obs.cell_type[~validated], "name")
ln.save(records)
β now recursing through parents: this only happens once, but is much slower than bulk saving
lb.ExperimentalFactor.validate(adata.obs.assay)
lb.Tissue.validate(adata.obs.tissue);
Because we didnβt mount a custom schema that contains a Donor
registry, we use the ULabel
registry to track donor ids:
ln.ULabel.validate(adata.obs.donor);
β received 12 unique terms, 1636 empty/duplicated terms are ignored
β 12 terms (100.00%) are not validated for name: D496, 621B, A29, A36, A35, 637C, A52, A37, D503, 640C, A31, 582C
Donor labels are not validated, so letβs register them:
donors = [ln.ULabel(name=name) for name in adata.obs.donor.unique()]
ln.save(donors)
ln.ULabel.validate(adata.obs.donor);
Register #
modalities = ln.Modality.lookup()
experimental_factors = lb.ExperimentalFactor.lookup()
species = lb.Species.lookup()
features = ln.Feature.lookup()
Register data#
When we create a File
object from an AnnData
, weβll automatically link its feature sets and get information about unmapped categories:
file = ln.File.from_anndata(
adata, description="Conde22", field=lb.Gene.ensembl_gene_id, modality=modalities.rna
)
file.save()
The file has the following 2 linked feature sets:
file.features
Features:
var: FeatureSet(id='EJQhGDAUMVCAW7a878Is', n=36503, type='number', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-09-29 14:45:40, modality_id='Tkw6vO00', created_by_id='DzTjkKse')
'KRT4', 'LHX5-AS1', 'OPN1LW', 'STYX', 'CCDC158', 'None', 'None', 'MKNK1-AS1', 'None', 'None', 'LNPEP', 'LINC02485', 'None', 'None', 'IGHEP2', 'CYFIP1', 'A4GNT', 'SLC14A2', 'None', 'None', ...
obs: FeatureSet(id='KEEZXO20pmTjLPROaTDE', n=4, registry='core.Feature', hash='NUCABLKrrAle7o2cv7hj', updated_at=2023-09-29 14:45:45, modality_id='jUAc2M1C', created_by_id='DzTjkKse')
π donor (0, core.ULabel):
π cell_type (0, bionty.CellType):
π assay (0, bionty.ExperimentalFactor):
π tissue (0, bionty.Tissue):
Register metadata links#
Let us first link external labels for the entire file:
file.labels.add(species.human, feature=features.species)
file.labels.add(experimental_factors.single_cell_rna_sequencing, feature=features.assay)
Next, we parse the columns of adata.obs
for additional metadata:
file.labels.add(adata.obs.cell_type, feature=features.cell_type)
file.labels.add(adata.obs.assay, feature=features.assay)
file.labels.add(adata.obs.tissue, feature=features.tissue)
file.labels.add(adata.obs.donor, feature=features.donor)
file.features
Features:
var: FeatureSet(id='EJQhGDAUMVCAW7a878Is', n=36503, type='number', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-09-29 14:45:40, modality_id='Tkw6vO00', created_by_id='DzTjkKse')
'KRT4', 'LHX5-AS1', 'OPN1LW', 'STYX', 'CCDC158', 'None', 'None', 'MKNK1-AS1', 'None', 'None', 'LNPEP', 'LINC02485', 'None', 'None', 'IGHEP2', 'CYFIP1', 'A4GNT', 'SLC14A2', 'None', 'None', ...
obs: FeatureSet(id='KEEZXO20pmTjLPROaTDE', n=4, registry='core.Feature', hash='NUCABLKrrAle7o2cv7hj', updated_at=2023-09-29 14:45:45, modality_id='jUAc2M1C', created_by_id='DzTjkKse')
π donor (12, core.ULabel): '621B', '640C', 'D496', 'A52', 'A36', '582C', 'A29', 'D503', 'A37', 'A31', ...
π cell_type (32, bionty.CellType): 'progenitor cell', 'T follicular helper cell', 'naive thymus-derived CD8-positive, alpha-beta T cell', 'dendritic cell, human', 'CD8-positive, alpha-beta memory T cell, CD45RO-positive', 'classical monocyte', 'regulatory T cell', 'group 3 innate lymphoid cell', 'CD4-positive helper T cell', 'non-classical monocyte', ...
π assay (4, bionty.ExperimentalFactor): 'single-cell RNA sequencing', '10x 5' v2', '10x 3' v3', '10x 5' v1'
π tissue (17, bionty.Tissue): 'mesenteric lymph node', 'lung', 'sigmoid colon', 'thymus', 'lamina propria', 'bone marrow', 'ileum', 'jejunal epithelium', 'spleen', 'thoracic lymph node', ...
external: FeatureSet(id='wc2uklEtrF5kDstnGzpN', n=1, registry='core.Feature', hash='wXguPMg-nDtWwOHbNrZ_', updated_at=2023-09-29 14:45:47, modality_id='jUAc2M1C', created_by_id='DzTjkKse')
π species (1, bionty.Species): 'human'
The file is now queryable by everything we linked:
file.describe()
File(id='nV0w72HVEfJeK6lgb7BO', suffix='.h5ad', accessor='AnnData', description='Conde22', size=28049505, hash='WEFcMZxJNmMiUOFrcSTaig', hash_type='md5', updated_at=2023-09-29 14:45:45)
Provenance:
ποΈ storage: Storage(id='975nKuX0', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna', type='local', updated_at=2023-09-29 14:44:56, created_by_id='DzTjkKse')
π« transform: Transform(id='Nv48yAceNSh8z8', name='scRNA-seq', short_name='scrna', version='0', type=notebook, updated_at=2023-09-29 14:45:40, created_by_id='DzTjkKse')
π£ run: Run(id='Pd5UweAXC3cCH1aOMdr1', run_at=2023-09-29 14:44:58, transform_id='Nv48yAceNSh8z8', created_by_id='DzTjkKse')
π€ created_by: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-29 14:44:56)
Features:
var: FeatureSet(id='EJQhGDAUMVCAW7a878Is', n=36503, type='number', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-09-29 14:45:40, modality_id='Tkw6vO00', created_by_id='DzTjkKse')
'KRT4', 'LHX5-AS1', 'OPN1LW', 'STYX', 'CCDC158', 'None', 'None', 'MKNK1-AS1', 'None', 'None', 'LNPEP', 'LINC02485', 'None', 'None', 'IGHEP2', 'CYFIP1', 'A4GNT', 'SLC14A2', 'None', 'None', ...
obs: FeatureSet(id='KEEZXO20pmTjLPROaTDE', n=4, registry='core.Feature', hash='NUCABLKrrAle7o2cv7hj', updated_at=2023-09-29 14:45:45, modality_id='jUAc2M1C', created_by_id='DzTjkKse')
π donor (12, core.ULabel): '621B', '640C', 'D496', 'A52', 'A36', '582C', 'A29', 'D503', 'A37', 'A31', ...
π cell_type (32, bionty.CellType): 'progenitor cell', 'T follicular helper cell', 'naive thymus-derived CD8-positive, alpha-beta T cell', 'dendritic cell, human', 'CD8-positive, alpha-beta memory T cell, CD45RO-positive', 'classical monocyte', 'regulatory T cell', 'group 3 innate lymphoid cell', 'CD4-positive helper T cell', 'non-classical monocyte', ...
π assay (4, bionty.ExperimentalFactor): 'single-cell RNA sequencing', '10x 5' v2', '10x 3' v3', '10x 5' v1'
π tissue (17, bionty.Tissue): 'mesenteric lymph node', 'lung', 'sigmoid colon', 'thymus', 'lamina propria', 'bone marrow', 'ileum', 'jejunal epithelium', 'spleen', 'thoracic lymph node', ...
external: FeatureSet(id='wc2uklEtrF5kDstnGzpN', n=1, registry='core.Feature', hash='wXguPMg-nDtWwOHbNrZ_', updated_at=2023-09-29 14:45:47, modality_id='jUAc2M1C', created_by_id='DzTjkKse')
π species (1, bionty.Species): 'human'
Labels:
π·οΈ species (1, bionty.Species): 'human'
π·οΈ tissues (17, bionty.Tissue): 'mesenteric lymph node', 'lung', 'sigmoid colon', 'thymus', 'lamina propria', 'bone marrow', 'ileum', 'jejunal epithelium', 'spleen', 'thoracic lymph node', ...
π·οΈ cell_types (32, bionty.CellType): 'progenitor cell', 'T follicular helper cell', 'naive thymus-derived CD8-positive, alpha-beta T cell', 'dendritic cell, human', 'CD8-positive, alpha-beta memory T cell, CD45RO-positive', 'classical monocyte', 'regulatory T cell', 'group 3 innate lymphoid cell', 'CD4-positive helper T cell', 'non-classical monocyte', ...
π·οΈ experimental_factors (4, bionty.ExperimentalFactor): 'single-cell RNA sequencing', '10x 5' v2', '10x 3' v3', '10x 5' v1'
π·οΈ ulabels (12, core.ULabel): '621B', '640C', 'D496', 'A52', 'A36', '582C', 'A29', 'D503', 'A37', 'A31', ...
Create a dataset from the file#
dataset = ln.Dataset(file, name="My versioned scRNA-seq dataset", version="1")
dataset
Dataset(id='nV0w72HVEfJeK6lgb7BO', name='My versioned scRNA-seq dataset', version='1', hash='WEFcMZxJNmMiUOFrcSTaig', transform_id='Nv48yAceNSh8z8', run_id='Pd5UweAXC3cCH1aOMdr1', file_id='nV0w72HVEfJeK6lgb7BO', created_by_id='DzTjkKse')
Letβs inspect the features measured in this dataset which were inherited from the file:
dataset.features
Features:
var: FeatureSet(id='EJQhGDAUMVCAW7a878Is', n=36503, type='number', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-09-29 14:45:40, modality_id='Tkw6vO00', created_by_id='DzTjkKse')
'KRT4', 'LHX5-AS1', 'OPN1LW', 'STYX', 'CCDC158', 'None', 'None', 'MKNK1-AS1', 'None', 'None', 'LNPEP', 'LINC02485', 'None', 'None', 'IGHEP2', 'CYFIP1', 'A4GNT', 'SLC14A2', 'None', 'None', ...
obs: FeatureSet(id='KEEZXO20pmTjLPROaTDE', n=4, registry='core.Feature', hash='NUCABLKrrAle7o2cv7hj', updated_at=2023-09-29 14:45:45, modality_id='jUAc2M1C', created_by_id='DzTjkKse')
π donor (0, core.ULabel):
π cell_type (0, bionty.CellType):
π assay (0, bionty.ExperimentalFactor):
π tissue (0, bionty.Tissue):
external: FeatureSet(id='wc2uklEtrF5kDstnGzpN', n=1, registry='core.Feature', hash='wXguPMg-nDtWwOHbNrZ_', updated_at=2023-09-29 14:45:47, modality_id='jUAc2M1C', created_by_id='DzTjkKse')
π species (0, bionty.Species):
This looks all good, hence, letβs save it:
dataset.save()
Annotate by linking labels:
dataset.labels.add(experimental_factors.single_cell_rna_sequencing, features.assay)
dataset.labels.add(species.human, features.species)
dataset.labels.add(adata.obs.cell_type, feature=features.cell_type)
dataset.labels.add(adata.obs.assay, feature=features.assay)
dataset.labels.add(adata.obs.tissue, feature=features.tissue)
dataset.labels.add(adata.obs.donor, feature=features.donor)
For this version 1 of the dataset, dataset and file match each other. But theyβre independently tracked and queryable through their registries.
dataset.describe()
Dataset(id='nV0w72HVEfJeK6lgb7BO', name='My versioned scRNA-seq dataset', version='1', hash='WEFcMZxJNmMiUOFrcSTaig', updated_at=2023-09-29 14:45:51)
Provenance:
π« transform: Transform(id='Nv48yAceNSh8z8', name='scRNA-seq', short_name='scrna', version='0', type=notebook, updated_at=2023-09-29 14:45:51, created_by_id='DzTjkKse')
π£ run: Run(id='Pd5UweAXC3cCH1aOMdr1', run_at=2023-09-29 14:44:58, transform_id='Nv48yAceNSh8z8', created_by_id='DzTjkKse')
π file: File(id='nV0w72HVEfJeK6lgb7BO', suffix='.h5ad', accessor='AnnData', description='Conde22', size=28049505, hash='WEFcMZxJNmMiUOFrcSTaig', hash_type='md5', updated_at=2023-09-29 14:45:51, storage_id='975nKuX0', transform_id='Nv48yAceNSh8z8', run_id='Pd5UweAXC3cCH1aOMdr1', created_by_id='DzTjkKse')
π€ created_by: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-29 14:44:56)
Features:
var: FeatureSet(id='EJQhGDAUMVCAW7a878Is', n=36503, type='number', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-09-29 14:45:40, modality_id='Tkw6vO00', created_by_id='DzTjkKse')
'KRT4', 'LHX5-AS1', 'OPN1LW', 'STYX', 'CCDC158', 'None', 'None', 'MKNK1-AS1', 'None', 'None', 'LNPEP', 'LINC02485', 'None', 'None', 'IGHEP2', 'CYFIP1', 'A4GNT', 'SLC14A2', 'None', 'None', ...
obs: FeatureSet(id='KEEZXO20pmTjLPROaTDE', n=4, registry='core.Feature', hash='NUCABLKrrAle7o2cv7hj', updated_at=2023-09-29 14:45:45, modality_id='jUAc2M1C', created_by_id='DzTjkKse')
π donor (12, core.ULabel): '621B', '640C', 'D496', 'A52', 'A36', '582C', 'A29', 'D503', 'A37', 'A31', ...
π cell_type (32, bionty.CellType): 'progenitor cell', 'T follicular helper cell', 'naive thymus-derived CD8-positive, alpha-beta T cell', 'dendritic cell, human', 'CD8-positive, alpha-beta memory T cell, CD45RO-positive', 'classical monocyte', 'regulatory T cell', 'group 3 innate lymphoid cell', 'CD4-positive helper T cell', 'non-classical monocyte', ...
π assay (4, bionty.ExperimentalFactor): 'single-cell RNA sequencing', '10x 5' v2', '10x 3' v3', '10x 5' v1'
π tissue (17, bionty.Tissue): 'mesenteric lymph node', 'lung', 'sigmoid colon', 'thymus', 'lamina propria', 'bone marrow', 'ileum', 'jejunal epithelium', 'spleen', 'thoracic lymph node', ...
external: FeatureSet(id='wc2uklEtrF5kDstnGzpN', n=1, registry='core.Feature', hash='wXguPMg-nDtWwOHbNrZ_', updated_at=2023-09-29 14:45:47, modality_id='jUAc2M1C', created_by_id='DzTjkKse')
π species (1, bionty.Species): 'human'
Labels:
π·οΈ species (1, bionty.Species): 'human'
π·οΈ tissues (17, bionty.Tissue): 'mesenteric lymph node', 'lung', 'sigmoid colon', 'thymus', 'lamina propria', 'bone marrow', 'ileum', 'jejunal epithelium', 'spleen', 'thoracic lymph node', ...
π·οΈ cell_types (32, bionty.CellType): 'progenitor cell', 'T follicular helper cell', 'naive thymus-derived CD8-positive, alpha-beta T cell', 'dendritic cell, human', 'CD8-positive, alpha-beta memory T cell, CD45RO-positive', 'classical monocyte', 'regulatory T cell', 'group 3 innate lymphoid cell', 'CD4-positive helper T cell', 'non-classical monocyte', ...
π·οΈ experimental_factors (4, bionty.ExperimentalFactor): 'single-cell RNA sequencing', '10x 5' v2', '10x 3' v3', '10x 5' v1'
π·οΈ ulabels (12, core.ULabel): '621B', '640C', 'D496', 'A52', 'A36', '582C', 'A29', 'D503', 'A37', 'A31', ...
And we can access the file like so:
dataset.file
File(id='nV0w72HVEfJeK6lgb7BO', suffix='.h5ad', accessor='AnnData', description='Conde22', size=28049505, hash='WEFcMZxJNmMiUOFrcSTaig', hash_type='md5', updated_at=2023-09-29 14:45:51, storage_id='975nKuX0', transform_id='Nv48yAceNSh8z8', run_id='Pd5UweAXC3cCH1aOMdr1', created_by_id='DzTjkKse')
dataset.view_flow()