Build SummarizedExperiment using a h5ad file for the counts
Usage
dataset_h5ad(
h5ad_file_counts,
h5ad_file_tpm = NULL,
cell_id_col = "ID",
cell_type_col = "cell_type",
cells_in_obs = TRUE,
name = "SimBu_dataset",
spike_in_col = NULL,
additional_cols = NULL,
filter_genes = TRUE,
variance_cutoff = 0,
type_abundance_cutoff = 0,
scale_tpm = TRUE
)
Arguments
- h5ad_file_counts
(mandatory) h5ad file with raw count data
- h5ad_file_tpm
h5ad file with TPM count data
- cell_id_col
(mandatory) name of column in Seurat meta.data with unique cell ids; 0 for rownames
- cell_type_col
(mandatory) name of column in Seurat meta.data with cell type name
- cells_in_obs
boolean, if TRUE, cell identifiers are taken from
obs
layer in anndata object; if FALSE, they are taken fromvar
- name
name of the dataset; will be used for new unique IDs of cells#' @param spike_in_col which column in annotation contains information on spike_in counts, which can be used to re-scale counts; mandatory for spike_in scaling factor in simulation
- spike_in_col
which column in annotation contains information on spike_in counts, which can be used to re-scale counts; mandatory for spike_in scaling factor in simulation
- additional_cols
list of column names in annotation, that should be stored as well in dataset object
- filter_genes
boolean, if TRUE, removes all genes with 0 expression over all samples & genes with variance below
variance_cutoff
- variance_cutoff
numeric, is only applied if
filter_genes
is TRUE: removes all genes with variance below the chosen cutoff- type_abundance_cutoff
numeric, remove all cells, whose cell-type appears less then the given value. This removes low abundant cell-types
- scale_tpm
boolean, if TRUE (default) the cells in tpm_matrix will be scaled to sum up to 1e6
Value
Return a SummarizedExperiment object
Examples
h5 <- system.file('extdata', 'anndata.h5ad', package='SimBu')
ds_h5ad <- SimBu::dataset_h5ad(h5ad_file_counts = h5,
name = "h5ad_dataset",
cell_id_col = 'ID', # this will use the 'ID' column of the metadata as cell identifiers
cell_type_col = 'cell_type', # this will use the 'cell-type' column of the metadata as cell type info
cells_in_obs = TRUE) # in case your cell information is stored in the var layer, switch to FALSE
#> Filtering genes...
#> Created dataset.