API Reference
Complete API documentation for SATAY Tools modules.
CLI Module
Core Modules
Script to map FASTQ files to the yeast genome using STAR aligner. Takes a directory of FASTQ files and aligns them to a specified yeast genome reference.
- satay.fastq_to_bam.find_fastq_files(fastq_dir, suffix=None)[source]
Find all FASTQ files in the given directory
- satay.fastq_to_bam.find_paired_files(fastq_files)[source]
Group FASTQ files into pairs if they appear to be paired-end reads
- satay.fastq_to_bam.map_fastq_to_bam(fastq_dir, output_dir, genome_fasta, threads, suffix=[], single_end=True, limit_bam_sort_ram=2000000000)[source]
- satay.fastq_to_bam.run_star_alignment(fastq_file1, fastq_file2, output_dir, genome_dir, sample_name, threads, logger, limit_bam_sort_ram=2000000000)[source]
Run STAR alignment for single or paired-end reads
- satay.fastq_to_bam.setup_logger(log_dir)[source]
Set up the logger to write to both console and file
- satay.fastq_to_bam.verify_star_index(genome_dir, logger)[source]
Verify that STAR actually created a valid index.
- satay.process_satay_bams.count_insertions_over_intervals(interval_files: List[str | Path], filtered_insertions_file: str | Path) None[source]
Count insertions over specified genomic intervals using bedtools map.
- Parameters:
Notes
For each interval file, creates an output file with the format: {filtered_insertions_stem}_{interval_file_stem}.cnts containing the count of unique insertions and sum of insertions per interval.
- satay.process_satay_bams.filter_bam(output_bam)[source]
Filter BAM files and convert to BED format. Filter out q < 10
- satay.process_satay_bams.filter_insertions(merged_file)[source]
Remove insertions supported by only 1 read.
- satay.process_satay_bams.map_sample(sample, bam_dir, output_dir, interval_files)[source]
Process a sample
- satay.process_satay_bams.merge_bams(sample, output_dir, bam_dir, bam_suffix='Aligned.sortedByCoord.out.bam')[source]
Merge BAM files for each sample.
- satay.process_satay_bams.merge_cnts_files(interval_file: str | Path, counts_dir: str | Path, name: str = '', format: str = 'gff') Tuple[pandas.DataFrame, pandas.DataFrame][source]
Merge count files from multiple samples into consolidated transposon and read count matrices.
- Parameters:
- Returns:
Two DataFrames containing merged transposon counts and read counts respectively
- Return type:
Tuple[pd.DataFrame, pd.DataFrame]
- satay.process_satay_bams.merge_files(sorted_insertions_file)[source]
Merge intervals mapping to same location using sorted files.
- satay.process_satay_bams.process_orientation(filtered_bam_bed)[source]
Process BED files based on orientation.
- satay.da_analysis.gene_da(counts_file, sample_data_file, output_dir, filter, comp_col, baseline, a, gff_file='', ids_to_keep=['locus_tag', 'gene', 'product'], sample_id_col='sample_id', n_cpus=None)[source]
Run complete differential analysis pipeline.
- satay.da_analysis.load_filter_deseq(counts_file, sample_data_file, filter=100, comp_col='conc', baseline='0nMaF', a=0.01, sample_id_col='sample_id', n_cpus=None)[source]
Load count data and sample metadata, filter low-count genes, run DESeq2.
- satay.da_analysis.merge_annotations(res_df, gff_file, ids_to_keep=['locus_tag', 'gene', 'product'])[source]
Merge DESeq2 results with gene annotations from GFF file.
- satay.da_analysis.run_deseq(count_df, sample_data, condition_col, baseline, a=0.01, n_cpus=None)[source]
Run DESeq2 analysis on count data.
- Parameters:
count_df – DataFrame with count data (samples as rows, genes as columns)
sample_data – DataFrame with sample metadata
condition_col – Column name for condition/treatment in sample_data
baseline – Baseline condition for comparisons
a – Alpha value for significance testing
- Returns:
(vst_counts DataFrame, results DataFrame)
- Return type: