Koverage.scripts
combineCoverage.py
Combine the coverage statistics from all samples
This script will take the coverage information from each samples' coverage file and output the population-wide counts for each contig.
collect_coverage_stats- Read and add the counts from all samplesprint_sample_coverage- Print the combined coverage statistics for all contigs
collect_coverage_stats(input_file)
Combine the mapped coverage stats for all samples.
| Parameters: |
|
|---|
| Returns: |
|
|---|
print_sample_coverage(output_file, all_coverage)
Print the combined coverage statistics from collect_coverage_stats().
| Parameters: |
|
|---|
combineKmerCoverage.py
Combine the kmer-based coverage statistics from all samples
This script will take the kmer-based coverage information from each samples' coverage file and output the population-wide counts for each contig.
collect_kmer_coverage_stats- Read and add the kmer counts from all samplesprint_kmer_coverage- Print the combined kmer coverage statistics for all contigs
collect_kmer_coverage_stats(input_file)
Combine the kmer coverage stats for all samples.
| Parameters: |
|
|---|
| Returns: |
|
|---|
print_kmer_coverage(allCoverage, output_file, lines_per_batch=1000)
Print the combined kmer coverage statistics from collect_kmer_coverage_stats().
| Parameters: |
|
|---|
kmerScreen.py
Screen the sample for reference sampled kmers
This script will parse the reference sampled kmers and query them from the sample jellyfish db
trimmed_variance- Calculate variance from a list of integersoutput_print_worker- Take output lines from a queue and print to zstandard-compressed TSVprocess_counts- Process the kmer depths of the ref sampled kmers and the sample jellyfish database
output_print_worker(out_queue=None, out_file=None)
Worker to take the output lines for printing, compress with gzip, and print to the output file.
| Parameters: |
|
|---|
process_counts(kmer_counts, sample_name, contig_name)
Process the kmer depths of the ref sampled kmers and the sample jellyfish database.
| Parameters: |
|
|---|
| Returns: |
|
|---|
ref_kmer_parser_worker(ref_kmers=None, jellyfish_db=None, out_queue=None, sample_name=None, cmd=None)
Parse the processed reference kmer file (zstd-compressed) and query kmers from the Jellyfish database.
| Parameters: |
|
|---|
trimmed_variance(data, trim_frac=0.05)
Calculate the variance, minus the top x percent of outliers
| Parameters: |
|
|---|
| Returns: |
|
|---|
minimapWrapper.py
Run minimap2, parse its output, calculate counts on the fly
This script will run minimap2 of a sample's reads against the reference FASTA. We use a wrapper instead of a snakemake rule to avoid additional read/writes for every sample. PAF files of alignments can optionally be saved.
worker_mm_to_count_paf_queues- read minimap2 output and pass to queues for processing and saving PAFworker_mm_to_count_queues- read minimap2 output and pass to queue for processing onlyworker_paf_writer- read minimap2 output from queue and write to zstandard-zipped fileworker_count_and_print- read minimap2 output from queue, calculate counts, print to output filesbuild_mm2cmd- return the minimap2 command based on presence of R2 filestart_workers- start queues and worker threads
build_mm2cmd(**kwargs)
Return the minimap2 command
| Parameters: |
|
|---|
| Returns: |
|
|---|
contig_lens_from_fai(file_path)
Collect the sequence IDs from the reference fasta file
| Parameters: |
|
|---|
| Returns: |
|
|---|
start_workers(queue_counts, queue_paf, pipe_minimap, **kwargs)
Start workers for reading the minimap output and parsing to queue(s) for processing
| Parameters: |
|
|---|
worker_count_and_print(count_queue, contig_lengths, **kwargs)
Collect the counts from minimap2 queue and calc counts on the fly
| Parameters: |
|
|---|
worker_mm_to_count_paf_queues(pipe, count_queue, paf_queue)
Read minimap2 output and slot into queues for collecting coverage counts, and saving the paf file.
| Parameters: |
|
|---|
worker_mm_to_count_queues(pipe, count_queue)
Read minimap2 output and slot into queues for collecting coverage counts
Args: pipe (pipe): minimap2 pipe for reading count_queue (Queue): queue for putting for counts
worker_paf_writer(paf_queue, paf_dir, sample, chunk_size=100)
Read minimap2 output from queue and write to zstd-zipped file
| Parameters: |
|
|---|
refSampleKmer.py
Sample kmers from the reference FASTA file
This script will parse the reference FASTA and sample kmers from each contig.
parse_fasta- Read the fasta(.gz) filecontigs_to_queue- Parse the fasta file and populate the processing queue with the id and sequencestring_to_kmers- Sample kmers from a sequenceprocess_contigs- Parse the processing queue, get sampled kmers, push output lines to writing queueoutput_printer- Print the output lines to zstandard-zipped file
contigs_to_queue(file, queue_put, available_threads, queue_hold=1000)
Parse the fasta file and populate the processing queue with the id and sequence
| Parameters: |
|
|---|
output_printer(queue, outfile, chunk_size=1000)
Print the output lines to zstandard-zipped file
| Parameters: |
|
|---|
parse_fasta(file)
Read the fasta(.gz) file
| Parameters: |
|
|---|
| Returns: |
|
|---|
process_contigs(in_queue, out_queue, **kwargs)
Parse the processing queue, get sampled kmers, push output lines to writing queue
| Parameters: |
|
|---|
string_to_kmers(seq, **kwargs)
Sample kmers from a sequence
| Parameters: |
|
|---|
| Returns: |
|
|---|
sampleCoverage.py
Calculate the coverage stats for a sample for each contig
This script will parse the raw count summary for a sample and calculate the output coverage stats for each contig.
calculate_coverage_stats_from_counts- Read in the library size and the counts from minimapWrapper.py, calculate rpm, rpkm, and rpk, write the counts
calculate_coverage_stats_from_counts(**kwargs)
Read in the library size and the counts from minimapWrapper.py, calculate rpm, rpkm, and rpk, write counts for sample
Kwargs
count_file (str): filepath to pickle file of contig lengths and np.array count objects bin_width (int): bin width size output_file (str): filepath of ouptut file for writing sample (str): sample name