2025-01-21 14:18:52,519::INFO::main: Starting joint PCA 2025-01-21 14:18:58,478::INFO::__init__: JointPCADecomposer :: configuration: ==================== Joint PCA Configuration ==================== mcools : - DE-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool - DE-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool - DE-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool - ESC-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool - ESC-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool - ESC-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool - HB-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool - HB-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool - HB-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool - iHEP-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool - iHEP-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool - iHEP-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool - mHEP-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool - mHEP-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool - mHEP-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool resolution : 50000 assembly : hg38 output : output_2025_01_21_14_18 components : 32 chrom_limit : 22 method : PCA exclusion_list : None percentile_top : 99.5 percentile_bottom : 1.0 batch_size : 10000 log_level : INFO =================================================================== 2025-01-21 14:18:58,496::INFO::get_chromosome_sizes: Loaded chromosome sizes for specified assembly: hg38 2025-01-21 14:18:58,496::INFO::get_chromosome_sizes: Chromosome sizes: name chr1 248956422 chr2 242193529 chr3 198295559 chr4 190214555 chr5 181538259 chr6 170805979 chr7 159345973 chr8 145138636 chr9 138394717 chr10 133797422 chr11 135086622 chr12 133275309 chr13 114364328 chr14 107043718 chr15 101991189 chr16 90338345 chr17 83257441 chr18 80373285 chr19 58617616 chr20 64444167 chr21 46709983 chr22 50818468 Name: length, dtype: int64 2025-01-21 14:18:58,496::INFO::set_union_bad_bins: Beginning to compute union set of NaN bins.. 2025-01-21 15:59:07,071::INFO::set_union_bad_bins: Loaded union set of bad bins in: 1:40:08.575057. 2025-01-21 15:59:07,072::INFO::set_union_bad_bins: Percent of bins that are bad: 16.946912657149316. 2025-01-21 15:59:07,784::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'DE-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 15:59:07,817::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 16:17:40,325::INFO::decompose_cooler_file: Finished decomposition for 'DE-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool' in 0:18:32.540823. 2025-01-21 16:17:40,393::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'DE-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool'... 2025-01-21 16:17:40,460::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 16:36:32,012::INFO::decompose_cooler_file: Finished decomposition for 'DE-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool' in 0:18:51.618720. 2025-01-21 16:36:32,077::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'DE-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 16:36:32,102::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 16:54:50,541::INFO::decompose_cooler_file: Finished decomposition for 'DE-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool' in 0:18:18.464096. 2025-01-21 16:54:50,611::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'ESC-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 16:54:50,642::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 17:13:01,301::INFO::decompose_cooler_file: Finished decomposition for 'ESC-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool' in 0:18:10.689402. 2025-01-21 17:13:01,370::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'ESC-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool'... 2025-01-21 17:13:01,426::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 17:30:28,188::INFO::decompose_cooler_file: Finished decomposition for 'ESC-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool' in 0:17:26.818717. 2025-01-21 17:30:28,251::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'ESC-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 17:30:28,284::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 17:47:16,323::INFO::decompose_cooler_file: Finished decomposition for 'ESC-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool' in 0:16:48.071739. 2025-01-21 17:47:16,385::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'HB-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 17:47:16,418::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 18:04:08,445::INFO::decompose_cooler_file: Finished decomposition for 'HB-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool' in 0:16:52.059670. 2025-01-21 18:04:08,508::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'HB-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool'... 2025-01-21 18:04:08,558::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 18:21:15,588::INFO::decompose_cooler_file: Finished decomposition for 'HB-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool' in 0:17:07.080136. 2025-01-21 18:21:15,657::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'HB-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 18:21:15,689::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 18:38:25,520::INFO::decompose_cooler_file: Finished decomposition for 'HB-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool' in 0:17:09.862789. 2025-01-21 18:38:25,582::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'iHEP-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 18:38:25,619::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 18:55:25,714::INFO::decompose_cooler_file: Finished decomposition for 'iHEP-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool' in 0:17:00.132143. 2025-01-21 18:55:25,777::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'iHEP-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool'... 2025-01-21 18:55:25,822::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 19:12:27,617::INFO::decompose_cooler_file: Finished decomposition for 'iHEP-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool' in 0:17:01.839386. 2025-01-21 19:12:27,684::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'iHEP-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 19:12:27,711::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 19:29:23,314::INFO::decompose_cooler_file: Finished decomposition for 'iHEP-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool' in 0:16:55.630290. 2025-01-21 19:29:23,377::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'mHEP-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 19:29:23,440::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 19:46:25,565::INFO::decompose_cooler_file: Finished decomposition for 'mHEP-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool' in 0:17:02.187921. 2025-01-21 19:46:25,628::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'mHEP-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool'... 2025-01-21 19:46:25,676::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:03:43,206::INFO::decompose_cooler_file: Finished decomposition for 'mHEP-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool' in 0:17:17.577981. 2025-01-21 20:03:43,274::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'mHEP-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 20:03:43,317::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:21:06,943::INFO::decompose_cooler_file: Finished decomposition for 'mHEP-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool' in 0:17:23.669105. 2025-01-21 20:21:07,006::INFO::run: Model training complete. Training time: 6:02:08.509573. 2025-01-21 20:21:07,006::INFO::run: Computing embeddings using fully trained model... 2025-01-21 20:21:07,006::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'DE-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 20:21:07,048::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:22:24,080::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'DE-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool'... 2025-01-21 20:22:24,087::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:23:41,733::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'DE-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 20:23:41,741::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:24:43,095::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'ESC-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 20:24:43,103::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:25:23,307::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'ESC-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool'... 2025-01-21 20:25:23,314::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:25:38,728::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'ESC-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 20:25:38,735::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:25:54,107::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'HB-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 20:25:54,115::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:26:09,835::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'HB-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool'... 2025-01-21 20:26:09,843::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:26:25,279::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'HB-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 20:26:25,286::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:26:41,300::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'iHEP-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 20:26:41,308::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:26:56,821::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'iHEP-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool'... 2025-01-21 20:26:56,828::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:27:12,641::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'iHEP-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 20:27:12,649::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:27:28,098::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'mHEP-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 20:27:28,105::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:27:43,540::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'mHEP-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool'... 2025-01-21 20:27:43,548::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:27:59,034::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'mHEP-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool'... 2025-01-21 20:27:59,041::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2025-01-21 20:28:14,717::INFO::run: Saving results... 2025-01-21 20:28:15,187::INFO::save_model: Saved model to 'output_2025_01_21_14_18_PCA-32_50000bp_hg38_model.pkl.gz' in 0:00:00.470363. 2025-01-21 20:29:49,376::INFO::run: Saved embeddings to 'output_2025_01_21_14_18_PCA-32_50000bp_hg38_embeddings.csv.gz' and 'output_2025_01_21_14_18_PCA-32_50000bp_hg38_embeddings.pq'. 2025-01-21 20:29:49,377::INFO::run: Finished joint PCA in 6:10:50.880341. 2025-01-21 20:29:49,411::INFO::__init__: PostProcessor :: configuration: ==================== Post Processor Configuration ==================== parquet_file : output_2025_01_21_14_18_PCA-32_50000bp_hg38_embeddings.pq output : output_2025_01_21_14_18_PCA-32_50000bp_hg38 umap_neighbours : [30, 100, 500] kmeans_clusters : [5, 6, 7, 8, 9, 10, 15, 20] leiden_resolutions : [0.1, 0.2, 0.3, 0.5, 0.8, 1.0] method : PCA log_level : INFO ======================================================================== 2025-01-21 20:29:49,511::INFO::run: Running post-processing 2025-01-21 20:29:49,761::INFO::normalize_embeddings: Normalizing embeddings for DE-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool 2025-01-21 20:29:50,234::INFO::normalize_embeddings: Normalizing embeddings for DE-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool 2025-01-21 20:29:50,705::INFO::normalize_embeddings: Normalizing embeddings for DE-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool 2025-01-21 20:29:51,109::INFO::normalize_embeddings: Normalizing embeddings for ESC-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool 2025-01-21 20:29:51,597::INFO::normalize_embeddings: Normalizing embeddings for ESC-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool 2025-01-21 20:29:52,004::INFO::normalize_embeddings: Normalizing embeddings for ESC-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool 2025-01-21 20:29:52,479::INFO::normalize_embeddings: Normalizing embeddings for HB-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool 2025-01-21 20:29:52,901::INFO::normalize_embeddings: Normalizing embeddings for HB-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool 2025-01-21 20:29:53,379::INFO::normalize_embeddings: Normalizing embeddings for HB-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool 2025-01-21 20:29:53,772::INFO::normalize_embeddings: Normalizing embeddings for iHEP-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool 2025-01-21 20:29:54,231::INFO::normalize_embeddings: Normalizing embeddings for iHEP-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool 2025-01-21 20:29:54,639::INFO::normalize_embeddings: Normalizing embeddings for iHEP-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool 2025-01-21 20:29:55,134::INFO::normalize_embeddings: Normalizing embeddings for mHEP-FA-DSG-DdeI-DpnII-P1.hg38.mapq_30.1000.mcool 2025-01-21 20:29:55,546::INFO::normalize_embeddings: Normalizing embeddings for mHEP-FA-DSG-DdeI-DpnII-P2.hg38.mapq_30.1000.mcool 2025-01-21 20:29:56,023::INFO::normalize_embeddings: Normalizing embeddings for mHEP-FA-DSG-HindIII-P1.hg38.mapq_30.1000.mcool 2025-01-21 20:29:57,562::INFO::run: Components shape: (716445, 32) 2025-01-21 20:29:57,562::INFO::run_kmeans: Running KMeans clustering with 5 clusters 2025-01-21 20:30:01,666::INFO::run_kmeans: Running KMeans clustering with 6 clusters 2025-01-21 20:30:06,605::INFO::run_kmeans: Running KMeans clustering with 7 clusters 2025-01-21 20:30:10,700::INFO::run_kmeans: Running KMeans clustering with 8 clusters 2025-01-21 20:30:16,163::INFO::run_kmeans: Running KMeans clustering with 9 clusters 2025-01-21 20:30:21,566::INFO::run_kmeans: Running KMeans clustering with 10 clusters 2025-01-21 20:30:28,916::INFO::run_kmeans: Running KMeans clustering with 15 clusters 2025-01-21 20:30:36,906::INFO::run_kmeans: Running KMeans clustering with 20 clusters 2025-01-21 20:30:47,686::INFO::run_leiden: Running Leiden clustering with 500 neighbors. 2025-01-21 22:41:54,437::INFO::run_leiden: Leiden clustering complete. 2025-01-21 22:42:00,136::INFO::run_leiden: Running Leiden clustering with 500 neighbors. 2025-01-21 23:47:36,242::INFO::run_leiden: Leiden clustering complete. 2025-01-21 23:47:40,253::INFO::run_leiden: Running Leiden clustering with 500 neighbors. 2025-01-22 01:59:51,382::INFO::run_leiden: Leiden clustering complete. 2025-01-22 01:59:55,683::INFO::run_leiden: Running Leiden clustering with 500 neighbors. 2025-01-22 03:35:56,970::INFO::run_leiden: Leiden clustering complete. 2025-01-22 03:36:01,156::INFO::run_leiden: Running Leiden clustering with 500 neighbors. 2025-01-22 07:01:58,992::INFO::run_leiden: Leiden clustering complete. 2025-01-22 07:02:03,244::INFO::run_leiden: Running Leiden clustering with 500 neighbors. 2025-01-22 10:32:38,807::INFO::run_leiden: Leiden clustering complete. 2025-01-22 10:32:43,049::INFO::run_umap: Running UMAP with 30 neighbors 2025-01-22 10:39:28,448::INFO::run_umap: Running UMAP with 100 neighbors 2025-01-22 10:51:13,670::INFO::run_umap: Running UMAP with 500 neighbors 2025-01-22 11:30:58,861::INFO::plot_scores: Plotting scores 2025-01-22 11:31:11,866::INFO::run: Saving embeddings to parquet and csv 2025-01-22 11:33:05,902::INFO::run: Finished running post-processing 2025-01-22 11:33:05,904::INFO::__init__: TrajectoryAnalyzer :: configuration: ==================== Trajectory Analysis Configuration ==================== parquet_file : output_2025_01_21_14_18_PCA-32_50000bp_hg38_embeddings.pq output : output_2025_01_21_14_18_PCA-32_50000bp_hg38 kmeans_clusters : [5, 6, 7, 8, 9, 10, 15, 20] leiden_neighbors : 100 umap_neighbours : [30, 100, 500] method : PCA log_level : INFO ============================================================================= 2025-01-22 11:33:06,180::INFO::__init__: Shape of pivoted trajectory embeddings: (57509, 484) 2025-01-22 11:33:06,181::INFO::run: Running trajectory analysis 2025-01-22 11:33:06,239::INFO::run_kmeans: Running KMeans clustering with 5 clusters 2025-01-22 11:33:07,489::INFO::run_kmeans: Running KMeans clustering with 6 clusters 2025-01-22 11:33:09,024::INFO::run_kmeans: Running KMeans clustering with 7 clusters 2025-01-22 11:33:11,057::INFO::run_kmeans: Running KMeans clustering with 8 clusters 2025-01-22 11:33:13,189::INFO::run_kmeans: Running KMeans clustering with 9 clusters 2025-01-22 11:33:16,899::INFO::run_kmeans: Running KMeans clustering with 10 clusters 2025-01-22 11:33:19,726::INFO::run_kmeans: Running KMeans clustering with 15 clusters 2025-01-22 11:33:24,372::INFO::run_kmeans: Running KMeans clustering with 20 clusters 2025-01-22 11:34:05,937::INFO::run_umap: Running UMAP with 30 neighbors 2025-01-22 11:34:27,656::INFO::run_umap: Running UMAP with 100 neighbors 2025-01-22 11:35:12,731::INFO::run_umap: Running UMAP with 500 neighbors 2025-01-22 11:39:05,334::INFO::run: Finished trajectory analysis 2025-01-22 11:39:05,373::INFO::main: Finished joint PCA