2024-09-20 09:45:52,637::INFO::main: Starting joint PCA 2024-09-20 09:45:58,556::INFO::__init__: JointPCADecomposer :: configuration: ==================== Joint PCA Configuration ==================== mcools : - DE.mcool - ESC.mcool - HB.mcool - iHEP.mcool - mHEP.mcool resolution : 50000 assembly : hg38 output : output_2024_09_20_09_45 components : 32 chrom_limit : 22 method : PCA exclusion_list : None percentile_top : 99.5 percentile_bottom : 1.0 batch_size : 10000 log_level : INFO =================================================================== 2024-09-20 09:45:58,574::INFO::get_chromosome_sizes: Loaded chromosome sizes for specified assembly: hg38 2024-09-20 09:45:58,574::INFO::get_chromosome_sizes: Chromosome sizes: name chr1 248956422 chr2 242193529 chr3 198295559 chr4 190214555 chr5 181538259 chr6 170805979 chr7 159345973 chr8 145138636 chr9 138394717 chr10 133797422 chr11 135086622 chr12 133275309 chr13 114364328 chr14 107043718 chr15 101991189 chr16 90338345 chr17 83257441 chr18 80373285 chr19 58617616 chr20 64444167 chr21 46709983 chr22 50818468 Name: length, dtype: int64 2024-09-20 09:45:58,575::INFO::set_union_bad_bins: Beginning to compute union set of NaN bins.. 2024-09-20 10:20:06,682::INFO::set_union_bad_bins: Loaded union set of bad bins in: 0:34:08.107370. 2024-09-20 10:20:06,682::INFO::set_union_bad_bins: Percent of bins that are bad: 13.716113999547897. 2024-09-20 10:20:07,452::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'DE.mcool'... 2024-09-20 10:20:07,517::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2024-09-20 10:40:25,002::INFO::decompose_cooler_file: Finished decomposition for 'DE.mcool' in 0:20:17.549065. 2024-09-20 10:40:25,077::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'ESC.mcool'... 2024-09-20 10:40:25,139::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2024-09-20 11:00:00,805::INFO::decompose_cooler_file: Finished decomposition for 'ESC.mcool' in 0:19:35.728288. 2024-09-20 11:00:00,881::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'HB.mcool'... 2024-09-20 11:00:00,934::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2024-09-20 11:19:28,382::INFO::decompose_cooler_file: Finished decomposition for 'HB.mcool' in 0:19:27.501185. 2024-09-20 11:19:28,458::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'iHEP.mcool'... 2024-09-20 11:19:28,501::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2024-09-20 11:39:05,267::INFO::decompose_cooler_file: Finished decomposition for 'iHEP.mcool' in 0:19:36.809253. 2024-09-20 11:39:05,342::INFO::decompose_cooler_file: Computing dimensionality reduction for input file: 'mHEP.mcool'... 2024-09-20 11:39:05,421::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2024-09-20 11:58:53,932::INFO::decompose_cooler_file: Finished decomposition for 'mHEP.mcool' in 0:19:48.590379. 2024-09-20 11:58:54,008::INFO::run: Model training complete. Training time: 2:12:55.433082. 2024-09-20 11:58:54,008::INFO::run: Computing embeddings using fully trained model... 2024-09-20 11:58:54,008::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'DE.mcool'... 2024-09-20 11:58:54,031::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2024-09-20 11:59:12,159::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'ESC.mcool'... 2024-09-20 11:59:12,166::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2024-09-20 11:59:29,321::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'HB.mcool'... 2024-09-20 11:59:29,328::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2024-09-20 11:59:46,424::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'iHEP.mcool'... 2024-09-20 11:59:46,430::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2024-09-20 12:00:03,550::INFO::compute_output_embeddings_single_file: Computing embeddings for input file: 'mHEP.mcool'... 2024-09-20 12:00:03,557::INFO::preprocess_matrix: Loading preprocessed matrix from disk... 2024-09-20 12:00:21,062::INFO::run: Saving results... 2024-09-20 12:00:21,597::INFO::save_model: Saved model to 'output_2024_09_20_09_45_PCA-32_50000bp_hg38_model.pkl.gz' in 0:00:00.534721. 2024-09-20 12:00:59,522::INFO::run: Saved embeddings to 'output_2024_09_20_09_45_PCA-32_50000bp_hg38_embeddings.csv.gz' and 'output_2024_09_20_09_45_PCA-32_50000bp_hg38_embeddings.pq'. 2024-09-20 12:00:59,522::INFO::run: Finished joint PCA in 2:15:00.947176. 2024-09-20 12:00:59,534::INFO::__init__: PostProcessor :: configuration: ==================== Post Processor Configuration ==================== parquet_file : output_2024_09_20_09_45_PCA-32_50000bp_hg38_embeddings.pq output : output_2024_09_20_09_45_PCA-32_50000bp_hg38 umap_neighbours : [30, 100, 500] kmeans_clusters : [5, 6, 7, 8, 9, 10, 15, 20] leiden_resolutions : [0.1, 0.2, 0.3, 0.5, 0.8, 1.0] method : PCA log_level : INFO ======================================================================== 2024-09-20 12:00:59,568::INFO::run: Running post-processing 2024-09-20 12:00:59,631::INFO::normalize_embeddings: Normalizing embeddings for DE.mcool 2024-09-20 12:00:59,782::INFO::normalize_embeddings: Normalizing embeddings for ESC.mcool 2024-09-20 12:00:59,935::INFO::normalize_embeddings: Normalizing embeddings for HB.mcool 2024-09-20 12:01:00,096::INFO::normalize_embeddings: Normalizing embeddings for iHEP.mcool 2024-09-20 12:01:00,250::INFO::normalize_embeddings: Normalizing embeddings for mHEP.mcool 2024-09-20 12:01:00,930::INFO::run: Components shape: (248105, 32) 2024-09-20 12:01:00,931::INFO::run_kmeans: Running KMeans clustering with 5 clusters 2024-09-20 12:01:02,798::INFO::run_kmeans: Running KMeans clustering with 6 clusters 2024-09-20 12:01:04,837::INFO::run_kmeans: Running KMeans clustering with 7 clusters 2024-09-20 12:01:07,010::INFO::run_kmeans: Running KMeans clustering with 8 clusters 2024-09-20 12:01:09,625::INFO::run_kmeans: Running KMeans clustering with 9 clusters 2024-09-20 12:01:12,109::INFO::run_kmeans: Running KMeans clustering with 10 clusters 2024-09-20 12:01:15,194::INFO::run_kmeans: Running KMeans clustering with 15 clusters 2024-09-20 12:01:18,948::INFO::run_kmeans: Running KMeans clustering with 20 clusters 2024-09-20 12:01:23,419::INFO::run_leiden: Running Leiden clustering with 500 neighbors. 2024-09-20 12:33:50,209::INFO::run_leiden: Leiden clustering complete. 2024-09-20 12:33:51,601::INFO::run_leiden: Running Leiden clustering with 500 neighbors. 2024-09-20 13:04:45,134::INFO::run_leiden: Leiden clustering complete. 2024-09-20 13:04:46,938::INFO::run_leiden: Running Leiden clustering with 500 neighbors. 2024-09-20 13:35:11,109::INFO::run_leiden: Leiden clustering complete. 2024-09-20 13:35:12,688::INFO::run_leiden: Running Leiden clustering with 500 neighbors. 2024-09-20 14:09:26,020::INFO::run_leiden: Leiden clustering complete. 2024-09-20 14:09:27,659::INFO::run_leiden: Running Leiden clustering with 500 neighbors. 2024-09-20 14:47:47,645::INFO::run_leiden: Leiden clustering complete. 2024-09-20 14:47:49,325::INFO::run_leiden: Running Leiden clustering with 500 neighbors. 2024-09-20 15:24:01,459::INFO::run_leiden: Leiden clustering complete. 2024-09-20 15:24:03,110::INFO::run_umap: Running UMAP with 30 neighbors 2024-09-20 15:27:21,376::INFO::run_umap: Running UMAP with 100 neighbors 2024-09-20 15:31:08,335::INFO::run_umap: Running UMAP with 500 neighbors 2024-09-20 15:45:46,388::INFO::plot_scores: Plotting scores 2024-09-20 15:45:59,130::INFO::run: Saving embeddings to parquet and csv 2024-09-20 15:46:45,769::INFO::run: Finished running post-processing 2024-09-20 15:46:45,770::INFO::__init__: TrajectoryAnalyzer :: configuration: ==================== Trajectory Analysis Configuration ==================== parquet_file : output_2024_09_20_09_45_PCA-32_50000bp_hg38_embeddings.pq output : output_2024_09_20_09_45_PCA-32_50000bp_hg38 kmeans_clusters : [5, 6, 7, 8, 9, 10, 15, 20] leiden_neighbors : 100 umap_neighbours : [30, 100, 500] method : PCA log_level : INFO ============================================================================= 2024-09-20 15:46:45,868::INFO::__init__: Shape of pivoted trajectory embeddings: (57509, 164) 2024-09-20 15:46:45,868::INFO::run: Running trajectory analysis 2024-09-20 15:46:45,892::INFO::run_kmeans: Running KMeans clustering with 5 clusters 2024-09-20 15:46:47,490::INFO::run_kmeans: Running KMeans clustering with 6 clusters 2024-09-20 15:46:49,626::INFO::run_kmeans: Running KMeans clustering with 7 clusters 2024-09-20 15:46:51,424::INFO::run_kmeans: Running KMeans clustering with 8 clusters 2024-09-20 15:46:53,433::INFO::run_kmeans: Running KMeans clustering with 9 clusters 2024-09-20 15:46:55,517::INFO::run_kmeans: Running KMeans clustering with 10 clusters 2024-09-20 15:46:57,398::INFO::run_kmeans: Running KMeans clustering with 15 clusters 2024-09-20 15:46:59,747::INFO::run_kmeans: Running KMeans clustering with 20 clusters 2024-09-20 15:47:54,262::INFO::run_umap: Running UMAP with 30 neighbors 2024-09-20 15:48:17,502::INFO::run_umap: Running UMAP with 100 neighbors 2024-09-20 15:49:01,802::INFO::run_umap: Running UMAP with 500 neighbors 2024-09-20 15:51:40,401::INFO::run: Finished trajectory analysis 2024-09-20 15:51:40,425::INFO::main: Finished joint PCA