Commit graph

51 commits

Author SHA1 Message Date
powe97 bdc6b2bcbc
Short circuit on IP ban 2024-03-16 21:16:14 -04:00
powe97 5e9e464ad0
Add CSV generator 2024-03-16 02:17:31 -04:00
powe97 ba3e5c77d1
Lint 2024-03-16 01:17:49 -04:00
powe97 c3be28e520
Add JSON converter from by-institution to by-course 2024-03-16 01:14:58 -04:00
powe97 061f9b14e5
Sort keys 2024-03-16 01:07:16 -04:00
powe97 1fa7ab61af
Rename transfer scraper 2024-03-16 00:32:59 -04:00
powe97 517952f977
Re-add catalog name scraping 2024-03-13 23:52:51 -04:00
powe97 f1a47dca48
stderr 2024-03-13 22:19:56 -04:00
powe97 af25410c5d
Log IP 2024-03-13 22:18:13 -04:00
powe97 b017436be9
Fix handling for courses that come in as multiple courses 2024-03-13 15:48:53 -04:00
powe97 4c0517f6c4
Print title 2024-03-07 16:14:54 -06:00
powe97 779b979b9b
Fix credit count parsing 2024-03-07 11:03:59 -06:00
powe97 0f3652d8cc
Lint 2024-03-06 16:16:36 -06:00
powe97 7c87221256
Fix typo 2024-03-06 16:16:23 -06:00
powe97 69d8946f37
Improve credit count parsing 2024-03-06 16:13:13 -06:00
powe97 9c374bf130
Fix timeout issue (again) 2024-03-06 13:20:09 -06:00
powe97 10360ff57c
Wait for table 2024-03-06 12:48:48 -06:00
powe97 de89a56808
Add more debug printing 2024-03-06 12:41:08 -06:00
powe97 81ba2fdc80
Make failing actually fail the program 2024-03-06 02:35:18 -06:00
powe97 92c3327b1a
Fix debug prints 2024-03-06 02:09:42 -06:00
powe97 912b07f6f3
Add retrying first page 2024-03-06 01:18:49 -06:00
powe97 8b15438a98
Actually use the retry version of the function... 2024-03-06 01:03:23 -06:00
powe97 c98b928125
Add retrying 2024-03-05 22:54:42 -05:00
powe97 a0b9081f8f
--headless 2024-03-05 21:14:32 -05:00
powe97 4f69c1d8a0
Re-get the page to try circumvent timeout 2024-03-05 21:14:00 -05:00
powe97 02b383b90b
Extend timeout 2024-03-05 20:47:41 -05:00
powe97 fc72fda5de
Remove jump debug print 2024-03-05 19:10:10 -05:00
powe97 52fdab6ce6
Make everything stderr print 2024-03-05 19:03:54 -05:00
powe97 ce2f22b23b
Merge branch 'main' of https://github.com/quatalog/quatalog 2024-03-05 18:38:12 -05:00
powe97 6ad6f85708
Redesign scraper to not be unbearably slow 2024-03-05 18:33:54 -05:00
powe97 976b553b14
Reduce wait time 2024-03-04 17:03:13 -05:00
powe97 faf303ec27
Add termination 2024-03-03 23:53:25 -05:00
powe97 ae286917c1
Formatting 2024-03-02 02:26:58 -05:00
powe97 80f9ed1d95
Merge branch 'main' of https://github.com/quatalog/quatalog 2024-03-02 02:25:26 -05:00
powe97 bc07c559bc
Uh-oh 2024-03-02 02:22:30 -05:00
powe97 30f4f49cdb
Fix bug where only 1 page is scraped per school and refactor 2024-03-01 20:32:00 -05:00
powe97 baa74b8ee6
Fix issue where only 1 page per school would get scraped properly 2024-03-01 18:17:53 -05:00
powe97 5ea6816c90
Fix capitalization next to smart apostrophes (really?) 2024-03-01 17:21:45 -05:00
powe97 6b5356c84f
Fix typo leading to bad capitalization 2024-03-01 15:01:20 -05:00
powe97 3b608fad41
Fix Roman numerals issue 2024-03-01 13:32:02 -05:00
powe97 d03be03aeb
Move debug print to be more accurate 2024-03-01 01:50:01 -05:00
powe97 1a4542e20e
Fix crashing without timeout arg and re-add --headless 2024-03-01 00:29:34 -05:00
powe97 b0acd0e745
Dammit python 2024-02-29 22:31:09 -05:00
powe97 c6e28d399a
Make timeout field have default value 2024-02-29 22:28:00 -05:00
powe97 cf2abf7193
Fix partial updates when KeyboardInterrupt happens mid-institution 2024-02-29 22:13:44 -05:00
powe97 8a3e8a84d8
See previous commit 2024-02-29 21:25:53 -05:00
powe97 12d844ca28
Fix global var fuckery 2024-02-29 21:21:39 -05:00
powe97 4916feeb19
Add debug timeout to workflow 2024-02-29 21:16:07 -05:00
powe97 b304e9f8d2
Fix scraper 2024-02-29 21:02:38 -05:00
powe97 f216c45748
Add if __name__ == "__main__" and fix workflow 2024-02-29 20:49:45 -05:00