Organisationsahrefsocannl

ocannl

https://github.com/ahrefs/ocannl
Branches (10)
ludics/gh-ocannl-57-s1/root
Add regression test for decoder_only_block/decoder_only API Exercises the new Nn_blocks.decoder_only helper with a 2-layer stack, causal mask, and forward pass, validating output shape. This ensures the new public API added in the previous commit has CI coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ee621a
ludics/gh-ocannl-57-s5/root
gh-ocannl-57: complete decoder-only transformer example on Names dataset Use library decoder_only_block in layer_norm_test (replacing local mini_decoder_block duplicate), update README checklist and CHANGES.md. The core implementation (decoder_only_block/decoder_only in nn_blocks.ml, transformer_names.ml training + generation, dune stanza, expected output) was added in prior commits on this branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
87492d
ludics/task-2d20f48e-s1/root
Block write-write conflicts in loop fusion safety check Address Codex review: fusible_bodies now also rejects fusion when both loop bodies write to the same tensor, preventing interleaving order changes that could produce different numerical results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
98f24c
ludics/task-2d20f48e-s2/root
Cross-statement CSE with hoisting to common ancestor scope Add `Declare_local of scope_id` IR node that emits a local variable declaration and zero-initialization. Implement `hoist_cross_statement_cse` pass that scans flattened Seq siblings for alpha-equivalent Local_scope nodes across different Set statements, hoists them as Declare_local + body before the first user, and replaces all occurrences with Get_local. The pass runs after intra-statement CSE in the optimization pipeline, includes a conservative safety check that rejects hoisting when intervening statements write to tensors read by the Local_scope body. Includes regression test exercising to_doc, to_doc_cstyle, and c_syntax pp_ll paths with cross-statement deduplication. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9fb1c8
ludics/task-fe1c593d-s1/root
feat: add block tensor literal syntax for %op PPX extension Implement block tensor construction via list, tuple, and array syntax in %op blocks: [x1; x2] (output axis), (x1, x2) (input axis), [|x1; x2|] (batch axis). Disambiguate from ndarray constants by checking the first leaf expression. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
446069
ludics/task-fe1c593d-s4/root
fix: preserve batch/input axes in output-axis block tensor concat spec The output-axis concat spec "bt0,...; bt1,..." dropped batch and input rows, causing shape errors when list block tensor components had input or batch axes. Use the full spec "...|...-> bt0,...; ..." to broadcast batch/input rows through concatenation. Add test 13 exercising list block tensors with input-dim components. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f46315
ludics/watch-ocannl-readme-md-b61f3434-s2/root
fix: restore cross-device synchronization in device_to_device copies When source and destination are on different devices, wait for source stream writes to complete before scheduling the copy on the destination stream. Without this, async CUDA/Metal copies could read stale data. Addresses review feedback on PR #12. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
80aec5
ludics/watch-ocannl-readme-md-b61f3434-s6/root
Remove deprecated multi-stream infrastructure from backend layer Remove the config type (Only_devices_parallel | For_parallel_copying | Most_parallel_streams), the sharing type (Per_stream | Shared_cross_streams), cross-stream synchronization logic (wait_for_all, shared_writer_streams, host_reading_streams, host_writing_streams, reader_streams, owner_stream), the Streaming_for merge buffer variant, suggested_num_streams from all backends, round_robin/round_robin_dry_run from train.ml, and cross-stream candidate tracking. Replace cross_stream_candidates with constant_buffer_cache for per-device read-only buffer caching. Simplify device_to_device to use phys_equal pointer-identity check instead of the old same-device fast-path guard. Update documentation to reflect the cleanup. Closes #341 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e1cc3d
watch-ocannl-README-md-b61f3434-claude
Pre-PR cleanup for watch-ocannl-README-md-b61f3434
eae4bb
watch-ocannl-README-md-b61f3434-codex
Pre-PR cleanup for watch-ocannl-README-md-b61f3434
0377b9
Refs Branches (6)
feat: add block tensor literal syntax for %op PPX
feat: add block tensor literal syntax for %op PPX extension Implement block tensor construction via list, tuple, and array syntax in %op blocks: [x1; x2] (output axis), (x1, x2) (input axis), [|x1; x2|] (batch axis). Disambiguate from ndarray constants by checking the first leaf expression. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
446069
#443
feat: block tensor literal syntax [ta; tb], (ta, tb), [|ta; tb|]
fix: preserve batch/input axes in output-axis block tensor concat spec The output-axis concat spec "bt0,...; bt1,..." dropped batch and input rows, causing shape errors when list block tensor components had input or batch axes. Use the full spec "...|...-> bt0,...; ..." to broadcast batch/input rows through concatenation. Add test 13 exercising list block tensors with input-dim components. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f46315
#450
Fix build_files/ sharing between test executables
Make remove_dir_if_exists recursive to handle prefix subdirectories When no build_files_prefix is set, startup cleanup removes the entire build_files/ tree. The previous Sys.remove call fails on subdirectory entries left by prefixed test runs. Recurse into subdirectories before removing them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
45d1c1
#453
gh-ocannl-57: decoder-only transformer example on Names dataset
gh-ocannl-57: complete decoder-only transformer example on Names dataset Use library decoder_only_block in layer_norm_test (replacing local mini_decoder_block duplicate), update README checklist and CHANGES.md. The core implementation (decoder_only_block/decoder_only in nn_blocks.ml, transformer_names.ml training + generation, dune stanza, expected output) was added in prior commits on this branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
87492d
#454
Remove deprecated multi-streaming infrastructure (#341)
fix: restore cross-device synchronization in device_to_device copies When source and destination are on different devices, wait for source stream writes to complete before scheduling the copy on the destination stream. Without this, async CUDA/Metal copies could read stale data. Addresses review feedback on PR #12. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
80aec5
#455
Cross-statement CSE with hoisting to common ancestor scope
Block write-write conflicts in loop fusion safety check Address Codex review: fusible_bodies now also rejects fusion when both loop bodies write to the same tensor, preventing interleaving order changes that could produce different numerical results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
98f24c
#456