slides-shapes_and_einsum.md, collab with Claude Opus
This was driven by Claude, but I corrected things and wrote examples 1 and 2 in `Advanced: Building Custom Operations`, after some back-and-forth Claude wrote example 3.
Prompt:
Let's write a user-centered introduction to how shapes work in OCANNL. Let's put the slides in docs/slides-shapes_and_einsum.md , and write them using slipshow navigation metadata as described in docs/CLAUDE.md . The slides should take a user from a beginner to advanced in making full use of shape inference and generalized einsum notation when building neural network models. They should end up aware of how projections work, how to lean on shape inference or row variables / ellipsis notation to not commit to dimension sizes or for example the number of batch axes unnecessarily. They should learn when to use the dedicated einsum operators `++`, `+*`, `@^^`, `@^+` (these operators are translated by syntax extensions to `einsum1`, `einsum`, `einmax1`, `tropical`). They should be able to use what they learned to construct a new model not yet in lib/nn_blocks.ml . Consider these sources of information: files docs/syntax_extensions.md , docs/shape_inference.md , lib/nn_blocks.ml , lib/shape.mli , selected parts of lib/operation.ml , selected parts of docs/slides-basics_backprop_training_codegen.md . Let me also provide some points that might not be stated sufficiently explicitly in other documentation. (1) The split of axes into kinds does not enforce semantics, because the generalized einsum notation can make aribtrary use of the axes. However, it offers expressivity gains. (2) It's important to stress the syntactic difference with NumPy: since we use `->` to separate input and output axes, it cannot mean separating the argument tensor(s) from the result tensor -- thus `=>` is used to the left of the result tensor; similarly, we use `;` to separate arguments and (in multi-char mode) `,` to separate identifiers within a row. (3) Remember to use kind separators where you intend to use the distinct axis kinds, e.g use `|` after batch axes . (4) To trigger multichar mode there must be a comma in the spec, it can be a trailing comma e.g. "input->output, => output->input" . (5) A reminder that, as defined in lib/operation.ml , `*` stands for tensor multiplication and `*.` stands for pointwise multiplication when working with tensor expressions (rather than low-level assignments in the `%cd` syntax). (7) The user can define operations analogous to the `einsum1` and `einsum` operation in lib/operation.ml , for example with the max operator as theaccumulation operator -- this is not so scary, operations can be easily added by users even if not inside lib/operation.ml .
Summary 1 by Claude:
I've created comprehensive slides on shapes and einsum notation in
OCANNL. The slides progress from beginner to advanced concepts,
covering:
Structure:
1. Introduction - Why shape inference matters
2. Basics - NumPy differences, axis kinds
3. Row variables - Flexible broadcasting with ... and ..var..
4. Multi-character mode - When to use commas
5. Einsum operators - ++, +*, @^^, @^+
6. Convolutions - Using the + syntax
7. Dimension capture - Extracting sizes for computation
8. Custom operations - Building beyond built-ins
9. Practical patterns - Real-world examples
10. Shape inference - How it works under the hood
11. Projections - Loop generation details
12. Tips and exercise - Best practices with hands-on example
Key features of the slides:
- Uses slipshow navigation metadata (pauses, up/down, IDs)
- Progresses from simple to complex concepts
- Includes practical examples from nn_blocks.ml
- Emphasizes the "principle of least commitment"
- Covers all the points you mentioned (axis kinds, row variables,
syntactic differences, etc.)
- Ends with an exercise building custom attention
The slides teach users to leverage OCANNL's powerful shape inference
rather than over-specifying dimensions, and to use row variables
for maximum flexibility when building neural network models.
Prompt 2:
We also have the documentation docs/migration_guide.md . Review your
work in light of that file. Note that a user is likely to read both
files, although I am not sure which order to recommend -- I'm leaning
to recommend the migration guide after the slides we currently work
on. Note that you made some mistakes that I corrected. Apply any
updated you deem are still helpful.
Summary 2 by Claude:
Corrections applied:
1. Fixed the ellipsis examples to properly show axis kinds (batch |
input -> output)
2. Corrected *. for pointwise multiplication (not * which is tensor
multiplication)
3. Added clarification for single-char vs multi-char row variable
syntax (..v.. vs ..var..)
4. Fixed the matrix multiplication examples to properly show axis
kinds
Enhancements from migration guide:
1. Added a "Common Pitfalls" section highlighting operator
differences
2. Included the 0.5 + 0.5 trick for creating shape-inferred
constants
3. Added connection to migration guide in the "Next Steps"
4. Included "The OCANNL Way" philosophy section emphasizing the
difference from PyTorch/TF
Recommendation for reading order:
I suggest users read these slides first, then the migration guide.
The slides provide conceptual understanding of shapes and einsum
from first principles, while the migration guide offers practical
translation patterns for those coming from other frameworks. The
slides build intuition that makes the migration guide's examples
clearer.
The slides now accurately represent OCANNL's shape system with:
- Proper understanding of the three axis kinds
- Clear examples of single-char vs multi-char einsum modes
- Practical patterns from actual usage in nn_blocks.ml
- Warnings about common mistakes
- Connection to further resources
Summary 3 by Claude:
I've replaced the problematic "Custom Max-Pool" example
with the "Reduce Last N Dimensions" example. This new example:
1. Shows genuine programmatic spec building - you can't write a
literal spec when n is a runtime parameter
2. Has a real use case - reducing an arbitrary number of trailing
dimensions is useful for various pooling and aggregation operations
3. Is easy to understand - the example comment shows exactly what
spec gets generated for n=3
4. Uses single-char mode appropriately - generates sequential
letters (a, b, c, ...) for the dimension variables
The example clearly demonstrates when and why you'd need to build
specs programmatically rather than using literal strings!
Summary 4:
I've updated the "Principle of least commitment" section to
accurately reflect how OCANNL handles different axis kinds:
1. Made it clear that "...|...->..." is needed for truly arbitrary
shapes (all three axis kinds)
2. Added context for when simpler patterns work:
- "...->..." for parameters (which typically don't have batch
axes)
- "...|..." for data tensors (which typically don't have input
axes)
3. Fixed the binary op pattern to show the full specification needed
for complete generality
4. Clarified that "... | ..d.. => ... | 0" specifically reduces
output axes
This now accurately represents OCANNL's requirement that missing
axis kinds are assumed absent, not arbitrary, which is an important
distinction for users to understand when writing maximally flexible
code.
b93368
-
Sep 07 15:00 +00:00