pallas_operations.efficient_attention.efficient_attention
efficient_attention(query, key, value, bias=None, deterministic=True, dropout_rng=None, attention_drop_rate=0.0, causal=True, query_chunk_size=1024, key_chunk_size=1024, dtype=jnp.float32, policy=jax.checkpoint_policies.nothing_saveable(), precision=None, float32_logits=True, prevent_cse=True)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query |
Array
|
Array Shape [batch,Q Sequence length,num attention heads, head dims] |
required |
key |
Array
|
Array Shape [batch,KV Sequence length,num KV attention heads, head dims] |
required |
value |
Array
|
Array Shape [batch,KV Sequence length,num KV attention heads, head dims] |
required |
bias |
Array
|
Bias To be added |
None
|
deterministic |
bool
|
bool (whenever use dropout or no) |
True
|
dropout_rng |
PRNGKey
|
RNG Dropout |
None
|
attention_drop_rate |
float
|
|
0.0
|
causal |
bool
|
Is Decoder or Causal |
True
|
query_chunk_size |
int
|
Chunk size used for query |
1024
|
key_chunk_size |
int
|
Chunk size used for key |
1024
|
dtype |
ArrayDType
|
DataType |
float32
|
policy |
Gradient Checkpoint Policy |
nothing_saveable()
|
|
precision |
PrecisionLike |
None
|
|
float32_logits |
bool
|
|
True
|
prevent_cse |
bool
|
|
True
|
Returns:
| Type | Description |
|---|---|
|
|
Source code in src/fjformer/pallas_operations/efficient_attention/efficient_attention.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | |