interlab.queries.query_for_json

  1import json
  2import re
  3from typing import Any, TypeVar
  4
  5import pydantic
  6from fastapi.encoders import jsonable_encoder
  7
  8from treetrace import FormatStr, TracingNode
  9
 10from .json_examples import generate_json_example
 11from .json_parsing import find_and_parse_json_block
 12from .json_schema import get_json_schema, get_pydantic_model
 13from .query_failure import ParsingFailure
 14from .query_model import query_model
 15
 16_FORMAT_PROMPT = """\
 17# Format instructions:\n
 18{deliberation}Write the answer as a single valid JSON value, conforming to the following JSON schema:\n
 19```json
 20{schema}
 21```\n
 22The answer should contain exactly one valid JSON code block delimited by "```json" and "```".
 23"""
 24
 25
 26_FORMAT_PROMPT_DELIBERATE = """\
 271. Deliberate about the problem and write your thoughts as free-form text containing no JSON.
 282. """
 29
 30
 31_FORMAT_PROMPT_EXAMPLE = """\
 32Here is an example JSON instance of the given schema.\n
 33```json
 34{example}
 35```\n"""
 36
 37
 38_FORMAT_VAR = "FORMAT_PROMPT"
 39
 40TOut = TypeVar("TOut")
 41
 42
 43def query_for_json(
 44    model: Any,
 45    T: type,
 46    prompt: str,
 47    with_example: bool | TOut | str = False,
 48    with_cot: bool = False,
 49    max_repeats: int = 5,
 50    model_for_examples: Any = None,
 51) -> TOut:
 52    """
 53    Prompt `model` to produce a JSON representation of type `T`, and return it parsed and validated.
 54
 55    * `model` can be a langchain normal or chat model, a interlab model, or just any callable object.
 56    * `T` needs to be a dataclass, a pydantic BaseModel, or a pydantic dataclass.
 57      While defining the classes, use field names and desciprions that will help the LLM fill in the data as you
 58      expect it. Recursive classes are not suported.
 59      After parsing, the models will also be validated.
 60    * `prompt` is any string query to the model. If `prompt` contains "{FORMAT_PROMPT}", it will be replaced with format
 61      instructions, the JSON schema and the optional JSON example. Otherwise this information will be appended
 62      at the end of the prompt (this seems to work well).
 63    * `with_example=True` will generate an example JSON instance of the type and its schema, and the example will
 64      be added to the prompt. Examples can help smaller LLMs or with more complex tasks but it is for now unclear
 65      how much they help larger models, and there is some chance they influence the answer.
 66      The example is generated by an LLM (default: gpt-3.5-turbo) so they are trying to be semantically menaningful
 67      instances of type T relative to field names and descriptions.
 68      In-memory and on-disk caching of the examples for schemas is TODO.
 69      You can also provide your own example by passing a JSON string or JSON-serializable object in `with_example`.
 70      Note that a provided example is not validated (TODO: validate it).
 71    * `with_cot=True` adds a minimal prompt for writing chain-of-thought reasoning before writing
 72      out the JSON response. This may improve response quality (via CoT deliberation) but has some risks:
 73      the models may include JSON in their deliberation (confusing the parser) or run out of token limit via
 74      lenghty deliberation.
 75    * `max_repeats` limits how many times will the model be queried before raising an exception -
 76      all models have some chance to fail to follow the instructions, and this gives them several chances.
 77      Repetition is triggered valid JSON is not found in the output or if it fails to validate
 78      against the schema or any validators in the dataclasses.
 79      Note there is no repetition on LLM model failure (the model is expected to take care of network faiures etc.).
 80    * `model_for_examples` can specify an model to use to generate the example JSON. By default,
 81      `gpt-3.5-turbo` is used.
 82
 83    Returns a valid instance of `T` or raises `ParsingFailure` if all retries failed to find valid JSON.
 84
 85    *Notes:*
 86
 87    - Tracing: `query_for_json` logs one TraceNode for its call, and uses `query_model` which
 88      also logs TraceNodes for the LLM calls themselves by default.
 89
 90    - Uses pydatinc under the hood to construction of JSON schemas, flexible conversion of types to schema,
 91      validation etc.
 92
 93    - The prompts ask the LLMs to wrap the JSON in markdown-style codeblocks for additional robustness
 94      (e.g. against wild `{` or `}` somewhere in surrounding text, which is hard to avoid reliably.),
 95      and falls back to looking for the outermost `{}`-pair.
 96      This may still fail e.g. when talking about JSON in your task, or having the JSON answer
 97      contain "```" as substrings. While current version seems sufficient, there are TODOs for improvement.
 98
 99    - The schema presented to LLM is reference-free; all `$ref`s from the JSON schema are resolved.
100    """
101    if isinstance(prompt, str):
102        fmt_count = len(re.findall(f'{"{"}{_FORMAT_VAR}{"}"}', prompt))
103        if fmt_count > 1:
104            raise ValueError(
105                f'Multiple instances of {"{"}{_FORMAT_VAR}{"}"} found in prompt'
106            )
107        if fmt_count == 0:
108            prompt = (
109                FormatStr() + prompt + FormatStr("\n\n{" + _FORMAT_VAR + "#77777726}")
110            )
111    elif isinstance(prompt, FormatStr):
112        if _FORMAT_VAR not in prompt.free_params():
113            prompt += FormatStr("\n\n{" + _FORMAT_VAR + "#77777726}")
114    else:
115        raise TypeError("query_for_json only accepts str or FormatStr as `prompt`")
116
117    deliberation = _FORMAT_PROMPT_DELIBERATE if with_cot else ""
118
119    pdT = get_pydantic_model(T)
120    schema = get_json_schema(pdT)
121    format_prompt = _FORMAT_PROMPT.format(schema=schema, deliberation=deliberation)
122
123    if with_example is True:
124        with_example = generate_json_example(schema, model=model_for_examples)
125    if with_example and not isinstance(with_example, str):
126        with_example = json.dumps(jsonable_encoder(with_example))
127    if with_example:
128        format_prompt += _FORMAT_PROMPT_EXAMPLE.format(example=with_example)
129
130    if isinstance(prompt, str):
131        prompt_with_fmt = prompt.replace(f'{"{"}{_FORMAT_VAR}{"}"}', format_prompt)
132    else:
133        prompt_with_fmt = prompt.format(**{_FORMAT_VAR: format_prompt})
134
135    with TracingNode(
136        f"query for JSON of type {T}",
137        kind="query",
138        inputs=dict(
139            prompt=prompt,
140            with_example=with_example,
141            with_cot=with_cot,
142            max_repeats=max_repeats,
143            T=str(T),
144        ),
145    ) as c:
146        for i in range(max_repeats):
147            res = query_model(model, prompt_with_fmt)
148            assert isinstance(res, str)
149            try:
150                d = find_and_parse_json_block(res)
151                # TODO: Is the following conversion/validation working for nested fields as well?
152                # Convert to pydantic type for permissive conversion and validation
153                d = pdT(**d)
154                # Convert back to match expected type (nested types are ok)
155                d = T(**d.dict())
156                assert isinstance(d, T)
157                c.set_result(d)
158                return d
159            except (ValueError, pydantic.ValidationError) as e:
160                if i < max_repeats - 1:
161                    continue
162                # Errors on last turn get logged into tracing and propagated
163                raise ParsingFailure(
164                    f"model repeatedly returned a response without a valid JSON instance of {T.__class__.__name__}"
165                ) from e
def query_for_json( model: Any, T: type, prompt: str, with_example: Union[bool, ~TOut, str] = False, with_cot: bool = False, max_repeats: int = 5, model_for_examples: Any = None) -> ~TOut:
 44def query_for_json(
 45    model: Any,
 46    T: type,
 47    prompt: str,
 48    with_example: bool | TOut | str = False,
 49    with_cot: bool = False,
 50    max_repeats: int = 5,
 51    model_for_examples: Any = None,
 52) -> TOut:
 53    """
 54    Prompt `model` to produce a JSON representation of type `T`, and return it parsed and validated.
 55
 56    * `model` can be a langchain normal or chat model, a interlab model, or just any callable object.
 57    * `T` needs to be a dataclass, a pydantic BaseModel, or a pydantic dataclass.
 58      While defining the classes, use field names and desciprions that will help the LLM fill in the data as you
 59      expect it. Recursive classes are not suported.
 60      After parsing, the models will also be validated.
 61    * `prompt` is any string query to the model. If `prompt` contains "{FORMAT_PROMPT}", it will be replaced with format
 62      instructions, the JSON schema and the optional JSON example. Otherwise this information will be appended
 63      at the end of the prompt (this seems to work well).
 64    * `with_example=True` will generate an example JSON instance of the type and its schema, and the example will
 65      be added to the prompt. Examples can help smaller LLMs or with more complex tasks but it is for now unclear
 66      how much they help larger models, and there is some chance they influence the answer.
 67      The example is generated by an LLM (default: gpt-3.5-turbo) so they are trying to be semantically menaningful
 68      instances of type T relative to field names and descriptions.
 69      In-memory and on-disk caching of the examples for schemas is TODO.
 70      You can also provide your own example by passing a JSON string or JSON-serializable object in `with_example`.
 71      Note that a provided example is not validated (TODO: validate it).
 72    * `with_cot=True` adds a minimal prompt for writing chain-of-thought reasoning before writing
 73      out the JSON response. This may improve response quality (via CoT deliberation) but has some risks:
 74      the models may include JSON in their deliberation (confusing the parser) or run out of token limit via
 75      lenghty deliberation.
 76    * `max_repeats` limits how many times will the model be queried before raising an exception -
 77      all models have some chance to fail to follow the instructions, and this gives them several chances.
 78      Repetition is triggered valid JSON is not found in the output or if it fails to validate
 79      against the schema or any validators in the dataclasses.
 80      Note there is no repetition on LLM model failure (the model is expected to take care of network faiures etc.).
 81    * `model_for_examples` can specify an model to use to generate the example JSON. By default,
 82      `gpt-3.5-turbo` is used.
 83
 84    Returns a valid instance of `T` or raises `ParsingFailure` if all retries failed to find valid JSON.
 85
 86    *Notes:*
 87
 88    - Tracing: `query_for_json` logs one TraceNode for its call, and uses `query_model` which
 89      also logs TraceNodes for the LLM calls themselves by default.
 90
 91    - Uses pydatinc under the hood to construction of JSON schemas, flexible conversion of types to schema,
 92      validation etc.
 93
 94    - The prompts ask the LLMs to wrap the JSON in markdown-style codeblocks for additional robustness
 95      (e.g. against wild `{` or `}` somewhere in surrounding text, which is hard to avoid reliably.),
 96      and falls back to looking for the outermost `{}`-pair.
 97      This may still fail e.g. when talking about JSON in your task, or having the JSON answer
 98      contain "```" as substrings. While current version seems sufficient, there are TODOs for improvement.
 99
100    - The schema presented to LLM is reference-free; all `$ref`s from the JSON schema are resolved.
101    """
102    if isinstance(prompt, str):
103        fmt_count = len(re.findall(f'{"{"}{_FORMAT_VAR}{"}"}', prompt))
104        if fmt_count > 1:
105            raise ValueError(
106                f'Multiple instances of {"{"}{_FORMAT_VAR}{"}"} found in prompt'
107            )
108        if fmt_count == 0:
109            prompt = (
110                FormatStr() + prompt + FormatStr("\n\n{" + _FORMAT_VAR + "#77777726}")
111            )
112    elif isinstance(prompt, FormatStr):
113        if _FORMAT_VAR not in prompt.free_params():
114            prompt += FormatStr("\n\n{" + _FORMAT_VAR + "#77777726}")
115    else:
116        raise TypeError("query_for_json only accepts str or FormatStr as `prompt`")
117
118    deliberation = _FORMAT_PROMPT_DELIBERATE if with_cot else ""
119
120    pdT = get_pydantic_model(T)
121    schema = get_json_schema(pdT)
122    format_prompt = _FORMAT_PROMPT.format(schema=schema, deliberation=deliberation)
123
124    if with_example is True:
125        with_example = generate_json_example(schema, model=model_for_examples)
126    if with_example and not isinstance(with_example, str):
127        with_example = json.dumps(jsonable_encoder(with_example))
128    if with_example:
129        format_prompt += _FORMAT_PROMPT_EXAMPLE.format(example=with_example)
130
131    if isinstance(prompt, str):
132        prompt_with_fmt = prompt.replace(f'{"{"}{_FORMAT_VAR}{"}"}', format_prompt)
133    else:
134        prompt_with_fmt = prompt.format(**{_FORMAT_VAR: format_prompt})
135
136    with TracingNode(
137        f"query for JSON of type {T}",
138        kind="query",
139        inputs=dict(
140            prompt=prompt,
141            with_example=with_example,
142            with_cot=with_cot,
143            max_repeats=max_repeats,
144            T=str(T),
145        ),
146    ) as c:
147        for i in range(max_repeats):
148            res = query_model(model, prompt_with_fmt)
149            assert isinstance(res, str)
150            try:
151                d = find_and_parse_json_block(res)
152                # TODO: Is the following conversion/validation working for nested fields as well?
153                # Convert to pydantic type for permissive conversion and validation
154                d = pdT(**d)
155                # Convert back to match expected type (nested types are ok)
156                d = T(**d.dict())
157                assert isinstance(d, T)
158                c.set_result(d)
159                return d
160            except (ValueError, pydantic.ValidationError) as e:
161                if i < max_repeats - 1:
162                    continue
163                # Errors on last turn get logged into tracing and propagated
164                raise ParsingFailure(
165                    f"model repeatedly returned a response without a valid JSON instance of {T.__class__.__name__}"
166                ) from e

Prompt model to produce a JSON representation of type T, and return it parsed and validated.

  • model can be a langchain normal or chat model, a interlab model, or just any callable object.
  • T needs to be a dataclass, a pydantic BaseModel, or a pydantic dataclass. While defining the classes, use field names and desciprions that will help the LLM fill in the data as you expect it. Recursive classes are not suported. After parsing, the models will also be validated.
  • prompt is any string query to the model. If prompt contains "{FORMAT_PROMPT}", it will be replaced with format instructions, the JSON schema and the optional JSON example. Otherwise this information will be appended at the end of the prompt (this seems to work well).
  • with_example=True will generate an example JSON instance of the type and its schema, and the example will be added to the prompt. Examples can help smaller LLMs or with more complex tasks but it is for now unclear how much they help larger models, and there is some chance they influence the answer. The example is generated by an LLM (default: gpt-3.5-turbo) so they are trying to be semantically menaningful instances of type T relative to field names and descriptions. In-memory and on-disk caching of the examples for schemas is TODO. You can also provide your own example by passing a JSON string or JSON-serializable object in with_example. Note that a provided example is not validated (TODO: validate it).
  • with_cot=True adds a minimal prompt for writing chain-of-thought reasoning before writing out the JSON response. This may improve response quality (via CoT deliberation) but has some risks: the models may include JSON in their deliberation (confusing the parser) or run out of token limit via lenghty deliberation.
  • max_repeats limits how many times will the model be queried before raising an exception - all models have some chance to fail to follow the instructions, and this gives them several chances. Repetition is triggered valid JSON is not found in the output or if it fails to validate against the schema or any validators in the dataclasses. Note there is no repetition on LLM model failure (the model is expected to take care of network faiures etc.).
  • model_for_examples can specify an model to use to generate the example JSON. By default, gpt-3.5-turbo is used.

Returns a valid instance of T or raises ParsingFailure if all retries failed to find valid JSON.

Notes:

  • Tracing: query_for_json logs one TraceNode for its call, and uses query_model which also logs TraceNodes for the LLM calls themselves by default.

  • Uses pydatinc under the hood to construction of JSON schemas, flexible conversion of types to schema, validation etc.

  • The prompts ask the LLMs to wrap the JSON in markdown-style codeblocks for additional robustness (e.g. against wild { or } somewhere in surrounding text, which is hard to avoid reliably.), and falls back to looking for the outermost {}-pair. This may still fail e.g. when talking about JSON in your task, or having the JSON answer contain "```" as substrings. While current version seems sufficient, there are TODOs for improvement.

  • The schema presented to LLM is reference-free; all $refs from the JSON schema are resolved.