gliner.serve.client module¶

HTTP client for the GLiNER Ray Serve deployment.

exception gliner.serve.client.GLiNERClientError[source]¶

Bases: RuntimeError

Raised when the GLiNER server returns an error or is unreachable.

class gliner.serve.client.GLiNERClient(base_url='http://localhost:8000', route_prefix='/gliner', timeout=30.0, max_concurrency=32)[source]¶

Bases: object

HTTP client for a running GLiNER Ray Serve deployment.

Example

>>> from gliner.serve import GLiNERClient
>>> client = GLiNERClient()
>>> results = client.predict(
...     "John works at Google in Mountain View", labels=["person", "organization", "location"]
... )
{'entities': [{'start': 0, 'end': 4, 'text': 'John', 'label': 'person', ...}, ...]}

Initialize the HTTP client.

Parameters:

base_url (str) – Scheme + host + port of the Ray Serve HTTP proxy.
route_prefix (str) – Route prefix the deployment is mounted under (must match GLiNERServeConfig.route_prefix).
timeout (float) – Per-request timeout in seconds.
max_concurrency (int) – Maximum in-flight HTTP requests when predicting on a list of texts. Bounds the client-side thread pool.

__init__(base_url='http://localhost:8000', route_prefix='/gliner', timeout=30.0, max_concurrency=32)[source]¶

Initialize the HTTP client.

Parameters:

base_url (str) – Scheme + host + port of the Ray Serve HTTP proxy.
route_prefix (str) – Route prefix the deployment is mounted under (must match GLiNERServeConfig.route_prefix).
timeout (float) – Per-request timeout in seconds.
max_concurrency (int) – Maximum in-flight HTTP requests when predicting on a list of texts. Bounds the client-side thread pool.

predict(text, labels, relations=None, threshold=None, relation_threshold=None, flat_ner=True, multi_label=False, adapter_id=None)[source]¶

Blocking prediction. str in -> dict out; list in -> list out.

async predict_async(text, labels, relations=None, threshold=None, relation_threshold=None, flat_ner=True, multi_label=False, adapter_id=None)[source]¶

Async version of predict.

adapter_cache_status(adapter_id=None)[source]¶

Return PolyLoRA adapter cache status from the server.

is_adapter_cached(adapter_id)[source]¶

gliner.serve.client.get_client(base_url='http://localhost:8000', route_prefix='/gliner', timeout=30.0, max_concurrency=32)[source]¶

Convenience constructor for GLiNERClient.