Medication Extraction Examples 🏥
LangExtract excels at extracting structured medical information from clinical text, making it particularly useful for healthcare applications.
WARNING
Disclaimer: This demonstration is only for illustrative purposes of LangExtract's baseline capability. It does not represent a finished or approved product, is not intended to diagnose or suggest treatment of any disease or condition, and should not be used for medical advice.
Basic Named Entity Recognition (NER)
In this basic medical example, LangExtract extracts structured medication information:
python
import langextract as lx
# Text with a medication mention
input_text = "Patient took 400 mg PO Ibuprofen q4h for two days."
# Define extraction prompt
prompt_description = "Extract medication information including medication name, dosage, route, frequency, and duration in the order they appear in the text."
# Define example data with entities in order of appearance
examples = [
lx.data.ExampleData(
text="Patient was given 250 mg IV Cefazolin TID for one week.",
extractions=[
lx.data.Extraction(extraction_class="dosage", extraction_text="250 mg"),
lx.data.Extraction(extraction_class="route", extraction_text="IV"),
lx.data.Extraction(extraction_class="medication", extraction_text="Cefazolin"),
lx.data.Extraction(extraction_class="frequency", extraction_text="TID"),
lx.data.Extraction(extraction_class="duration", extraction_text="for one week")
]
)
]
result = lx.extract(
text_or_documents=input_text,
prompt_description=prompt_description,
examples=examples,
model_id="gemini-2.5-pro",
)
# Display entities with positions
print(f"Input: {input_text}\n")
print("Extracted entities:")
for entity in result.extractions:
position_info = ""
if entity.char_interval:
start, end = entity.char_interval.start_pos, entity.char_interval.end_pos
position_info = f" (pos: {start}-{end})"
print(f"• {entity.extraction_class.capitalize()}: {entity.extraction_text}{position_info}")
# Save and visualize the results
lx.io.save_annotated_documents([result], output_name="medical_ner_extraction.jsonl", output_dir=".")
# Generate the interactive visualization
html_content = lx.visualize("medical_ner_extraction.jsonl")
with open("medical_ner_visualization.html", "w") as f:
if hasattr(html_content, 'data'):
f.write(html_content.data) # For Jupyter/Colab
else:
f.write(html_content)
print("Interactive visualization saved to medical_ner_visualization.html")Expected Output
Input: Patient took 400 mg PO Ibuprofen q4h for two days.
Extracted entities:
• Dosage: 400 mg (pos: 13-19)
• Route: PO (pos: 20-22)
• Medication: Ibuprofen (pos: 23-32)
• Frequency: q4h (pos: 33-36)
• Duration: for two days (pos: 37-49)
Interactive visualization saved to medical_ner_visualization.htmlRelationship Extraction (RE)
For more complex extractions that involve relationships between entities, LangExtract can extract medications and their associated attributes:
python
import langextract as lx
# Text with interleaved medication mentions
input_text = """
The patient was prescribed Lisinopril and Metformin last month.
He takes the Lisinopril 10mg daily for hypertension, but often misses
his Metformin 500mg dose which should be taken twice daily for diabetes.
"""
# Define extraction prompt
prompt_description = """
Extract medications with their details, using attributes to group related information:
1. Extract entities in the order they appear in the text
2. Each entity must have a 'medication_group' attribute linking it to its medication
3. All details about a medication should share the same medication_group value
"""
# Define example data with medication groups
examples = [
lx.data.ExampleData(
text="Patient takes Aspirin 100mg daily for heart health and Simvastatin 20mg at bedtime.",
extractions=[
lx.data.Extraction(
extraction_class="medication",
extraction_text="Aspirin",
attributes={"medication_group": "Aspirin"}
),
lx.data.Extraction(
extraction_class="dosage",
extraction_text="100mg",
attributes={"medication_group": "Aspirin"}
),
lx.data.Extraction(
extraction_class="frequency",
extraction_text="daily",
attributes={"medication_group": "Aspirin"}
),
lx.data.Extraction(
extraction_class="condition",
extraction_text="heart health",
attributes={"medication_group": "Aspirin"}
),
lx.data.Extraction(
extraction_class="medication",
extraction_text="Simvastatin",
attributes={"medication_group": "Simvastatin"}
),
lx.data.Extraction(
extraction_class="dosage",
extraction_text="20mg",
attributes={"medication_group": "Simvastatin"}
),
lx.data.Extraction(
extraction_class="frequency",
extraction_text="at bedtime",
attributes={"medication_group": "Simvastatin"}
)
]
)
]
result = lx.extract(
text_or_documents=input_text,
prompt_description=prompt_description,
examples=examples,
model_id="gemini-2.5-pro",
)
# Display grouped medications
print(f"Input text: {input_text.strip()}\n")
print("Extracted Medications:")
# Group by medication
medication_groups = {}
for extraction in result.extractions:
if not extraction.attributes or "medication_group" not in extraction.attributes:
continue
group_name = extraction.attributes["medication_group"]
medication_groups.setdefault(group_name, []).append(extraction)
# Print each medication group
for med_name, extractions in medication_groups.items():
print(f"\n* {med_name}")
for extraction in extractions:
position_info = ""
if extraction.char_interval:
start, end = extraction.char_interval.start_pos, extraction.char_interval.end_pos
position_info = f" (pos: {start}-{end})"
print(f" • {extraction.extraction_class.capitalize()}: {extraction.extraction_text}{position_info}")Key Features Demonstrated
- Named Entity Recognition: Extracts entities with their types (medication, dosage, route, etc.)
- Relationship Extraction: Groups related entities using attributes
- Position Tracking: Records exact positions of extracted entities in the source text
- Structured Output: Organizes information in a format suitable for healthcare applications
- Interactive Visualization: Generates HTML visualizations for exploring complex medical extractions