Openize.MarkItDown for Python
Convert Office Documents and PDFs into Markdown
Convert Office documents and PDFs into clean, organized Markdown format for improved flexibility and workflow.
Openize.MarkItDown for Python is a powerful and versatile Open-Source Python package for converting documents into Markdown format. It supports a wide range of file formats, including Word, PDF, Excel, and PowerPoint, offering flexible output customization and seamless integration with LLMs for advanced processing.
The installation of this lightweight package is simple, ensuring a smooth user experience. Openize.MarkItDown is designed for scalability, leveraging the Factory & Strategy Pattern for efficient document conversion. It supports both Windows and Linux-compatible paths, making it highly adaptable across different environments.
With a user-friendly command-line interface, this Open-Source Python library allows effortless conversion of .docx
, .pdf
, .xlsx
, and .pptx
files into Markdown. Users can save Markdown files locally or send them to an LLM for further processing, enabling enhanced automation and workflow efficiency.
Explore our GitHub repository to contribute, suggest improvements, and enhance this Open-Source SDK: https://github.com/openize-com/openize-markitdown-python
Getting Started with Openize.MarkItDown for Python
The recommend way to install Openize.MarkItDown for Python is using Pip. Please use the following command for an easy installation.
Install Openize.MarkItDown for Python via Pip
pip install openize-markitdown-python
You can also download it directly from github.Convert a Word Document to MD using Command Line Interface
Use the following simple commands to convert your Word document (.docx) into Markdown (.md) file.
Convert a word document to Markdown via CLI
# Convert a file and save locally
markitdown document.docx -o output_folder
# Process with an LLM (requires OPENAI_API_KEY environment variable)
markitdown document.docx -o output_folder --insert_into_llm
Convert a PDF document to Markdown via Python API
The following code snippet converts a PDF file to Markdown.
Convert PDF to MD via Python Package
from openize.markitdown.core import MarkItDown
# Define input file and output directory
input_file = "report.pdf"
output_dir = "output_markdown"
# Create MarkItDown instance
converter = MarkItDown(output_dir)
# Convert document and save locally
converter.convert_document(input_file, insert_into_llm=False)
print("Conversion completed and saved locally.")
Convert a PDF document to Markdown via Python API and Save into LLM
The following code snippet converts a PDF file to Markdown and saves the output into an LLM model.
Convert PDF to MD via Python Package and Save into LLM
from openize.markitdown.core import MarkItDown
# Define input file and output directory
input_file = "report.pdf"
output_dir = "output_markdown"
# Create MarkItDown instance
converter = MarkItDown(output_dir)
# Convert document and send output to LLM
converter.convert_document(input_file, insert_into_llm=True)
print("Conversion completed and data sent to LLM.")
More Code Examples and Resources
Explore more detailed code examples at Openize Gists.