feat: adds markdown as an output for the URL component by philnash · Pull Request #11312 · langflow-ai/langflow (original) (raw)
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/lfx/pyproject.toml`:
- Line 46: Update the markitdown dependency constraint in pyproject.toml:
replace the existing "markitdown>=0.1.4,<2.0.0" entry with
"markitdown>=0.1.5b1,<2.0.0" to avoid known security issues, then regenerate
your lockfile or run your package manager (poetry/pip-compile/etc.) and install
to ensure the updated version is used in CI and local environments.
🧹 Nitpick comments (3)
src/lfx/src/lfx/components/data_source/url.py (1)
190-206: Consider adding error handling for markdown conversion failures.The
_markdown_extractormethod could fail if the HTML content is malformed or if encoding issues occur duringBytesIOconversion. WhileMarkItDownmight handle some edge cases, explicit error handling would improve robustness.♻️ Suggested improvement with error handling
@staticmethoddef _markdown_extractor(x: str) -> str: """Convert HTML to Markdown format."""
- stream = io.BytesIO(x.encode("utf-8"))
- result = MarkItDown(enable_plugins=False).convert_stream(stream)
- return result.markdown
- try:
stream = io.BytesIO(x.encode("utf-8"))result = MarkItDown(enable_plugins=False).convert_stream(stream)return result.markdown- except Exception as e:
logger.warning(f"Failed to convert HTML to Markdown, falling back to text: {e}")return BeautifulSoup(x, "lxml").get_text()src/backend/tests/unit/components/data_source/test_url_component.py (2)
148-168: Docstring mentions "different format options" but only tests HTML.The docstring says "Test URLComponent with different format options" but the test only covers HTML format. Consider updating the docstring to be more specific.
✏️ Suggested docstring fix
def test_url_component_html_format(self, mock_recursive_loader):
- """Test URLComponent with different format options."""
- """Test URLComponent with HTML format."""
component = URLComponent()
170-191: Tests don't verify the actual extractor logic.The test mocks
RecursiveUrlLoader.loadwhich returns pre-converted content. This means the_markdown_extractormethod (and other extractors) is never actually invoked during the test. The extractor is passed to the loader, but since the loader is mocked, the conversion logic isn't tested.Consider adding unit tests that directly test the extractor methods to ensure the conversion logic works correctly.
✏️ Suggested addition: Direct extractor tests
def test_markdown_extractor_converts_html(self): """Test that _markdown_extractor correctly converts HTML to Markdown.""" html = "
Title
Paragraph
" result = URLComponent._markdown_extractor(html) assert "Title" in result assert "Paragraph" in resultdef test_text_extractor_strips_html(self): """Test that _text_extractor removes HTML tags.""" html = "
Title
Paragraph
" result = URLComponent._text_extractor(html) assert "<" not in result assert "Title" in resultdef test_html_extractor_returns_unchanged(self): """Test that _html_extractor returns content unchanged.""" html = "Content" result = URLComponent._html_extractor(html) assert result == html
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 4a673cf and 10bcfeb.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock📒 Files selected for processing (3)src/backend/tests/unit/components/data_source/test_url_component.pysrc/lfx/pyproject.tomlsrc/lfx/src/lfx/components/data_source/url.py🧰 Additional context used 📓 Path-based instructions (4) src/backend/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
src/backend/**/*.py: Use FastAPI async patterns withawaitfor async operations in component execution methods
Useasyncio.create_task()for background tasks and implement proper cleanup with try/except forasyncio.CancelledError
Usequeue.put_nowait()for non-blocking queue operations andasyncio.wait_for()with timeouts for controlled get operations
Files:
src/backend/tests/unit/components/data_source/test_url_component.pysrc/backend/**/*component*.py
📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)
In Python component classes, set the
iconattribute to a string matching the desired icon name (e.g.,icon = "AstraDB"). The string must match the frontend icon mapping exactly (case-sensitive).
Files:
src/backend/tests/unit/components/data_source/test_url_component.pysrc/backend/tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/testing.mdc)
src/backend/tests/**/*.py: Place backend unit tests insrc/backend/tests/directory, component tests insrc/backend/tests/unit/components/organized by component subdirectory, and integration tests accessible viamake integration_tests
Use same filename as component with appropriate test prefix/suffix (e.g.,my_component.py→test_my_component.py)
Use theclientfixture (FastAPI Test Client) defined insrc/backend/tests/conftest.pyfor API tests; it provides an asynchttpx.AsyncClientwith automatic in-memory SQLite database and mocked environment variables. Skip client creation by marking test with@pytest.mark.noclient
Inherit from the correctComponentTestBasefamily class located insrc/backend/tests/base.pybased on API access needs:ComponentTestBase(no API),ComponentTestBaseWithClient(needs API), orComponentTestBaseWithoutClient(pure logic). Provide three required fixtures:component_class,default_kwargs, andfile_names_mapping
Create comprehensive unit tests for all new backend components. If unit tests are incomplete, create a corresponding Markdown file documenting manual testing steps and expected outcomes
Test both sync and async code paths, mock external dependencies appropriately, test error handling and edge cases, validate input/output behavior, and test component initialization and configuration
Use@pytest.mark.asynciodecorator for async component tests and ensure async methods are properly awaited
Test background tasks usingasyncio.create_task()and verify completion withasyncio.wait_for()with appropriate timeout constraints
Test queue operations using non-blockingqueue.put_nowait()andasyncio.wait_for(queue.get(), timeout=...)to verify queue processing without blocking
Use@pytest.mark.no_blockbustermarker to skip the blockbuster plugin in specific tests
For database tests that may fail in batch runs, run them sequentially usinguv run pytest src/backend/tests/unit/test_database.pyr...
Files:
src/backend/tests/unit/components/data_source/test_url_component.py**/test_*.py
📄 CodeRabbit inference engine (Custom checks)
**/test_*.py: Review test files for excessive use of mocks that may indicate poor test design - check if tests have too many mock objects that obscure what's actually being tested
Warn when mocks are used instead of testing real behavior and interactions, and suggest using real objects or test doubles when mocks become excessive
Ensure mocks are used appropriately for external dependencies only, not for core logic
Backend test files should follow the naming convention test_*.py with proper pytest structure
Test files should have descriptive test function names that explain what is being tested
Tests should be organized logically with proper setup and teardown
Consider including edge cases and error conditions for comprehensive test coverage
Verify tests cover both positive and negative scenarios where appropriate
For async functions in backend tests, ensure proper async testing patterns are used with pytest
For API endpoints, verify both success and error response testing
Files:
src/backend/tests/unit/components/data_source/test_url_component.py🧠 Learnings (9) 📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Inherit from the correct `ComponentTestBase` family class located in `src/backend/tests/base.py` based on API access needs: `ComponentTestBase` (no API), `ComponentTestBaseWithClient` (needs API), or `ComponentTestBaseWithoutClient` (pure logic). Provide three required fixtures: `component_class`, `default_kwargs`, and `file_names_mapping`
Applied to files:
src/backend/tests/unit/components/data_source/test_url_component.py📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Use same filename as component with appropriate test prefix/suffix (e.g., `my_component.py` → `test_my_component.py`)
Applied to files:
src/backend/tests/unit/components/data_source/test_url_component.py📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Test component versioning and backward compatibility using `file_names_mapping` fixture with `VersionComponentMapping` objects mapping component files across Langflow versions
Applied to files:
src/backend/tests/unit/components/data_source/test_url_component.py📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Test component build config updates by calling `to_frontend_node()` to get the node template, then calling `update_build_config()` to apply configuration changes
Applied to files:
src/backend/tests/unit/components/data_source/test_url_component.py📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Place backend unit tests in `src/backend/tests/` directory, component tests in `src/backend/tests/unit/components/` organized by component subdirectory, and integration tests accessible via `make integration_tests`
Applied to files:
src/backend/tests/unit/components/data_source/test_url_component.py📚 Learning: 2025-11-24T19:46:09.104Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-11-24T19:46:09.104Z
Learning: Applies to tests/unit/components/**/*.py : Create unit tests in `src/backend/tests/unit/components/` mirroring the component directory structure, using `ComponentTestBaseWithClient` or `ComponentTestBaseWithoutClient` base classes
Applied to files:
src/backend/tests/unit/components/data_source/test_url_component.py📚 Learning: 2025-08-05T22:51:27.961Z
Learnt from: edwinjosechittilappilly
Repo: langflow-ai/langflow PR: 0
File: :0-0
Timestamp: 2025-08-05T22:51:27.961Z
Learning: The TestComposioComponentAuth test in src/backend/tests/unit/components/bundles/composio/test_base_composio.py demonstrates proper integration testing patterns for external API components, including real API calls with mocking for OAuth completion, comprehensive resource cleanup, and proper environment variable handling with pytest.skip() fallbacks.
Applied to files:
src/backend/tests/unit/components/data_source/test_url_component.py📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Create comprehensive unit tests for all new backend components. If unit tests are incomplete, create a corresponding Markdown file documenting manual testing steps and expected outcomes
Applied to files:
src/backend/tests/unit/components/data_source/test_url_component.py📚 Learning: 2025-12-19T18:04:08.938Z
Learnt from: Jkavia
Repo: langflow-ai/langflow PR: 11111
File: src/backend/tests/unit/api/v2/test_workflow.py:10-11
Timestamp: 2025-12-19T18:04:08.938Z
Learning: In the langflow-ai/langflow repository, pytest-asyncio is configured with asyncio_mode = 'auto' in pyproject.toml. This means you do not need to decorate test functions or classes with pytest.mark.asyncio; async tests are auto-detected and run by pytest-asyncio. When reviewing tests, ensure they rely on this configuration (i.e., avoid unnecessary pytest.mark.asyncio decorators) and that tests living under any tests/ path (e.g., src/.../tests/**/*.py) follow this convention. If a test explicitly requires a different asyncio policy, document it and adjust the config accordingly.
Applied to files:
src/backend/tests/unit/components/data_source/test_url_component.py⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
- GitHub Check: Lint Backend / Run Mypy (3.12)
- GitHub Check: Lint Backend / Run Mypy (3.11)
- GitHub Check: Run Backend Tests / LFX Tests - Python 3.10
- GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
- GitHub Check: Test Docker Images / Test docker images
- GitHub Check: Test Starter Templates
- GitHub Check: Update Starter Projects
- GitHub Check: Update Component Index 🔇 Additional comments (5)
src/lfx/src/lfx/components/data_source/url.py (3)
2-8: LGTM - Imports are appropriate for the new functionality.The
iomodule andMarkItDownimports are correctly added to support the new markdown extraction feature.
112-117: LGTM - Clear documentation of the new format option.The info text clearly explains the three available output formats and their purposes.
251-256: LGTM - Clean extractor selection pattern.The dict-based extractor mapping is a clean approach that's easy to extend. The fallback to
_text_extractorprovides safe behavior for unexpected format values.src/backend/tests/unit/components/data_source/test_url_component.py (2)
10-35: LGTM - Test class follows the component testing guidelines.The test class correctly inherits from
ComponentTestBaseWithoutClientand provides all three required fixtures:component_class,default_kwargs, andfile_names_mapping. Based on learnings, this follows the established testing patterns.
127-146: LGTM - Text format test covers the basic scenario.The test verifies that the text format option works correctly with the expected content type.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.